Wednesday, November 13, 2013

Parsing and Negating Boolean Strings in Haskell

It appears that the dailyprogrammer subreddit is back after a pretty long hiatus, and they kicked back into gear with a really interesting problem. The problem was, paraphrasing:
Given a Boolean expression as a string S, compute and print the negation of S as a string using DeMorgan's laws.
The problem is also detailed in full here.
I completed the challenge and posted my solution to reddit, but wanted to share it here as well, so here it is, with slight modifications:

This is a problem that is heavily suited to three major things that Haskell advocates: Algebraic Data Types, Pattern Matching, and Monadic Parsing.

First off, if you've had any experience with automata theory, it's pretty clear that the input language of Boolean expressions can be represented by a context free grammar. It just so happens that Haskell makes it incredibly easy to model CFGs right out of the box using Algebraic Data Types. Let's take a look at this data type representing Boolean expressions:

Simple. Now, the main problem of this challenge was actually performing the simplification of the not operation. Using pattern matching, we can directly encode these rules in the following manner:

Here we're giving a literal definition of rules for negating Boolean expressions. If you use Haskell, this is really easy to read. If you don't: stare at it for a second; you'll see what it's doing! That's the brunt of the challenge, right there. That's it. Encode a Boolean expression into an Expr and call not on it, and it will spit out a new Expr expressing the negation of your original expression. DeMorgan's laws are represented in the And and Or rules.We can also do this in a slightly modified way, using a function simplify :: Expr -> Expr that simplifies expressions and another function not = simplify . Not. Not to compute the same thing. It's a similar solution so I won't post it, but if you'd like to, feel free to experiment and/or add more simplification rules (e.g. simplify e@(And a b) = if a == b then a else e). We can also display our expressions as a string by declaring Expr an instance of Show in the following way:

Now we can type in Boolean expressions using our data type, not them, and print them out as nice expressions. But, now we are faced with, in my opinion, the tougher part of the challenge. We're able to actually compute everything we need to, but what about parsing a Boolean expression (as a string) into an Expr? We can use a monadic parsing library, namely Haskell's beloved Parsec, to do this in a rather simple way. We'll be using Parsec's Token and Expr libraries, as well as the base, in this example. Let's take a look.

We essentially define the structure of our input here and parse it into an Expr using a bunch of case-specific parsing rules. variable parses a single char into a Varparens matches and returns a SubExpr, and everything else is handled by using the convenience function buildExpressionParser along with a list of operator strings, the types they translate to and their operator precedence. Here we're using applicative style to do our parsing, but monadic style is fine too. Check this out for more on applicative style parsing.

Given that, we can define a main function to read in a file of expressions and output the negation of each of the expressions, like so:

Concise and to the point. We make sure that each line gets parsed properly, not the expressions, and print them. Here's what we get when we run the program:
inputs.txt                             --- output
a                                      --- NOT a
NOT a                                  --- a
a AND b                                --- NOT a OR NOT b 
NOT a AND b                            --- a OR NOT b
NOT (a AND b)                          --- a AND b
NOT (a OR b AND c) OR NOT(a AND NOT b) --- (a OR b AND c) AND (a AND NOT b)
Finally, here's the full (40 line!) source on Github!

Thanks for reading.


  1. I think that the SubExpr time is not necessary. The one way to handle this, might be to have a layer of expr for every tier of priority.

    In the case of OR < AND < NOT, and left-associativity,

    expr = or_expr
    or_expr = or_expr " OR " and_expr || and_expr
    and_expr = and_expr " AND " not_expr || not_expr
    not_expr = "NOT " var || parens or_expr || var

    in the case of OR = AND < NOT

    expr = and_or_expr
    and_or_expr = and_or_expr " OR " not_expr || and_or_expr " AND " not_expr || not_expr
    not_expr = "NOT " var || parens and_or_expr || var

    these layers can be parameterized by a list of some sort of course to prevent code repetition...this is really what parsec does best :). and in this way you get rid of SubExpr.

    For Showing, you can nest your pattern matches:

    instance Show Expr where
    show (Not e@(And _ _)) = "NOT (" ++ show e ++")"
    show (Not e@(Or _ _)) = "NOT (" ++ show e ++")"
    show (Not e@(Var _)) = "NOT " ++ show e
    show (And e1 e2) = show e1 ++ " AND " ++ show e2
    show (Or e1 e2) = show e1 ++ " OR " ++ show e2
    show (Var c) = [c]

    1. Thanks for your input! I may make some changes to this post soon. I'm currently studying for an exam and can't get to it, but I appreciate your taking the time to let me know how to do this better.