Abstract Nonsense: haskell

Showing posts with label haskell. Show all posts

Thursday, December 6, 2012

Graphs and Topological Sorting in the Functional Paradigm

What is a Graph?

From Wikipedia:

In mathematics, a graph is a representation of a set of objects where some pairs of the objects are connected by links. The interconnected objects are represented by mathematical abstractions called vertices, and the links that connect some pairs of vertices are called edges.

Simply put, a graph is just a bunch of points with links between them. A road map is a simple example: roads being edges, and intersections being vertices. In fact, Google maps uses graphs for just this purpose! Graphs are widely used in a wide variety of places. Facebook uses graphs to model your friend connections and likes. In fact, the entire internet is just a giant graph; websites act as vertices, with hyperlinks as edges. Graphs are highly useful structures, as they can be used to model many different types of situations, and as such, they will be the focus of this blog post. I am going to discuss one way to represent a graph in the Haskell programming language, and how to functionally solve a common problem using graphs.

Graphs are often represented visually like this:

Graph representing abstract data

This graph links the first six letters of the alphabet in an arbitrary way. This data doesn't really mean anything, but it will serve as a simple foray into the world of graphs, and provides an initial graph to work towards representing in Haskell.

Let's get right to it; here's the data structure we'll be using, along with some convenience methods:

First we define the actual Graph a data type: It's simply a set vertices and edges in the form of 2-tuples (The tuple (a, b) connects vertex a to vertex b), which fits our definition. I've also defined the removeEdge method, which does just what you'd expect. The outbound and inbound methods find the outbound and inbound connections to any point in the graph, respectively. They make use of the polymorphic connections method in order to get this done in a small amount of code. Finally, the Graph module exports the relevant functions at the top of the file.

Now that we've got our framework in order, we can go ahead and build the graph we mentioned above:

We import the Graph module and define a simple Letter data type, then build our Graph from it. The set of vertices are the letters A, B, C, D, E, and F, and the edges are modeled as above.

Now that we know how to build graphs, we can start modeling more important information with them.

Modeling Actual Scenarios using Graphs

Suppose some of the characters from NBC's Parks and Recreation have just finished competing in a dance competition, and we know the following about their rankings:

Leslie beat April.
April beat Ann.
Ron beat April.
Ron beat Ann.
April beat Andy.
Leslie beat Ron.
Andy beat Jerry.
Ron beat Andy.
Ann beat Jerry.
Leslie beat Andy.
Ann beat Andy.

This is a little hard to understand, so why don't we model it as a graph to make it a little more readable? Each person can be represented as a vertex, with outgoing edges representing connections to the people they beat.

A graph of dance competition results

It would be nice to be able to be able to read scenarios like this from a text file containing the important data and parse it into a graph. Let's go ahead and set up a function to do this for us, so we don't have to hard-code each and every graph that we want to use:

Here's our data file, with a list of space-separated connections, one on each line:

And our parsing function:

The graphFromFile function takes a String and returns an IO (Graph String). The function reads a file, parses it into two important pieces: verts (the set of all unique strings in the file, or, our vertices) and conns (the set of connections between strings in the file). It then builds a Graph from this data, wraps it in the IO monad with return, and gives it back.

Now you might have been wondering from the beginning of this section what the ranking from the dance competition was (maybe you even figured it out on your own!). How do we do this programmatically, using our graph?

Enter Topological Sort

Again, from Wikipedia:

In computer science, a topological sort of a directed graph is a linear ordering of its vertices such that, for every edge uv, u comes before v in the ordering.

In our case, this just means that each person must come before all of the people that he or she beat in the competition, in the ordering.

The basic procedure for topological sort is as follows:

L = {} --sorted list
S = Set of vertices with no incoming connections
while S is not empty:
for each vertex v in S with no incoming connections:
push v to L
for each edge e from v to u:
remove e from graph
if u has no more incoming connections:
push u to S

if edges still exist in the graph:
error: there is at least one cycle in the graph

else return L

If you do not understand this, I urge you to work through topologically sorting a graph on paper first; it's not too tough to understand once you've done it on paper, but can get a little confusing in psuedocode.

The problem with this algorithm is that you see a ton of loops -- control structures that we do not have in Haskell. Therefore, we must rely on recursion, folds, and maps to achieve what we want to do. Here's how it looks:

Our tsort function first finds the elements in the graph with no incoming edges using the function noInbound. We pass this into a sub-routine tsort' that takes a sorted list l, a list of vertices with no incoming connections (n:s), and a graph g.

We operate on the first element of the set of vertices with no incoming connections n, finding outEdges (the outgoing edges from n), and outNodes (the nodes that n points to). We build a new graph g' with the outEdges removed, and find the nodes in g' with no inbound connections, and add them to s.

We then recursively call tsort' with these new parameters (and prepend our current n to the sorted list), until there are no more nodes to check. At this point, if the edge list in the graph is empty, all is well and we return the list of sorted elements. Otherwise, an error is thrown stating that there is at least one cycle in the graph.

Now that we've got that, we're ready to find out how everyone ranked in the dance competition!

This produces the output:
["Leslie", "Ron", "April", "Ann", "Andy", "Jerry"]

(Of course Jerry lost.)

Conclusion

As you can see, Graphs are very useful data structures. They can be used to model a huge variety of things (see how many more you can come up with, they're everywhere!). Topological sort in particular is a pretty remarkable algorithm, and can be applied in many different situations from the one above. For example, finding paths through college courses with prerequisites. It's even used in UNIX systems to schedule processes according to their dependencies.

Hope you enjoyed the post! Until next time,

Ben

Wednesday, August 15, 2012

Haskell: Implementing a Fractional Data Type and Investigating the Expansion of the Square Root of 2

A couple of weeks ago, I completed Project Euler #57: Investigating the Expansion of the Square Root of 2. I found the problem really interesting, since it required me to write up my own fractional operations (addition, division, etc.) Today, I decided that I wanted to take my minimally defined fractional library and make it a more full-featured one. First, I'll first walk through the building of the data type and implementing fractional operations, and then I'll get to my solution of the Project Euler problem.

First thing's first. A Fraction is defined as a numerator and a denominator. My fractions consist of only Integers, and here's how the data type is defined:

I wanted fractions to show up in the form x / y, so that is what the instance of Show I defined does.

So, now that we have a data type, what can we do with it? Here's a list of common functions that I think are necessary in the definition of a fraction: Addition, Subtraction, Multiplication, Division, Exponentiation, Simplification, Negation, getting the Numerator, and getting the Denominator.

Let's start with the simple ones:

Those are are numerator and denominator functions. We simply take the first or second argument of the Frac in order to get the parameter that we need, and disregard the other one (using _) since it's unused.

Simplifying is also simple enough. We just need to divide the numerator and the denominator by the greatest common divisor of the two. Since Haskell gives us a GCD function, this is pretty darn simple. Here's how it's implemented:

Easy! We just make a new fraction out of the divided values. The function `quot` is basically just integer division that truncates the result towards 0. The second half of that is unimportant in this instance, since we're dividing by the GCD of the numbers and the result will always be an integer value.

Okay, so we have a couple of functions. Great! But, what about implementing addition, subtraction, etc? Well, basically what we want to do here is define a couple of instances of numeric types for our Fractional data type.

The first instance we need to derive is Num, which has a few operations that we want:

The three "common" operators (+, -, *) are defined here, which means that we can now evaluate expressions such as Frac 1 2 * Frac 1 2. Cool, right? Those three operations are fairly self-explanatory, and the code (hopefully) isn't too tough to follow. There are also three other function definitions here, that maybe aren't quite as clear. The function negate simply turns a negative fraction positive, or a positive fraction negative. The function abs takes the absolute value of the fraction. This is fairly simple; we just use a function that maps abs (Integer absolute value) over the numerator and denominator of the fraction. The last is signum, which looks probably the most complicated, but all it actually does is tells us whether the Fraction is less than, greater than, or equal to 0 (returning -1, 1, and 0, respectively).

Cool, so since we got all of those functions out of Num, where can we find the rest? We're missing /, so we'll make our Fraction an instance of Fractional. Seems appropriate, right?

Cool, we get division out of that, which is simple enough! We just take the reciprocal of the second fraction, and multiply the two. This may look a little funky, so I'll explain that last.

The other two functions defined here are recip and fromRational. recip simply flips the numerator and denominator in a Fraction, and this is easily handled with pattern matching. fromRational takes a rational number (which is provided the numerator and denominator functions) and turns it into a Fraction. Knowing what the numerator and denominator functions are, this function is incredibly trivial.

Okay, so about that division function. Our division appears to take only one argument, but it actually takes two. (*) f f' is just syntactic sugar for f * f'. We want to compose the function f* with the function recip f', so we use the function (*), apply it to f, and then apply that to the function recip, which is then called on the second argument of the function. This kind of function is hard to read for a lot of people, but it definitely comes more naturally with Haskell practice.

Alright! We've got plenty of functions at our disposal now, so what's next? Well, we want to be able to compare Fractions, so let's go ahead and make it an instance of Eq and Ord, which allow us to check equivalency and order, respectively.

There's a lot of functions that are similar in structure to our / function from earlier, so understanding what is happening with those will make these much easier to understand.

Starting with Eq, we have two functions, /= (not equals) and ==. == simply checks to see if the simplified version of f and the simplified version of f' have the same numerators and denominators. /= basically just returns the opposite of == by calling not on the result.

Ord isn't too tough either. First we have compare, which returns a comparator (LT, GT, or EQ) based on the relationship between two Fractions. The inequality functions are all designed around this function. The max and min functions are simple, and designed around our inequality functions.

So, what's left? I decided I wanted to experiment, so I decided to also make Fraction an instance of Monoid, and give our Fraction a simpler constructor. Here's the rest!

The instance of Monoid defines a couple of functions: mempty, mappend, and mconcat. mempty is the minimal value for the Monoid. I chose 0 (which gets automatically boxed into a Fraction). mappend is a combiner function, and I thought that addition of fractions fit this bill well enough, so I simply gave it an alias. mconcat concatenates a list of fractions with some function, and in our case, sums the entire list.

Our type constructor (%) takes two integers and boxes them up into a Fraction, which is then simplified. Easy enough.

One final note on all of this. You may have noticed that exponentiation (^) isn't implemented here. But, it actually is! Turns out, any data type that has a Num instance defined (like our Fraction) can use the ^ operator. So things like (1 % 2)^2 actually get evaluated properly. (In this case, to 1 % 4).

All right! Let's use this library to solve Project Euler #57! Knowing how to use the Fraction library, this should be relatively easy to follow. Here we go:

Fairly simple. We directly implement the function and run it on each iteration, building a list as we go. We then filter our list so that the only entries left are the ones that have more digits in the numerator than the denominator. Lastly, we evaluate the length of that list, which gives us the answer.

Pretty neat, huh? Here's the full Fraction.hs source file:

Until next time!

Ben

Sunday, August 5, 2012

My first real code golf program!

I've always been interested in Code golf, which essentially boils down to creating the shortest possible program that completes some task.

I finally got around to completing one of these challenges, and though I wasn't really that close to winning, it was fun to mess around with it it was one of the shorter Haskell solutions!

So, the challenge I attempted was Reddit's Tiny Code Challenge #4, asking for the shortest program that finds the longest identical subsequence in a string.

Directly from the challenge page:

For example:
aaabbbbccc aabccdcdd -> aab, bcc
adeaabbbbccc aabccdedcdd -> aab, bcc
abcdef ghijk -> nothing
abcdef fghijk -> f

So, here was my solution in Haskell:

Let's break this down into steps, since it probably looks like a bunch of mumbo-jumbo!

And there you have it! 193 characters of Haskell code got us a pretty cool program.

Here's a little bit of output:

>tc04.exe "aaabbbbccc" "aabccdcdd"

>"bcc"
>tc04.exe "abcdef" "ghijk"
>tc04.exe "abcdef" "fghijk"
>"f"
>tc04.exe "fhqwhgads" "where are all the dragons?"
>"wh"

Cool! :)

-Ben

Tuesday, July 17, 2012

Introducing fhck, a brainf*ck interpreter written in Haskell!

If you haven't heard of brainf*ck, you should totally check it out, because it's probably the coolest programming language of them all!

Also, probably one of the simplest...

That's why I decided to write an interpreter for it!

This is the product of about 3 days of work, give or take, but the sessions were long and I put a lot into it.

If you're a Haskeller reading this, please feel free to critique it (I'm sure it could be improved in many places)!

If you're not, please feel free to check it out anyway! I encourage everyone to play around with brainfuck a little bit because it's actually a lot of fun.

If you're not familiar with the language, a quick look through brainfuck's Wikipedia page should help you to understand what's going on in brainfuck (it's not too complex).

With the interpreter, you can write a file, say thisisbf.b, containing the classic "Hello, world!" brainfuck program directly from the wikipedia page:

and pass it to the interpreter:

on Windows:

$ build\windows\fhck.exe thisisbrainfuck.b

or Linux (or presumably Mac OSX, though I haven't tried):

$ build/linux/fhck thisisbrainfuck.b

...and it should print "Hello, world!" to your console!

Neat, right? (I think so!)

Download link/source is located here: fhck 1.0

Enjoy!

Wednesday, July 11, 2012

99 Haskell Problems #13: Run-length Encoding

I've been working on The Ninety-Nine Haskell Problems lately, and I've come across a particular one that seems to be practical, and was a lot of fun to implement!

The 99 Problems start out with a lot of list-based exercises, and eventually you're asked to build up to a point where you can implement a function that performs Run-length encoding on a String. Basically what this means is, if we have a string, say "aabbcccd", we'd group together the adjacent, identical characters, count them, and then output the characters along with their counts in a new list.

Thus, the previous example "aabbcccd" would output something like this: a2b2c3d

A couple of the 99 Problems ask you to implement this indirectly, meaning, actually group the identical characters into sub-lists and count the length of those lists. This is fairly trivial, making use of Haskell's built-in function group, which takes a list and separates it in exactly the manner we would need in order to get sub-lists of adjacent identical values.

The real fun came in Problem 13, where it asks for a direct implementation of the run-length encoding algorithm, meaning, we're no longer allowed to split up the groups in the way that we were before. Additionally, the problem asks that we use a data wrapper on each ending value so we are able to discern Single values (example in above: d) vs Multiple values (ex: a2).

Here's what I came up with:

Let's take a look at this.

So, first I defined a new data type called Count that can be either a Single that contains a character, or a Multiple that contains a character and it's count (as an Int)

What encodeDirect does is actually parses a list of Chars (which, in Haskell, can be represented as a String, like "aabbcccd") into a list that looks something like this:

[Multiple 2 a, Multiple 2 b, Multiple 3 c, Single d]

The procedure is as follows:

First, we use encode every character in our input set by using a foldr and an encode function in order to get a list that contains 2-tuples with our character as the second value, and its count as the first. ( [(2, 'a'), (2, 'b') ... ] )

The encode function might look a little cryptic, but broken down, it makes sense. First, if we have an empty resultant list of Counts, we need to put the first character we encounter in the resultant list. Once we have a resultant list (no matter how small), we are able to encode the other values by following some logic:
1. If the first character value in the resultant list matches what we are currently trying encode, we simply need to step up the count on that entry.
2. Otherwise, we need to add the new character to the front of the list with a count of 1.

When the foldr is complete, we have a nice list that looks just like how I described previously.

Now, to get everything wrapped up into a Count declaration, we simply map the toCount function over the list we just generated. The toCount function takes one of our 2-tuple values and creates a Single value out of it if its count is equivalent to 1, otherwise it packs it into a Multiple value.

Once this is mapped, we have our resultant list!

encodeDirect "aabbcccd" = [Multiple 2 a, Multiple 2 b, Multiple 3 c, Single d]

PS:I picked up a bit of new syntax from a video I watched last night, located here: Sokoban Live Coding. I learned a lot from it. If you're interested in Haskell, check it out.

Until next time!

5outh

Tuesday, April 3, 2012

Functional Sorting in Haskell

Sorting is a fairly major topic of study when it comes to programming, and there are tons of ways to do it. I don't know how interesting these algorithms are to other people, but they have always been an interest of mine. I think they're neat because they illustrate that there can be several ways to tackle even the simplest of tasks (which, in my opinion, is one of the things that makes programming so fun).

Since I'm always looking for new things to try, I've decided to take a few of the common sorting algorithms and implement them on my own in Haskell. For those of you who aren't familiar with it, here's a brief overview.

Most programming languages (Java, C++, Python, Ruby, Javascript, etc) are defined as imperative, which basically means that programs are executed line-by-line. Many of the aforementioned languages are also defined as object-oriented, meaning that virtually everything is a part of an object. I won't get into the details about OOP here, since we're talking about Haskell, but you can read about it if you'd like. Haskell differs from these languages by conforming to a paradigm in which everything is defined as a function. Objects are not present here, and iterative steps don't quite work in the same way -- nothing is being executed line-by-line -- all instructions are defined in functions. This is a rather mind-blowing concept, but it's extremely cool once you can get a mental grasp on it.

Back to the point of the blog post. I'll be covering how to implement three sorting algorithms in a functional manner: Bubble Sort, Insertion Sort, and finally, Quicksort. If you don't have a background in Haskell, this might all come off as programming jargon, but if you're looking for an interesting new look at these algorithms you've seen a billion times implemented imperatively, or are interested in picking up Haskell as a hobby or something, I'd recommend reading on.

Bubble Sort:

The bubble sort algorithm is as follows:

1. Take an array's first two elements (we'll call the first one x, and the second one y)

2. If x is less than y, leave those elements alone. If not, swap y with x, so that the array now reads

[y, x, (the rest of the array)]

3. Repeat this for each element in the array, swapping as it goes along.

4. Check to see if the list is sorted. If not, rerun the algorithm on the result of the last iteration of the function.

5. When list is sorted, return the result.

Here it is in Haskell:

Rather long, but effective. Bubble Sort isn't the most efficient sorting algorithm, especially since it has to do all of that checking if it's already sorted. I'm not a big fan of this one.

Insertion Sort:
Insertion sort is pretty simple to follow. We start with an array, as always, and then follow these steps:
1. Pop the first element of the array off of the array, and populate a new array with it.
2. For each element after the first in the array, insert it at it's proper sorted location in the array.
3. Return the final list of sorted numbers.

Let's see what this looks like in Haskell:

How elegant is that? This could take many lines in an imperative language, but we make use of Haskell's recursive foldr method to populate our sorted list, and the appropriately named function insert taken from the Data.List package helps us to populate it in the exact manner we're looking for.

Quicksort:
The Quicksort algorithm can be pretty intimidating, especially at first glance. This is because it makes use of recursion to accomplish it's sorting, which is a particularly tough concept to fully understand when it comes to programming. The instructions for Quicksort look a little something like this:
1. Choose any point in the array as a "pivot" point.
2. Create two new arrays, one populated with the numbers in the array less than the pivot, and another with the numbers greater than the pivot.
3. Recursively call Quicksort on both of the lists, until these lists (inevitably) turn up empty, at which point the function returns an empty list.
4. At this point, the recursion unwinds itself, and we concatenate the recursively sorted arrays together. (I would recommend reading up on this algorithm if the above instructions do not make sense; I am not the best at explaining things)

Sounds complicated, right? Here's a Haskell implementation:

What? It only takes five lines to implement Quicksort? This is why Haskell is so cool to me. Compare this with something like, say, a Java implementation, and you'll see what I mean. It's just so simple! We're simply defining quickSort as a quickSorted list of smaller numbers concatenated with a singleton list containing a pivot, concatenated with a quickSorted list of larger numbers. Easy to read, though the algorithm may seem very complicated.

Hope you enjoyed the read!

-Ben