Facebull Problem Definition

I mentioned earlier that the puzzle named “Facebull” in Facebook, is a variation of the traveling salesman problem. Let us define the facebull problem more formally and see if it is really true.

As I mentioned before, this problem can be modeled as a graph problem, where every chemical compound is a vertex and every machine is a directed edge in the graph. The cost of the machine is the weight of the edge in the graph. In this model, the following information is given in the problem definition.

  • The given graph is a weighted strongly connected directed graph, because every vertex can be reached from any other vertex.
  • The given graph need not be a fully connected graph, because every pair of vertices need not have a edge.

Now, the following are the unknowns,

  • We want to find the set of edges among the given set of edges, such that every vertex can still be reached from any other vertex.
  • When there are multiple such subsets, we want to find the subset that is the minimum weight sum of its edges.

Thus facebull(FB) problem can be stated concisely as below.

Given a strongly connected directed graph G = (V, E) with vertex set V and weighted edge set E, find the strongly connected subgraph G’ = (V, E’), with E’ ⊆ E having minimum total weight.

The definition of a asymmetric travelling salesman problem(TSP) is as below.

Given a fully connected directed graph, find the minimum cost tour that visits every vertex exactly once and returns to the starting vertex.

A tour that visits every vertex of a graph exactly once is also known as a hamiltonian cycle.

Any fully connected graph is also strongly connected, so the inputs to TSP and FB could be the same graph. Usually in a real TSP the distances between the vertices are Euclidean distances, this is not true for the FB problem.

The main difference between the TSP and the FB problem is the unknown in the problem. In TSP, the unknown is a cycle in the graph. In FB problem, the unknown is a set of edges, with the only constraint that the resultant graph be strongly connected. Another key difference is, every time a edge is used in a TSP solution, a cost is added to the solution. However, a edge that is added to the FB solution is a machine that is bought and therefore could be used multiple times at no cost.

Finally, consider the computational complexity of TSP and the FB problem. TSP belongs to the class of NP-Hard problems, it is O(n!) in its worst case, where n is the number of vertices of the graph. For a graph of 12 nodes, the program must consider 12! = 479001600 possibilities. A brute force solution that considers all the possibilites for a graph of 20 nodes, will take many years to complete!

There are 2^m possible subsets of edges in a graph of m edges. Therefore, the complexity of FB is 2^m, where m is the number of edges of the graph. If the given graph is fully connected, m = n², where n is the number of nodes. Therefore FB is O(2^n²) is its worst case. For a graph of 12 nodes, a solution to FB must consider 22300745198530623141535718272648361505980416 possibilites. O(2^n²) is worse than O(n!).  FB is a harder problem than TSP!

Another way of saying this is, the size of the search space for a FB solution is 2^m.  The solution can be any where within that space, so any correct algorithm must guarantee to be able to find the solution any where in that space. An algorithm can restrict the size of its search space, but only if it can prove that the solution cannot be in the unsearched area. I will talk more about some of these pruning techniques in a future blog entry.

[cross posted from my Sun Blog]

Modelling a software problem

A solution to every software problem must have two characteristics, it must be correct and and it must be efficient. One of the most important steps of solving a problem is to model the problem. A good software model of the problem determines how the problem gets solved and how efficient the solution is. Right modelling requires a good knowledge of software techniques and tools.

An engineer with good knowledge of data structures and algorithm techniques should be able to the model a single problem in multiple different ways, be able to see the trade-offs in the different models and pick a model using which the problem can be solved. Once the model has been decided, then a variety of tools can be used to implement the solution.

Here is a example. Facebook lists a variety of problems in their puzzles page that is ordered by how hard it is to solve the puzzle. Let us look at one of the hardest puzzles,  "FaceBull". The problem is to find the cheapest way to produce all the chemical compounds needed for the new super-energy drink from any single source compound. How would you model this problem?

Let each compound to be a vertex of a graph. There will be a edge between any two compounds, when there is machine that converts one compound to another. The weight of this edge is the cost of acquiring the machine that does the conversion. Let us draw this graph with the example input given in the problem.

Now in this model, the problem becomes, starting from any vertex find a path with the minimum cost that visits every vertex of this graph atleast once. This is a variation of the classic traveling salesman problem(TSP). Traveling salesman problem is a NP-complete problem. No wonder Facebook marked this as a hard problem.

We have identified a well known and well studied problem. The brute force approach to solve the problem is in the order of O(n!), where n is the number of vertices. It will take years for such a program to complete for just 20 vertices. Although it is difficult to solve TSP and to find the optimal solution for all inputs, there are many heuristics and approximation algorithms that will give a solution that is close to the optimal solution.

Notice how we transformed a chemical industry application problem into a well studied computer science problem with the right modelling.

When Nick and Thorsten were visiting the bayarea for Open HA Cluster Summit, we talked about some of the difficult problems in Solaris Cluster and how we may model them. This is one of the exciting aspects of working at Sun and at Solaris Cluster. There are many difficult problems to solve and people are eager to solve it. I  love the opportunity that every difficult problem presents. Stay tuned, we are just getting started.

[ cross posted at http://blogs.sun.com/augustus/  ]