After about 10 years at Sun and thus Oracle, I have decided to leave Oracle to pursue other opportunities. Today, 9th April, is my last day.

My time at Sun was enjoyable. I will miss all the intelligent and friendly people at Sun. I hope Oracle and Solaris Cluster continue to do well in the future.

I will continue blogging here.


Modelling a software problem

A solution to every software problem must have two characteristics, it must be correct and and it must be efficient. One of the most important steps of solving a problem is to model the problem. A good software model of the problem determines how the problem gets solved and how efficient the solution is. Right modelling requires a good knowledge of software techniques and tools.

An engineer with good knowledge of data structures and algorithm techniques should be able to the model a single problem in multiple different ways, be able to see the trade-offs in the different models and pick a model using which the problem can be solved. Once the model has been decided, then a variety of tools can be used to implement the solution.

Here is a example. Facebook lists a variety of problems in their puzzles page that is ordered by how hard it is to solve the puzzle. Let us look at one of the hardest puzzles,  "FaceBull". The problem is to find the cheapest way to produce all the chemical compounds needed for the new super-energy drink from any single source compound. How would you model this problem?

Let each compound to be a vertex of a graph. There will be a edge between any two compounds, when there is machine that converts one compound to another. The weight of this edge is the cost of acquiring the machine that does the conversion. Let us draw this graph with the example input given in the problem.

Now in this model, the problem becomes, starting from any vertex find a path with the minimum cost that visits every vertex of this graph atleast once. This is a variation of the classic traveling salesman problem(TSP). Traveling salesman problem is a NP-complete problem. No wonder Facebook marked this as a hard problem.

We have identified a well known and well studied problem. The brute force approach to solve the problem is in the order of O(n!), where n is the number of vertices. It will take years for such a program to complete for just 20 vertices. Although it is difficult to solve TSP and to find the optimal solution for all inputs, there are many heuristics and approximation algorithms that will give a solution that is close to the optimal solution.

Notice how we transformed a chemical industry application problem into a well studied computer science problem with the right modelling.

When Nick and Thorsten were visiting the bayarea for Open HA Cluster Summit, we talked about some of the difficult problems in Solaris Cluster and how we may model them. This is one of the exciting aspects of working at Sun and at Solaris Cluster. There are many difficult problems to solve and people are eager to solve it. I  love the opportunity that every difficult problem presents. Stay tuned, we are just getting started.

[ cross posted at http://blogs.sun.com/augustus/  ]

Shared Nothing Storage in Open HA Cluster

Two years back I led and designed a project to make Solaris Cluster easy to use. The wizards that resulted from this effort is a key element of the Solaris Cluster user experience now. Many new projects want their features supported using the wizards today. The architecture of the project even made it to the IEEE Cluster conference.

This past year I led another important effort for Solaris Cluster. This feature, Shared Nothing Storage, that was released as part of Open HA Cluster 2009.06 removed a major hardware requirement for the cluster: the necessity to have a shared storage. This was achieved by configuring the iSCSI protocol stack present in COMSTAR in a particular fashion and layering a ZFS mirror on top. This feature allows a user to use any local disk present in the system as a storage for the service and to make that service highly available.

There is no need to turn disk fencing on for this configuration and therefore it also removes the need to have SCSI reservations. Here is a picture of the configuration, with detailed configuration instructions here.

The key challenge in providing this feature was to make the cluster device subsystem robust enough to handle devices that are attached via the network. The design details are present here.

This configuration becomes more interesting when I/O multipathing is configured, because it shows the flexibility and the power of the COMSTAR architecture. With COMSTAR, a single logical unit of storage can be accessed via multiple port providers, multiple iSCSI targets in this case. These multiple iSCSI targets can be used to create multiple paths to the same logical unit. This provides fast mirroring of data in the cluster configuration. If you want to understand the different configurations with multi-pathing, Aaron Dailey and Scott Tracy have a excellent white paper on using MPxIO on Solaris. Here is a picture of the cluster configuration with I/O multipathing.

 Try it out. Join the discussions at ha-clusters-discuss@opensolaris.org

Sometimes the snow comes down in June

June was a good busy month. The source code to Solaris Cluster was open sourced late May. We have been seeing some excitement from universities so far. See here, here and here. There is great technology here that is accessible to everyone now.

My fun at home became "highly available" with the arrival of my second son, Dalvin Jonan Diraviam, in the last week of June. It is pretty exciting to see nature at its best.

The marathon training for the San Francisco marathon is in its final month now. Julianne joined our long runs couple of times. She is a experienced runner and it was great fun to run with her. Here is the long run schedule for July. Sorry about the delay in posting this. This is the last one, you are ready for the marathon after this. Good luck.

 July 5th  13 miles
 July 12th  22 miles
 July 19th  13 miles
 July 26th  6 miles

Synchronization of common agent container security files

Solaris Cluster uses common agent container as part of its management infrastructure. The common agent container (CAC) uses public key mechanisms for encryption and authentication. Here is the complete guide that explains CAC in lot more detail.

In Solaris Cluster, the CAC keys must be the same on all the nodes of the cluster, so that the management infrastructure can communicate with all the cluster nodes. Cluster software ensures that these keys are same on all the cluster nodes. However there could be scenarios when these keys go out of sync. When that happens, you will start seeing errors like below,

             ERROR: Unable to connect to the common agent container on node
             pneta1. Ensure that the common agent container is running and you
             have the required authorizations to connect to the common agent
             container on this node.

    Press RETURN to continue


 Here are the steps to correct this situation.

1. Stop CAC on all the cluster nodes

   #/usr/sbin/cacaoadm stop 


2. Copy the CAC security files from one node of the cluster to all the other nodes of the cluster.

    On any one node do, 

   cd /etc/cacao/instances/default/

   tar cf /tmp/SECURITY.tar security

   then transfer the SECURITY.tar to all the nodes and do,

   cd /etc/cacao/instances/default/

   tar xf /tmp/SECURITY.tar

   You can now remove all the copies of SECURITY.tar


3. Restart the CAC on all the cluster nodes

    /usr/sbin/cacaoadm start

 This procedure is explained in detail here. Join our communities around CAC and Solaris Cluster for more.

Changing Sun Cluster Manager port, 6789

There have been requests from people who want to change the port through which Sun Cluster Manager(SCM) is accessed. SCM, like many other web applications from Sun, is accessed through the Sun Java Web Console. By default, Sun Java Web Console is accessed via a secure HTTP port 6789. In fact, the port numbers 6786 to 6789 are assigned for Sun Java Web Console and no other application should use these ports.

Here is a procedure, that I used recently, that changes these ports, if necessary. Maybe this will be useful for others as well.

1. Find out the version of the Sun Java Web Console that you currently have.

    /usr/sbin/smcwebserver -V

2. If the version is 3.0.2, then do the following.

   smcwebserver stop

   cd /var/webconsole/domains

   rm -rf console

   cd /etc/webconsole/console

   rm status.properties

   rm regcache/registry.properties

   edit config.properties

       Replace values for console_httpsport and console_httpport
       // If on Solaris 10, clear the service:

      svcadm clear system/webconsole:console

   smcwebserver start

3.   If the version is greater than 3.0.2, then do the following.

   smcwebserver stop

   /usr/share/webconsole/bin/wcswap -t tomcat -s <nnnn> -p <nnnn>
       // If on Solaris 10, clear the service:

      svcadm clear system/webconsole:console

   smcwebserver start


IEEE Cluster 2007 Conference

Ira and I attended the IEEE Cluster 2007 conference last month.  This conference was held at Austin, Texas from September 17-20. This was a technical conference with hands-on tutorials, paper presentations, poster sessions and panel discussions related to cluster computing.  Cluster computing means both high-performance cluster computing, and high-availability clustering in this conference.

 A poster paper that I co-authored, "CHAF – An Object Oriented Framework for Configuring Applications in a Clustered Environment" was accepted for the conference. This framework was implemented in Sun Cluster 3.2. I gave a live demonstration of this implementation on my laptop with a lab cluster at the back end. My session and demo were well received, to the extent that a Sun customer referred to it in his email to Sun later.Sun Cluster Manager Task Page

Andy Bechtolsheim gave the opening keynote, at the conference, on "Scaling to Petaflops". He talked about challenges of peta-scale, the opportunities and Sun’s work in this area. He said that the primary challenge is memory speed scaling to meet the extra compute power delivered by the multiple cores.

 The two main topics that were the focus of the research papers and discussions, were multi-core and virtualization.  The panel discussion topic was multi-core computing, and the panelists were from IBM, Intel, UT Austin, nVIDIA and AMD. Prof. Steve Keckler used the phrase "termites, chainsaws and bulldozers" to refer to the different numbers of cores per chip, and it was clear by the end of the panel session that this phrase had caught on among the panelists and the audience!

 Ira and I also got an opportunity to visit Texas Advanced Computing Center at the University of Austin. This center is building a new supercomputer using Sun machines and the new Sun Magnum switch. It will be the largest supercomputer in the world when it becomes operational at the end of this year. It will have about 4000 nodes in the system. The whole site and the system was very impressive.

 The conference organizers had arranged a social outing with a barbecue dinner and live music (two live rock bands) at an Austin landmark restaurant, Stubbs. You might have seen their barbecue sauce in a local Safeway.

 One thought that stayed with me after the conference was from the closing keynote about "The Challenges and Rewards of Petascale Clusters", by Mark Seager from Lawrence Livermore National Labs. Mark Seager mentioned that different technologies that are in main stream today were present in the research community at least 20 years back. Some examples that he gave were garbage collection, virtual machines, and object oriented design. He stated that parallel programming was a technology that was not mainstream yet. I look forward to seeing Sun playing a big part in this.