Shared Nothing Storage in Open HA Cluster

Two years back I led and designed a project to make Solaris Cluster easy to use. The wizards that resulted from this effort is a key element of the Solaris Cluster user experience now. Many new projects want their features supported using the wizards today. The architecture of the project even made it to the IEEE Cluster conference.

This past year I led another important effort for Solaris Cluster. This feature, Shared Nothing Storage, that was released as part of Open HA Cluster 2009.06 removed a major hardware requirement for the cluster: the necessity to have a shared storage. This was achieved by configuring the iSCSI protocol stack present in COMSTAR in a particular fashion and layering a ZFS mirror on top. This feature allows a user to use any local disk present in the system as a storage for the service and to make that service highly available.

There is no need to turn disk fencing on for this configuration and therefore it also removes the need to have SCSI reservations. Here is a picture of the configuration, with detailed configuration instructions here.

The key challenge in providing this feature was to make the cluster device subsystem robust enough to handle devices that are attached via the network. The design details are present here.

This configuration becomes more interesting when I/O multipathing is configured, because it shows the flexibility and the power of the COMSTAR architecture. With COMSTAR, a single logical unit of storage can be accessed via multiple port providers, multiple iSCSI targets in this case. These multiple iSCSI targets can be used to create multiple paths to the same logical unit. This provides fast mirroring of data in the cluster configuration. If you want to understand the different configurations with multi-pathing, Aaron Dailey and Scott Tracy have a excellent white paper on using MPxIO on Solaris. Here is a picture of the cluster configuration with I/O multipathing.

 Try it out. Join the discussions at ha-clusters-discuss@opensolaris.org

Advertisements

2 thoughts on “Shared Nothing Storage in Open HA Cluster

  1. This is so incredibly innovative!
    Does the ZFS volume need to be mounted on one system at a time?
    How do you know when you are allowed to move the ZFS mounting from Node 1 to Node 2?
    What is the process for that? (i.e. if Node 1 becomes unavailable… let’s say from a network outage, how do you bring up ZFS on Node 2, and what keeps ZFS from coming up on Node 1… let’s say after a network outage is restored?)

  2. Hi David,
    Thank you. The mirrored Zpool that created using the iSCSI devices is then given to the clustering software to be managed. See the instructions here that details the steps,
    http://docs.sun.com/app/docs/doc/820-4682/gbspx?l=en&a=view
    The HAStoragePlus(see the above link for details) resource that is created will then handle node and other failures and ZFS mounting appropriately.
    Internally, the HAStoragePlus resource would export the zpool and import the zpool at the correct node based on the cluster membership and a user-given priority order of nodes.
    Yes, the zpool can be imported only on one node at a time. The HAStoragePlus resource will ensure that it will happen. The code is here,
    http://src.opensolaris.org/source/xref/ohac/ohac/usr/src/cmd/ha-services/hastorageplus/hastorageplus_zfs_private.c

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s