-
Notifications
You must be signed in to change notification settings - Fork 31
Virtual Keyspaces
One of the use cases for Keyspaces in Cassandra was for multi-tenant applications. Unfortunately, Keyspaces use a lot of memory, to the degree that it’s unlikely that you’d be running with any significant number of them. In the discussion on this topic on the Cassandra-user mailing list, the consensus was that if you needed Keyspace-like functionality but for a large number of Keyspaces (i.e. user can point and click to create a new Keyspace), that you would have to use a single static Keyspace and to simulate the idea of different keyspaces in the application layer. Hector now has a feature that makes it much simpler to maintain these “virtual keyspaces” within your application.
The approach used is similar to the “Shared Database, Shared Schema” approach to multi-tenancy often used with conventional RDBMS, where an additional column is added to all the tables in a database schema that contains some sort of tenant id (see 1 and 2). With Cassandra, while adding an additional column may make sense for indexed queries, a lot of the time you’re working with row keys. One potentially simple way to implement a virtual keyspace model in Cassandra, therefore, would be to prepend a tenant id to every CF row key value. Hector does a good job of abstracting access to Cassandra away from Thrift and the native Thrift data structures, and it passes all operations through the KeyspaceService interface, which is implemented by KeyspaceServiceImpl. Virtual keyspaces are implemented by a subclass of KeyspaceServiceImpl, called PrefixedKeyspaceServiceImpl, which adds the prefix to all row keys that are sent to Cassandra, and removes the prefix from all keys that are returned, while discarding keys where the returned key doesn’t contain a matching prefix. This should have the effect of completely hiding rows that aren’t in your virtual keyspace. Keep in mind, though, that you may very well want to still use an indexed tenant-id column in your CF if you’re doing things like lots of indexed queries. While the virtual keyspace code will discard returned rows where the row key isn’t prefixed with the correct tenant-id, if you also have an indexed tenant-id column and it’s specified in your indexed query, then it’s going to be more efficient than relying on the virtual keyspace code to filter out a large number of returned rows. Adding and using that tenant-id column is currently left up to you, the virtual keyspace code doesn’t handle that part.
In order to make use of this, you call HFactory.createPrefixedKeyspace rather than HFactory.createKeyspace. Unless you do this, none of new virtual keyspace code will be in your execution path, so there wont be any risk of effect to your existing applications. You should ideally only use this with a clean empty physical keyspace. You should make sure that you use prefixes that all serialize to byte arrays of equal length, the expectation was that typically the prefix will be a UUID. Note that, although we do support any prefix type that we have a serializer for, that the OrderPreservingPartitioner does expect that row keys are UTF8 encoded, so your prefix should also be a UTF8 string or the OPP will complain.
The unit test for it is currently a subclass of ApiV2SystemTest that performs all the same tests but using a prefixed keyspace.
1 http://iablog.sybase.com/kleisath/index.php/2009/11/multi-tenant-database-architecture-part-5/
2 http://msdn.microsoft.com/en-us/library/aa479086.aspx