-
Notifications
You must be signed in to change notification settings - Fork 767
Clustering
The MTIREid Connect server is designed to scale horizontally with multiple parallel instances of the server connecting to a common data store. However, there are a few important considerations that need to be made.
All instances of a given server need to talk to a single, common data store. Connections to the data store are wrapped in transactions and considered atomic operations by the system. Therefore, a process that starts on one system should be able to complete on another. This is best shown by the handling of authorization codes, which are stored in the database when created and removed from the database once used.
As a consequence, most data-layer caches are turned off. When items are needed from the database, they are read again instead of pulled from an in-memory cache. This approach can hinder performance at large scales but it allows for parallelization of the server.
Out of the box, MITREid Connect is built around a JPA-based data platform. This allows any JPA-compliant database to be swapped in, provided there's a connector and an appropriate schema for it. The project ships with support for HyperSQL (both in-memory and on-disk), MySQL, PostgreSQL, and Oracle databases. These connections are configured through the data-context.xml
configuration file.
If the underlying database allows for clustering, this needs to be configured apart from the MITREid Connect connection to it. MITREid Connect assumes a single database entity, but most database clustering technologies hide that from the application layer. As long as that holds true, any database clustering technology should work.
For interactive pages, MITREid Connect saves certain items in the current user's session during processing of a request. For the server to be clustered, session sharing needs to be set up in the servlet container hosting the application. The methods for doing this are specific to the servlet container, but the effect is that each instance of the server transparently has access to the same session information.
There is a special exception for cases where the user's interactions are handled all by a single server, but other parts of the protocol (such as back-channel calls to the token endpoint) are handled by other servers. In these cases, the front-channel and back-channel servers do not need to share session information. Only servers instances that will see a given user in a single transaction will need to share sessions.
The APIs do not have any session requirements.
There are several database cleanup functions referenced in task-config.xml
that take care of things like cleaning out expired tokens and removing orphaned data objects. These functions are designed to be called only by a single instance of the server, and if multiple instances call them simultaneously then sometimes data corruption can occur. In a clustered or split deployment environment, only one server should run the tasks, and all other servers should have their task functions disabled.
Software is available under the Apache 2.0 license. Documentation available under the Creative Commons 3.0 By-NC license.