API access architecture #100

jace · 2017-02-11T10:03:22Z

Coaster needs to provide the foundation for API-based access to HasGeek apps.

Our current approach has tight coupling between view functions and the rendered output, whether as HTML or JSON. It also assumes a front-end and back-end developed in sync with each other, in the same repository. These assumptions change when we have a single page application (SPA) that may be long lived in the browser, going out of sync with back-end deployments. It gets worse with native apps, which can be out of sync by weeks or months.

To decouple front-end and back-end, we need some changes:

Long lived endpoints that guarantee an API regardless of actual data model. This can be via three approaches:
1. Distinct, versioned URLs in the form /api/<version>/<method> where each version can have a distinct calling pattern.
2. A REST API where the URL is the same across versions, and the version is selected via an explicit HTTP header. GitHub does this via the Accept header.
3. A hybrid model where some URLs are explicitly versioned and within each, further customisation is possible via the Accept header. Coaster's render_with facilitates this approach.
As a necessary outcome of the previous, views are now wrappers around a lower layer that handles actual business logic. This is the workflow layer. Coaster provides a docflow module for this. However, Docflow's architecture hasn't been tested with a non-trivial app and could do with more attention. It is currently only used with Kharcha, which exposed some limitations. (Update: StateManager was introduced in LabeledEnum helper property to replace Docflow #150 and Docflow is now deprecated, pending removal once the module is moved to Kharcha.)
The back-end can also be an API consumer, especially as we move to distributed data storage. Lastuser provides an OAuth-based permission request and grant workflow that allows one app to request access to resources hosted in another app. However, OAuth is limiting as it recognises the notion of a type of resource and an action on it, but not a specific resource. For example, rather than grant access to one specific jobpost in Hasjob, the user can only grant access to all jobposts in Hasjob. Google's libmacaroons provides a framework for addressing this, but we need to build a workflow around it.
Another concern with decoupled front-ends and back-ends is that a front-end may have a data requirement that the back-end API does not provide. We have seen this with Funnel's JSON API to the Android app, where the API is too limiting and results in unnecessarily verbose data transfer. Since the projects are separately maintained, having requirements synchronised is a challenge. One approach to this problem is to expose a query API and require the front-end to have an intimate knowledge of the required data model. Facebook's GraphQL is a viable candidate.
GraphQL introduces a new problem. If we link it to SQLAlchemy, we risk exposing sensitive data to a third party. This has been a known problem with Ruby on Rails and automatic form construction. In HasGeek apps we always wrap db model write access with a form. However, this is inadequate:
1. Read access isn't wrapped. A view can still accidentally expose data the caller isn't authorised to receive. Coaster's permission model (provided in sqlalchemy.PermissionMixin and enforced in views.load_models only checks for the authorisation to call a view. We do not have any mechanism to define what attributes a caller is authorised to receive. This weakness is visible in Funnel's JSON API, which has a bunch of if conditions to determine if data should be included in the results. We need an equivalent to PermissionMixin that specifies the conditions for read and write access to attributes on the model.
2. Forms are shallow, providing all or nothing access to each attribute of a db model. In the case of relationships, which represent nested attributes (the so called "document" model of document databases, rather than the flat row model of SQL databases), there is no established way to represent this data. JSON Models are an option, but will require explicit specification separate from the model, as we currently do for forms. This increases the effort required to spec out a new data model.
SQLAlchemy models are also our source of truth, superceding the actual backing database. When in doubt, we regenerate the database from the models and reimport data. Database migrations must always produce a result that matches the model definition. This unfortunately means that there is no schema versioning if we use GraphQL to directly expose SQLAlchemy. We have opposing constraints here that force a compromise layer:
1. The database can never be in two states. Database consistency cannot be compromised.
2. An API consumer can never be cut off without notice.
3. A wrapping layer is required. We can bake this into the SQLAlchemy models with attributes that wrap other attributes (using SQLAlchemy's own synonym and hybrid_property features, perhaps), but this will add layers of cruft to the model. It'll help if we can separate these out and explicitly mark their reasoning and maintenance window.
4. SQLAlchemy can't do cross-database relationships. For example, if we restrict the User, Organization and Team models to Lastuser, only storing UUID foreign keys in other apps (thereby removing these tables in all apps), an attribute like JobPost.user (in Hasjob) cannot be populated by SQLAlchemy. We will need another layer that populates this via RPC to Lastuser.
There may be a case for public vs private APIs, the latter restricted to a more tightly coordinated HasGeek team. A public API could be exposed over HTTPS while a private API requires local network access via AMQP, for example. While such private APIs will be more performant and have lower maintenance overheads, this approach has two consequences:
1. It's a break from our commitment (so far) to using the same APIs we expose to everyone else.
2. Our own native apps can't access the private API. They still need to go through the public API, which means the maintenance overheads remain. The private APIs only provide a performance boost for back-end data gathering.
Finally, when can APIs be retired? All API calls will need a logging mechanism to keep track of usage, and perhaps a rate limiter to prevent abuse. We could outsource the logger to Nginx, or we could have it as a decorator in the app, giving us more fine grained control over what data is logged.

Checklist:

Workflow management (LabeledEnum helper property to replace Docflow #150)
Column annotations (Immutable columns #138)
Role-based access control (Column-level permissions (RoleMixin) #109)
Class-based views (Class-based views #167)
Anchors (for libmacaroons) (Inbound and outbound anchors #157)
RPC-populated columns (replacing SQLAlchemy columns)
Message queue framework for two-way coordination with workers

The text was updated successfully, but these errors were encountered:

jace · 2017-02-11T10:39:10Z

This transition will not happen in a single step, so we should file separate tickets to help with each step of the process.

jace · 2017-02-11T10:54:11Z

Interestingly, GraphQL's best practices recommend not versioning. Three reasons that bring up versioning:

A field has been introduced. Since GraphQL requires it to be explicitly requested, this is not a problem.
A field has changed data type. Integer to UUID, for example. GraphQL now forces us to use a new name for changed type, leading us to condition 3.
A field is no longer present (perhaps because we changed its type from integer to UUID). We need to provide a wrapper here that emulates the old field. Since fields are only returned if requested, the overhead exists for only clients that haven't been updated.

jace · 2017-02-11T10:59:27Z

Further discussion that recommends an add-only approach to APIs: graphql/graphql-spec#134

jace · 2017-02-11T11:03:40Z

It appears we also get subscriptions for free with GraphQL. http://graphql.org/blog/subscriptions-in-graphql-and-relay/

jace · 2017-08-29T11:40:06Z

Columns on database models need additional annotations beyond what the database provides (data type, nullable, etc). Specifically, these three:

Mutable: Regular column whose value can change over time. (This may not need specific annotation.)
Immutable: This column's content is a natural key and cannot change. For example, in Lastuser's UserEmail model, the email column is a natural key. Surrogate primary keys (autoincrementing serial numbers or UUIDs) also qualify for immutability.
Cache: This column's content is a copy of a mutable field from an external source. It is not a source of truth and in case of doubt should be disregarded or refreshed.

Example: Lastuser's User model has a username field that ideally should not change, but by design users are allowed to change it as they feel. This qualifies it as a mutable field.

Since immutable and cache are the special cases, we need separate tickets to discuss their implementation.

jace · 2017-11-07T07:00:47Z

#150 describes a new workflow layer that's baked into the model. Since models are the principal objects passed around in business logic, it makes sense to host state management within the model.

This was referenced May 31, 2017

Column-level permissions (RoleMixin) #109

Closed

Column-level permissions #110

Closed

jace mentioned this issue Jun 28, 2017

Replace userid with UUID-backed buid across all principals hasgeek/lastuser#211

Merged

jace mentioned this issue Aug 3, 2017

Allow client credentials in place of auth tokens for GraphQL endpoint hasgeek/lastuser#212

Open

jace mentioned this issue Aug 29, 2017

Immutable columns #138

Closed

jace mentioned this issue Nov 7, 2017

Share campaigns hasgeek/hasmail#18

Open

This was referenced Feb 15, 2018

Class-based views #167

Merged

LabeledEnum helper property to replace Docflow #150

Closed

jace added the code-architecture label Feb 27, 2018

This was referenced Feb 28, 2018

Role access proxies should support custom methods #177

Closed

Use materialized views for all charts hasgeek/hasjob#419

Open

jace mentioned this issue Oct 1, 2018

Role Based Access Control hasgeek/lastuser#129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API access architecture #100

API access architecture #100

jace commented Feb 11, 2017 •

edited

Loading

jace commented Feb 11, 2017

jace commented Feb 11, 2017

jace commented Feb 11, 2017

jace commented Feb 11, 2017

jace commented Aug 29, 2017

jace commented Nov 7, 2017

API access architecture #100

API access architecture #100

Comments

jace commented Feb 11, 2017 • edited Loading

jace commented Feb 11, 2017

jace commented Feb 11, 2017

jace commented Feb 11, 2017

jace commented Feb 11, 2017

jace commented Aug 29, 2017

jace commented Nov 7, 2017

jace commented Feb 11, 2017 •

edited

Loading