[Feature request] Returning fieldASTs to be able to cache them for performance imporvements #158

gyzerok · 2015-08-26T16:07:47Z

I have a use-case where my underlying data may change, but GraphQL queries are still the same. I think it may be nice if I can cache result of GraphQL language parsing (suppose its fieldASTs) and pass it to graphql function instead of raw query to be able to imporve performance when I need to execute the same query.

What do you think about this?

The text was updated successfully, but these errors were encountered:

leebyron · 2015-08-26T20:16:16Z

You can do this today!

Once ahead of time:

import { parse } from 'graphql/language';
import { validate } from 'graphql/validation';

var ast = parse(queryString);
var errors = validate(schema, ast);
if (errors.length) {
  exit(1);
}
cacheAwayAST(ast);

Later at runtime:

import { execute } from 'graphql/execution';
var ast = getCachedAST();
var result = execute(schema, ast);

gyzerok · 2015-08-27T03:03:35Z

Awesome! We can add a doc label to mention it in the docs later on.

gyzerok · 2015-08-27T03:04:38Z

Btw, what do you think about exporting this functions in the package root?

KyleAMathews · 2015-08-27T04:10:33Z

Curious how much time this saves?

Also isn't the plan to cache named queries?
On Wed, Aug 26, 2015 at 8:04 PM Fedor Nezhivoy notifications@github.com
wrote:

Btw, what do you think about exporting this functions in the package root?

—
Reply to this email directly or view it on GitHub
#158 (comment).

leebyron · 2015-09-01T23:09:47Z

Btw, what do you think about exporting this functions in the package root?

I'll consider that in the future. We've been trying to keep the API surface area small.

Curious how much time this saves?
Also isn't the plan to cache named queries?

Actual parse time for most queries tends to be sub-millisecond on most hardware, and single-digit milliseconds for very large queries. my guess is that between parsing a query coming in from the network, and reading in a pre-parsed JSON file, there is 0 savings, possibly a net loss depending on how expensive the "get from cache" step is, as it may require an internal round trip to a storage mechanism which may itself need to spin a disk.

Where you can in fact make a significant savings using a technique like this is in uploaded bytes. Since GraphQL's top priority clients are mobile, their internet connections tend to be very poor, and GraphQL queries for complex applications tend to grow into the KB range, upload time for a query can actually be a measurable part of the whole round trip time for a query. At one point in Facebook's history, non-media transferred bytes were nearly 50% uploaded GraphQL.

Another point at which (more modest) savings can be made is in doing query validation once per written query and usually at build time, rather than validating a query every time you submit it to a server. For complex queries against larger schema, this can save many milliseconds.

At Facebook we use a technique similar to what I posted above, though we store the original string query, not a parsed AST, in our query cache. We parse the cached query string to an AST on every request. We found this didn't have a meaningful impact on our server performance, and had a slightly lighter impact on our cache as the GraphQL query string is much shorter than the equivalent printed JSON AST. It also made us feel better about what might happen if we ever wanted to make changes to the code responsible for the AST representation, knowing that the query string itself was much more stable to change.

KyleAMathews · 2015-09-02T20:35:11Z

Huh, so if I'm understanding things correctly, GraphQL can essentially switch from client-controlled queries to server-controlled RPC queries if needed as an optimization? I.e. these mobile clients are just sending the query name + vars and the actual query is fetched from a store?

leebyron · 2015-09-02T23:24:17Z

I.e. these mobile clients are just sending the query name + vars and the actual query is fetched from a store?

Yeah, that's pretty much right. At Facebook, we run a script during development time which submits our queries to a cache and returns back a unique identifier. Clients submit this unique identifier instead of the query string (along with vars)

jardakotesovec · 2015-09-22T06:03:43Z

I have done some measuring to see if our graphql endpoint we are building can get into performance issues.

And basically confirming what @leebyron said. Parsing under 1ms, but validation 20ms in my case for particular (real) query generated in Relay. Which is unacceptable for us.

Are there some security risks when omitting validation step in production? For us its ok, because our api will be facing just to our clients, so we know that the query are ok. But I am afraid if it lowers resistance for malicious requests.

I don't see how I could cache validation step, as relay generate kind of random query - using random names for fragments and aliases.

Profiling suggested that it spends most time in visitor.js.

KyleAMathews · 2015-10-02T18:40:00Z

I can confirm that removing validation saves ~15ms per request.

KyleAMathews · 2015-10-02T18:46:31Z

Actually, with more careful measuring, validation with NODE_ENV=production and a warm JIT cache takes between 5-8 ms on my dev box.

KyleAMathews · 2015-10-02T19:09:58Z

(this is for a very simple query I should add)

leebyron · 2015-10-02T19:38:39Z

Are there some security risks when omitting validation step in production? For us its ok, because our api will be facing just to our clients, so we know that the query are ok. But I am afraid if it lowers resistance for malicious requests.

There should not be security risks - only fields defined by type systems will be executed, but you could certainly issue an invalid or ambiguous request and get unexpected information back.

Doing validation in DEV and skipping in PROD is the rough approximation of how we use GraphQL at Facebook, but we also ensure only previously validated queries are allowed to be run in PROD removing the whole security/ambiguity angle.

However validation should really not be a bottleneck. GraphQL-JS has not been optimized under load so there may be opportunities for optimization.

mattkrick · 2016-01-23T18:02:51Z

At Facebook, we run a script during development time which submits our queries to a cache and returns back a unique identifier

@leebyron Do you do this statically with something like babel to replace the whole query in an uglify-ish step or do you send down an additional lookup table?

helfer · 2016-03-22T03:05:55Z

Are there some security risks when omitting validation step in production? For us its ok, because our api will be facing just to our clients, so we know that the query are ok. But I am afraid if it lowers resistance for malicious requests.

The only potential issue I see is that your server could get tied up in an infinite loop if you don't detect cycles in fragments during validation. Obviously that could also be detected during query execution.

Example:

fragment A on Person {
  id
  ...B
}

fragment B on Person {
  name
  ...A
}

For more details, see this section on cyclic fragments in the spec.

leebyron · 2016-04-07T06:54:02Z

Closing this aging issue since it's mostly answered.

@helfer is correct through - it is only safe to execute a query that was validated at some point in time, but it could be at the point where you cache the query, rather than the point where you execute it. (This is in fact how GraphQL is used at Facebook).

Recently a section was added to the spec under validation to make this a little bit more clear http://facebook.github.io/graphql/#sec-Validation

leebyron · 2016-04-07T06:55:43Z

@mattkrick to answer your question, we've used both approaches at Facebook and strive to only include the data we actually need in production builds for build size reasons, so we've recently actually removed query text from production builds and only include this cache identifier that is used in place of the full query.

mattkrick · 2016-04-07T15:39:17Z

@leebyron thanks! any plans to open source that query-to-cache-id tool? I found myself manually doing exactly what you're talking about for graphQL subscriptions (a graphQL query string makes for an ugly pubsub channel name. better to have a cars/123 channel and then turn that into a graphQL query on the server).

leebyron · 2016-04-07T17:04:34Z

There's not much to open source beyond the idea, but we may add hooks to express-graphql to illustrate. Obviously you would need to provide your own storage layer.

antmdvs · 2016-08-18T15:31:12Z

@mattkrick See here: graphql/express-graphql#109

A first RFC to kick off a discussion about caching parsed and validated queries. This is based on the ideas from [graphql/graphql-js#158](graphql/graphql-js#158) GraphQL APIs often get the same queries with different variables over and over. Even though the query string doesn't change, Apollo server parses and validates it on every request. Instead, we can cache the resulting AST and skip the parsing and validation step on further requests with the same query. While I don't think this will provide an order of magnitude of a speedup, it seems like it would avoid unnecessary work, could help in performance sensitive scenarios and doesn't add much complexity. This commit implements this in a very crude way to kick off a discussion about including this in Apollo Server.

leebyron added the documentation label Sep 1, 2015

skevy mentioned this issue Oct 1, 2015

[RFC] Initial pass at adding subscription to executor #189

Merged

vladar mentioned this issue Oct 19, 2015

Relay is inlining scalar fragment variables facebook/relay#486

Closed

mattkrick mentioned this issue Jan 28, 2016

Open to a PR allowing for uglifying print? #280

Closed

leebyron closed this as completed Apr 7, 2016

jasonbahl mentioned this issue Apr 5, 2017

Filter the Query to allow for caching and using cached ASTs wp-graphql/wp-graphql#112

Closed

mxstbr mentioned this issue Jan 10, 2019

[RFC] Cache parsed and validated queries in memory apollographql/apollo-server#2170

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Returning fieldASTs to be able to cache them for performance imporvements #158

[Feature request] Returning fieldASTs to be able to cache them for performance imporvements #158

gyzerok commented Aug 26, 2015

leebyron commented Aug 26, 2015

gyzerok commented Aug 27, 2015

gyzerok commented Aug 27, 2015

KyleAMathews commented Aug 27, 2015

leebyron commented Sep 1, 2015

KyleAMathews commented Sep 2, 2015

leebyron commented Sep 2, 2015

jardakotesovec commented Sep 22, 2015

KyleAMathews commented Oct 2, 2015

KyleAMathews commented Oct 2, 2015

KyleAMathews commented Oct 2, 2015

leebyron commented Oct 2, 2015

mattkrick commented Jan 23, 2016

helfer commented Mar 22, 2016

leebyron commented Apr 7, 2016

leebyron commented Apr 7, 2016

mattkrick commented Apr 7, 2016

leebyron commented Apr 7, 2016

antmdvs commented Aug 18, 2016

[Feature request] Returning fieldASTs to be able to cache them for performance imporvements #158

[Feature request] Returning fieldASTs to be able to cache them for performance imporvements #158

Comments

gyzerok commented Aug 26, 2015

leebyron commented Aug 26, 2015

gyzerok commented Aug 27, 2015

gyzerok commented Aug 27, 2015

KyleAMathews commented Aug 27, 2015

leebyron commented Sep 1, 2015

KyleAMathews commented Sep 2, 2015

leebyron commented Sep 2, 2015

jardakotesovec commented Sep 22, 2015

KyleAMathews commented Oct 2, 2015

KyleAMathews commented Oct 2, 2015

KyleAMathews commented Oct 2, 2015

leebyron commented Oct 2, 2015

mattkrick commented Jan 23, 2016

helfer commented Mar 22, 2016

leebyron commented Apr 7, 2016

leebyron commented Apr 7, 2016

mattkrick commented Apr 7, 2016

leebyron commented Apr 7, 2016

antmdvs commented Aug 18, 2016