Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Feature request] Returning fieldASTs to be able to cache them for performance imporvements #158

Closed
gyzerok opened this issue Aug 26, 2015 · 19 comments

Comments

@gyzerok
Copy link

gyzerok commented Aug 26, 2015

I have a use-case where my underlying data may change, but GraphQL queries are still the same. I think it may be nice if I can cache result of GraphQL language parsing (suppose its fieldASTs) and pass it to graphql function instead of raw query to be able to imporve performance when I need to execute the same query.

What do you think about this?

@leebyron
Copy link
Contributor

You can do this today!

Once ahead of time:

import { parse } from 'graphql/language';
import { validate } from 'graphql/validation';

var ast = parse(queryString);
var errors = validate(schema, ast);
if (errors.length) {
  exit(1);
}
cacheAwayAST(ast);

Later at runtime:

import { execute } from 'graphql/execution';
var ast = getCachedAST();
var result = execute(schema, ast);

@gyzerok
Copy link
Author

gyzerok commented Aug 27, 2015

Awesome! We can add a doc label to mention it in the docs later on.

@gyzerok
Copy link
Author

gyzerok commented Aug 27, 2015

Btw, what do you think about exporting this functions in the package root?

@KyleAMathews
Copy link

Curious how much time this saves?

Also isn't the plan to cache named queries?
On Wed, Aug 26, 2015 at 8:04 PM Fedor Nezhivoy notifications@github.com
wrote:

Btw, what do you think about exporting this functions in the package root?


Reply to this email directly or view it on GitHub
#158 (comment).

@leebyron
Copy link
Contributor

leebyron commented Sep 1, 2015

Btw, what do you think about exporting this functions in the package root?

I'll consider that in the future. We've been trying to keep the API surface area small.

Curious how much time this saves?
Also isn't the plan to cache named queries?

Actual parse time for most queries tends to be sub-millisecond on most hardware, and single-digit milliseconds for very large queries. my guess is that between parsing a query coming in from the network, and reading in a pre-parsed JSON file, there is 0 savings, possibly a net loss depending on how expensive the "get from cache" step is, as it may require an internal round trip to a storage mechanism which may itself need to spin a disk.

Where you can in fact make a significant savings using a technique like this is in uploaded bytes. Since GraphQL's top priority clients are mobile, their internet connections tend to be very poor, and GraphQL queries for complex applications tend to grow into the KB range, upload time for a query can actually be a measurable part of the whole round trip time for a query. At one point in Facebook's history, non-media transferred bytes were nearly 50% uploaded GraphQL.

Another point at which (more modest) savings can be made is in doing query validation once per written query and usually at build time, rather than validating a query every time you submit it to a server. For complex queries against larger schema, this can save many milliseconds.

At Facebook we use a technique similar to what I posted above, though we store the original string query, not a parsed AST, in our query cache. We parse the cached query string to an AST on every request. We found this didn't have a meaningful impact on our server performance, and had a slightly lighter impact on our cache as the GraphQL query string is much shorter than the equivalent printed JSON AST. It also made us feel better about what might happen if we ever wanted to make changes to the code responsible for the AST representation, knowing that the query string itself was much more stable to change.

@KyleAMathews
Copy link

Huh, so if I'm understanding things correctly, GraphQL can essentially switch from client-controlled queries to server-controlled RPC queries if needed as an optimization? I.e. these mobile clients are just sending the query name + vars and the actual query is fetched from a store?

@leebyron
Copy link
Contributor

leebyron commented Sep 2, 2015

I.e. these mobile clients are just sending the query name + vars and the actual query is fetched from a store?

Yeah, that's pretty much right. At Facebook, we run a script during development time which submits our queries to a cache and returns back a unique identifier. Clients submit this unique identifier instead of the query string (along with vars)

@jardakotesovec
Copy link

I have done some measuring to see if our graphql endpoint we are building can get into performance issues.

And basically confirming what @leebyron said. Parsing under 1ms, but validation 20ms in my case for particular (real) query generated in Relay. Which is unacceptable for us.

Are there some security risks when omitting validation step in production? For us its ok, because our api will be facing just to our clients, so we know that the query are ok. But I am afraid if it lowers resistance for malicious requests.

I don't see how I could cache validation step, as relay generate kind of random query - using random names for fragments and aliases.

Profiling suggested that it spends most time in visitor.js.

@KyleAMathews
Copy link

I can confirm that removing validation saves ~15ms per request.

@KyleAMathews
Copy link

Actually, with more careful measuring, validation with NODE_ENV=production and a warm JIT cache takes between 5-8 ms on my dev box.

@KyleAMathews
Copy link

(this is for a very simple query I should add)

@leebyron
Copy link
Contributor

leebyron commented Oct 2, 2015

Are there some security risks when omitting validation step in production? For us its ok, because our api will be facing just to our clients, so we know that the query are ok. But I am afraid if it lowers resistance for malicious requests.

There should not be security risks - only fields defined by type systems will be executed, but you could certainly issue an invalid or ambiguous request and get unexpected information back.

Doing validation in DEV and skipping in PROD is the rough approximation of how we use GraphQL at Facebook, but we also ensure only previously validated queries are allowed to be run in PROD removing the whole security/ambiguity angle.

However validation should really not be a bottleneck. GraphQL-JS has not been optimized under load so there may be opportunities for optimization.

@mattkrick
Copy link
Contributor

At Facebook, we run a script during development time which submits our queries to a cache and returns back a unique identifier

@leebyron Do you do this statically with something like babel to replace the whole query in an uglify-ish step or do you send down an additional lookup table?

@helfer
Copy link
Contributor

helfer commented Mar 22, 2016

Are there some security risks when omitting validation step in production? For us its ok, because our api will be facing just to our clients, so we know that the query are ok. But I am afraid if it lowers resistance for malicious requests.

The only potential issue I see is that your server could get tied up in an infinite loop if you don't detect cycles in fragments during validation. Obviously that could also be detected during query execution.

Example:

fragment A on Person {
  id
  ...B
}

fragment B on Person {
  name
  ...A
}

For more details, see this section on cyclic fragments in the spec.

@leebyron
Copy link
Contributor

leebyron commented Apr 7, 2016

Closing this aging issue since it's mostly answered.

@helfer is correct through - it is only safe to execute a query that was validated at some point in time, but it could be at the point where you cache the query, rather than the point where you execute it. (This is in fact how GraphQL is used at Facebook).

Recently a section was added to the spec under validation to make this a little bit more clear http://facebook.github.io/graphql/#sec-Validation

@leebyron leebyron closed this as completed Apr 7, 2016
@leebyron
Copy link
Contributor

leebyron commented Apr 7, 2016

@mattkrick to answer your question, we've used both approaches at Facebook and strive to only include the data we actually need in production builds for build size reasons, so we've recently actually removed query text from production builds and only include this cache identifier that is used in place of the full query.

@mattkrick
Copy link
Contributor

@leebyron thanks! any plans to open source that query-to-cache-id tool? I found myself manually doing exactly what you're talking about for graphQL subscriptions (a graphQL query string makes for an ugly pubsub channel name. better to have a cars/123 channel and then turn that into a graphQL query on the server).

@leebyron
Copy link
Contributor

leebyron commented Apr 7, 2016

There's not much to open source beyond the idea, but we may add hooks to express-graphql to illustrate. Obviously you would need to provide your own storage layer.

@antmdvs
Copy link

antmdvs commented Aug 18, 2016

@mattkrick See here: graphql/express-graphql#109

mxstbr pushed a commit to mxstbr/apollo-server that referenced this issue Jan 10, 2019
A first RFC to kick off a discussion about caching parsed and validated
queries. This is based on the ideas from [graphql/graphql-js#158](graphql/graphql-js#158)

GraphQL APIs often get the same queries with different variables over
and over. Even though the query string doesn't change, Apollo server
parses and validates it on every request.

Instead, we can cache the resulting AST and skip the parsing and
validation step on further requests with the same query. While I don't
think this will provide an order of magnitude of a speedup, it seems
like it would avoid unnecessary work, could help in performance
sensitive scenarios and doesn't add much complexity.

This commit implements this in a very crude way to kick off a discussion
about including this in Apollo Server.
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

7 participants