-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Feature request] Returning fieldASTs to be able to cache them for performance imporvements #158
Comments
You can do this today! Once ahead of time: import { parse } from 'graphql/language';
import { validate } from 'graphql/validation';
var ast = parse(queryString);
var errors = validate(schema, ast);
if (errors.length) {
exit(1);
}
cacheAwayAST(ast); Later at runtime: import { execute } from 'graphql/execution';
var ast = getCachedAST();
var result = execute(schema, ast); |
Awesome! We can add a |
Btw, what do you think about exporting this functions in the package root? |
Curious how much time this saves? Also isn't the plan to cache named queries?
|
I'll consider that in the future. We've been trying to keep the API surface area small.
Actual parse time for most queries tends to be sub-millisecond on most hardware, and single-digit milliseconds for very large queries. my guess is that between parsing a query coming in from the network, and reading in a pre-parsed JSON file, there is 0 savings, possibly a net loss depending on how expensive the "get from cache" step is, as it may require an internal round trip to a storage mechanism which may itself need to spin a disk. Where you can in fact make a significant savings using a technique like this is in uploaded bytes. Since GraphQL's top priority clients are mobile, their internet connections tend to be very poor, and GraphQL queries for complex applications tend to grow into the KB range, upload time for a query can actually be a measurable part of the whole round trip time for a query. At one point in Facebook's history, non-media transferred bytes were nearly 50% uploaded GraphQL. Another point at which (more modest) savings can be made is in doing query validation once per written query and usually at build time, rather than validating a query every time you submit it to a server. For complex queries against larger schema, this can save many milliseconds. At Facebook we use a technique similar to what I posted above, though we store the original string query, not a parsed AST, in our query cache. We parse the cached query string to an AST on every request. We found this didn't have a meaningful impact on our server performance, and had a slightly lighter impact on our cache as the GraphQL query string is much shorter than the equivalent printed JSON AST. It also made us feel better about what might happen if we ever wanted to make changes to the code responsible for the AST representation, knowing that the query string itself was much more stable to change. |
Huh, so if I'm understanding things correctly, GraphQL can essentially switch from client-controlled queries to server-controlled RPC queries if needed as an optimization? I.e. these mobile clients are just sending the query name + vars and the actual query is fetched from a store? |
Yeah, that's pretty much right. At Facebook, we run a script during development time which submits our queries to a cache and returns back a unique identifier. Clients submit this unique identifier instead of the query string (along with vars) |
I have done some measuring to see if our graphql endpoint we are building can get into performance issues. And basically confirming what @leebyron said. Parsing under 1ms, but validation 20ms in my case for particular (real) query generated in Relay. Which is unacceptable for us. Are there some security risks when omitting validation step in production? For us its ok, because our api will be facing just to our clients, so we know that the query are ok. But I am afraid if it lowers resistance for malicious requests. I don't see how I could cache validation step, as relay generate kind of random query - using random names for fragments and aliases. Profiling suggested that it spends most time in |
I can confirm that removing validation saves ~15ms per request. |
Actually, with more careful measuring, validation with |
(this is for a very simple query I should add) |
There should not be security risks - only fields defined by type systems will be executed, but you could certainly issue an invalid or ambiguous request and get unexpected information back. Doing validation in DEV and skipping in PROD is the rough approximation of how we use GraphQL at Facebook, but we also ensure only previously validated queries are allowed to be run in PROD removing the whole security/ambiguity angle. However validation should really not be a bottleneck. GraphQL-JS has not been optimized under load so there may be opportunities for optimization. |
@leebyron Do you do this statically with something like babel to replace the whole query in an uglify-ish step or do you send down an additional lookup table? |
The only potential issue I see is that your server could get tied up in an infinite loop if you don't detect cycles in fragments during validation. Obviously that could also be detected during query execution. Example:
For more details, see this section on cyclic fragments in the spec. |
Closing this aging issue since it's mostly answered. @helfer is correct through - it is only safe to execute a query that was validated at some point in time, but it could be at the point where you cache the query, rather than the point where you execute it. (This is in fact how GraphQL is used at Facebook). Recently a section was added to the spec under validation to make this a little bit more clear http://facebook.github.io/graphql/#sec-Validation |
@mattkrick to answer your question, we've used both approaches at Facebook and strive to only include the data we actually need in production builds for build size reasons, so we've recently actually removed query text from production builds and only include this cache identifier that is used in place of the full query. |
@leebyron thanks! any plans to open source that query-to-cache-id tool? I found myself manually doing exactly what you're talking about for graphQL subscriptions (a graphQL query string makes for an ugly pubsub channel name. better to have a |
There's not much to open source beyond the idea, but we may add hooks to express-graphql to illustrate. Obviously you would need to provide your own storage layer. |
@mattkrick See here: graphql/express-graphql#109 |
A first RFC to kick off a discussion about caching parsed and validated queries. This is based on the ideas from [graphql/graphql-js#158](graphql/graphql-js#158) GraphQL APIs often get the same queries with different variables over and over. Even though the query string doesn't change, Apollo server parses and validates it on every request. Instead, we can cache the resulting AST and skip the parsing and validation step on further requests with the same query. While I don't think this will provide an order of magnitude of a speedup, it seems like it would avoid unnecessary work, could help in performance sensitive scenarios and doesn't add much complexity. This commit implements this in a very crude way to kick off a discussion about including this in Apollo Server.
I have a use-case where my underlying data may change, but GraphQL queries are still the same. I think it may be nice if I can cache result of GraphQL language parsing (suppose its fieldASTs) and pass it to
graphql
function instead of raw query to be able to imporve performance when I need to execute the same query.What do you think about this?
The text was updated successfully, but these errors were encountered: