Skip to content

Bootstrapping

Peter Goodman edited this page Sep 19, 2023 · 1 revision

What does Multiplier's bootstrap process do, and why do we do it? Before reading this, you should read about PASTA's bootstrap process.

  1. Creates the serializable form that can cover the various API surface areas of PASTA (pasta::Decl, pasta::Stmt, etc.). This is saved in lib/AST.capnp. Right now we use Cap'n Proto. The way we go about persistently representing objects is somewhat opaque. We have a concept of "typed slots." We roughly have one slot per method. Slots don't have meaningful names, because they can have different meanings in different classes. The key challenge is that we have a class hierarchy in PASTA (because of Clang), and a given object needs to be able to be casted up/down that hierarchy without changing. Cap'n Proto doesn't support subtyping, and it has a reader interface for things, and it wouldn't be safe or really even possible to do arbitrary pointer arithmetic on its underlying storage to be able to emulate changing the type we use to view the backing storage, so instead we opt for all entities in a given class hierarchy (e.g. Decls) uniformly see the same serialized representation.

  2. Creates code to persist the surface area of PASTA's AST API into a serializable form (i.e. Cap'n Proto). A serializer for a class needs to call the serializer for that class's base classes, if any. Then, it needs to call each method, and perform type-specific serialization of the method return value into a designated "slot" in the persistent representation. For example, if there is a method in pasta returning a pasta::Decl, then the persistent representation will use an indexer::EntityMapper to look up the mx::RawEntityId (64 bit entity id) for that decl, and store that integer into the persistent form. It knows how to store a variety of types (enums, integers, optionals, vectors, etc.).

  3. Creates a "clone" of PASTA's API to access serialized data as though we were dealing with a normal object graph. Whereas in PASTA, the methods call their Clang counterparts, the implementation of multiplier API methods read their data out of the serialized storage form (i.e. Cap'n Proto). So, if you call mx::Decl::canonical_declaration, then that would previously correspond to a pasta::Decl, and in persistent form, a mx::RawEntityId, and so then the bootstraped mx::Decl::canonical_declaration method needs to read out this entity id from the slot corresponding to canonical_declaration, and then ask an mx::EntityProvider::DeclFor to go and look up the entity id, so that we can return an mx::Decl object.

  4. Removes some PASTA methods from Multiplier's API surface. Our approach to serialization is based on saturation: we call all the methods (for which we support serialization), and if the return value is an entity that we haven't seen yet, then we queue that up for serialization too. Some methods, like: pasta::Type::WithConst, could lead to us finding entities that aren't needed (i.e. aren't referenced by any thing in the actual code), and so we don't want to include these.

  5. Removes some unsafe PASTA methods from Multiplier's API surface. Some methods are impossible to always use right and lead to asserts. There is a blacklisting mechanism to remove these methods.

  6. Canonicalizing eumerators to all be trivially enumerable, where their entries all have default values. This means generating "migrators" from PASTA enums to Multiplier ones.

  7. Renaming things in the API to be consistent.

  8. Adding in convenient methods, e.g. mx::Type::tokens. In Clang and PASTA, types don't have tokens, whereas in Multiplier they do. Multiplier types can all be rendered in a printable way, because we invent their tokens (when indexing) using the pasta::PrintedTokenRange::Create API.

Clone this wiki locally