|
1 |
| -An informal guide to reading and working on the rustc compiler. |
2 |
| -================================================================== |
| 1 | +For more information about how rustc works, see the [rustc guide]. |
3 | 2 |
|
4 |
| -If you wish to expand on this document, or have a more experienced |
5 |
| -Rust contributor add anything else to it, please get in touch: |
6 |
| - |
7 |
| -* https://internals.rust-lang.org/ |
8 |
| -* https://chat.mibbit.com/?server=irc.mozilla.org&channel=%23rust |
9 |
| - |
10 |
| -or file a bug: |
11 |
| - |
12 |
| -https://github.com/rust-lang/rust/issues |
13 |
| - |
14 |
| -Your concerns are probably the same as someone else's. |
15 |
| - |
16 |
| -You may also be interested in the |
17 |
| -[Rust Forge](https://forge.rust-lang.org/), which includes a number of |
18 |
| -interesting bits of information. |
19 |
| - |
20 |
| -Finally, at the end of this file is a GLOSSARY defining a number of |
21 |
| -common (and not necessarily obvious!) names that are used in the Rust |
22 |
| -compiler code. If you see some funky name and you'd like to know what |
23 |
| -it stands for, check there! |
24 |
| - |
25 |
| -The crates of rustc |
26 |
| -=================== |
27 |
| - |
28 |
| -Rustc consists of a number of crates, including `syntax`, |
29 |
| -`rustc`, `rustc_back`, `rustc_trans`, `rustc_driver`, and |
30 |
| -many more. The source for each crate can be found in a directory |
31 |
| -like `src/libXXX`, where `XXX` is the crate name. |
32 |
| - |
33 |
| -(NB. The names and divisions of these crates are not set in |
34 |
| -stone and may change over time -- for the time being, we tend towards |
35 |
| -a finer-grained division to help with compilation time, though as |
36 |
| -incremental improves that may change.) |
37 |
| - |
38 |
| -The dependency structure of these crates is roughly a diamond: |
39 |
| - |
40 |
| -``` |
41 |
| - rustc_driver |
42 |
| - / | \ |
43 |
| - / | \ |
44 |
| - / | \ |
45 |
| - / v \ |
46 |
| -rustc_trans rustc_borrowck ... rustc_metadata |
47 |
| - \ | / |
48 |
| - \ | / |
49 |
| - \ | / |
50 |
| - \ v / |
51 |
| - rustc |
52 |
| - | |
53 |
| - v |
54 |
| - syntax |
55 |
| - / \ |
56 |
| - / \ |
57 |
| - syntax_pos syntax_ext |
58 |
| -``` |
59 |
| - |
60 |
| -The `rustc_driver` crate, at the top of this lattice, is effectively |
61 |
| -the "main" function for the rust compiler. It doesn't have much "real |
62 |
| -code", but instead ties together all of the code defined in the other |
63 |
| -crates and defines the overall flow of execution. (As we transition |
64 |
| -more and more to the [query model](ty/maps/README.md), however, the |
65 |
| -"flow" of compilation is becoming less centrally defined.) |
66 |
| - |
67 |
| -At the other extreme, the `rustc` crate defines the common and |
68 |
| -pervasive data structures that all the rest of the compiler uses |
69 |
| -(e.g., how to represent types, traits, and the program itself). It |
70 |
| -also contains some amount of the compiler itself, although that is |
71 |
| -relatively limited. |
72 |
| - |
73 |
| -Finally, all the crates in the bulge in the middle define the bulk of |
74 |
| -the compiler -- they all depend on `rustc`, so that they can make use |
75 |
| -of the various types defined there, and they export public routines |
76 |
| -that `rustc_driver` will invoke as needed (more and more, what these |
77 |
| -crates export are "query definitions", but those are covered later |
78 |
| -on). |
79 |
| - |
80 |
| -Below `rustc` lie various crates that make up the parser and error |
81 |
| -reporting mechanism. For historical reasons, these crates do not have |
82 |
| -the `rustc_` prefix, but they are really just as much an internal part |
83 |
| -of the compiler and not intended to be stable (though they do wind up |
84 |
| -getting used by some crates in the wild; a practice we hope to |
85 |
| -gradually phase out). |
86 |
| - |
87 |
| -Each crate has a `README.md` file that describes, at a high-level, |
88 |
| -what it contains, and tries to give some kind of explanation (some |
89 |
| -better than others). |
90 |
| - |
91 |
| -The compiler process |
92 |
| -==================== |
93 |
| - |
94 |
| -The Rust compiler is in a bit of transition right now. It used to be a |
95 |
| -purely "pass-based" compiler, where we ran a number of passes over the |
96 |
| -entire program, and each did a particular check of transformation. |
97 |
| - |
98 |
| -We are gradually replacing this pass-based code with an alternative |
99 |
| -setup based on on-demand **queries**. In the query-model, we work |
100 |
| -backwards, executing a *query* that expresses our ultimate goal (e.g., |
101 |
| -"compile this crate"). This query in turn may make other queries |
102 |
| -(e.g., "get me a list of all modules in the crate"). Those queries |
103 |
| -make other queries that ultimately bottom out in the base operations, |
104 |
| -like parsing the input, running the type-checker, and so forth. This |
105 |
| -on-demand model permits us to do exciting things like only do the |
106 |
| -minimal amount of work needed to type-check a single function. It also |
107 |
| -helps with incremental compilation. (For details on defining queries, |
108 |
| -check out `src/librustc/ty/maps/README.md`.) |
109 |
| - |
110 |
| -Regardless of the general setup, the basic operations that the |
111 |
| -compiler must perform are the same. The only thing that changes is |
112 |
| -whether these operations are invoked front-to-back, or on demand. In |
113 |
| -order to compile a Rust crate, these are the general steps that we |
114 |
| -take: |
115 |
| - |
116 |
| -1. **Parsing input** |
117 |
| - - this processes the `.rs` files and produces the AST ("abstract syntax tree") |
118 |
| - - the AST is defined in `syntax/ast.rs`. It is intended to match the lexical |
119 |
| - syntax of the Rust language quite closely. |
120 |
| -2. **Name resolution, macro expansion, and configuration** |
121 |
| - - once parsing is complete, we process the AST recursively, resolving paths |
122 |
| - and expanding macros. This same process also processes `#[cfg]` nodes, and hence |
123 |
| - may strip things out of the AST as well. |
124 |
| -3. **Lowering to HIR** |
125 |
| - - Once name resolution completes, we convert the AST into the HIR, |
126 |
| - or "high-level IR". The HIR is defined in `src/librustc/hir/`; that module also includes |
127 |
| - the lowering code. |
128 |
| - - The HIR is a lightly desugared variant of the AST. It is more processed than the |
129 |
| - AST and more suitable for the analyses that follow. It is **not** required to match |
130 |
| - the syntax of the Rust language. |
131 |
| - - As a simple example, in the **AST**, we preserve the parentheses |
132 |
| - that the user wrote, so `((1 + 2) + 3)` and `1 + 2 + 3` parse |
133 |
| - into distinct trees, even though they are equivalent. In the |
134 |
| - HIR, however, parentheses nodes are removed, and those two |
135 |
| - expressions are represented in the same way. |
136 |
| -3. **Type-checking and subsequent analyses** |
137 |
| - - An important step in processing the HIR is to perform type |
138 |
| - checking. This process assigns types to every HIR expression, |
139 |
| - for example, and also is responsible for resolving some |
140 |
| - "type-dependent" paths, such as field accesses (`x.f` -- we |
141 |
| - can't know what field `f` is being accessed until we know the |
142 |
| - type of `x`) and associated type references (`T::Item` -- we |
143 |
| - can't know what type `Item` is until we know what `T` is). |
144 |
| - - Type checking creates "side-tables" (`TypeckTables`) that include |
145 |
| - the types of expressions, the way to resolve methods, and so forth. |
146 |
| - - After type-checking, we can do other analyses, such as privacy checking. |
147 |
| -4. **Lowering to MIR and post-processing** |
148 |
| - - Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which |
149 |
| - is a **very** desugared version of Rust, well suited to the borrowck but also |
150 |
| - certain high-level optimizations. |
151 |
| -5. **Translation to LLVM and LLVM optimizations** |
152 |
| - - From MIR, we can produce LLVM IR. |
153 |
| - - LLVM then runs its various optimizations, which produces a number of `.o` files |
154 |
| - (one for each "codegen unit"). |
155 |
| -6. **Linking** |
156 |
| - - Finally, those `.o` files are linked together. |
157 |
| - |
158 |
| -Glossary |
159 |
| -======== |
160 |
| - |
161 |
| -The compiler uses a number of...idiosyncratic abbreviations and |
162 |
| -things. This glossary attempts to list them and give you a few |
163 |
| -pointers for understanding them better. |
164 |
| - |
165 |
| -- AST -- the **abstract syntax tree** produced by the `syntax` crate; reflects user syntax |
166 |
| - very closely. |
167 |
| -- codegen unit -- when we produce LLVM IR, we group the Rust code into a number of codegen |
168 |
| - units. Each of these units is processed by LLVM independently from one another, |
169 |
| - enabling parallelism. They are also the unit of incremental re-use. |
170 |
| -- cx -- we tend to use "cx" as an abbrevation for context. See also tcx, infcx, etc. |
171 |
| -- `DefId` -- an index identifying a **definition** (see `librustc/hir/def_id.rs`). Uniquely |
172 |
| - identifies a `DefPath`. |
173 |
| -- HIR -- the **High-level IR**, created by lowering and desugaring the AST. See `librustc/hir`. |
174 |
| -- `HirId` -- identifies a particular node in the HIR by combining a |
175 |
| - def-id with an "intra-definition offset". |
176 |
| -- `'gcx` -- the lifetime of the global arena (see `librustc/ty`). |
177 |
| -- generics -- the set of generic type parameters defined on a type or item |
178 |
| -- ICE -- internal compiler error. When the compiler crashes. |
179 |
| -- ICH -- incremental compilation hash. |
180 |
| -- infcx -- the inference context (see `librustc/infer`) |
181 |
| -- MIR -- the **Mid-level IR** that is created after type-checking for use by borrowck and trans. |
182 |
| - Defined in the `src/librustc/mir/` module, but much of the code that manipulates it is |
183 |
| - found in `src/librustc_mir`. |
184 |
| -- obligation -- something that must be proven by the trait system; see `librustc/traits`. |
185 |
| -- local crate -- the crate currently being compiled. |
186 |
| -- node-id or `NodeId` -- an index identifying a particular node in the |
187 |
| - AST or HIR; gradually being phased out and replaced with `HirId`. |
188 |
| -- query -- perhaps some sub-computation during compilation; see `librustc/maps`. |
189 |
| -- provider -- the function that executes a query; see `librustc/maps`. |
190 |
| -- sess -- the **compiler session**, which stores global data used throughout compilation |
191 |
| -- side tables -- because the AST and HIR are immutable once created, we often carry extra |
192 |
| - information about them in the form of hashtables, indexed by the id of a particular node. |
193 |
| -- span -- a location in the user's source code, used for error |
194 |
| - reporting primarily. These are like a file-name/line-number/column |
195 |
| - tuple on steroids: they carry a start/end point, and also track |
196 |
| - macro expansions and compiler desugaring. All while being packed |
197 |
| - into a few bytes (really, it's an index into a table). See the |
198 |
| - `Span` datatype for more. |
199 |
| -- substs -- the **substitutions** for a given generic type or item |
200 |
| - (e.g., the `i32, u32` in `HashMap<i32, u32>`) |
201 |
| -- tcx -- the "typing context", main data structure of the compiler (see `librustc/ty`). |
202 |
| -- trans -- the code to **translate** MIR into LLVM IR. |
203 |
| -- trait reference -- a trait and values for its type parameters (see `librustc/ty`). |
204 |
| -- ty -- the internal representation of a **type** (see `librustc/ty`). |
| 3 | +[rustc guide]: https://rust-lang-nursery.github.io/rustc-guide/ |
0 commit comments