As mentioned in the discussion for "Should Generated Parser Source Be Committed?", most parser / grammar repositories currently do.
It turns out that the tree-sitter library has a notion of an ABI:
/**
* The latest ABI version that is supported by the current version of the
* library. When Languages are generated by the Tree-sitter CLI, they are
* assigned an ABI version number that corresponds to the current CLI version.
* The Tree-sitter library is generally backwards-compatible with languages
* generated using older CLI versions, but is not forwards-compatible.
*/
The cli is typically built around the tree-sitter library that's in the same monorepos, and the above comment is from within the source of the library.
Thus, which ABI is used for one's generated parser source (typically
wired into src/parser.c
), usually depends on specifically which
version (or build) of the tree-sitter
cli was used when invoking the
generate
subcommand.
For release versions of the tree-sitter
cli, the default ABI became
14
in version 0.20.7. In version 0.20.3, it became possible to specify
generation of at least one level lower via the --abi
option to the
generate
subcommand.
Note that these changes occurred in 2022.
The following is an incomplete table of which versions used which ABI levels:
ABI Version Release
--- ------- -------
9 0.14.0 2019-02
10 0.15.2 2019-06
11 0.16.0 2019-12
12 0.17.0 2020-09
12 0.18.0 2021-01
13 0.19.0 2021-03
13 0.20.0 2021-06
14 0.20.7 2022-09
14 0.20.8 2023-04
There's a demo below whose purpose is to print out a table of which
ABI versions appear in locally fetched src/parser.c
files.
See the section of the corresponding name in the repository README.
- Ensure the current working directory is the repository root directory.
cd questions/what-abi-level-should-be-used
- Invoke
sh ./script/gen-abi-stats.sh
- Observe output like:
ABI: Count
9: 12
10: 5
11: 4
12: 5
13: 93
14: 171
Minimum number of repositories with parser.c: 282
At least for what was collected about 30% use 13, while about 60% use 14.
Perhaps it's not unreasonable to assume a sufficient sample of repositories had been collected though.
As the ability to explicitly specify an ABI level was only made available and advertised in 2022, it may be that these results may be more a reflection of which cli happened to be used rather than a conscious decision on the part of individual grammar repository maintainers.
Tip: to get a list of which repositories use which ABI version, invoke:
sh ./script/list-parser-c-abi-nums.sh | sort -n
This should produce output that starts something like:
9 tree-sitter-abnf.jmitchell
9 tree-sitter-clojure.Tavistock
9 tree-sitter-dhall.jmitchell
9 tree-sitter-email.maxnordlund
9 tree-sitter-graphql.dralletje
9 tree-sitter-qml.rschiang
9 tree-sitter-todo.Aerijo
9 tree-sitter-xml.unhammer
10 tree-sitter-biber.Aerijo
10 tree-sitter-eno.eno-lang
10 tree-sitter-sml.stonebuddha
11 tree-sitter-carp.GrayJack
11 tree-sitter-move.move-hub
11 tree-sitter-odin.lucypero
11 tree-sitter-souffle.julienhenry
12 tree-sitter-diff.vigoux
12 tree-sitter-sexp.AbstractMachinesLab
12 tree-sitter-twitchchat.rockerBOO
12 tree-sitter-xml.dorgnarg
Note that the actual repository name for a result with something like:
tree-sitter-carp.GrayJack
is actually likely to be more like:
https://github.com/GrayJack/tree-sitter-carp