Skip to content

Commit be8d416

Browse files
committed
Merge branch 'ds/path-walk' into seen
* ds/path-walk: pack-objects: thread the path-based compression pack-objects: refactor path-walk delta phase scalar: enable path-walk during push via config pack-objects: enable --path-walk via config repack: update usage to match docs repack: add --path-walk option pack-objects: introduce GIT_TEST_PACK_PATH_WALK p5313: add performance tests for --path-walk pack-objects: update usage to match docs pack-objects: add --path-walk option pack-objects: extract should_attempt_deltas() path-walk: add prune_all_uninteresting option revision: create mark_trees_uninteresting_dense() path-walk: allow visiting tags path-walk: allow consumer to specify object types t6601: add helper for testing path-walk API path-walk: introduce an object walk by path
2 parents 3826e6d + f6d0289 commit be8d416

31 files changed

+1563
-50
lines changed

Documentation/config/feature.txt

+4
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ walking fewer objects.
2020
+
2121
* `pack.allowPackReuse=multi` may improve the time it takes to create a pack by
2222
reusing objects from multiple packs instead of just one.
23+
+
24+
* `pack.usePathWalk` may speed up packfile creation and make the packfiles be
25+
significantly smaller in the presence of certain filename collisions with Git's
26+
default name-hash.
2327

2428
feature.manyFiles::
2529
Enable config options that optimize for repos with many files in the

Documentation/config/pack.txt

+8
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,14 @@ pack.useSparse::
155155
commits contain certain types of direct renames. Default is
156156
`true`.
157157

158+
pack.usePathWalk::
159+
When true, git will default to using the '--path-walk' option in
160+
'git pack-objects' when the '--revs' option is present. This
161+
algorithm groups objects by path to maximize the ability to
162+
compute delta chains across historical versions of the same
163+
object. This may disable other options, such as using bitmaps to
164+
enumerate objects.
165+
158166
pack.preferBitmapTips::
159167
When selecting which commits will receive bitmaps, prefer a
160168
commit at the tip of any reference that is a suffix of any value

Documentation/git-pack-objects.txt

+17-6
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,13 @@ SYNOPSIS
1010
--------
1111
[verse]
1212
'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
13-
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
14-
[--local] [--incremental] [--window=<n>] [--depth=<n>]
15-
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
16-
[--cruft] [--cruft-expiration=<time>]
17-
[--stdout [--filter=<filter-spec>] | <base-name>]
18-
[--shallow] [--keep-true-parents] [--[no-]sparse] < <object-list>
13+
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
14+
[--local] [--incremental] [--window=<n>] [--depth=<n>]
15+
[--revs [--unpacked | --all]] [--keep-pack=<pack-name>]
16+
[--cruft] [--cruft-expiration=<time>]
17+
[--stdout [--filter=<filter-spec>] | <base-name>]
18+
[--shallow] [--keep-true-parents] [--[no-]sparse]
19+
[--path-walk] < <object-list>
1920

2021

2122
DESCRIPTION
@@ -345,6 +346,16 @@ raise an error.
345346
Restrict delta matches based on "islands". See DELTA ISLANDS
346347
below.
347348

349+
--path-walk::
350+
By default, `git pack-objects` walks objects in an order that
351+
presents trees and blobs in an order unrelated to the path they
352+
appear relative to a commit's root tree. The `--path-walk` option
353+
enables a different walking algorithm that organizes trees and
354+
blobs by path. This has the potential to improve delta compression
355+
especially in the presence of filenames that cause collisions in
356+
Git's default name-hash algorithm. Due to changing how the objects
357+
are walked, this option is not compatible with `--delta-islands`,
358+
`--shallow`, or `--filter`.
348359

349360
DELTA ISLANDS
350361
-------------

Documentation/git-repack.txt

+16-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ git-repack - Pack unpacked objects in a repository
99
SYNOPSIS
1010
--------
1111
[verse]
12-
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] [--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>] [--write-midx]
12+
'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]
13+
[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]
14+
[--write-midx] [--path-walk]
1315

1416
DESCRIPTION
1517
-----------
@@ -249,6 +251,19 @@ linkgit:git-multi-pack-index[1]).
249251
Write a multi-pack index (see linkgit:git-multi-pack-index[1])
250252
containing the non-redundant packs.
251253

254+
--path-walk::
255+
This option passes the `--path-walk` option to the underlying
256+
`git pack-options` process (see linkgit:git-pack-objects[1]).
257+
By default, `git pack-objects` walks objects in an order that
258+
presents trees and blobs in an order unrelated to the path they
259+
appear relative to a commit's root tree. The `--path-walk` option
260+
enables a different walking algorithm that organizes trees and
261+
blobs by path. This has the potential to improve delta compression
262+
especially in the presence of filenames that cause collisions in
263+
Git's default name-hash algorithm. Due to changing how the objects
264+
are walked, this option is not compatible with `--delta-islands`
265+
or `--filter`.
266+
252267
CONFIGURATION
253268
-------------
254269

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
Path-Walk API
2+
=============
3+
4+
The path-walk API is used to walk reachable objects, but to visit objects
5+
in batches based on a common path they appear in, or by type.
6+
7+
For example, all reachable commits are visited in a group. All tags are
8+
visited in a group. Then, all root trees are visited. At some point, all
9+
blobs reachable via a path `my/dir/to/A` are visited. When there are
10+
multiple paths possible to reach the same object, then only one of those
11+
paths is used to visit the object.
12+
13+
When walking a range of commits with some `UNINTERESTING` objects, the
14+
objects with the `UNINTERESTING` flag are included in these batches. In
15+
order to walk `UNINTERESTING` objects, the `--boundary` option must be
16+
used in the commit walk in order to visit `UNINTERESTING` commits.
17+
18+
Basics
19+
------
20+
21+
To use the path-walk API, include `path-walk.h` and call
22+
`walk_objects_by_path()` with a customized `path_walk_info` struct. The
23+
struct is used to set all of the options for how the walk should proceed.
24+
Let's dig into the different options and their use.
25+
26+
`path_fn` and `path_fn_data`::
27+
The most important option is the `path_fn` option, which is a
28+
function pointer to the callback that can execute logic on the
29+
object IDs for objects grouped by type and path. This function
30+
also receives a `data` value that corresponds to the
31+
`path_fn_data` member, for providing custom data structures to
32+
this callback function.
33+
34+
`revs`::
35+
To configure the exact details of the reachable set of objects,
36+
use the `revs` member and initialize it using the revision
37+
machinery in `revision.h`. Initialize `revs` using calls such as
38+
`setup_revisions()` or `parse_revision_opt()`. Do not call
39+
`prepare_revision_walk()`, as that will be called within
40+
`walk_objects_by_path()`.
41+
+
42+
It is also important that you do not specify the `--objects` flag for the
43+
`revs` struct. The revision walk should only be used to walk commits, and
44+
the objects will be walked in a separate way based on those starting
45+
commits.
46+
+
47+
If you want the path-walk API to emit `UNINTERESTING` objects based on the
48+
commit walk's boundary, be sure to set `revs.boundary` so the boundary
49+
commits are emitted.
50+
51+
`commits`, `blobs`, `trees`, `tags`::
52+
By default, these members are enabled and signal that the path-walk
53+
API should call the `path_fn` on objects of these types. Specialized
54+
applications could disable some options to make it simpler to walk
55+
the objects or to have fewer calls to `path_fn`.
56+
+
57+
While it is possible to walk only commits in this way, consumers would be
58+
better off using the revision walk API instead.
59+
60+
`prune_all_uninteresting`::
61+
By default, all reachable paths are emitted by the path-walk API.
62+
This option allows consumers to declare that they are not
63+
interested in paths where all included objects are marked with the
64+
`UNINTERESTING` flag. This requires using the `boundary` option in
65+
the revision walk so that the walk emits commits marked with the
66+
`UNINTERESTING` flag.
67+
68+
Examples
69+
--------
70+
71+
See example usages in:
72+
`t/helper/test-path-walk.c`,
73+
`builtin/pack-objects.c`

Makefile

+2
Original file line numberDiff line numberDiff line change
@@ -822,6 +822,7 @@ TEST_BUILTINS_OBJS += test-parse-options.o
822822
TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
823823
TEST_BUILTINS_OBJS += test-partial-clone.o
824824
TEST_BUILTINS_OBJS += test-path-utils.o
825+
TEST_BUILTINS_OBJS += test-path-walk.o
825826
TEST_BUILTINS_OBJS += test-pcre2-config.o
826827
TEST_BUILTINS_OBJS += test-pkt-line.o
827828
TEST_BUILTINS_OBJS += test-proc-receive.o
@@ -1098,6 +1099,7 @@ LIB_OBJS += parse-options.o
10981099
LIB_OBJS += patch-delta.o
10991100
LIB_OBJS += patch-ids.o
11001101
LIB_OBJS += path.o
1102+
LIB_OBJS += path-walk.o
11011103
LIB_OBJS += pathspec.o
11021104
LIB_OBJS += pkt-line.o
11031105
LIB_OBJS += preload-index.o

0 commit comments

Comments
 (0)