-
Notifications
You must be signed in to change notification settings - Fork 534
packfile: cache undeltified objects to improve decode performance #218
Conversation
Feel free to add any comment or ways to improve this. |
Current coverage is 76.36% (diff: 80.88%)@@ master #218 diff @@
==========================================
Files 96 98 +2
Lines 6299 6359 +60
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 4845 4856 +11
- Misses 922 978 +56
+ Partials 532 525 -7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
- We should discuss about if cache should be a package on its own.
- It would be nice if you could measure what's the speed up of cloning the git/git repository with this change.
if max <= 0 { | ||
max = 1 | ||
} | ||
c.maxElements = max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra space after if
|
||
func (c *cache) Add(o plumbing.EncodedObject) { | ||
if len(c.order) >= c.maxElements { | ||
d, order := c.order[0], c.order[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite inefficient, we should probably implement a queue as a circular list backed by a slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely.
@@ -62,6 +67,8 @@ type Decoder struct { | |||
|
|||
offsetToType map[int64]plumbing.ObjectType | |||
decoderType plumbing.ObjectType | |||
|
|||
cache *cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could make cache public in its own package, since an object cache might be useful in other cases...
@@ -63,8 +63,7 @@ func (s *BaseSuite) NewRepositoryFromPackfile(f *fixtures.Fixture) *Repository { | |||
p := f.Packfile() | |||
defer p.Close() | |||
|
|||
n := packfile.NewScanner(p) | |||
d, err := packfile.NewDecoder(n, r.s) | |||
d, err := packfile.NewDecoder(p, r.s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this PR change the way of instantiating Scanner/Decoder?
@@ -128,9 +138,20 @@ func (d *Decoder) Decode() (checksum plumbing.Hash, err error) { | |||
return d.s.Checksum() | |||
} | |||
|
|||
func (d *Decoder) doDecode() error { | |||
func (d *Decoder) Count() (uint32, error) { | |||
_, count, err := d.s.Header() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if Count is called twice, the decoding is broken?
return 0, err | ||
} | ||
|
||
d.cache.SetMaxElements(int(count / magicCacheNumber)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the max elements of cache inside this method (public Count) looks wierd.
@@ -128,9 +138,20 @@ func (d *Decoder) Decode() (checksum plumbing.Hash, err error) { | |||
return d.s.Checksum() | |||
} | |||
|
|||
func (d *Decoder) doDecode() error { | |||
func (d *Decoder) Count() (uint32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you the documentation, I don't know the goal of this function
"gopkg.in/src-d/go-git.v4/plumbing" | ||
) | ||
|
||
type cache struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a description about how operates this cache?
The improvement is great, but why? Will be great know the average of the re-used objects in a repository, maybe we can have a more sophisticated algorithm. How is done this at jgit and libgit? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome contribution.
|
||
func (c *cache) Add(o plumbing.EncodedObject) { | ||
if len(c.order) >= c.maxElements { | ||
d, order := c.order[0], c.order[1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely.
I wrote from scratch the implementation, taking into account all your comments. Basically, is the same implementation as JGit. |
@@ -105,6 +108,8 @@ func NewDecoderForType(s *Scanner, o storer.EncodedObjectStorer, | |||
|
|||
offsetToType: make(map[int64]plumbing.ObjectType, 0), | |||
decoderType: t, | |||
|
|||
cache: cache.NewObjectFIFO(cache.MaxSize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache.MaxSize
should be added as external repository configuration. Some idea of how to do it correctly?
Simple object cache that keep in memory the last undeltified objects. When no more objects can be keept into memory, the oldest one is deleted.
Code used to check times:
|
We should probably improve this in the future with an LRU cache with maximum memory size, but so far it looks great. LGTM. |
* Simple object cache that keeps in memory the last undeltified objects. When no more objects can be kept into memory, the oldest one is deleted (FIFO). This speeds up packfile operations preventing redundant seeks and decodes.
Simple object cache that keep in memory the last undeltified objects. When no more objects can be keept into memory, the oldest one is deleted.
Benchmarks: