Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

allow deletions of indexes, handle null integrity values in compact better #55

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -458,13 +458,17 @@ cacache.rm.all(cachePath).then(() => {
})
```

#### <a name="rm-entry"></a> `> cacache.rm.entry(cache, key) -> Promise`
#### <a name="rm-entry"></a> `> cacache.rm.entry(cache, key, [opts]) -> Promise`

Alias: `cacache.rm`

Removes the index entry for `key`. Content will still be accessible if
requested directly by content address ([`get.stream.byDigest`](#get-stream)).

By default, this appends a new entry to the index with an integrity of `null`.
If `opts.removeFully` is set to `true` then the index file itself will be
physically deleted rather than appending a `null`.

To remove the content itself (which might still be used by other entries), use
[`rm.content`](#rm-content). Or, to safely vacuum any unused content, use
[`verify`](#verify).
Expand All @@ -491,12 +495,21 @@ cacache.rm.content(cachePath, 'sha512-SoMeDIGest/IN+BaSE64==').then(() => {
})
```

#### <a name="index-compact"></a> `> cacache.index.compact(cache, key, matchFn) -> Promise`
#### <a name="index-compact"></a> `> cacache.index.compact(cache, key, matchFn, [opts]) -> Promise`

Uses `matchFn`, which must be a synchronous function that accepts two entries
and returns a boolean indicating whether or not the two entries match, to
deduplicate all entries in the cache for the given `key`.

If `opts.validateEntry` is provided, it will be called as a function with the
only parameter being a single index entry. The function must return a Boolean,
if it returns `true` the entry is considered valid and will be kept in the index,
if it returns `false` the entry will be removed from the index.

If `opts.validateEntry` is not provided, however, every entry in the index will
be deduplicated and kept until the first `null` integrity is reached, removing
all entries that were written before the `null`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I'm reading this correctly, adding a null index entry and then a new index entry, then compacting is somewhat the same as doing removeAll and then adding the new one? Like, the null means "ignore all that came before"?


The deduplicated list of entries is both written to the index, replacing the
existing content, and returned in the Promise.

Expand Down
58 changes: 46 additions & 12 deletions lib/entry-index.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@ const fixOwner = require('./util/fix-owner')
const hashToSegments = require('./util/hash-to-segments')
const indexV = require('../package.json')['cache-version'].index
const moveFile = require('@npmcli/move-file')
const rimraf = util.promisify(require('rimraf'))
const _rimraf = require('rimraf')
const rimraf = util.promisify(_rimraf)
rimraf.sync = _rimraf.sync

const appendFile = util.promisify(fs.appendFile)
const readFile = util.promisify(fs.readFile)
Expand All @@ -35,15 +37,31 @@ module.exports.compact = compact
async function compact (cache, key, matchFn, opts = {}) {
const bucket = bucketPath(cache, key)
const entries = await bucketEntries(bucket)
// reduceRight because the bottom-most result is the newest
const newEntries = []
// we loop backwards because the bottom-most result is the newest
// since we add new entries with appendFile
const newEntries = entries.reduceRight((acc, newEntry) => {
if (!acc.find((oldEntry) => matchFn(oldEntry, newEntry))) {
acc.push(newEntry)
for (let i = entries.length - 1; i >= 0; --i) {
const entry = entries[i]
// a null integrity could mean either a delete was appended
// or the user has simply stored an index that does not map
// to any content. we determine if the user wants to keep the
// null integrity based on the validateEntry function passed in options.
// if the integrity is null and no validateEntry is provided, we break
// as we consider the null integrity to be a deletion of everything
// that came before it.
if (entry.integrity === null && !opts.validateEntry) {
break
}

return acc
}, [])
// if this entry is valid, and it is either the first entry or
// the newEntries array doesn't already include an entry that
// matches this one based on the provided matchFn, then we add
// it to the beginning of our list
if ((!opts.validateEntry || opts.validateEntry(entry) === true) &&
(newEntries.length === 0 || !newEntries.find((oldEntry) => matchFn(oldEntry, entry)))) {
newEntries.unshift(entry)
}
}

const newIndex = '\n' + newEntries.map((entry) => {
const stringified = JSON.stringify(entry)
Expand Down Expand Up @@ -85,7 +103,13 @@ async function compact (cache, key, matchFn, opts = {}) {
// write the file atomically
await disposer(setup(), teardown, write)

return newEntries.map((entry) => formatEntry(cache, entry, true))
// we reverse the list we generated such that the newest
// entries come first in order to make looping through them easier
// the true passed to formatEntry tells it to keep null
// integrity values, if they made it this far it's because
// the user specified a metadata field in keepMissingIntegrityWhenSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is keepMissingIntegrityWhenSet still a thing? I thought the caller just provides their own validateEntry function to do that.

// and as such we should return it
return newEntries.reverse().map((entry) => formatEntry(cache, entry, true))
}

module.exports.insert = insert
Expand Down Expand Up @@ -201,14 +225,24 @@ function findSync (cache, key) {

module.exports.delete = del

function del (cache, key, opts) {
return insert(cache, key, null, opts)
function del (cache, key, opts = {}) {
if (!opts.removeFully) {
return insert(cache, key, null, opts)
}

const bucket = bucketPath(cache, key)
return rimraf(bucket)
}

module.exports.delete.sync = delSync

function delSync (cache, key, opts) {
return insertSync(cache, key, null, opts)
function delSync (cache, key, opts = {}) {
if (!opts.removeFully) {
return insertSync(cache, key, null, opts)
}

const bucket = bucketPath(cache, key)
return rimraf.sync(bucket)
}

module.exports.lsStream = lsStream
Expand Down
4 changes: 2 additions & 2 deletions rm.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ const rmContent = require('./lib/content/rm')
module.exports = entry
module.exports.entry = entry

function entry (cache, key) {
function entry (cache, key, opts) {
memo.clearMemoized()
return index.delete(cache, key)
return index.delete(cache, key, opts)
}

module.exports.content = content
Expand Down
119 changes: 113 additions & 6 deletions test/entry-index.js
Original file line number Diff line number Diff line change
Expand Up @@ -60,21 +60,94 @@ test('compact', async (t) => {
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 1 } }),
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 2 } }),
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 2 } }),
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 1 } }),
// compact will return entries with a null integrity
index.insert(CACHE, KEY, null, { metadata: { rev: 3 } })
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 1 } })
])

const bucket = index.bucketPath(CACHE, KEY)
const entries = await index.bucketEntries(bucket)
t.equal(entries.length, 5, 'started with 5 entries')
t.equal(entries.length, 4, 'started with 4 entries')

const filter = (entryA, entryB) => entryA.metadata.rev === entryB.metadata.rev
const compacted = await index.compact(CACHE, KEY, filter)
t.equal(compacted.length, 2, 'should return only two entries')

const newEntries = await index.bucketEntries(bucket)
t.equal(newEntries.length, 2, 'bucket was deduplicated')
})

test('compact: treats null integrity without validateEntry as a delete', async (t) => {
t.teardown(() => {
index.delete.sync(CACHE, KEY)
})
// this one does not use Promise.all because we want to be certain
// things are written in the right order
await index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 1 } })
await index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 2 } })
// this is a delete, revs 1, 2 and 3 will be omitted
await index.insert(CACHE, KEY, null, { metadata: { rev: 3 } })
await index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 4 } })

const bucket = index.bucketPath(CACHE, KEY)
const entries = await index.bucketEntries(bucket)
t.equal(entries.length, 4, 'started with 4 entries')

const filter = (entryA, entryB) => entryA.metadata.rev === entryB.metadata.rev
const compacted = await index.compact(CACHE, KEY, filter)
t.equal(compacted.length, 3, 'should return only three entries')
t.equal(compacted.length, 1, 'should return only one entry')
t.equal(compacted[0].metadata.rev, 4, 'kept rev 4')

const newEntries = await index.bucketEntries(bucket)
t.equal(newEntries.length, 3, 'bucket was deduplicated')
t.equal(newEntries.length, 1, 'bucket was deduplicated')
})

test('compact: leverages validateEntry to skip invalid entries', async (t) => {
t.teardown(() => {
index.delete.sync(CACHE, KEY)
})
await Promise.all([
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 1 } }),
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 2 } }),
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 2 } }),
index.insert(CACHE, KEY, INTEGRITY, { metadata: { rev: 1 } })
])

const bucket = index.bucketPath(CACHE, KEY)
const entries = await index.bucketEntries(bucket)
t.equal(entries.length, 4, 'started with 4 entries')

const matchFn = (entryA, entryB) => entryA.metadata.rev === entryB.metadata.rev
const validateEntry = (entry) => entry.metadata.rev > 1
const compacted = await index.compact(CACHE, KEY, matchFn, { validateEntry })
t.equal(compacted.length, 1, 'should return only one entries')
t.equal(compacted[0].metadata.rev, 2, 'kept the rev 2 entry')

const newEntries = await index.bucketEntries(bucket)
t.equal(newEntries.length, 1, 'bucket was deduplicated')
})

test('compact: validateEntry allows for keeping null integrity', async (t) => {
t.teardown(() => {
index.delete.sync(CACHE, KEY)
})
await Promise.all([
index.insert(CACHE, KEY, null, { metadata: { rev: 1 } }),
index.insert(CACHE, KEY, null, { metadata: { rev: 2 } }),
index.insert(CACHE, KEY, null, { metadata: { rev: 2 } }),
index.insert(CACHE, KEY, null, { metadata: { rev: 1 } })
])

const bucket = index.bucketPath(CACHE, KEY)
const entries = await index.bucketEntries(bucket)
t.equal(entries.length, 4, 'started with 4 entries')

const matchFn = (entryA, entryB) => entryA.metadata.rev === entryB.metadata.rev
const validateEntry = (entry) => entry.metadata.rev > 1
const compacted = await index.compact(CACHE, KEY, matchFn, { validateEntry })
t.equal(compacted.length, 1, 'should return only one entry')
t.equal(compacted[0].metadata.rev, 2, 'kept the rev 2 entry')

const newEntries = await index.bucketEntries(bucket)
t.equal(newEntries.length, 1, 'bucket was deduplicated')
})

test('compact: ENOENT in chownr does not cause failure', async (t) => {
Expand Down Expand Up @@ -151,6 +224,40 @@ test('delete.sync: removes a cache entry', (t) => {
})
})

test('delete.sync: removeFully deletes the index entirely', async (t) => {
const bucket = index.bucketPath(CACHE, KEY)
await index.insert(CACHE, KEY, INTEGRITY)
const entries = await index.bucketEntries(bucket)
t.equal(entries.length, 1, 'has an entry')

// do a normal delete first, this appends a null integrity
index.delete.sync(CACHE, KEY)
const delEntries = await index.bucketEntries(bucket)
t.equal(delEntries.length, 2, 'should now have 2 entries')
t.equal(delEntries[1].integrity, null, 'has a null integrity last')

// then a full delete
index.delete.sync(CACHE, KEY, { removeFully: true })
await t.rejects(index.bucketEntries(bucket), { code: 'ENOENT' }, 'rejects with ENOENT because file is gone')
})

test('delete: removeFully deletes the index entirely', async (t) => {
const bucket = index.bucketPath(CACHE, KEY)
await index.insert(CACHE, KEY, INTEGRITY)
const entries = await index.bucketEntries(bucket)
t.equal(entries.length, 1, 'has an entry')

// do a normal delete first, this appends a null integrity
await index.delete(CACHE, KEY)
const delEntries = await index.bucketEntries(bucket)
t.equal(delEntries.length, 2, 'should now have 2 entries')
t.equal(delEntries[1].integrity, null, 'has a null integrity last')

// then a full delete
await index.delete(CACHE, KEY, { removeFully: true })
await t.rejects(index.bucketEntries(bucket), { code: 'ENOENT' }, 'rejects with ENOENT because file is gone')
})

test('find: error on parsing json data', (t) => {
// mocks readFile in order to return a borked json payload
const { find } = getEntryIndex({
Expand Down