Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Backup, Sync and git tracking #51

Closed
perrette opened this issue Apr 25, 2023 · 5 comments
Closed

Backup, Sync and git tracking #51

perrette opened this issue Apr 25, 2023 · 5 comments

Comments

@perrette
Copy link
Owner

perrette commented Apr 25, 2023

Originally, git tracking feature was added in order to add safety to handling a global papers install.
Implementation details are now jeopardized with local install. Local installs are often git-tracked themselves, and nested git repos does not play good. Worse, papers git install might trigger commits to a directory where it is not expected to (fortunately it is off by default, so it still requires explicit user action to be enabled). In the original implementation, the git directory could also be separate from the bibtex file. If that was the case, the bibtex would be copied to the git directory upon saving, and a commit would be done. That works, but using git commands to revert or reset to a previous commit would then only affect the git repo, and not the original bibtex, making the overall behavior unintuitive. Clearly, some overhaul is needed.

While it is not entirely clear to me yet how that feature should evolve. The basic idea of using git to safeguard the bibtex, and undo unwanted changes, is still relevant IMO. Here a few options:

  • use git as an internal tool in papers, without explicitly asking about it. papers undo (and a new command papers redo) could be used to navigate git history. The git repo would be saved in a central papers dir, using different branches to handle different bibtex locations (using a slug of the full bibtex path as branch name, for instance). That could work even without a proper installation. Maybe. Issue: bibtex rename would break the flow by creating a new branch. We could live with that.

  • propose hooks upon bibtex save. Here a whole workflow could be fine-tuned by users. Could be used as internal to implement higher-level feature.

  • add options to track files, sync with a remote server etc.

For now I'll just leave that issue open to collect ideas. Current simplistic implementation works OK.

perrette pushed a commit that referenced this issue Apr 25, 2023
- git commit does not copy files any more
- also remove git-lfs print
- checks on git directory (and suppress inline help on --gitdir)

TODO: contextualize in git-specific PR #51
@perrette perrette changed the title Rethink git tracking Backup, Sync and git tracking Apr 27, 2023
@perrette
Copy link
Owner Author

perrette commented Apr 27, 2023

While there are many ways of implementing back-ups and git tracking, the git model of a local, self-contained folder is the most elegant in my opinion. It is easy to keep track of and to cleanup (in contrast to a centralized repo with various branches for various files -- the number of branches would accumulate over time and be hard to maintain).

To avoid double-tracking and conflicts with an existing, larger git repo, it should be possible to simply add a .gitignore file next to the .papers directory (or append .papers to an existing git ignore). And let the user choose whether to git-track or not in the first place. It will initially be opt-in, but could become opt-out if usefulness is greater than other concerns, which I presume will be the case -- reliability is a concern number one when building a bibliography over time.

To let papers handle git-tracking behind the scenes, any changes to the bibtex (and optionally, to the associated files), have to be mirrored to a specifically dedicated git repo. If file-tracking is activated, the mirrored bibtex cannot be mere copy, but need to maintain its own "file" field pointing to local files. Hard links could be used for files to keep disk usage to a minimum -- at the expanse of Windows user (workarounds, like a copy, could be found later for Windows users).

For a local install, the resulting files structure would look like:

 papers.bib        => that could be anywhere else
 files/            => that could be anywhere else, or be an untidy collection of files
.gitignore         => so that no conflict arises with an already git-tracked repo
.papers/
    config.json
    papers.bib     => copy of bibtex with updated file links
    files/         => a tidy, renamed version of files
        file1.pdf  => could be a hard link toward the actual file, to save disk space
        ...
    .git            => yet another copy of papers.bib and files + history
    .gitattributes  => produced by `git lfs track files`

A global install would be pretty much the same, except that a .papers would be stored in some place globally.

@perrette
Copy link
Owner Author

perrette commented Apr 27, 2023

The model outlined above would ensure a solid backup whatever the user configuration. Restoring a previous bibtex would work with that sequence of commands:

cd .papers
git reset --hard HEAD^   # check-out git repo to previous (or any other specific version)
cd ..
rm papers.bib -f
touch papers.bib
papers add .papers/papers.bib --rename --copy

The last line is not a perfect undo. It does keep track of the files, but it forces rename.
This example shows that rename may be a must for git-tracking of files.

The sequence of commands above can be used for undos until the beginning of time, but it cannot be used for redo. Here an alternative sequence for papers undo, with a hack to keep track of future states (only section between cd .papers and cd .. is written below):

echo $(git rev-parse HEAD) >> futures
git reset --hard HEAD^

and for papers redo:

git reset --hard $(tail -1 futures)
head -n -1 futures > futures.tmp && mv -f futures.tmp futures

Any new modification to the bib would empty futures (no redo after branching out).

@perrette
Copy link
Owner Author

Upon saving of the bibliography, the following could work (a more efficient version would be needed to avoid moving around files if not necessary):

rm -rf .papers/papers.bib .papers/files    
touch .papers/papers.bib
papers add papers.bib --bibtex .papers/papers.bib --filesdir .papers/files --no-check-duplicate
cd .papers
git add .
git commit -m 'action that triggered the change'
# maybe: git push remote --force
rm -f futures   # redo disabled

@perrette
Copy link
Owner Author

perrette commented Apr 27, 2023

The model above is some kind of black box that leaves the implementation details to papers. Alternatively, a simpler, more transparent implementation would involve git tracking in the same, working directory.

papers.bib
files/
.papersconfig.json
.git
.gitattributes

Here plain git commands would work, without the need to move around bibtex and files each time the bibliography is saved.

Pros of black-box, .papers model

  • Works regardless of the location of files and bibtex (=> will move/rename them anyways)
  • Minimal intereference with an existing git-tracking (through git ignore) => can keep parallel systems
  • Slower as full-size bibtex manipulation is necessary at every step
  • Somewhat counter-intuitively, that could be more universal despite the complexity, because locally we'd track the files in a standardized form.
  • Larger disk usage (but hard links can largely alleviate that issue)

Pros and contras of transparent, same-dir model

  • faster (no need to edit around the whole bibliography each time.
  • less error prone (less actions needed, simpler)
  • easier to implement and maintain code-wise
  • cannot keep track of bibtex and files outside git directory
  • may interfere with an existing git install => can just give up on git tracking or make local install in a subfolder folder written-down in .gitignore or use git submodule
  • does it add any benefit at all compared to just letting the user use git ?

While I am sensitive to the arguments of simplicity and maintenance, the very last point seems the stronger in favor of a black-box model. Or in favor of dropping the feature altogether. Since this issue is about doing something, let's discuss it further. In case of an already-tracked project repo (which might be common for a local install), the only benefit of the transaprent, same-dir model is to automatize the commit / sync. That could also be address via some kind of hook on savebib, redo, undo (set of commands stored in config file). The black-box model, in contrast, would have a redo/undo system that operates regardless of whether the larger project is handled in git or not.

@perrette
Copy link
Owner Author

Now included in release 2.4.

perrette pushed a commit that referenced this issue Apr 29, 2023
- with this command undo / redo also restore files that have been renamed (only if installed with --git-lfs option)

caveat: these files will not be deleted in subsequent redos => redo only
applies to tracked files

Additionally to this addition, this large commit also overhauls the way
git tracking works, by now using a history branch to do undo/redo, and
main branch to save forward advances. This is cleaner and does not
require a futures.txt file any more.

New tests have been added to support both the --restore parameter and
to make sure git does the right thing (only local install was tested --
in facts it seems the global install backup is buggy at this stage, more
in next commit)
perrette pushed a commit that referenced this issue Apr 29, 2023
perrette pushed a commit that referenced this issue Jun 27, 2023
Despite what I first argued in #51, I now consider the central backup
option to be the safest. More background work with git is needed to keep
the size of the backup repo reasonable (e.g. remove what's older than
X), but that is not urgent, by definition, and manual maintenance is
possible (like removing the whole thing, since the backup dir is
indicated in the config anyway).

TODO:
- initialize back-up git repo upon saving in case it was removed, or to
  make it work even without install.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant