- Install DVC in your environment and all the dependencies you need [s3], [gdrive], [azure], [ssh], etc. To install all use [all]:
pip install dvc[gdrive]
- Initialize DVC repository:
$ dvc init
- Create a folder in your
Gdrive
unit and get theid
from url:
https://drive.google.com/drive/folders/<id>
Then, set that folder as your remote storage for DVC:
$ dvc remote add -d storage gdrive://<id>
The first time you connect/access to gdrive, DVC will ask you to authenticate. Make sure to have the corresponding permissions from your storage
- Push the config files to your repository:
$ git commit .dvc/config -m <message>
$ git push
- This adds the file to your DVC
.gitignore
and creates a.dvc
file that keeps track of the file:
$ dvc add <data_folder/file>
- Add the newly created files to your repository:
$ git add <data_folder>/.gitignore <data_folder/file>.dvc
$ git commit -m <message>
$ git push
- Make sure to always push your changes to your remote storage:
$ dvc push
- Make the changes in your file and add to DVC track:
$ dvc add <data_folder/file>
- Add the changed
.dvc
file to your repository to keep track of the new version:
$ git add <data_folder/file>.dvc
$ git commit -m <message>
$ git push
- Make sure to push the new change to the remote storage:
$ dvc push
- Like the
git status
, make sure to check the status of the files in the remote storage:
$ dvc data status
or
$ dvc status
- Fetch all the untracked files to your cache:
$ dvc fetch <untracked_files>
- Finally, pull from the remote storage:
$ dvc pull
- Checkout to the previous commit in git and DVC:
$ git checkout HEAD^1 <data_folder/file>
$ dvc checkout
DVC checkout modifies the <data_folder/file> to the previous version
- If you want to keep that version in track, commit the changes:
$ git commit <data_folder/file>.dvc -m <message>
$ git push
- Add the new track file to dvc:
$ dvc add
- No need to push, since we already have that version in our storage
- Checkout to the desired commit or branch in git, and then checkout in DVC:
$ git checkout <...> <...>.dvc
$ dvc checkout
- Follow the steps above
- Install DVC in your environment:
pip install dvc
- Import DVC API in your code:
import dvc.api
- To get a file from DVC repository:
with dvc.api.open("path_to_file", repo=<repo_url>) as fd:
#stream files
fd.read()
-
$ dvc list <address_dvc_repository>
: List all files in DVC repository -
$dvc diff <a_rev> <b_rev>
: Compares 2 commits and shows added, modified an deleted DVC tracked files.<a_rev>: Old commit. Default HEAD
<b_rev>: New commit. Default to current workspace
-
dvc doctor
: Display DVC version, environment and project information