Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

lock file persists after task is complete #10672

Open
chris-rapson-formus opened this issue Jan 15, 2025 · 3 comments
Open

lock file persists after task is complete #10672

chris-rapson-formus opened this issue Jan 15, 2025 · 3 comments
Labels
awaiting response we are waiting for your reply, please respond! :) triage Needs to be triaged

Comments

@chris-rapson-formus
Copy link

chris-rapson-formus commented Jan 15, 2025

Bug Report

pull: permission denied

Description

dvc pull file_to_pull

After the pull has completed, the lock file (.dvc/tmp/lock) is still there. It seems like it is deleted when the next task begins.

This is a problem in our multi-user environment, which shares a single copy of the repo. When a different user wants to dvc pull, they are unable to remove the lock file, so the pull command is blocked:

ERROR: unexpected error - [Errno 13] Permission denied: '/path/to/repo/.dvc/tmp/lock'
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Reproduce

  1. Call dvc pull with one user
  2. Change to a different user (e.g. su user2)
  3. Call dvc pull again
  4. Verify that a lock file still exists, and is owned by the first user:
$ ls -l .dvc/tmp/lock
-rw-rw-r-- 1 user1 user1 9 Jan 16 12:13 .dvc/tmp/lock

Expected

The lock file should be removed when the task is complete.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 3.59.0 (snap)

Platform: Python 3.12.8 on Linux-5.15.0-130-generic-x86_64-with-glibc2.31
Subprojects:
dvc_data = 3.16.8
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.3.9
Supports:
azure (adlfs = 2024.12.0, knack = 0.12.0, azure-identity = 1.19.0),
gdrive (pydrive2 = 1.21.3),
gs (gcsfs = 2024.12.0),
hdfs (fsspec = 2024.12.0, pyarrow = 18.1.0),
http (aiohttp = 3.11.11, aiohttp-retry = 2.9.1),
https (aiohttp = 3.11.11, aiohttp-retry = 2.9.1),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.12.0, boto3 = 1.35.93),
ssh (sshfs = 2024.9.0),
webdav (webdav4 = 0.10.0),
webdavs (webdav4 = 0.10.0),
webhdfs (fsspec = 2024.12.0)
Config:
Global: /home/user1/.config/dvc
System: /etc/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda1
Caches: local
Remotes: gs
Workspace directory: ext4 on /dev/sda1
Repo: dvc (subdir), git
Repo.site_cache_dir: /var/tmp/dvc/repo/61bc2c9403cc66ebc304ff80f4128dba

Additional Information (if any):

N/A

@shcheklein
Copy link
Member

Is the fist dvc pull done by the time you change it to the second user? Why is the second user important here?

if the task for the first user done, do you still see the lock? can you run the command the second time?

@shcheklein shcheklein added awaiting response we are waiting for your reply, please respond! :) triage Needs to be triaged labels Jan 26, 2025
@chris-rapson-formus
Copy link
Author

Yes, the dvc pull is done. It has finished printing messages and brought up the next terminal prompt. As the first user I can run the command again. (The next time, everything is up to date and nothing happens, but also no errors.)

The second user is important because we are working on a multi-user system with a dataset that is hundreds of GB. We'd like to avoid duplicating the dataset if possible.

@shcheklein
Copy link
Member

okay, I think I understand the issue now - we keep the lock file I think, it has a DVC PID and we check if that process still exists.

I think DVC might be creating quite a lot of files with -rw-r--r-- permissions - locks, .gitignore.

How do you deal with other files? how do you allow multiple users to edit them?

would it be better to setup shared cache -

https://dvc.org/doc/use-cases/fast-data-caching-hub
https://dvc.org/doc/user-guide/how-to/share-a-dvc-cache#how-to-share-a-dvc-cache

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
awaiting response we are waiting for your reply, please respond! :) triage Needs to be triaged
Projects
None yet
Development

No branches or pull requests

2 participants