Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

PPY-68: Added multithreading support to neptune sync #1897

Open
wants to merge 1 commit into
base: dev/1.x
Choose a base branch
from

Conversation

SiddhantSadangi
Copy link
Member

Before submitting checklist

  • Did you update the CHANGELOG? (not for test updates, internal changes/refactors or CI/CD setup)
  • Did you ask the docs owner to review all the user-facing changes?

@SiddhantSadangi SiddhantSadangi requested a review from a team December 27, 2024 11:22
@SiddhantSadangi SiddhantSadangi self-assigned this Dec 27, 2024
@SiddhantSadangi SiddhantSadangi requested a review from a team December 27, 2024 11:22
@SiddhantSadangi SiddhantSadangi added this to the 1.14 milestone Dec 27, 2024
Copy link

codecov bot commented Dec 27, 2024

Codecov Report

Attention: Patch coverage is 71.42857% with 10 lines in your changes missing coverage. Please review.

Project coverage is 75.59%. Comparing base (be19b8e) to head (fecb9cc).

Files with missing lines Patch % Lines
src/neptune/cli/commands.py 16.66% 5 Missing ⚠️
src/neptune/cli/sync.py 82.75% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           dev/1.x    #1897      +/-   ##
===========================================
- Coverage    77.53%   75.59%   -1.95%     
===========================================
  Files          303      303              
  Lines        15384    15378       -6     
===========================================
- Hits         11928    11625     -303     
- Misses        3456     3753     +297     
Flag Coverage Δ
e2e ?
e2e-management ?
e2e-s3 ?
e2e-s3-gcs ?
macos 75.32% <71.42%> (-1.89%) ⬇️
py3.12 ?
py3.8 75.59% <71.42%> (-1.95%) ⬇️
ubuntu 75.46% <71.42%> (-1.91%) ⬇️
unit 75.59% <71.42%> (-0.01%) ⬇️
windows 74.50% <71.42%> (-1.95%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@AleksanderWWW AleksanderWWW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing to consider here, no strong opinion:

Python is anyway single-threaded and those operations are IO-bound I think, so perhaps multi-threading approach should be replaced with concurrency and asyncio?
@PatrykGala

Comment on lines +156 to +169
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
futures = [
executor.submit(
sync_selected_offline,
backend=backend,
base_path=base_path,
container_names=[selected_offline],
containers=containers.offline_containers,
project_name=project_name,
)
for selected_offline in offline_selected
]
for future in concurrent.futures.as_completed(futures):
future.result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this exact piece of logic (with slightly different parameters each time) is repeated so often it deserves a separate function ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will create a reusable function if we stick with this approach :)

Comment on lines +63 to +66
project = get_project(
project_name_flag=QualifiedName(project_name) if project_name else None,
backend=backend,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this change is related to automatic formatting, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

if not project:
raise CannotSynchronizeOfflineRunsWithoutProject

for container in containers.offline_containers:
container.sync(base_path=base_path, backend=backend, project=project)
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if num_threads is None? I think we should perform an ordinary loop then, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadPoolExecutor uses it's defaults if num_workers is None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what if a user doesn't want to use threads?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neptune sync -n 1 ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but

  • setting up a thread might have some cost compared to ordinary for loop
  • it's counter- intuitive that if I don't specify threads I still end up with threaded behaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's anyway a design choice that the team should discuss internally. Save for that and the other points LGTM ;)

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants