-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
PPY-68: Added multithreading support to neptune sync
#1897
base: dev/1.x
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev/1.x #1897 +/- ##
===========================================
- Coverage 77.53% 75.59% -1.95%
===========================================
Files 303 303
Lines 15384 15378 -6
===========================================
- Hits 11928 11625 -303
- Misses 3456 3753 +297
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing to consider here, no strong opinion:
Python is anyway single-threaded and those operations are IO-bound I think, so perhaps multi-threading approach should be replaced with concurrency and asyncio?
@PatrykGala
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor: | ||
futures = [ | ||
executor.submit( | ||
sync_selected_offline, | ||
backend=backend, | ||
base_path=base_path, | ||
container_names=[selected_offline], | ||
containers=containers.offline_containers, | ||
project_name=project_name, | ||
) | ||
for selected_offline in offline_selected | ||
] | ||
for future in concurrent.futures.as_completed(futures): | ||
future.result() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that this exact piece of logic (with slightly different parameters each time) is repeated so often it deserves a separate function ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will create a reusable function if we stick with this approach :)
project = get_project( | ||
project_name_flag=QualifiedName(project_name) if project_name else None, | ||
backend=backend, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this change is related to automatic formatting, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
if not project: | ||
raise CannotSynchronizeOfflineRunsWithoutProject | ||
|
||
for container in containers.offline_containers: | ||
container.sync(base_path=base_path, backend=backend, project=project) | ||
with concurrent.futures.ThreadPoolExecutor(max_workers=num_threads) as executor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if num_threads
is None
? I think we should perform an ordinary loop then, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ThreadPoolExecutor
uses it's defaults if num_workers
is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what if a user doesn't want to use threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neptune sync -n 1
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but
- setting up a thread might have some cost compared to ordinary for loop
- it's counter- intuitive that if I don't specify threads I still end up with threaded behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's anyway a design choice that the team should discuss internally. Save for that and the other points LGTM ;)
Before submitting checklist