-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[io] don't miss writing a histogram that is only in a last file with option -n 2 #18679
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Open
ferdymercury
wants to merge
5
commits into
root-project:master
Choose a base branch
from
ferdymercury:haddn2
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
5111f69
[io] don't miss writing a histogram that is only in a last file with …
ferdymercury cb44a2f
[test][io] add merge test for single file hist with -n 2
ferdymercury 6a1e569
[io][test] remove stray comment
ferdymercury e5d0915
[io][nfc] add a print for debugging/optimization purposes
ferdymercury f312f3b
[nfc] debug1
ferdymercury File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it
&&
and not||
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or more exactly, I don't understand yet why the fact the histogram can be found means that it is not written in the end ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out:
https://github.com/ferdymercury/root/blob/1531153ea11a7b54f4eb3c170bbd28e9bc46447f/io/io/src/TFileMerger.cxx#L808-L817
so if canBeFound is true, there is an optimization that spare some write cycles.
We use && to avoid it being true, ie to force writing to file. Using || would go in the different direction.
Do you want me to rename canBeFound to skipPartialWriting ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet. I am still confused.
The original
canBeFound
meant (in the context of incremental merge) 'histogram can be found in the source' while the new version ishistogram can be found in the source and in the target
.The optimization is indeed 'skip partial writing if we can found the histogram again'.
So I don't understand (yet) the semantic of the change. ie. Why is the new criteria the right choice? Is the new criteria instead 'just' making
canBeFound
always false?Another avenue of inquiry is 'the original code assume that if
canBeFound
is true then there will be another change to write the histogram. Why is it no true anymore (it does not seem to be realted to 'can not be found in target')? Is there other variation of the example that also fails (how does the-n X
value relates to the number of files in the input list and how many are 'missing' the histograms).Related: is it possible that the alternative if that at the refresh boundary there needs to be a flush/write as if we were at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. I am not sure about these questions; I just followed jblomer's suggestion. My (limited) understanding is that this change just forces an extra partial write the first time that a new histogram appear in any file. So it does not really harm, but is suboptimal since, if all files have exactly the same histograms, then we could have waited until the end. But it makes it work if there are some files with and some without, independently of the chosen N.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be fair enough. Can we verify that it just one and not number_of_input files? (Related but probably unavoidable is that for the case of just 2 files with all the same histograms the number of write is doubled .... actually the number of double for each histogram that is in more than one file (i.e. usual case)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here the test with many more files mtest.cpp.txt.
root -l mtest.cpp.txt+ -b -q 2>&1 | grep MergeOne
Info in TFileMerger::MergeOne: Writing partial result of h1 into target
Info in TFileMerger::MergeOne: Writing partial result of h2 into target