-
Notifications
You must be signed in to change notification settings - Fork 185
Fix(experiment): Update coverage calculation logic for #727 #960
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Fix(experiment): Update coverage calculation logic for #727 #960
Conversation
CC @DonggeLiu |
experiment/evaluator.py
Outdated
if total_lines and run_result.coverage: | ||
coverage_diff = run_result.coverage.covered_lines / total_lines | ||
if self.baseline_total_lines > 0 and run_result.coverage: | ||
coverage_diff = newly_covered_lines / self.baseline_total_lines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inaccurate.
The lines linked with the new fuzz target could be different from the baseline, hence the denominator should be the union of both sets.
See this example: #898 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, in experiment/evaluator.py (within the check_target method), I have now implemented:
Obtain the Textcov objects for both the baseline (existing_textcov) and the current run.
Create a merged Textcov representing the union (using a copy of the current run's coverage and merge(existing_textcov)).
Use the total_lines property of this merged Textcov object (union_total_lines) as the denominator when calculating coverage_diff (newly_covered_lines / union_total_lines).
CC @DonggeLiu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @demoncoder-crypto !
The logic looks good to me now, here are some suggestions / questions re. implementation.
experiment/evaluator.py
Outdated
if run_result.coverage: | ||
run_result.coverage.subtract_covered_lines(existing_textcov) | ||
if run_result and run_result.coverage: | ||
current_coverage_copy = run_result.coverage.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this will only create a shallow copy, the modification below will affect the original run_result.coverage
.
Try using python's builtin deepcopy
package.
We can add a function in class Textcov
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line current_coverage_copy = run_result.coverage.copy() in evaluator.py should therefore already be performing a deep copy, preventing unintended modifications to the original run_result.coverage object. I think it will do deep copy please correct me if i am wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@demoncoder-crypto Could you confirmed this via a simple script?
The coverage
is a Textcov
, which is a dataclass and does not have builtin function copy:
(Pdb) from experiment.textcov import Textcov
(Pdb) cov = Textcov()
(Pdb) cov.copy()
*** AttributeError: 'Textcov' object has no attribute 'copy'
experiment/evaluator.py
Outdated
run_result.coverage.subtract_covered_lines(existing_textcov) | ||
newly_covered_lines = run_result.coverage.covered_lines | ||
else: | ||
newly_covered_lines = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recon the conditions of these two blocks (if current_coverage_copy:
and if run_result.coverage:
) are essentially the same?
We can merge them for simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can merge, good point, thanks
experiment/evaluator.py
Outdated
f'Warning: total_lines == 0 in {generated_oss_fuzz_project}.') | ||
coverage_diff = 0.0 | ||
if run_result: | ||
existing_textcov = self.load_existing_textcov() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please add a TODO for this?
TODO(dongge): Move load_existing_textcov to OSS-Fuzz module so that we only need to run it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
experiment/evaluator.py
Outdated
elif self._load_existing_coverage_summary(): | ||
coverage_summary = self._load_existing_coverage_summary() | ||
total_lines_for_percent = compute_total_lines_without_fuzz_targets( | ||
coverage_summary, generated_target_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to modify the logic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines. It correctly uses the pre-subtraction coverage data stored in current_coverage_copy for this calculation. That is my understanding of this please correct me if i i am wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines.
This makes sense, but:
- Is there any code using
total_lines_for_percent
after you assigned it? - Why did we prefer to remove
coverage_summary = self._load_existing_coverage_summary()
from the top and call the function twice?
Original code
if run_result:
# Gets line coverage (diff) details.
coverage_summary = self._load_existing_coverage_summary()
...
elif coverage_summary:
total_lines = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
- Why did we remove the final
else
?
Original code:
elif coverage_summary:
total_lines = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
else:
total_lines = 0
- Could you please add back the comment for JVM and Python? Thanks.
Original code:
# The Jacoco.xml coverage report used to generate summary.json on
# OSS-Fuzz for JVM projects does not trace the source file location.
# Thus the conversion may miss some classes because they are not
# present during coverage report generation. This fix gets the total
# line calculation from the jacoco.xml report of the current run
# directly and compares it with the total_lines retrieved from
# summary.json. Then the larger total_lines is used which is assumed
# to be more accurate. This is the same case for python project which
# the total line is determined from the all_cov.json file.
experiment/evaluator.py
Outdated
dual_logger.log( | ||
f'Warning: union_total_lines is 0 but newly_covered_lines is {newly_covered_lines}. Cannot calculate coverage diff accurately.' | ||
) | ||
coverage_diff = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this part is getting more complex, could you please separate it into an individual function?
Thanks!
Thanks @DonggeLiu for viewing this I will work on the remaining issue and fix them by tomorrow, thanks for support |
I have the necessary changes @DonggeLiu. I have tried to address the issue mentioned do let me know if any thing else is needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @demoncoder-crypto, thanks for making the changes.
However, some indentation changes (2 spaces to 4 spaces) confused git
. Could you please double-check this PR only modifies the parts needed?
Thanks!
experiment/evaluator.py
Outdated
if run_result.coverage: | ||
run_result.coverage.subtract_covered_lines(existing_textcov) | ||
if run_result and run_result.coverage: | ||
current_coverage_copy = run_result.coverage.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@demoncoder-crypto Could you confirmed this via a simple script?
The coverage
is a Textcov
, which is a dataclass and does not have builtin function copy:
(Pdb) from experiment.textcov import Textcov
(Pdb) cov = Textcov()
(Pdb) cov.copy()
*** AttributeError: 'Textcov' object has no attribute 'copy'
experiment/evaluator.py
Outdated
elif self._load_existing_coverage_summary(): | ||
coverage_summary = self._load_existing_coverage_summary() | ||
total_lines_for_percent = compute_total_lines_without_fuzz_targets( | ||
coverage_summary, generated_target_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines.
This makes sense, but:
- Is there any code using
total_lines_for_percent
after you assigned it? - Why did we prefer to remove
coverage_summary = self._load_existing_coverage_summary()
from the top and call the function twice?
Original code
if run_result:
# Gets line coverage (diff) details.
coverage_summary = self._load_existing_coverage_summary()
...
elif coverage_summary:
total_lines = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
- Why did we remove the final
else
?
Original code:
elif coverage_summary:
total_lines = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
else:
total_lines = 0
- Could you please add back the comment for JVM and Python? Thanks.
Original code:
# The Jacoco.xml coverage report used to generate summary.json on
# OSS-Fuzz for JVM projects does not trace the source file location.
# Thus the conversion may miss some classes because they are not
# present during coverage report generation. This fix gets the total
# line calculation from the jacoco.xml report of the current run
# directly and compares it with the total_lines retrieved from
# summary.json. Then the larger total_lines is used which is assumed
# to be more accurate. This is the same case for python project which
# the total line is determined from the all_cov.json file.
def check_target(self, ai_binary, target_path: str) -> Result: | ||
"""Builds and runs a target.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you prefer to remove this doc-string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Un intentional
if GENERATE_CORPUS: | ||
self.extend_build_with_corpus(ai_binary, target_path, | ||
self.extend_build_with_corpus(ai_binary, target_path, | ||
generated_oss_fuzz_project) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect indentation.
experiment/evaluator.py
Outdated
try: | ||
build_result, run_result = self.builder_runner.build_and_run( | ||
generated_oss_fuzz_project, target_path, llm_fix_count, | ||
self.benchmark.language) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple indentation changes and other unnecessary changes.
Could you please ensure the PR only changes the parts needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really sorry for this
In general I am not against using AI or automation tools to assist coding, but we would certainly appreciate it if you could review their result before submitting the code (e.g., indentation mismatches, etc.). |
@DonggeLiu you are absolutely right, Thanks for the note. I do leverage AI and automation to accelerate my coding, but I understand the importance of manual review to ensure everything meet the standards, including proper indentation and formatting. I am sorry my implementation was not up to the mark this time. I did not check this code thoroughly its my mistake. I will improve upon it on further iterations. |
This makes sense, but: Is there any code using total_lines_for_percent after you assigned it? So for this I am really sorry about removing JVM and Python comment. For this - Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice? And for this Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice? The intention wasn't to call it twice, but rather to call it only when necessary. Loading the summary.json involves fetching from Google Cloud Storage and parsing JSON, which can be relatively slow. The new code optimizes this by removing the unconditional load at the beginning. Instead, self._load_existing_coverage_summary() is now called only inside the specific fallback condition (if total_lines_for_percent == 0:). Therefore, the function is called at most once per execution of _calculate_coverage_metrics, and only if the primary methods for determining line counts fail. This avoids unnecessary GCS reads and improves performance in the common case. Why did we remove the final else? |
Hi, this time as per your recommendation I ran the deep copy feature. Please let me know if there is any other changes or implementation you want me to fix I am more than happy to do. Thanks for the support. @DonggeLiu . |
Thanks @demoncoder-crypto, could you please keep the indentation consistent? Many lines are considered as new code because now they indent with 4 spaces (instead of 2 spaces, like the rest code of the projects). |
For sure fixing it |
Now I fixed this using yapf hopefully all indentation is fixed |
Any updates @DonggeLiu. Please let me know if everything is fixed. Thanks |
New Coverage Formulas:
Absolute Coverage:
Computes the percentage as:
Coverage(f1) = LinesCovered(f1) / LinesLinked(f1)
Coverage Improvement:
Computes the improvement as:
Improvement(f1 vs f0) = NewlyCoveredLines(f1 vs f0) / LinesLinked(f0)
(Here, LinesLinked(f0) approximates the total relevant lines for the baseline build.)
Data Extraction:
For the new target (f1):
LinesCovered(f1): Retrieved from run_result.coverage.covered_lines.
LinesLinked(f1): Extracted from summary.json using the new helper function _get_total_lines_from_summary.
For the baseline target (f0):
LinesCovered(f0): Obtained from self.existing_textcov.covered_lines.
LinesLinked(f0): Retrieved from self.existing_coverage_summary and parsed with _get_total_lines_from_summary, stored in self.baseline_total_lines.
Calculation Logic:
Absolute Coverage:
Calculated by dividing the new target’s covered lines by its total linked lines.
Coverage Improvement:
A copy of the new target’s coverage object is created.
The copy subtracts the baseline coverage (self.existing_textcov) to compute the number of newly covered lines.
Improvement is then computed as the ratio of these newly covered lines to the baseline’s total linked lines.
Raw Count Storage:
New fields (newly_covered_lines, total_lines, baseline_total_lines) are added to store the integer counts used in the calculations.
Modifications in Core Logic:
Evaluator.init:
Loads the baseline coverage summary and initializes self.baseline_total_lines using _get_total_lines_from_summary.
Evaluator.check_target:
After a successful build-and-run, it validates the existence of both run_result.coverage and run_result.coverage_summary.
It calculates total_lines_f1 using the helper function.
Computes result.coverage and result.line_coverage_diff (using the newly calculated values) while handling division by zero.
Uses a non-destructive approach by copying the coverage object before performing subtraction.
Robustness Enhancements:
The helper function _get_total_lines_from_summary ensures safe parsing of the summary.json structure, handling potential errors such as missing keys or incorrect types by returning 0 when necessary.