Skip to content

Fix(experiment): Update coverage calculation logic for #727 #960

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

demoncoder-crypto
Copy link

New Coverage Formulas:

Absolute Coverage:
Computes the percentage as:
Coverage(f1) = LinesCovered(f1) / LinesLinked(f1)

Coverage Improvement:
Computes the improvement as:
Improvement(f1 vs f0) = NewlyCoveredLines(f1 vs f0) / LinesLinked(f0)
(Here, LinesLinked(f0) approximates the total relevant lines for the baseline build.)

Data Extraction:

For the new target (f1):

LinesCovered(f1): Retrieved from run_result.coverage.covered_lines.

LinesLinked(f1): Extracted from summary.json using the new helper function _get_total_lines_from_summary.

For the baseline target (f0):

LinesCovered(f0): Obtained from self.existing_textcov.covered_lines.

LinesLinked(f0): Retrieved from self.existing_coverage_summary and parsed with _get_total_lines_from_summary, stored in self.baseline_total_lines.

Calculation Logic:

Absolute Coverage:
Calculated by dividing the new target’s covered lines by its total linked lines.

Coverage Improvement:

A copy of the new target’s coverage object is created.

The copy subtracts the baseline coverage (self.existing_textcov) to compute the number of newly covered lines.

Improvement is then computed as the ratio of these newly covered lines to the baseline’s total linked lines.

Raw Count Storage:
New fields (newly_covered_lines, total_lines, baseline_total_lines) are added to store the integer counts used in the calculations.

Modifications in Core Logic:

Evaluator.init:
Loads the baseline coverage summary and initializes self.baseline_total_lines using _get_total_lines_from_summary.

Evaluator.check_target:

After a successful build-and-run, it validates the existence of both run_result.coverage and run_result.coverage_summary.

It calculates total_lines_f1 using the helper function.

Computes result.coverage and result.line_coverage_diff (using the newly calculated values) while handling division by zero.

Uses a non-destructive approach by copying the coverage object before performing subtraction.

Robustness Enhancements:

The helper function _get_total_lines_from_summary ensures safe parsing of the summary.json structure, handling potential errors such as missing keys or incorrect types by returning 0 when necessary.

@demoncoder-crypto
Copy link
Author

CC @DonggeLiu

if total_lines and run_result.coverage:
coverage_diff = run_result.coverage.covered_lines / total_lines
if self.baseline_total_lines > 0 and run_result.coverage:
coverage_diff = newly_covered_lines / self.baseline_total_lines
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inaccurate.
The lines linked with the new fuzz target could be different from the baseline, hence the denominator should be the union of both sets.

See this example: #898 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, in experiment/evaluator.py (within the check_target method), I have now implemented:
Obtain the Textcov objects for both the baseline (existing_textcov) and the current run.
Create a merged Textcov representing the union (using a copy of the current run's coverage and merge(existing_textcov)).
Use the total_lines property of this merged Textcov object (union_total_lines) as the denominator when calculating coverage_diff (newly_covered_lines / union_total_lines).

@demoncoder-crypto
Copy link
Author

CC @DonggeLiu

Copy link
Collaborator

@DonggeLiu DonggeLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @demoncoder-crypto !
The logic looks good to me now, here are some suggestions / questions re. implementation.

if run_result.coverage:
run_result.coverage.subtract_covered_lines(existing_textcov)
if run_result and run_result.coverage:
current_coverage_copy = run_result.coverage.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this will only create a shallow copy, the modification below will affect the original run_result.coverage.
Try using python's builtin deepcopy package.
We can add a function in class Textcov for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line current_coverage_copy = run_result.coverage.copy() in evaluator.py should therefore already be performing a deep copy, preventing unintended modifications to the original run_result.coverage object. I think it will do deep copy please correct me if i am wrong

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@demoncoder-crypto Could you confirmed this via a simple script?
The coverage is a Textcov, which is a dataclass and does not have builtin function copy:

(Pdb) from experiment.textcov import Textcov
(Pdb) cov = Textcov()
(Pdb) cov.copy()
*** AttributeError: 'Textcov' object has no attribute 'copy'

run_result.coverage.subtract_covered_lines(existing_textcov)
newly_covered_lines = run_result.coverage.covered_lines
else:
newly_covered_lines = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recon the conditions of these two blocks (if current_coverage_copy: and if run_result.coverage:) are essentially the same?
We can merge them for simplicity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can merge, good point, thanks

f'Warning: total_lines == 0 in {generated_oss_fuzz_project}.')
coverage_diff = 0.0
if run_result:
existing_textcov = self.load_existing_textcov()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please add a TODO for this?
TODO(dongge): Move load_existing_textcov to OSS-Fuzz module so that we only need to run it once.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

elif self._load_existing_coverage_summary():
coverage_summary = self._load_existing_coverage_summary()
total_lines_for_percent = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to modify the logic here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines. It correctly uses the pre-subtraction coverage data stored in current_coverage_copy for this calculation. That is my understanding of this please correct me if i i am wrong

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines.

This makes sense, but:

  1. Is there any code using total_lines_for_percent after you assigned it?
  2. Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?
    Original code
      if run_result:
        # Gets line coverage (diff) details.
        coverage_summary = self._load_existing_coverage_summary()
...
        elif coverage_summary:
          total_lines = compute_total_lines_without_fuzz_targets(
              coverage_summary, generated_target_name)
  1. Why did we remove the final else?
    Original code:
        elif coverage_summary:
          total_lines = compute_total_lines_without_fuzz_targets(
              coverage_summary, generated_target_name)
        else:
          total_lines = 0
  1. Could you please add back the comment for JVM and Python? Thanks.
    Original code:
          # The Jacoco.xml coverage report used to generate summary.json on
          # OSS-Fuzz for JVM projects does not trace the source file location.
          # Thus the conversion may miss some classes because they are not
          # present during coverage report generation. This fix gets the total
          # line calculation from the jacoco.xml report of the current run
          # directly and compares it with the total_lines retrieved from
          # summary.json. Then the larger total_lines is used which is assumed
          # to be more accurate. This is the same case for python project which
          # the total line is determined from the all_cov.json file.

dual_logger.log(
f'Warning: union_total_lines is 0 but newly_covered_lines is {newly_covered_lines}. Cannot calculate coverage diff accurately.'
)
coverage_diff = 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this part is getting more complex, could you please separate it into an individual function?
Thanks!

@demoncoder-crypto
Copy link
Author

Thanks @DonggeLiu for viewing this I will work on the remaining issue and fix them by tomorrow, thanks for support

@DonggeLiu DonggeLiu marked this pull request as draft April 11, 2025 05:16
@demoncoder-crypto demoncoder-crypto marked this pull request as ready for review April 12, 2025 17:35
@demoncoder-crypto
Copy link
Author

I have the necessary changes @DonggeLiu. I have tried to address the issue mentioned do let me know if any thing else is needed

Copy link
Collaborator

@DonggeLiu DonggeLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @demoncoder-crypto, thanks for making the changes.
However, some indentation changes (2 spaces to 4 spaces) confused git. Could you please double-check this PR only modifies the parts needed?
Thanks!

if run_result.coverage:
run_result.coverage.subtract_covered_lines(existing_textcov)
if run_result and run_result.coverage:
current_coverage_copy = run_result.coverage.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@demoncoder-crypto Could you confirmed this via a simple script?
The coverage is a Textcov, which is a dataclass and does not have builtin function copy:

(Pdb) from experiment.textcov import Textcov
(Pdb) cov = Textcov()
(Pdb) cov.copy()
*** AttributeError: 'Textcov' object has no attribute 'copy'

elif self._load_existing_coverage_summary():
coverage_summary = self._load_existing_coverage_summary()
total_lines_for_percent = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines.

This makes sense, but:

  1. Is there any code using total_lines_for_percent after you assigned it?
  2. Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?
    Original code
      if run_result:
        # Gets line coverage (diff) details.
        coverage_summary = self._load_existing_coverage_summary()
...
        elif coverage_summary:
          total_lines = compute_total_lines_without_fuzz_targets(
              coverage_summary, generated_target_name)
  1. Why did we remove the final else?
    Original code:
        elif coverage_summary:
          total_lines = compute_total_lines_without_fuzz_targets(
              coverage_summary, generated_target_name)
        else:
          total_lines = 0
  1. Could you please add back the comment for JVM and Python? Thanks.
    Original code:
          # The Jacoco.xml coverage report used to generate summary.json on
          # OSS-Fuzz for JVM projects does not trace the source file location.
          # Thus the conversion may miss some classes because they are not
          # present during coverage report generation. This fix gets the total
          # line calculation from the jacoco.xml report of the current run
          # directly and compares it with the total_lines retrieved from
          # summary.json. Then the larger total_lines is used which is assumed
          # to be more accurate. This is the same case for python project which
          # the total line is determined from the all_cov.json file.

def check_target(self, ai_binary, target_path: str) -> Result:
"""Builds and runs a target."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you prefer to remove this doc-string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un intentional

if GENERATE_CORPUS:
self.extend_build_with_corpus(ai_binary, target_path,
self.extend_build_with_corpus(ai_binary, target_path,
generated_oss_fuzz_project)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation.

try:
build_result, run_result = self.builder_runner.build_and_run(
generated_oss_fuzz_project, target_path, llm_fix_count,
self.benchmark.language)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple indentation changes and other unnecessary changes.
Could you please ensure the PR only changes the parts needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really sorry for this

@DonggeLiu
Copy link
Collaborator

In general I am not against using AI or automation tools to assist coding, but we would certainly appreciate it if you could review their result before submitting the code (e.g., indentation mismatches, etc.).

@demoncoder-crypto
Copy link
Author

@DonggeLiu you are absolutely right, Thanks for the note. I do leverage AI and automation to accelerate my coding, but I understand the importance of manual review to ensure everything meet the standards, including proper indentation and formatting. I am sorry my implementation was not up to the mark this time. I did not check this code thoroughly its my mistake. I will improve upon it on further iterations.

@demoncoder-crypto
Copy link
Author

This makes sense, but:

Is there any code using total_lines_for_percent after you assigned it?
Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?
Original code
if run_result:
# Gets line coverage (diff) details.
coverage_summary = self._load_existing_coverage_summary()
...
elif coverage_summary:
total_lines = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
Why did we remove the final else?
Original code:
elif coverage_summary:
total_lines = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
else:
total_lines = 0
Could you please add back the comment for JVM and Python? Thanks.
Original code:
# The Jacoco.xml coverage report used to generate summary.json on
# OSS-Fuzz for JVM projects does not trace the source file location.
# Thus the conversion may miss some classes because they are not
# present during coverage report generation. This fix gets the total
# line calculation from the jacoco.xml report of the current run
# directly and compares it with the total_lines retrieved from
# summary.json. Then the larger total_lines is used which is assumed
# to be more accurate. This is the same case for python project which
# the total line is determined from the all_cov.json file.

So for this I am really sorry about removing JVM and Python comment.

For this - Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?
Yes. After total_lines_for_percent is calculated (either from current_coverage_copy or the fallback coverage_summary), it's immediately used in the next if condition: if total_lines_for_percent > 0 and original_covered_lines != -1:. If this condition is true, it's used as the denominator to calculate coverage_percent. It's also used in the else block within the warning log message: dual_logger.log(f'Warning: Could not determine coverage percentage... total_lines={total_lines_for_percent}'). So, it's essential for both the calculation and logging in the fallback scenario.

And for this Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?

The intention wasn't to call it twice, but rather to call it only when necessary.

Loading the summary.json involves fetching from Google Cloud Storage and parsing JSON, which can be relatively slow.
In the original code, it was loaded unconditionally at the start of the if run_result: block, even though it might only be needed much later in the fallback logic (if run_result.total_pcs was 0 and current_coverage_copy.total_lines was 0).

The new code optimizes this by removing the unconditional load at the beginning. Instead, self._load_existing_coverage_summary() is now called only inside the specific fallback condition (if total_lines_for_percent == 0:).

Therefore, the function is called at most once per execution of _calculate_coverage_metrics, and only if the primary methods for determining line counts fail. This avoids unnecessary GCS reads and improves performance in the common case.

Why did we remove the final else?
As per my understanding the else- total_lines_for_percent = 0 was removed because the variable total_lines_for_percent is already initialized to 0 at the start of that code block. If the attempts to calculate it from coverage data fail, it simply retains its initial 0 value, which the subsequent logic handles correctly.

@demoncoder-crypto
Copy link
Author

Screenshot 2025-04-15 234034

@demoncoder-crypto
Copy link
Author

Hi, this time as per your recommendation I ran the deep copy feature. Please let me know if there is any other changes or implementation you want me to fix I am more than happy to do. Thanks for the support. @DonggeLiu .

@DonggeLiu
Copy link
Collaborator

Thanks @demoncoder-crypto, could you please keep the indentation consistent?
For example, this is likely caused by previous commits:
image

Many lines are considered as new code because now they indent with 4 spaces (instead of 2 spaces, like the rest code of the projects).

@demoncoder-crypto
Copy link
Author

For sure fixing it

@demoncoder-crypto
Copy link
Author

Now I fixed this using yapf hopefully all indentation is fixed

@demoncoder-crypto
Copy link
Author

Any updates @DonggeLiu. Please let me know if everything is fixed. Thanks

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants