Fix(experiment): Update coverage calculation logic for #727 #960

demoncoder-crypto · 2025-04-03T13:45:39Z

New Coverage Formulas:

Absolute Coverage:
Computes the percentage as:
Coverage(f1) = LinesCovered(f1) / LinesLinked(f1)

Coverage Improvement:
Computes the improvement as:
Improvement(f1 vs f0) = NewlyCoveredLines(f1 vs f0) / LinesLinked(f0)
(Here, LinesLinked(f0) approximates the total relevant lines for the baseline build.)

Data Extraction:

For the new target (f1):

LinesCovered(f1): Retrieved from run_result.coverage.covered_lines.

LinesLinked(f1): Extracted from summary.json using the new helper function _get_total_lines_from_summary.

For the baseline target (f0):

LinesCovered(f0): Obtained from self.existing_textcov.covered_lines.

LinesLinked(f0): Retrieved from self.existing_coverage_summary and parsed with _get_total_lines_from_summary, stored in self.baseline_total_lines.

Calculation Logic:

Absolute Coverage:
Calculated by dividing the new target’s covered lines by its total linked lines.

Coverage Improvement:

A copy of the new target’s coverage object is created.

The copy subtracts the baseline coverage (self.existing_textcov) to compute the number of newly covered lines.

Improvement is then computed as the ratio of these newly covered lines to the baseline’s total linked lines.

Raw Count Storage:
New fields (newly_covered_lines, total_lines, baseline_total_lines) are added to store the integer counts used in the calculations.

Modifications in Core Logic:

Evaluator.init:
Loads the baseline coverage summary and initializes self.baseline_total_lines using _get_total_lines_from_summary.

Evaluator.check_target:

After a successful build-and-run, it validates the existence of both run_result.coverage and run_result.coverage_summary.

It calculates total_lines_f1 using the helper function.

Computes result.coverage and result.line_coverage_diff (using the newly calculated values) while handling division by zero.

Uses a non-destructive approach by copying the coverage object before performing subtraction.

Robustness Enhancements:

The helper function _get_total_lines_from_summary ensures safe parsing of the summary.json structure, handling potential errors such as missing keys or incorrect types by returning 0 when necessary.

demoncoder-crypto · 2025-04-03T13:45:51Z

CC @DonggeLiu

DonggeLiu · 2025-04-03T23:57:12Z

experiment/evaluator.py

-        if total_lines and run_result.coverage:
-          coverage_diff = run_result.coverage.covered_lines / total_lines
+        if self.baseline_total_lines > 0 and run_result.coverage:
+          coverage_diff = newly_covered_lines / self.baseline_total_lines


This is inaccurate.
The lines linked with the new fuzz target could be different from the baseline, hence the denominator should be the union of both sets.

See this example: #898 (comment)

Specifically, in experiment/evaluator.py (within the check_target method), I have now implemented:
Obtain the Textcov objects for both the baseline (existing_textcov) and the current run.
Create a merged Textcov representing the union (using a copy of the current run's coverage and merge(existing_textcov)).
Use the total_lines property of this merged Textcov object (union_total_lines) as the denominator when calculating coverage_diff (newly_covered_lines / union_total_lines).

demoncoder-crypto · 2025-04-04T07:15:01Z

CC @DonggeLiu

DonggeLiu

Thanks a lot @demoncoder-crypto !
The logic looks good to me now, here are some suggestions / questions re. implementation.

DonggeLiu · 2025-04-10T04:58:08Z

experiment/evaluator.py

-        if run_result.coverage:
-          run_result.coverage.subtract_covered_lines(existing_textcov)
+      if run_result and run_result.coverage:
+          current_coverage_copy = run_result.coverage.copy()


IIUC, this will only create a shallow copy, the modification below will affect the original run_result.coverage.
Try using python's builtin deepcopy package.
We can add a function in class Textcov for this.

The line current_coverage_copy = run_result.coverage.copy() in evaluator.py should therefore already be performing a deep copy, preventing unintended modifications to the original run_result.coverage object. I think it will do deep copy please correct me if i am wrong

@demoncoder-crypto Could you confirmed this via a simple script?
The coverage is a Textcov, which is a dataclass and does not have builtin function copy:

(Pdb) from experiment.textcov import Textcov (Pdb) cov = Textcov() (Pdb) cov.copy() *** AttributeError: 'Textcov' object has no attribute 'copy'

DonggeLiu · 2025-04-10T05:21:33Z

experiment/evaluator.py

+              run_result.coverage.subtract_covered_lines(existing_textcov)
+              newly_covered_lines = run_result.coverage.covered_lines
+          else:
+              newly_covered_lines = 0


I recon the conditions of these two blocks (if current_coverage_copy: and if run_result.coverage:) are essentially the same?
We can merge them for simplicity.

Yes we can merge, good point, thanks

DonggeLiu · 2025-04-10T05:24:05Z

experiment/evaluator.py

-              f'Warning: total_lines == 0 in {generated_oss_fuzz_project}.')
-          coverage_diff = 0.0
+      if run_result:
+          existing_textcov = self.load_existing_textcov()


could you please add a TODO for this?
TODO(dongge): Move load_existing_textcov to OSS-Fuzz module so that we only need to run it once.

DonggeLiu · 2025-04-10T05:59:20Z

experiment/evaluator.py

+          elif self._load_existing_coverage_summary():
+              coverage_summary = self._load_existing_coverage_summary()
+              total_lines_for_percent = compute_total_lines_without_fuzz_targets(
+                  coverage_summary, generated_target_name)


Why do we need to modify the logic here?

As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines. It correctly uses the pre-subtraction coverage data stored in current_coverage_copy for this calculation. That is my understanding of this please correct me if i i am wrong

As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines.

This makes sense, but:

Is there any code using total_lines_for_percent after you assigned it?

Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?
Original code

if run_result: # Gets line coverage (diff) details. coverage_summary = self._load_existing_coverage_summary() ... elif coverage_summary: total_lines = compute_total_lines_without_fuzz_targets( coverage_summary, generated_target_name)

Why did we remove the final else?
Original code:

elif coverage_summary: total_lines = compute_total_lines_without_fuzz_targets( coverage_summary, generated_target_name) else: total_lines = 0

Could you please add back the comment for JVM and Python? Thanks.
Original code:

# The Jacoco.xml coverage report used to generate summary.json on # OSS-Fuzz for JVM projects does not trace the source file location. # Thus the conversion may miss some classes because they are not # present during coverage report generation. This fix gets the total # line calculation from the jacoco.xml report of the current run # directly and compares it with the total_lines retrieved from # summary.json. Then the larger total_lines is used which is assumed # to be more accurate. This is the same case for python project which # the total line is determined from the all_cov.json file.

DonggeLiu · 2025-04-10T06:14:29Z

experiment/evaluator.py

+                   dual_logger.log(
+                       f'Warning: union_total_lines is 0 but newly_covered_lines is {newly_covered_lines}. Cannot calculate coverage diff accurately.'
+                   )
+              coverage_diff = 0.0


As this part is getting more complex, could you please separate it into an individual function?
Thanks!

demoncoder-crypto · 2025-04-10T16:37:31Z

Thanks @DonggeLiu for viewing this I will work on the remaining issue and fix them by tomorrow, thanks for support

demoncoder-crypto · 2025-04-12T17:35:59Z

I have the necessary changes @DonggeLiu. I have tried to address the issue mentioned do let me know if any thing else is needed

DonggeLiu

Hi @demoncoder-crypto, thanks for making the changes.
However, some indentation changes (2 spaces to 4 spaces) confused git. Could you please double-check this PR only modifies the parts needed?
Thanks!

DonggeLiu · 2025-04-14T04:26:08Z

experiment/evaluator.py

-        if run_result.coverage:
-          run_result.coverage.subtract_covered_lines(existing_textcov)
+      if run_result and run_result.coverage:
+          current_coverage_copy = run_result.coverage.copy()


@demoncoder-crypto Could you confirmed this via a simple script?
The coverage is a Textcov, which is a dataclass and does not have builtin function copy:

(Pdb) from experiment.textcov import Textcov (Pdb) cov = Textcov() (Pdb) cov.copy() *** AttributeError: 'Textcov' object has no attribute 'copy'

DonggeLiu · 2025-04-14T04:41:20Z

experiment/evaluator.py

+          elif self._load_existing_coverage_summary():
+              coverage_summary = self._load_existing_coverage_summary()
+              total_lines_for_percent = compute_total_lines_without_fuzz_targets(
+                  coverage_summary, generated_target_name)


As per my understanding this logic ensures that the overall coverage_percent accurately reflects the coverage achieved by the current fuzz target using its own relevant lines as the denominator, distinct from the coverage_diff which uses the union of lines.

This makes sense, but:

Is there any code using total_lines_for_percent after you assigned it?

Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?
Original code

if run_result: # Gets line coverage (diff) details. coverage_summary = self._load_existing_coverage_summary() ... elif coverage_summary: total_lines = compute_total_lines_without_fuzz_targets( coverage_summary, generated_target_name)

Why did we remove the final else?
Original code:

elif coverage_summary: total_lines = compute_total_lines_without_fuzz_targets( coverage_summary, generated_target_name) else: total_lines = 0

Could you please add back the comment for JVM and Python? Thanks.
Original code:

# The Jacoco.xml coverage report used to generate summary.json on # OSS-Fuzz for JVM projects does not trace the source file location. # Thus the conversion may miss some classes because they are not # present during coverage report generation. This fix gets the total # line calculation from the jacoco.xml report of the current run # directly and compares it with the total_lines retrieved from # summary.json. Then the larger total_lines is used which is assumed # to be more accurate. This is the same case for python project which # the total line is determined from the all_cov.json file.

DonggeLiu · 2025-04-14T04:54:01Z

experiment/evaluator.py

  def check_target(self, ai_binary, target_path: str) -> Result:
-    """Builds and runs a target."""


Why do you prefer to remove this doc-string?

Un intentional

DonggeLiu · 2025-04-14T04:54:50Z

experiment/evaluator.py

    if GENERATE_CORPUS:
-      self.extend_build_with_corpus(ai_binary, target_path,
+        self.extend_build_with_corpus(ai_binary, target_path,
                                    generated_oss_fuzz_project)


Incorrect indentation.

DonggeLiu · 2025-04-14T04:57:15Z

experiment/evaluator.py

+        try:
+            build_result, run_result = self.builder_runner.build_and_run(
+                generated_oss_fuzz_project, target_path, llm_fix_count,
+                self.benchmark.language)


There are multiple indentation changes and other unnecessary changes.
Could you please ensure the PR only changes the parts needed?

I am really sorry for this

DonggeLiu · 2025-04-14T05:04:04Z

In general I am not against using AI or automation tools to assist coding, but we would certainly appreciate it if you could review their result before submitting the code (e.g., indentation mismatches, etc.).

demoncoder-crypto · 2025-04-14T05:10:17Z

@DonggeLiu you are absolutely right, Thanks for the note. I do leverage AI and automation to accelerate my coding, but I understand the importance of manual review to ensure everything meet the standards, including proper indentation and formatting. I am sorry my implementation was not up to the mark this time. I did not check this code thoroughly its my mistake. I will improve upon it on further iterations.

demoncoder-crypto · 2025-04-15T17:46:16Z

This makes sense, but:

Is there any code using total_lines_for_percent after you assigned it?
Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?
Original code
if run_result:
# Gets line coverage (diff) details.
coverage_summary = self._load_existing_coverage_summary()
...
elif coverage_summary:
total_lines = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
Why did we remove the final else?
Original code:
elif coverage_summary:
total_lines = compute_total_lines_without_fuzz_targets(
coverage_summary, generated_target_name)
else:
total_lines = 0
Could you please add back the comment for JVM and Python? Thanks.
Original code:
# The Jacoco.xml coverage report used to generate summary.json on
# OSS-Fuzz for JVM projects does not trace the source file location.
# Thus the conversion may miss some classes because they are not
# present during coverage report generation. This fix gets the total
# line calculation from the jacoco.xml report of the current run
# directly and compares it with the total_lines retrieved from
# summary.json. Then the larger total_lines is used which is assumed
# to be more accurate. This is the same case for python project which
# the total line is determined from the all_cov.json file.

So for this I am really sorry about removing JVM and Python comment.

For this - Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?
Yes. After total_lines_for_percent is calculated (either from current_coverage_copy or the fallback coverage_summary), it's immediately used in the next if condition: if total_lines_for_percent > 0 and original_covered_lines != -1:. If this condition is true, it's used as the denominator to calculate coverage_percent. It's also used in the else block within the warning log message: dual_logger.log(f'Warning: Could not determine coverage percentage... total_lines={total_lines_for_percent}'). So, it's essential for both the calculation and logging in the fallback scenario.

And for this Why did we prefer to remove coverage_summary = self._load_existing_coverage_summary() from the top and call the function twice?

The intention wasn't to call it twice, but rather to call it only when necessary.

Loading the summary.json involves fetching from Google Cloud Storage and parsing JSON, which can be relatively slow.
In the original code, it was loaded unconditionally at the start of the if run_result: block, even though it might only be needed much later in the fallback logic (if run_result.total_pcs was 0 and current_coverage_copy.total_lines was 0).

The new code optimizes this by removing the unconditional load at the beginning. Instead, self._load_existing_coverage_summary() is now called only inside the specific fallback condition (if total_lines_for_percent == 0:).

Therefore, the function is called at most once per execution of _calculate_coverage_metrics, and only if the primary methods for determining line counts fail. This avoids unnecessary GCS reads and improves performance in the common case.

Why did we remove the final else?
As per my understanding the else- total_lines_for_percent = 0 was removed because the variable total_lines_for_percent is already initialized to 0 at the start of that code block. If the attempts to calculate it from coverage data fail, it simply retains its initial 0 value, which the subsequent logic handles correctly.

demoncoder-crypto · 2025-04-15T18:11:03Z

demoncoder-crypto · 2025-04-15T18:13:54Z

Hi, this time as per your recommendation I ran the deep copy feature. Please let me know if there is any other changes or implementation you want me to fix I am more than happy to do. Thanks for the support. @DonggeLiu .

DonggeLiu · 2025-04-16T00:27:19Z

Thanks @demoncoder-crypto, could you please keep the indentation consistent?
For example, this is likely caused by previous commits:

Many lines are considered as new code because now they indent with 4 spaces (instead of 2 spaces, like the rest code of the projects).

demoncoder-crypto · 2025-04-16T06:43:22Z

For sure fixing it

demoncoder-crypto · 2025-04-16T07:59:33Z

Now I fixed this using yapf hopefully all indentation is fixed

demoncoder-crypto · 2025-04-20T15:52:41Z

Any updates @DonggeLiu. Please let me know if everything is fixed. Thanks

Fix(experiment): Update coverage calculation logic for google#727

06ce551

DonggeLiu reviewed Apr 3, 2025

View reviewed changes

Refactor(experiment): Use union of lines for coverage diff denominator

2acf980

DonggeLiu reviewed Apr 10, 2025

View reviewed changes

DonggeLiu marked this pull request as draft April 11, 2025 05:16

Refactor(evaluator): Extract coverage calculation logic

3b7ecde

demoncoder-crypto marked this pull request as ready for review April 12, 2025 17:35

DonggeLiu reviewed Apr 14, 2025

View reviewed changes

Fix: Address review comments on coverage calculation

68d41ed

Merge branch 'main' into issue-727-coverage-fix

da115a8

Merge branch 'main' into issue-727-coverage-fix

02b374a

demoncoder-crypto added 2 commits April 16, 2025 13:27

Style: Reformat evaluator.py to use 2-space indentation

dad5627

Merge branch 'main' into issue-727-coverage-fix

b049b55

Merge branch 'main' into issue-727-coverage-fix

dbed9ec

		def check_target(self, ai_binary, target_path: str) -> Result:
		"""Builds and runs a target."""

Fix(experiment): Update coverage calculation logic for #727 #960

Are you sure you want to change the base?

Fix(experiment): Update coverage calculation logic for #727 #960

Uh oh!

Conversation

demoncoder-crypto commented Apr 3, 2025

Uh oh!

demoncoder-crypto commented Apr 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

demoncoder-crypto commented Apr 4, 2025

Uh oh!

DonggeLiu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

demoncoder-crypto commented Apr 10, 2025

Uh oh!

demoncoder-crypto commented Apr 12, 2025

Uh oh!

DonggeLiu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DonggeLiu commented Apr 14, 2025

Uh oh!

demoncoder-crypto commented Apr 14, 2025

Uh oh!

demoncoder-crypto commented Apr 15, 2025

Uh oh!

demoncoder-crypto commented Apr 15, 2025

Uh oh!

demoncoder-crypto commented Apr 15, 2025

Uh oh!

DonggeLiu commented Apr 16, 2025

Uh oh!

demoncoder-crypto commented Apr 16, 2025

Uh oh!

demoncoder-crypto commented Apr 16, 2025

Uh oh!

demoncoder-crypto commented Apr 20, 2025

Uh oh!

DonggeLiu left a comment •

edited

Loading