Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Deprecations cause SourceMapper to scan all <source/> files #6111

Closed
dantleech opened this issue Jan 28, 2025 · 9 comments
Closed

Deprecations cause SourceMapper to scan all <source/> files #6111

dantleech opened this issue Jan 28, 2025 · 9 comments
Assignees
Labels
feature/test-runner CLI test runner type/performance Issues related to resource consumption (time and memory) version/11 Something affects PHPUnit 11 version/12 Something affects PHPUnit 12

Comments

@dantleech
Copy link
Contributor

dantleech commented Jan 28, 2025

Q A
PHPUnit version 11.5.3
PHP version 8.3
Installation Method PHAR + Composer

Summary

Hi 👋

When deprecations are encountered the SouceMapper recursively scans all files for each <source> include and exclude paths.

This causes a siginficant pause in PHPUnit as it scans all these files, initially for 16 seconds until we isolated the issue and refined the <source> list - now it's "only" 3 seconds.

Current behavior

Scans all files when a deprecation is encountered.

How to reproduce

Add a <source> configuration:

    <source>
        <include>
            <directory suffix=".php">src/</directory>
            <directory suffix=".php">perhaps-another-directory-here/</directory>
        </include>
        <exclude>
            <directory suffix=".php">src/SubFolder</directory>
            <!-- maybe serveral more here -->
        </exclude>
    </source>
  • Run a test that would trigger a deprecation
  • Observe that source mapper will recursively iterate over all the files in the above paths.

https://github.com/sebastianbergmann/phpunit/blob/main/src/Runner/ErrorHandler.php#L178

Expected behavior

That a triggered deprecation would not scan all the files and block the test run for N seconds.

@dantleech dantleech added the type/bug Something is broken label Jan 28, 2025
@sebastianbergmann sebastianbergmann added type/performance Issues related to resource consumption (time and memory) feature/test-runner CLI test runner version/11 Something affects PHPUnit 11 version/12 Something affects PHPUnit 12 and removed type/bug Something is broken labels Jan 28, 2025
@sebastianbergmann sebastianbergmann changed the title Deprecations cause SourceMapper to scan all <source/> files Deprecations cause SourceMapper to scan all <source/> files Jan 28, 2025
@sebastianbergmann sebastianbergmann self-assigned this Jan 28, 2025
@dantleech
Copy link
Contributor Author

dantleech commented Jan 28, 2025

Thankyou! unfortuantely it doesn't seem to fix the issue. I added some screen recording with some diagnostic output below

recording.mp4

This runs a specific test with --filter. The test triggers an E_DEPRECATION. I've then added debug output in the SourceMapper for each directory scanned.

@dantleech

This comment has been minimized.

@sebastianbergmann
Copy link
Owner

sebastianbergmann commented Jan 28, 2025

I cannot reproduce this:

diff --git a/src/TextUI/Configuration/SourceMapper.php b/src/TextUI/Configuration/SourceMapper.php
index cbe8aa4c65..d173a17ee7 100644
--- a/src/TextUI/Configuration/SourceMapper.php
+++ b/src/TextUI/Configuration/SourceMapper.php
@@ -9,6 +9,8 @@
  */
 namespace PHPUnit\TextUI\Configuration;
 
+use const FILE_APPEND;
+use function file_put_contents;
 use function realpath;
 use SebastianBergmann\FileIterator\Facade as FileIteratorFacade;
 use SplObjectStorage;
@@ -30,6 +32,8 @@ final class SourceMapper
      */
     public function map(Source $source): array
     {
+        file_put_contents('/tmp/debug.txt', '*', FILE_APPEND);
+
         if (self::$files === null) {
             self::$files = new SplObjectStorage;
         }
.
├── phpunit.xml
├── src
│   └── Issue6111.php
└── tests
    └── Issue6111Test.php

phpunit.xml

<?xml version="1.0" encoding="UTF-8"?>
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:noNamespaceSchemaLocation="../../../../phpunit.xsd"
         bootstrap="src/Issue6111.php"
>
    <testsuites>
        <testsuite name="default">
            <directory>tests</directory>
        </testsuite>
    </testsuites>

    <source ignoreIndirectDeprecations="true" restrictNotices="true" restrictWarnings="true">
        <include>
            <directory>src</directory>
        </include>
    </source>
</phpunit>

src/Issue6111.php

<?php declare(strict_types=1);
/*
 * This file is part of PHPUnit.
 *
 * (c) Sebastian Bergmann <sebastian@phpunit.de>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */
namespace PHPUnit\TestFixture\Issue6111;

use const E_USER_DEPRECATED;
use function trigger_error;

final readonly class Issue6111
{
    public function doSomething(): bool
    {
        trigger_error('one', E_USER_DEPRECATED);
        trigger_error('two', E_USER_DEPRECATED);

        return true;
    }
}

src/Issue6111Test.php

<?php declare(strict_types=1);
/*
 * This file is part of PHPUnit.
 *
 * (c) Sebastian Bergmann <sebastian@phpunit.de>
 *
 * For the full copyright and license information, please view the LICENSE
 * file that was distributed with this source code.
 */
namespace PHPUnit\TestFixture\Issue6111;

use PHPUnit\Framework\TestCase;

final class Issue6111Test extends TestCase
{
    public function testOne(): void
    {
        $o = new Issue6111;

        $this->assertTrue($o->doSomething());
    }
}
❯ rm /tmp/debug.txt
❯ ll /tmp/debug.txt
ls: cannot access '/tmp/debug.txt': No such file or directory
❯ phpunit
PHPUnit 11.5.4-2-g339c2c8aad by Sebastian Bergmann and contributors.

Runtime:       PHP 8.4.3
Configuration: /usr/local/src/phpunit/tests/end-to-end/regression/6111/phpunit.xml

D                                                                   1 / 1 (100%)

Time: 00:00.017, Memory: 10.00 MB

OK, but there were issues!
Tests: 1, Assertions: 1, Deprecations: 2.
❯ cat /tmp/debug.txt
*

As you can see, and unless I am doing something stupid that I am missing, SourceMapper::map() is only called once while ErrorHandler::__invoke() (and ErrorHandler::trigger() in turn) is called twice to handle the two E_USER_DEPRECATED issues.

The only other place from where SourceMapper::map() is called that I can find is https://github.com/sebastianbergmann/phpunit/blob/11.5.4/src/TextUI/Configuration/CodeCoverageFilterRegistry.php#L65. I plan to refactor the code so that SourceMapper::map() is called at most once.

In the current state of PHPUnit 11.5.4, SourceMapper::map() is called at most twice: once when a deprecation needs to be handled, once when code coverage data is processed, and twice when a deprecation needs to be handled and code coverage data is processed.

What am I missing?

@dantleech
Copy link
Contributor Author

dantleech commented Jan 28, 2025

Thanks - the problem is not that SourceMapper::map() is being called multiple times, it's that when it is called it is potentially very expensive as demonstrated in the screen recording (that was a single call to map).

I'll attempt to create a reproduction repo :)

@sebastianbergmann
Copy link
Owner

We need to perform this operation at least once. Before PHPUnit 11.5.4 it was performed for each deprecation.

@dantleech
Copy link
Contributor Author

dantleech commented Jan 28, 2025

Here is a "reproduction" repository: https://github.com/dantleech/phpunit-6111

I think I understand why it's called, but wonder if it could be more efficient? We get the issue after upgrading from 10.5.37

(At the extreme I guess the includes($file): bool use case could be done without any file traversal).

@sebastianbergmann
Copy link
Owner

Maybe includes($file): bool could be done without filesystem traversal, and of course that would be more efficient. But given the complexity/variety of what can be expressed under <source> with <include> and <exclude> as well as glob-ing, the only reliable way I have found so far is what we have right now. And to be honest, this is good enough for me to not work on this myself. I would, of course, consider a PR that improves this.

For me, this issue was about unnecessarily doing the same work multiple times. This has been fixed, so I will close this issue now.

@dantleech
Copy link
Contributor Author

dantleech commented Jan 29, 2025

I'd certainly consider contributing here! For me, in this project, it compromises the core value of PHPUnit - i.e. the tool is introducing latency due to a deprecation being issued (in our initial case 16 seconds) - this interupts the TDD cycle and is a source of confusion.

While we may be able to optimise our <source/> map the best case is that it will traverse the entire source tree once when a deprecation is encountered.

That said I think there is more than one option:

  1. Do nothing: Accept a 1-N second latency on each test run in our 120k file project or we could remove the source map and reintroduce it somehow for code coverage.
  2. Defer processing: The latency could be made more transparent perhaps by deferring the traversal until after the tests run - in which case perhaps a diagnostic message could be shown Building source map.....
  3. Opt-out: Make** the "first party code" behavior optional - remove the latency completely at the cost of not being able to differentiate between their code and our code.
  4. Do not traverse the entire tree for each <directory/> with the same path: it currently traverses the tree for each suffix on the same path. But the suffixes on the same path could be compiled to a single predicate. In my reporoduction repo this could reduce the latency to about 1 second or so depending on if we traverse the tree once or twice instead of 5 times.
  5. Compile all include/exclude rules to a single regex: This option could potentially be used both for include(): bool use case and to resolve the file paths - significantly improving the performance in perhaps both cases.

Options 4 and 5 would be transparent improvements with option 5 being the most risky and challenging. I actually have poor solution to this exact problem in one of my open source projects so I'd have another reason for approaching that.

I appreciate the time you've spent on this issue, let me know if you'd like me to contribute on any of the options above, and I may well attempt option 5 in my open-source project in anycase!

@sebastianbergmann
Copy link
Owner

"Do nothing" is not ideal, of course.

"Defer processing" is not possible. We need to know the information that is (currently) expensive to determine at the time an E_DEPRECATED is triggered, for example, as the event that is emitted by the test runner (in the error handler) needs to have this information. The responsibility of determining first-party code vs. third-party code etc. must not be pushed to event consumers.

"Opt-out" might be feasible. This would mean, of course, that the events that are emitted by the test runner (in the error handler) will not know whether the issue was triggered in/by first-party or third-party code.

I am not entirely sure I understand "Do not traverse the entire tree for each <directory/> with the same path", but would, of course, consider a pull request you send for this.

Finally, "Compile all include/exclude rules to a single regex" seems like the ideal solution. On the code coverage side, we have had sebastianbergmann/php-code-coverage#386 open for almost ten years. So far, I have not dared working on this. As you correctly state, this is the "the most risky and challenging" option.

TL;DR: Yes, I would appreciate any contribution you would like to make on "Do not traverse the entire tree for each <directory/> with the same path" and/or "Compile all include/exclude rules to a single regex".

And I will try to look into if/how "opt-out" could be tackled.

dantleech added a commit to dantleech/phpunit that referenced this issue Jan 29, 2025
This POC "compresses" the direcrories to a map of paths to
prefixes,suffixes tuples.

This can reduce the overhead of scanning paths when multiple directory
elements are defined with the same path but differnet prefixes or
suffixes.

For example:

```xml
    <exclude>
        <directory suffix="Controller.php">src/</directory>
        <directory suffix="Factory.php">src/</directory>
        <directory suffix="Bus.php">src/</directory>
        <directory suffix="Car.php">src/</directory>
    </exclude>
```

Would be resolved as:

```php
[
   'src/' => [
       [], // no prefixes
       ['Controller.php', 'Factory.php', 'Bus.php', 'Car.php'],
   ],
],
```

The File Iterator already accepts an array of suffixes or prefixes.
dantleech added a commit to dantleech/phpunit that referenced this issue Jan 29, 2025
This PR aggregates `<directory/>` entires to a map of paths to prefixes
and suffixes.

This can reduce the overhead of scanning paths when multiple
`<directory/>` elements are defined with the same path but differnet
prefixes or suffixes.
dantleech added a commit to dantleech/phpunit that referenced this issue Jan 29, 2025
This PR aggregates the directories to a map of paths to tuples
containing prefixes and suffixes.

This will reduce the overhead of scanning paths when multiple directory
elements are defined with the same path but differnet prefixes or
suffixes.

Addresses sebastianbergmann#6111
dantleech added a commit to dantleech/phpunit that referenced this issue Jan 29, 2025
This PR aggregates the directories to a map of paths to tuples
containing prefixes and suffixes.

This will reduce the overhead of scanning paths when multiple directory
elements are defined with the same path but differnet prefixes or
suffixes.

Addresses sebastianbergmann#6111
dantleech added a commit to dantleech/phpunit that referenced this issue Jan 29, 2025
This PR aggregates the directories to a map of paths to tuples
containing prefixes and suffixes.

This will reduce the overhead of scanning paths when multiple directory
elements are defined with the same path but differnet prefixes or
suffixes.

Addresses sebastianbergmann#6111
sebastianbergmann pushed a commit that referenced this issue Jan 30, 2025
This PR aggregates the directories to a map of paths to tuples
containing prefixes and suffixes.

This will reduce the overhead of scanning paths when multiple directory
elements are defined with the same path but differnet prefixes or
suffixes.

Addresses #6111
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
feature/test-runner CLI test runner type/performance Issues related to resource consumption (time and memory) version/11 Something affects PHPUnit 11 version/12 Something affects PHPUnit 12
Projects
None yet
Development

No branches or pull requests

2 participants