Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

cyclomatic complexity of ternary/null coalescing in control structures #3501

Open
mmarton opened this issue Dec 13, 2021 · 12 comments
Open

cyclomatic complexity of ternary/null coalescing in control structures #3501

mmarton opened this issue Dec 13, 2021 · 12 comments

Comments

@mmarton
Copy link

mmarton commented Dec 13, 2021

Describe the bug

This might not be a bug, I'm just curious of your opinion

After #3469 multiple part of my code is reporting too much cyclomatic complexity.
i have something similar:

if (($variable1ThatCanBeNullOrFloat ?? 0.0) !== 0.0) {
   $context->addViolation('description');
}

if (($variable2ThatCanBeNullOrFloat ?? 0.0) !== 0.0) {
   $context->addViolation('description2');
}
// 6 more time

Until now one of this statements was +1 to complexity, not it's +2 because the null coalescing. But it's still 2 options here. stepping into the if or stepping it over.

Yes I can rewrite it to
if ($variable1ThatCanBeNullOrFloat !== null && $variable1ThatCanBeNullOrFloat !== 0.0) but it's a lot more character and I like to spare my keyboard.

what do you think?

@gsherwood
Copy link
Member

I think the new change is correct as it's more accurately counting the conditions in the IF that would evaluate it to TRUE or FALSE. Keen to hear what @MarkBaker thinks.

@MarkBaker
Copy link
Contributor

There are a number of ways of writing this code:

$variable1ToTest = $variable1ThatCanBeNullOrFloat ?? 0.0;
if (($variable1ToTest !== 0.0) {
   $context->addViolation('description');
}

would be another way of expressing the same code logic using an intermediate assignment; and which clearly should increase the complexity by two - once for the null coalescence expression, and once for the if statement. Internally, PHP is doing just this, though the intermediate variable is kept hidden. Combining the two into a single line like if (($variable1ThatCanBeNullOrFloat ?? 0.0) !== 0.0) {...} doesn't reduce that complexity, it just "hides" it: the null coalescence is still evaluated with two possible paths, and the if condition is still tested with its two paths.

I believe that the complexity is still reflected accurately here - as least as accurately as Cyclomatic Complexity can be measured - in reflecting both the null coalescence expression and the if condition.

The CYC count isn't 100% accurate: coding structures have evolved since Henry McCabe's original 1976 paper on the subject.
There is some argument on whether "Decisions involving compound predicates like those found in high-level languages like IF cond1 AND cond2 THEN ... should be counted in terms of predicate variables involved, i.e. in this example one should count two decision points, because at machine level it is equivalent to IF cond1 THEN IF cond2 THEN ...." (quoting from Wikipedia). This would be the case for your if ($variable1ThatCanBeNullOrFloat !== null && $variable1ThatCanBeNullOrFloat !== 0.0) alternative... however, this takes no account of lazy evaluation for || within an if evaluation... it's probably more pertinent to measuring NPath Complexity than the more simplistic Cyclomatic Complexity.
Nor does the CYC calculation make any allowance for early returns (McCabe's definition of complexity is based one one entry point, one exit point within the code), or jumps (goto). And if I had my way nested ternaries should increment the complexity logarithmically, rather than simply by one for each level of nesting.

At the end of the day, Cyclomatic Complexity is a guideline that we can use to assess the maintenance risks of our code, not a hard and fast rule. Even the warning and error levels that we apply are just guidelines based on general principles of what is considered complex.

If it's any consolation; the most complex method in my most popular OSS library has a CYC of over 21 million. I've tried refactoring it, but any refactoring has a significant adverse affect on performance. That CYC makes it a pain to maintain; but a 3x performance overhead would make it impossible to run. I choose to live with that maintenance risk for the benefit of performance.

@mmarton
Copy link
Author

mmarton commented Dec 15, 2021

Okay, i can accept that argument, but still weird that some operators count and some doesn't.
Anyway, since you changed the calculation making it harder to stay inside the recommendations, wouldn't it make sense to also increase the warning/error limit?

@mmarton
Copy link
Author

mmarton commented Dec 15, 2021

And another theoretical question:

Shouldn't optional chaining counted too?
return $object->getX()?->getY() is technically return $object->getX() !== null ? $object->getX()->getY() : null

@MarkBaker
Copy link
Contributor

MarkBaker commented Dec 15, 2021

Anyway, since you changed the calculation making it harder to stay inside the recommendations, wouldn't it make sense to also increase the warning/error limit?

This is probably a call for Greg to make.

In McCabe's original paper, he refers to as an upper threshold of 10 as a “reasonable, but not magical, upper limit”. But a subsequent presentation that he gave on the topic ('Software Quality Metrics to Identify Risk') and published as a NIST advisory used the following table to assess risk based on complexity:

CYC Risk
1-10 A simple program without much risk
11-20 More complex, moderate risk
21-50 Complex, high risk program
51+ Untestable program, very high risk

Steve McConnell in his book "Code Complete" was stricter:

CYC Risk
1-5 The routine is probably fine
6-10 Start to think about was to simplify the routine
11+ Break part of the routine into a second routine, and call it from the first routine

More recently, Phil Koopman seems to suggest thresholds of 10-15, and 30-50 (so potentially 15 and 30) although he's thinking in terms of embedded systems

Personally, I'd have no problem with setting the thresholds at 15 and 30 rather than 10 and 20; to make allowance for more modern coding structures and expressions.

@MarkBaker
Copy link
Contributor

MarkBaker commented Dec 15, 2021

Shouldn't optional chaining counted too? return $object->getX()?->getY() is technically return $object->getX() !== null ? $object->getX()->getY() : null

It probably should, because it's an early break from the full expression, one that I'd forgotten while I was looking at what was missing from the sniff.

There's other omissions too:

An expression like;

$file = fopen($filename, 'r')
    or exit("unable to open file ($filename)");

or the use of break or continue while in a loop.

The CYC check is fairly simplistic in that it simply looks for certain tokens inside a function/method, and increments the CYC value by 1 each time it finds one of those tokens.
The two cases that I mentioned above couldn't be counted in the same way because they're dependent on the context, not simply the presence of the tokens in the code. break can also occur in the cases of a switch statement, and shouldn't be counted when it's in that context, but only when the context is exiting from a loop. The or token can also occur when combining conditions in an if statement, so it shouldn't increment the CYC value in that context.
Rewriting the sniff itself to take that context check into account would make it a lot more complicated.

Other possible omissions are subject to debate. In a switch or match statement, some people would argue that each case increments CYC by one, regardless of the number of conditions being tested in that case; others argue that a case with multiple conditions should increment for each condition. e.g.

    return match(strtolower(substr($monthName, 0, 3))){
        'apr', 'jun', 'sep',  'nov'  => 30,
        'jan', 'mar', 'may', 'jul', 'aug',  'oct', 'dec'  => 31,
        'feb' => is_leap_year($year) ? 29 : 28,
        default => throw new InvalidArgumentException("Invalid month"),
    }

should increment CYC by 13 rather than by 4 (+1 for the leap year ternary)

A throw is another case that sees some argument about whether it should increment CYC as well.


Getting back to your actual question though, I'd say that yes, the nullsafe operator (?->) should increase CYC by 1. I'll look to submit a PR for that.

@mmarton
Copy link
Author

mmarton commented Dec 15, 2021

brake and continue in a loop are (almost) always inside of an if statement, so that would be double counting imo

@MarkBaker
Copy link
Contributor

Normally; but you could get very dirty code like

$data = [
    'Hello',
    'World',
    'Terminate',
    'Goodbye',
];

foreach ($data as $entry) {
    switch ($entry) {
        case 'Terminate': break 2;
        default: echo $entry, PHP_EOL;
    }
    
}

@mmarton
Copy link
Author

mmarton commented Dec 15, 2021

Yes, and I can use goto and eval() too :) But I don't think there is an common part of the ones who are doing that and who are caring about code quality :D

@MarkBaker
Copy link
Contributor

@gsherwood Any opinion on changing the Warning and Error thresholds for CYC?

@jrfnl
Copy link
Contributor

jrfnl commented Dec 20, 2021

@MarkBaker The thresholds can be customized per project via the project custom ruleset:
https://github.com/squizlabs/PHP_CodeSniffer/wiki/Customisable-Sniff-Properties#genericmetricscyclomaticcomplexity

@mmarton
Copy link
Author

mmarton commented Dec 23, 2021

@MarkBaker The thresholds can be customized per project via the project custom ruleset: https://github.com/squizlabs/PHP_CodeSniffer/wiki/Customisable-Sniff-Properties#genericmetricscyclomaticcomplexity

yes, I can, and I'm doing it right now in my projects. But the question here is should the recommended default values go up with the recent changes in the calculation. And I think they should.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants