-
Notifications
You must be signed in to change notification settings - Fork 54
Complexity Risk
Complexity Risk are the risks to your project due to its underlying "complexity". Over the next few sections, we'll break down exactly what we mean by complexity, looking at Dependency Risk and Boundary Risk as two particular sub-types of Complexity Risk. However, in this section, we're going to be specifically focusing on code you write: the size of your code-base, the number of modules, the interconnectedness of the modules and how well-factored the code is.
You could think of this section, then, as Codebase Risk: We'll look at three separate measures of codebase complexity and talk about Technical Debt, and look at places in which Codebase Risk is at it's greatest.
The standard Computer-Science definition of complexity, is Kolmogorov Complexity. This is:
"...is the length of the shortest computer program (in a predetermined programming language) that produces the object as output." - Kolmogorov Complexity, Wikipedia
This is a fairly handy definition for us, as it means that to in writing software to solve a problem, there is a lower bound on the size of the software we write. In practice, this is pretty much impossible to quantify. But that doesn't really matter: the techniques for moving in that direction are all that we are interested in, and this basically amounts to compression.
Let's say we wanted to write a JavaScript program to output this string:
abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
We might choose this representation:
function out() { (7 )
return "abcdabcdabcdabcdabcdabcdabcdabcdabcdabcd" (45)
} (1 )
The numbers in brackets indicate how many symbols each line contains, so in total, this code block contains 53 symbols, if you count function
, out
and return
as one symbol each.
But, if we write it like this:
const ABCD="ABCD"; (11)
function out() { (7 )
return ABCD+ABCD+ABCD+ABCD+ABCD+ABCD+ABCD+ (16)
ABCD+ABCD+ABCD; (6 )
} (1 )
With this version, we now have 41 symbols (ABCD is a single symbol, because we could have called it anything). And with this version:
const ABCD="ABCD"; (11)
function out() { (7 )
return ABCD.repeat(10) (7 )
} (1 )
... we have 26 symbols.
What's happening here is that we're exploiting a pattern: we noticed that ABCD
occurs several times, so we defined it a single time and then used it over and over, like a stamp.
By applying techniques such as Abstraction, we can improve in the direction of the Kolmogorov limit. And, by allowing ourselves to say that symbols (like out
and ABCD
) are worth one complexity point, we've allowed that we can be descriptive in our function
name and const
. Naming things is an important part of abstraction, because to use something, you have to be able to refer to it.
But we could go further down into Code Golf territory. This javascript program plays FizzBuzz up to 100, but is less readable than you might hope:
for(i=0;i<100;)document.write(((++i%3?'':'Fizz')+
(i%5?'':'Buzz')||i)+"<br>") (62)
So there is at some point a trade-off to be made between Complexity Risk and Communication Risk. This is a topic we'll address more in that section. But for now, it should be said that Communication Risk is about misunderstanding: The more complex a piece of software is, the more difficulty users will have understanding it, and the more difficulty developers will have changing it.
A second, useful measure of complexity comes from graph theory, and that is the connectivity of a graph:
"...the minimum number of elements (nodes or edges) that need to be removed to disconnect the remaining nodes from each other" - Connectivity, Wikipedia
To see this in action, have a look at the below graph:
It has 10 vertices, labelled a to j, and it has 15 edges (or links) connecting the vertices together. If any single edge were removed from this diagram, the 10 vertices would still be linked together. Because of this, we can say that the graph is 2-connected. That is, to disconnect any single vertex, you'd have to remove at least two edges.
As a slight aside, let's consider the Kolmogorov Complexity of this graph, by inventing a mini-language to describe graphs. It could look something like this:
<item> : [<item>,]* <item> # Indicates that the item
# before the colon
# has a connection to all
# the items after the colon
So our graph could be defined like this:
a: b,c,d
b: c,f,e
c: f,d
d: j
e: h,j
f: h
g: j
h: i
i: j
(39)
Let's remove some of those extra links:
In this graph, I've removed 6 of the edges. Now, we're in a situation where if any single edge is removed, the graph becomes unconnected. That is, it's broken into distinct chunks. So, it's 1-connected.
The second graph is clearly simpler than the first. And, we can show this by looking at the Kolgomorov Complexity in our little language:
a: d,g
b: f
c: d,f
d: j
f: h
e: h
h: i
(25)
Connectivity is also Complexity. Heavily connected programs/graphs are much harder to work with than less-connected ones. Even laying out the first graph sensibly is a harder task than the second (the second is a doddle). But the reason programs with greater connectivity are harder to work with is that changing one module potentially impacts many others.
In the second, simplified graph, I've arranged it as a hierarchy, which I can do now that it's only 1-connected. For 10 vertices, we need 9 edges to connect everything up. It's always:
edges = vertices - 1
Note that I could pick any hierarchy here: I don't have to start at c (although it has the nice property that it has two roughly even sub-trees attached to it).
How does this help us? Imagine if a - j were modules of a software system, and the edges of the graph showed communications between the different sub-systems. In the first graph, we're in a worse position: who's in charge? What deals with what? Can I isolate a component and change it safely? What happens if one component disappears? But, in the second graph, it's easier to reason about, because of the reduced number of connections and the new heirarchy of organisation.
On the down-side, perhaps our messages have farther to go now: in the original i could send a message straight to j, but now we have to go all the way via c. But this is the basis of Modularisation and Hierarchy.
As a tool to battle complexity, we don't just see this in software, but everywhere in our lives. Society, business, nature and even our bodies:
- Organelles - such as Mitochondria.
- Cells - such as blood cells, nerve cells, skin cells in the Human Body.
- Organs - like hearts livers, brains etc.
- Organisms - like you and me.
The great complexity-reducing mechanism of modularisation is that you only have to consider your local environment. Elements of the program that are "far away" in the hierarchy can be relied on not to affect you. This is somewhat akin to the Principal Of Locality:
"Spatial locality refers to the use of data elements within relatively close storage locations." - Locality Of Reference, Wikipedia
A variation on this graph connectivity metric is our third measure of complexity, Cyclomatic Complexity. This is:
Cyclomatic Complexity = edges − vertices + 2P,
Where P is the number of Connected Components (i.e. distinct parts of the graph that aren't connected to one another by any edges).
So, our first graph had a Cyclomatic Complexity of 7. (15 - 10 + 2)
, while our second was 1. (9 - 10 + 2)
.
Cyclomatic complexity is all about the number of different routes through the program. The more branches a program has, the greater it's cyclomatic complexity. Hence, this is a useful metric in Testing and Code Coverage: the more branches you have, the more tests you'll need to exercise them all.
Although we ended up with our second graph having a Cyclomatic Complexity of 1 (the minimum), we can go further through abstraction, because this representation isn't minimal from a Kolmogorov Complexity point-of-view. For example, we might observe that there are further similarities in the graph that we can "draw out":
Here, we've spotted that the structure of subgraphs P1 and P2 are the same: we can have the same functions there to assemble those. Noticing and exploiting patterns of repetition is one of the fundamental tools we have in the fight against Complexity Risk.
So, we've looked at some measures of software structure complexity, in order that we can say "this is more complex than this". However, we've not really said why complexity entails Risk. So let's address that now by looking at two analogies, Mass and Technical Debt.
The first way to look at complexity is as Mass or Inertia : a software project with more complexity has greater Inertia or Mass than one with less complexity.
Newton's Second Law states:
"F = ma, ( Force = Mass x Acceleration )" - Netwon's Laws Of Motion, Wikipedia
That is, in order to move your project somewhere new, and make it do new things, you need to give it a push, and the more Mass it has, the more Force you'll need to move (accelerate) it.
Inertia and Mass are equivalent concepts in physics:
"mass is the quantitative or numerical measure of a body’s inertia, that is of its resistance to being accelerated". - Inertia, Wikipedia
You could stop here and say that the more lines of code a project contains, the higher it's mass. And, that makes sense, because in order to get it to do something new, you're likely to need to change more lines.
But there is actually some underlying sense in which this is real, as discussed in this Veritasium video. To paraphrase:
"Most of your mass you owe due to E=mc², you owe to the fact that your mass is packed with energy, because of the interactions between these quarks and gluon fluctuations in the gluon field... what we think of as ordinarily empty space... that turns out to be the thing that gives us most of our mass." - Your Mass is NOT From the Higgs Boson, Veritasium
I'm not an expert in physics, at all, and so there is every chance that I am pushing this analogy too hard. But, substituting quarks and gluons for pieces of software we can (in a very handwaving-y way) say that more complex software has more interactions going on, and therefore has more mass than simple software.
The reason I am labouring this analogy is to try and make the point that Complexity Risk is really fundamental:
- Feature Risk: like money.
- Schedule Risk: like time.
- Complexity Risk: like mass.
At a basic level, Complexity Risk heavily impacts on Schedule Risk: more complexity means you need more force to get things done, which takes longer.
The most common way we talk about unnecessary complexity in software is as Technical Debt:
"Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite... The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organisations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise." - Ward Cunningham, 1992
Building a perfect first-time solution is a waste, because perfection takes a long time. You're taking on more attendant Schedule Risk than necessary and Meeting Reality more slowly than you could.
A quick-and-dirty, over-complex implementation mitigates the same Feature Risk and allows you to Meet Reality faster.
But, having mitigated the Feature Risk, you are now carrying more Complexity Risk than you necessarily need, and it's time to think about how to Refactor the software to reduce this risk again.
It’s often hard to make the case for minimising Technical Debt: it often feels that there are more important priorities, especially when technical debt can be “swept under the carpet” and forgotten about until later. (See Discounting The Future.)
One helpful analogy I have found is to imagine your code-base is a kitchen. After preparing a meal (i.e. delivering the first implementation), you need to tidy up the kitchen. This is just something everyone does as a matter of basic sanitation.
Now of course, you could carry on with the messy kitchen. When tomorrow comes and you need to make another meal, you find yourself needing to wash up saucepans as you go, or working around the mess by using different surfaces to chop on.
It's not long before someone comes down with food poisoning.
We wouldn't tolerate this behaviour in a restaurant kitchen, so why put up with it in a software project?
In Brooks' essay "No Silver Bullet - Essence and Accident in Software Engineering", a distinction is made between:
- Essence: the difficulties inherent in the nature of the software.
- Accident: those difficulties that attend its production but are not inherent. - Fred Brooks, No Silver Bullet
The problem with this definition is that we are accepting features of our software as essential.
The Risk-First approach is that if you want to mitigate some Feature Risk then you have to pick up Complexity Risk as a result. But, that's a choice you get to make.
Therefore, Feature Creep (or Gold Plating) is a failure to observe this basic equation: instead of considering this trade off, you're building every feature possible. This has an impact on Complexity Risk, which in turn impacts Communication Risk and also Schedule Risk.
Sometimes, feature-creep happens because either managers feel they need to keep their staff busy, or the staff decide on their own that they need to keep themselves busy. But now, we can see that basically this boils down to bad risk management.
"Perfection is Achieved Not When There Is Nothing More to Add, But When There Is Nothing Left to Take Away" - Antoine de Saint-Exupery
Dead-End Risk is where you build functionality that you think is useful, only to find out later that actually, it was a dead-end, and is superseded by something else.
For example, let's say that the Accounting sub-system needed password protection (so you built this). Then the team realised that you needed a way to change the password (so you built that). Then, that you needed to have more than one user of the Accounting system so they would all need passwords (OK, fine).
Finally, the team realises that actually logging-in would be something that all the sub-systems would need, and that it had already been implemented more thoroughly by the Approvals sub-system.
At this point, you realise you're in a Dead End:
- Option 1: You carry on making minor incremental improvements to the accounting password system (carrying the extra Complexity Risk of the duplicated functionality).
- Option 2: You rip out the accounting password system, and merge in the Approvals system, surfacing new, hidden Complexity Risk in the process, due to the difficulty in migrating users from the old to new way of working.
- Option 3: You start again, trying to take into account both sets of requirements at the same time, again, possibly surfacing new hidden Complexity Risk due to the combined approach.
Sometimes, the path from your starting point to your goal on the Risk Landscape will take you to dead ends: places where the only way towards your destination is to lose something, and do it again another way.
This is because you surface new Hidden Risk along the way. And the source of a lot of this hidden risk will be unexpected Complexity Risk in the solutions you choose. This happens a lot.
Version Control Systems like Git are a useful mitigation of Dead-End Risk, because it means you can go back to the point where you made the bad decision and go a different way. Additionally, they provide you with backups against the often inadvertent Dead-End Risk of someone wiping the hard-disk.
Option 3, Rewriting code or a whole project can seem like a way to mitigate Complexity Risk, but it usually doesn't work out too well. As Joel Spolsky says:
There’s a subtle reason that programmers always want to throw away the code and start over. The reason is that they think the old code is a mess. And here is the interesting observation: they are probably wrong. The reason that they think the old code is a mess is because of a cardinal, fundamental law of programming: It’s harder to read code than to write it. - Things You Should Never Do, Part 1, Joel Spolsky
The problem that Joel is outlining here is that the developer mistakes hard-to-understand code for unnecessary Complexity Risk. Also, perhaps there is Agency Risk because the developer is doing something that is more useful to him than the project. We're going to return to this problem in again Communication Risk.
Complexity isn't spread evenly within a software project. Some problems, some areas, have more than their fair share of issues. We're going to cover a few of these now, but be warned, this is not a complete list by any means:
- Memory Management
- Protocols / Types
- Algorithmic (Space and Time) Complexity
- Concurrency / Mutability
- Networks / Security
Memory Management is another place where Complexity Risk hides:
"Memory leaks are a common error in programming, especially when using languages that have no built in automatic garbage collection, such as C and C++." - Memory Leak, Wikipedia
Garbage Collectors (as found in Javascript or Java) offer you the deal that they will mitigate the Complexity Risk of you having to manage your own memory, but in return perhaps give you fewer guarantees about the performance of your software. Again, there are times when you can't accommodate this Operational Risk, but these are rare and usually only affect a small portion of an entire software-system.
Whenever two components of a software system need to interact, they have to establish a protocol for doing so. There are lots of different ways this can work, but the simplest example I can think of is where some component a calls some function b. e.g:
function b(a, b, c) {
return "whatever" // do something here.
}
function a() {
var bOut = b("one", "two", "three");
return "something "+bOut;
}
If component b then changes in some backwards-incompatible way, say:
function b(a, b, c, d /* new parameter */) {
return "whatever" // do something here.
}
Then, we can say that the protocol has changed. This problem is so common, so endemic to computing that we've had compilers that check function arguments since the 1960's. The point being is that it's totally possible for the compiler to warn you about when a protocol within the program has changed.
The same is basically true of Data Types: whenever we change the Data Type, we need to correct the usages of that type. Note above, I've given the JavaScript
example, but I'm going to switch to TypeScript
now:
interface BInput {
a: string,
b: string,
c: string,
d: string
}
function b(in: BInput): string {
return "whatever" // do something here.
}
Now, of course, there is a tradeoff: we mitigate Complexity Risk, because we define the protocols / types once only in the program, and ensure that usages all match the specification. But the tradeoff is (as we can see in the TypeScript
code) more finger-typing, which some people argue counts as Schedule Risk.
Nevertheless, compilers and type-checking are so prevalent in software that clearly, you have to accept that in most cases, the trade-off has been worth it: Even languages like Clojure have been retro-fitted with type checkers.
We're going to head into much more detail on this in the section on Protocol Risk.
So far, we've looked at a couple of definitions of complexity in terms of the codebase itself. However, in Computer Science there is a whole branch of complexity theory devoted to how the software runs, namely Big O Complexity.
Once running, an algorithm or data structure will consume space or runtime dependent on it's characteristics. As with Garbage Collectors, these characteristics can introduce Performance Risk which can easily catch out the unwary. By and large, using off-the-shelf data structures and algorithms helps, but you still need to know their performance characteristics.
The Big O Cheat Sheet is a wonderful resource to investigate this further.
Although modern languages include plenty of concurrency primitives, (such as the java.util.concurrent libraries), concurrency is still hard to get right.
Race conditions and Deadlocks thrive in over-complicated concurrency designs: complexity issues are magnified by concurrency concerns, and are also hard to test and debug.
Recently, languages such as Clojure have introduced persistent collections to alleviate concurrency issues. The basic premise is that any time you want to change the contents of a collection, you get given back a new collection. So, any collection instance is immutable once created. The tradeoff is again attendant Performance Risk to mitigate Complexity Risk.
An important lesson here is that choice of language can reduce complexity: and we'll come back to this in Software Dependency Risk.
There are plenty of Complexity Risk perils in anything to do with networked code, chief amongst them being error handling and (again) protocol evolution.
In the case of security considerations, exploits thrive on the complexity of your code, and the weaknesses that occur because of it. In particular, Schneier's Law says, never implement your own cryptographic scheme:
"Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can't break. It's not even hard. What is hard is creating an algorithm that no one else can break, even after years of analysis." - Bruce Schneier, 1998
Luckily, most good languages include cryptographic libraries that you can include to mitigate these Complexity Risks from your own code-base.
This is a strong argument for the use of libraries. But, when should you use a library and when should you implement yourself? This is again covered in the section on Software Dependency Risk.
Although Type-Systems help for avoiding Communication Risk in the small, when software systems grow large it becomes hard to communicate their intent and keep their degree of connectivity low.
"A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. " - Big Ball Of Mud, Wikipedia
costs associated with complexity risk
CHANGE is also more risky why?
- Discuss here.
- Watch/Star this project to be invited to join the Risk-First team.