Blog: How Tos

Is everything actually broken? Software will always be flawed, but here’s what you can do about it

Lee Parkes 17 Jun 2014

iseverythingbroken

Sure, software is buggy and sometimes insecure and flawed like you wouldn’t believe. We’re going to look at why, and then talk about how to mitigate. Let’s start with some truths, not inconvenient ones either:

  • Humans are imperfect
  • We’re noisy
  • We don’t understand complex things
  • Have I mentioned that we’re imperfect

Invariably, security problems arise due to a lack of understanding. In fact, most failures arise due to a lack of understanding. Now, failure is not a bad thing. Watch Adam Savage’s presentation at DefCon 17 (http://youtu.be/1825zkmJVuE). Failure is *always* an option, it’s how we learn. Very few people get it right first time. But the problem with software is that we don’t seem to be learning. Similar issues keep cropping up (Heartbleed, for one) on a regular basis.

As the interactions between different components within an operating system and the applications that rely on it grow in features and complexity then the ability for any one person to understand it diminishes drastically.

I read this this recently, enitled: “Everything is Broken”.

I agree with the author. For those of you with short attention spans, (I include mysel…. SQUIRREL!), it basically states that no matter what we do, software will always fail at some point. Analysing code to work out what it does is a time-consuming, laborious, and, you guessed it, a failure prone enterprise. For anyone that has ever written code that doesn’t work and tried to hunt down the obscure bug they will know that debugging is a frustrating exercise. Often, getting another person to look at it will find the bug straightaway. But what if that bug is hidden somewhere in 50 million lines of code? What if the bug only appears when a certain set of requirements are met (planetary alignment syndrome)?

OK, let’s automate it. Guess what? Yup, you guessed right: a human has to write the software that assesses the software as well as the test cases! You can see where I’m going with this….

Take the following C code:

int main(void)
{
printf(“Hello, World!”);
}

We’ve all seen it before, and it’s pretty obvious what it does. Not a lot really, once compiled and run it prints “Hello, World!” (minus quotes) to the terminal. That’s just five lines of code (excluding whitespace). What that doesn’t take into account is the amount of code that goes into the compiler itself.

Let’s move onto some more complex code. I took the file “test_features2d.cpp” from the OpenCV codebase. Running it through Nick Dunn’s VisualCodeGrepper there were a number of issues, a couple of which are presented below:

HIGH: Potentially Unsafe Code – Signed/Unsigned Comparison

Line: 144 – C:\Research\BlogPosts\CodeVerification\test_features2d.cpp
The code appears to compare a signed numeric value with an unsigned numeric value. This behaviour can return unexpected results as negative numbers will be forcibly cast to large positive numbers.

for( size_t v = 0; v < validKeypoints.size(); v++ )

 

HIGH: Potentially Unsafe Code – Signed/Unsigned Comparison

Line: 149 – C:\Research\BlogPosts\CodeVerification\test_features2d.cpp
The code appears to compare a signed numeric value with an unsigned numeric value. This behaviour can return unexpected results as negative numbers will be forcibly cast to large positive numbers.

for( size_t c = 0; c < calcKeypoints.size(); c++ )

 

VCG is a great tool, and certainly helps to reduce the amount of time that it takes to perform a code review and increases the number of issues found and that need to be fixed. However, it can’t help with context and what it would mean to exploit the issue. It’s also finding the issue after the code has been written. That could mean that the issue is already out there or code can’t be changed because of a number of reasons.

The following table gives an estimate of the number of lines of code for major Microsoft Windows operating systems:

Year Operating System SLOC (Million)
1993 Windows NT 3.1 4-5
1994 Windows NT 3.5 7-8
1996 Windows NT 4.0 11-12
2000 Windows 2000 More than 29
2001 Windows XP 45
2003 Windows Server 2003 50

Source: http://en.wikipedia.org/wiki/Source_lines_of_code (SLOC = Source Lines of Code)

As the number of lines of code increases then so does the complexity and the interaction between different aspects. It’s impossible, at least in practical terms, to enumerate every possible interaction between entities as the number of entities increases beyond a relatively small number. There will always be paths that cannot be predicted that could lead to a compromise of the software.

Unfortunately, failure when it comes to IT tends to have expensive consequences. This is usually in terms of financial and/or reputational loss, but there have been a few cases where lives have been lost as well. Failure isn’t always malicious, of course, but loss of any form is bad no matter how it was caused.

Trusting Other People…

Software is rarely created in total isolation. One of the central paradigms of software development is to not keep re-inventing the wheel. Whilst admirable, the approach does have drawbacks. How do you trust libraries that are created by people who are nothing more than email addresses? What if the library isn’t supported anymore? Even Microsoft end-of-life their software as they bring new operating systems and applications out. It is, of course, understandable. What’s the point in supporting a 15 year old OS? The onus should be on the user upgrade in order to ensure that they are using the latest (and greatest? – that’s a matter of opinion) software.

There’s another fly in the ointment: how do you trust the compiler or interpreter to:

  1. Compile or interpret the code correctly and adhere to the accepted standards (for example, ANSI C)?
  2. Not insert any form of backdoor or otherwise malicious code into your compiled software?

Ken Thompson, he of UNIX fame, even inserted a virus into a compiler.

So, if you can’t trust your compiler then what can you trust? How do you prove that a compiler or interpreter is doing what you expect? You could try reverse engineering the binary, but you’re at the mercy of the disassembler at that point. Ultimately we end up going round in circles and therefore implicitly trusting software because that is all we have. It doesn’t matter whether the software is free and/or open source or paid for, the same problems apply to both.

What Can We Do?

Perhaps one of the most important things, and this goes for any type of project, is to build security in from step one. Understanding the requirements and then assessing them under the critical eye of security will help to ensure that the development lifecycle not only delivers the functionality required by scope but does it in a secure manner. Most of the issues that are discovered during a penetration test of either a pre-production or live application are ones that could have been caught a lot earlier. It’s a lot harder to fix things after the fact (think bolting the gate after the horse has escaped). And here’s the financial kicker: fixing issues, sooner rather than later is a lot cheaper.

Secondly, there is an elegance in simplicity and efficiency. It makes things easier to understand, and therefore less likely to contain errors. Adding too much complexity into anything practically ensures that it will fail at some point. Of course, not everything is simple, but ask yourself if a particular feature is *really* required…. And if it is, what does that mean for the rest of the application or environment? Can a feature be kept within a restricted zone? Does input to that feature need to go into the same database as the sensitive information?

One problem is the need to get things to market quickly. Some software has a short shelf life, and the first one out there is usually the one that makes the most money. However, being first might also mean being taken apart piece-by-piece by the bad guys first…. Even for projects that are needed quickly, try to fit security in from the start.

An important aspect here is getting everyone to buy into security. It doesn’t matter whether it’s developers or management; everyone needs to be on the same page. Part of the problem is that security often gets in the way. It adds cost and time and sometimes forces the removal of things that are desired. Ultimately, security is worth the investment. Get it right now and avoid a lot of pain later.

Formal verification can be used to ensure that code does what is supposed to do. However, this gets more difficult to do as the complexity and size of the codebase increases. This paper details the formal verification of a microkernel.

The kernel itself consists of 8700 lines of C and 600 lines of assembler. Compare that to the Windows Server 2003 kernel at over 50million lines of code and it’s easy to see how difficult it is to prove something works as expected.

The good news is that security professionals will continue to have work until the end of the Universe, although the complexity will increase dramatically (a subject of another blog post no doubt).

Recommended reading

A Logic of Authentication http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-39.pdf.