Blog: Internet Of Things

Security testing advice for Automotive OEMs and Tier ones

Andrew Tierney 29 Jan 2017

Security Testing – Aims and Limitations

Security testing aims to find vulnerabilities.

A vulnerability is a weakness that could be exploited to perform unauthorised actions on a computer system. Vulnerabilities may require that certain conditions are met, such:

  • Physical access – access to one or more devices is required to exploit the vulnerability.
  • RF range – must be within radio frequency range.
  • Authentication – a set of valid credentials are required e.g. a Wi-Fi password for a hotspot.
  • User interaction – a user of the system must carry out an action e.g. click on a link.

Once found, the business can then decide how to handle these vulnerabilities. They may:

  • Fix the vulnerability – preventing it from being exploited.
  • Mitigate the risk – places barriers in place to make the vulnerability harder to exploit or reduce the impact if it is exploited.
  • Accept the risk – decided that they would rather leave the vulnerability.

Accepting risk is often seen as an undesirable option. As a result of this, testing is sometimes artificially constrained to prevent issues from being discovered.

“Out of sight, out of mind” would describe this attitude. It is harmful and high-risk. Examples of this would include:

  • Discovery of a hardcoded password across all devices, stored as a hash, but not providing the password. Given time and resources, it would be possible to obtain the plaintext password, either by brute force or another method (such as a breach of development systems). Providing the password to permit access would provide a better depth of testing and avoid wasting time.
  • Code readout protection preventing access to firmware. Given time and resources, firmware can be recovered from most devices, and then reverse engineered. This can discover issues such as backdoor access, hardcoded keys, or debug functionality. Giving direct access to firmware allows testing to progress faster.
  • Source code access. Whilst compiled firmware can be disassembled to determine function and find vulnerabilities, this can take a long time, especially on large systems (e.g. IVIs) or ones with little information available (e.g. some CAN gateways).
  • Limiting scope. For example, taking backend services out-of-scope so that they cannot be attacked. In general, an attacker is not limited in which actions they can perform.

There are several realities about security testing that should be accepted:

  • Testing is always limited in time.
  • Deeply hidden vulnerabilities will not be uncovered. There are hundreds of examples of issues in widely deployed software and hardware than have existed for many years without being found.

Because security design testing cannot guarantee that it will prevent all vulnerabilities, a method called defence-in-depth should be used. This means that multiple security mechanisms are put in place to protect assets. There are many clear examples of this being used in systems:

  • Password hashing – passwords are stored hashed, so that even if the data is leaked, the passwords cannot be determined.
  • Firewalling – a firewall can protect machines on an internal network from being connected to from the Internet, in the event that any of the services can be compromised.
  • Key diversification – devices use different keys so that even if one is compromised, the keys cannot be used to compromise multiple devices.

Just because a total system compromise is not found during a security test does not mean that one will not be found in the future. 10 vulnerabilities could be identified during a test, but it may take the 11th to permit total compromise.

In terms of risk to automotive systems, there are several classes of attack that stand out as high impact:

  • Break-once Run Everywhere (BORE) attacks – for example, obtaining a hard-coded password common across all IVIs. The password could be spread and used by many to perform other attacks.
  • Centralised attacks – telematics platforms, firmware updates etc. could allow multiple (or even all) connected entities to be compromised.

It is vital that testing does not exclude these class of attacks by artificially limiting scope to not include backend systems.

The eventual risk that a vulnerability produces depends on several aspects:

  • How hard is the vulnerability to discover?
  • How hard is the vulnerability to exploit?
  • What is the impact of it being exploited?

As a result, it is rarely possible to say that a system or device is secure or not; all that can be said is that the risk is acceptable. Security cannot be seen as “on or off”.

Security Testing – Considerations

Coverage vs Depth

Coverage relates to the aspects of a system that are examined. For example, an IVI would comprise of physical interfaces, WiFi, Bluetooth, CAN, USB, firmware update, operating system etc.

Depth relates to the amount of time/effort spent looking at each area.

A test should aim to get both good coverage and depth, with a balance struck between the two.

Black/grey/white box

This describes the level of access a tester is given to the tested system.

  • Black box – the tester is given the same level of access a member of the public would have.
  • White box – the tester is given full access, potentially including source code, documentation, schematics, keys, passwords, debug access etc.
  • Grey box – somewhere between black and white box.

Black box testing provides poor value. A large portion of the test may be consumed gaining access to system being tested and performing reverse engineering, leaving little time to find other security issues. A genuine adversary would not stop at this stage, as they are not time-boxed.

White box testing can be challenging to arrange. For example, a TCU may be produced by one vendor, use the cellular infrastructure of a given provider, connect to the telematics infrastructure of another provider, and hold data concerning other manufacturers.

For this reason, grey box testing is the preferred approach.

A period of black-box testing can be carried out to check security mechanisms such as code readout protection.

Risk-based vs compliance-based security testing

Whilst conventional IoT security testing is scoped and described before testing commences, it is most frequently carried out “open-ended”. Testing will follow the paths that are most likely to result in serious impact. Checklists and experience will be used. If entities (infrastructure, applications, or companies) are discovered, they can be brought into scope so as not to constrain testing.

In contrast to this, some industries are highly focused on compliance-based testing. This is where standards, checklists and simple “yes/no” or “go/no-go” tests are the driver. This conflicts with several aspects of security testing:

  • Checklists/schedules are often drawn up ahead of testing starting, which does not allow for open-ended testing.
  • Security is rarely on or off – it is a sliding scale. A tester identifies a vulnerability; the business must determine the risk and act on this. Checklists do not allow subtle distinctions to be made.

Compliance-based testing is particularly common in industries where safety is a primary driver, or where regulation is strong. This includes automotive, maritime, and chemical industries.

There are several examples of this type of testing going wrong:

  • Inappropriate tests – for example, tests for Bluetooth Low Energy on a system that only supports Bluetooth Classic.
  • Impossible to meet schedules – a test schedule is drawn up based on obtaining root access to a device, which is not given. As a result, an entire test would be taken up trying to obtain access and not performing the tests.
  • Constraining tests – checklists are often based on previous testing of older generations of a device, or similar classes of device. This result in tests for vulnerabilities that existed in other devices being applied, even when they are not appropriate. For example, performing a review of a Linux system on a Windows CE based IVI.

It is recommended that open-ended testing is used on all systems. Good customer communication means that the coverage is acceptable, and the direction testing taken appropriate.

Time-boxed vs goal-oriented

Nearly all testing is time-boxed: a given period of time is provided for testing.

The period of time must be appropriate for the system being tested. It is better for a test to be too long than too short.

For complex test, consider a reserve period of testing. This can be used as a top-up if low-coverage has been achieved in some areas.

Goal-oriented testing will follow a list of goals, with the testing not being complete until the goals are achieved. This works well with checklist-based testing.

System vs component

The denotes what aspect is under test.

If it refers to the scope, it can be helpful. For example, if a TCU is being tested, constraining the scope to the component will prevent time being wasted on other aspects of the car.

If “system or component” refers to the provided equipment, then it tends to be unhelpful. Testing a component, totally standalone (without any other devices to power or stimulate it) will generally result in slow progress and very low coverage/depth whilst time is spent trying to reverse engineer it.

In general, tests should never be carried out with a single, standalone component. This can result in all time being occupied by simply making the device work as intended.

Testing a component as part of a system is sensible and productive.

Destructive vs non-destructive

It should be accepted that devices are likely to be damaged during testing and almost certainly cannot be used safely in a vehicle.

Non-destructive testing severely limits testing and slows down what testing can still be performed.

Authorisation

All entities and components in the system require authorisation to perform security testing.

This can become complex when dealing with large systems, where an ECU may be made by a vendor, use a SIM card from another provided, and interface with a backend managed by someone else.

It is the customer’s responsibility to determine which parties must authorise testing; this should happen weeks in advance of a test.

Incorrect or missing authorisation can severely hamper a test.

Legal

Certain aspects of testing will require that devices such as fake 2G base stations are used, or techniques such as RF jamming.

If these tests are to be carried out on-site or abroad, it is the customer’s responsibility to determine if these activities are legal. This must be confirmed in writing.

Scheduling

Most tests will follow several phases:

  • Setup – connecting the equipment and getting it working.
  • Basic observation – determining, from a high-level, how the system works. This guides the next stage.
  • Instrumentation – using various tools (logic analysers, CAN monitors, oscilloscopes, serial adapters, WiFi access points.
  • Detailed observation – using the instrumentation to observe, in-depth, how the system operates.
  • Attack planning – using the observation to determine which attacks are appropriate.
  • Attack – carrying out the attacks.
  • Reporting – documenting the attacks.

In general, there will be several days required for the setup, basic observation, and instrumentation of a system before work can begin finding deeper vulnerabilities.

Reporting can take a variable amount of time but is one day at a minimum.

This is generally an iterative process. As new aspects are found, further instrumentation is put in place alongside developing new attacks.

Parallel testing

For hardware testing, it is preferred if no more than 2 testers are scheduled onto a job. There is simply too much overhead in communication as more testers are involved.

If two testers are scheduled, they should be scheduled to work on the system concurrently. Running consecutively means that all information gained during each testing period must be passed to the next tester. Consecutive scheduling such as this should incur a day penalty for each swap. For example, if two testers are performing 5 days of testing each, then only 9 days of output should be expected.

It is possible to schedule two testers for differing periods. The second tester, ideally, would be scheduled in the middle or end of the first tester’s time.

Multiple testers cannot work efficiently if there is only a single device. Try not to schedule multiple testers if this is expected.

Lab space

An ideal scenario is one when your tester has proper testing facilities of their own. At Pen Test Partners we’ve got space for two cars or one small van.

Customer Site testing

If testing must be carried out on-site with any significant equipment (e.g. soldering station, oscilloscope, toolkit), then 0.5 days should be allocated for packing/setup and 0.5 days for takedown/unpacking.

In addition to this, if travel abroad with equipment is expected, this can add 0.5 days and approximately 750GBP for dealing with a customs Carnet or similar.

Time guidelines

The following are given for as guidelines only.

Examples

Looking at smaller components of system testing:

  • Web application test – for most applications with testing from a user perspective 4-5 days.
  • Mobile application test – for most modern applications using an API common between iOS and Android, 5-8 days.
  • Debug interface testing – time-boxed test to identify and interact with debug interfaces (JTAG, ICSP, serial ) on a device, 5 days. This is a strongly time-boxed test – some devices may take weeks of reverse engineering. If a debug interface is open on a device, the vendor should already know this – as a result this test can be largely pointless, simply demonstrating that not enough time was allocated to find an issue.
  • Chip-off firmware extraction – removing memory chips and reading the contents to recover firmware. There is nearly always complexity around recovering data (e.g. dealing with the flash translation layer or custom file systems), 5 days. Note that there may need to be a period of downtime in the middle of this whilst adapters are built (between 3 days and 6 weeks).
  • USB host testing – test the host interface on an IVI or other device, using a device such as a FaceDancer, 2 days.
  • Network device testing – looking at a device that connects to a WiFi or Ethernet network and offers services or connects outbound to others, 5 days. This can be hugely variable depending on the services offered or used – some devices are entirely closed, others totally open.
  • Full compiled firmware evaluation – another strongly time-boxed test, but expect at least 10 days to get good value, 20 days for better coverage.

Retesting

Scoping retests is challenging. Often it is assumed that performing the same tests will be far faster, but the setup, instrumentation and reporting times still factor in. For example, an ECU may require soldering etc. to be able to monitor all required signals.

Retesting of previously found issues is hugely variable depending on the setup time, and the changes expected. For example, if a serial console has been removed from an IVI, it may not be possible to confirm if a hardcoded key has been removed from firmware.

A second test of the same device is also highly variable. If all previous issues have been fixed, this can be largely a validation exercise. Caution must be taken as new issues can be introduced and then missed if under-scoped.

Consultants should always be asked for scoping retests.