We Can't Fully Test Digital Systems

We Can't Fully Test Digital Systems image #1

03 Feb 2022

Why can’t we manage Functional Safety risks that can be caused by EMI, by immunity testing?

There are at least three reasons, but in this blog I want to deal with just one of them:

It is impossible to fully test a digital system....Ever.

Anyone remember the error in the floating-point calculation in the new Pentium IV microprocessor?

This was a hardware fault, and Intel came right out and said (something like) “Hey guys, be cool, you know it is impossible to test all the functionality of a modern microprocessor. We thought this was a little-used function and our time was better spent testing others. Sorry!”

The Pentium IV was awesome in its day, but now we have microprocessors that are at least 10 times more complex, so at least 10 times more impossible to test.

And a few years ago I was told that the largest software company in the world had managed to reach a milestone and do something that no-one had ever been able to do before – they had managed to fully test all the possible digital states of a …………………………..printer driver!

Don’t take my word for it, here are some quotations from important people or committees:

  • “Our programs are often used in unanticipated ways and it is impossible to test even fairly small programs in every way that they could possibly be used. With current practices, large software systems are riddled with defects, and many of these defects cannot be found even by the most extensive testing. Unfortunately, it is true that there is no way to prove that a software system is defect free.”

From Watts S. Humphrey in “The Quality Attitude”, March 1, 2004,  page 33. Watts is a Senior Member of Technical Staff, Software Engineering Institute, Carnegie Mellon University, USA.

  • “We no longer have the luxury of carefully testing systems and designs to understand all the potential behaviors and risks before commercial or scientific use.”

From Nancy Leveson in: “A New Accident Model for Engineering Safer Systems”, in “Safety Science,” Vol. 42, No. 4, April 2004, pp. 237-270, http://sunnyday.mit.edu/accidents/safetyscience-single.pdf. Nancy is Professor of Aeronautics and Astronautics, and Professor of Engineering Systems, at the Massachusetts Institute of Technology, Boston, USA, and very well known in the Functional Safety world.

  • “Computer systems lack continuous behaviour so that, in general, a successful set of tests provides little or no information about how the system would behave in circumstances that differ, even slightly, from the test conditions.”

The IET, “Computer Based Safety-Critical Systems” September 2008. This text does not appear in the 2013 edition.

  • “Software essentially requires infinite testing”

From “Software Reliability”, NASA, Goddard Space Flight Center, USA, 

  • “The current state-of-the-art is such that neither the application of quality assurance methods (so-called fault avoiding measures and fault detecting measures) nor the application of software fault tolerant approaches can guarantee the absolute safety of the software.”

From BS EN 50128 July 2011, “Railway applications – Communication, signalling and processing systems – Software for railway control and protection systems”.

Let’s try some putting some numbers to the issue.

If a digital system had four inputs each digitized to 8-bit accuracy, plus sixteen binary inputs (e.g. switches: either on or off), and if all inputs were independent of each other, there would be 248 possible combinations of correct inputs, about 2.8·1013. If we assume we can test 10 input states per microsecond it would take 2.8·106 seconds to test them all – about 32 days of testing 24/7.

It’s no wonder that the digital industry has known for decades that their systems can fail in unpredictable ways as the direct result of untested combinations of perfectly correct inputs (read J. A. Whittaker, “What Is Software Testing and Why is it so Hard”, IEEE Software, Jan-Feb 2000, pp 70-79).

Of course, there are many more system states than are required for just the “input space”, not least to handle the processing of the input data, and to discover whether EMI could cause an unsafe error or malfunction by immunity testing alone would require each immunity test to be applied in turn to all of the possible system states. Figure 1 attempts to sketch a view of this problem.

If we limit our example to testing the input space alone, when performing a radiated immunity test (e.g. to IEC 61000-4-3) the lowest frequency would be set at the correct level (taking measurement uncertainty into account), and the test would dwell at that frequency while the complete set of correct input states was exercised. For the simple example system above, this would take 32 days.

Then the test frequency would be stepped 1% higher for another 32 days, and 230 such steps would cover the first decade of frequency, taking about 20 years of testing 24/7, and so far this has only considered one frequency decade, with one antenna angle, and one antenna polarization!

Of course, this is all a gross simplification: it might be possible to reduce the testing time; and it might also be possible to speed up the testing of the system states.

Let’s assume that some kind of “intelligent” digital testing is developed that reduces the number of states to be tested by 10. Let’s also assume that each digital state can be tested in 1 ns instead of 100 ns. In this case, even a very simple immunity test plan would take 500 years just to test the input state space of this simple system.

Future mass-produced safety-related systems, such as will be used to control autonomous cars, can be a lot more complex than the simple example above. An input space consisting of eighteen 8-bit digitized monochrome camera signals (typical of a future autonomous vehicle) would have 2144 possible input states, 298 more than the simple worked example above.

Even the simple test plan of Figure 1 would need over 1028 years to complete – about 1013 times the age of the universe!

The above have of course been very crude calculations, but nevertheless they reveal the absolute impossibility of ever proving that a digital system is safe enough as regards risks caused by EMI, by doing immunity testing.

And this is just the start! The above assumes that all such digital systems are always assembled from nominal-spec. components which never degrade and never suffer any kinds of malfunctions or failures throughout their operational lifecycle – which we know can never be true. But this is a topic for a future blog!

For more on this topic, and how to make it possible to design digital systems that are safe-enough, read the article “Why Do We Need an IEEE EMC Standard on Managing Functional Safety and Other Risks?”.

https://www.emcstandards.co.uk/why-do-we-need-an-ieee-emc-standard-on-managing

Photo by Umberto on Unsplash

« Back to Blog