For our last meeting during the Fall 2011 semester, ACM invited Dr. Gary Leavens to deliver an insightful lecture about testing for a program’s correctness.
How Do We Know It’s Right?
Dr. Leavens opened the talk with two questions: “Should business care?” and “Should society care?” The answers varied from person to person, but it is clear that not everyone cares the same about the correctness of a program (especially if someone is creating it just for profit!) He also mentioned that it’s important to note that computer programs are everywhere (pacemakers to cars) and their performance and correctness must be maintained for both accuracy and for safety reasons.
Some other questions he proposed were: “What does correctness mean?” Take the correctness of PowerPoint over the correctness of the same pacemaker mentioned before – correctness has many applications and thus becomes a challenge to define and comprehend. Even so, at the most basic level correctness is defined as two parts: specification (description of behavior) and validation (comparison to specification). You create a list of desired behaviors for the system and then test after development for correctness.
The question then becomes, how do you specify? You can do it in a variety of ways depending on situations such as “BATCH” which studies the relation between inputs and outputs, “INTERACTIVE” which is based on a state machine with inputs and outputs, or “REAL-TIME” which is a set of tasks. Each of these model a set of behaviors that have testable criteria. Then one wonders, what do you specify? After noting how you then decide to specify either the entire program or the components of the program (i.e., classes or methods).
Dr. Leavens proposed more questions for the attendees: “Are Tests Adequate?” “Is It Enough?” “Is This Code Correct?” He then displayed a method which contained the set up: “if input happens to be 2, then return 3″ which is correct but does no calculations (answers hard-coded internally). This is not a real solution!
To effectively test for correctness, one could use relational specifications (domain and relation) to describe inputs and outputs (the classic “BATCH”). The Hoare logic approach to relational specification would be to use domain as the standard “precondition” that many of us are use to. The “preconditions” specify what are true of legal states; what correct input would contain. The “postconditions” describe what is true of the input/output relations – as it describes what the output will be based on input.
Correctness is then defined (in a behavioral sense) in that “all calls that satisfy precondition’s result and final state must satisfy postconditions” which is a fancy way of saying that expected output must match actual output for a valid set of inputs.
Next we looked into validation which has two parts: testing (check correctness for some inputs) and verification (prove code is correct for all input). Testing has its advantages of real errors, being economical for students who are first learning to program, and can improve confidence. However the disadvantages include its inability to prove a program is correct for all inputs. Verification can prove all inputs and can do so before the program is ever run, but can be expensive and may need new theories or tools for new kinds of software to be able to determine correctness effectively.
After some more discussion, Dr. Leavens noted that precise verification is impossible – therefore one must be conservative. This means that when there is doubt in the answer, be conservative about it and say false!
In summary, Dr. Leaven’s talk fell back to the basics of specifications and verification. Specifications are needed for validation, and verification is possible but conservatively-sound verification tools will be incomplete.