Wednesday, May 25, 2005

From Email on Path Testing

"Paths" are a phenomenon of "computational models", in which (roughly speaking) real-world "objects" are represented by "nodes" (vertices) and the relationships between them by "links" (edges, arcs). A set of nodes and links constituting a computational model is a "graph". If the links have a preferred direction of traversal (from a "source node" to a "target node"), they are "directed links" and you have a "directed graph".

A finite state machine (as you asked about) is usually a "partially directed graph" (most transitions may be bi-directional, but some transitions may be irreversible) in which the nodes represent component or system states, and the links represent transitions between them. Control flow graphs and cause-effect graphs are always directed graphs; in the first, the nodes represent points at which flow of control is exercised other than by fall-through, and the links represent the flows of control; in the second, the nodes represent "causes" (stimuli) or "effects" (responses) and logical operations between them, while the links represent the logical relationships that enter into, and result from, the logical operations.

In a computational model, a "path" is a sequence of nodes joined by links. Some (many) paths may be "unachievable" (infeasible) because they would depend on forbidden combinations of conditions. Testing all "achievable" paths is known as "path coverage", with the implication (if unqualified) that "100% path coverage" ("all-paths coverage") is meant. When you ask about "traversing all the paths", Saravana, you presumably mean "achieving all-paths coverage", which is often infeasible in testing non-trivial components because the number of achievable paths is often more than astronomical -- particularly when loops are involved. (Of course, OO methods often fall into the "trivial" class.)

Apart from its basic definition given above, "path" can have different meanings for different types of model, such as finite state machines, flowgraphs, cause-effect graphs, etc. In a finite state machine, a "path" is known as an "n-switch", meaning a sequence of transitions between "n+2" states. E.g., given three states, "A", "B", and "C", and no forbidden transitions: the sequences "A->B", "B->A", "B->C", "C->B", "C->A", and "A->C" would achieve "0-switch coverage"; triplet sequences such as "A->B->C", "B->C->A" (etc.) would achieve "1-switch coverage"; quadruplet sequences such as "A->B->C->B", "A->B->C->A" (etc.) would achieve "2-switch coverage"; and so on. ("Computation of the sizes of covering sets for 1-switch and 2-switch coverage in this example model will be left as an exercise for the reader.") It will often be true that a complete set of switches (all-paths coverage) will be infinite in size -- for example, just a single member of such a set might be "A -> B -> A -> B ..." ad infinitum, each transition utilising a different set of start and end data.

In a control flowgraph (as might be developed from, say, a use case or a set of sequence diagrams), paths are uni-directional, in the sense that a link such as "A->B" cannot be traversed in reverse. (A triad such as "B->A" in the same graph would represent a link different from "A->B", the two triads together representing a loop.) Generally, only "entry-exit paths" are of interest here (starting at an "entry node" of a process and ending at an "exit node"), since processes normally can't start or end in the middle, except by the insertion of additional entry and/or exit nodes. All-paths coverage won't require an infinite set of paths unless the graph includes at least one error-free uncounted loop, but even relatively simple graphs may require quite high numbers of paths to achieve coverage. In principle (again), the number of paths needed to achieve 100% path coverage of a non-looping graph doubles with each binary decision node in the graph ("n^2"), though nested decisions and unachievable combinations both work to reduce the number.

You (Saravana) worry that, "When u developed the system, and give it to client, the system may have traversed all the paths, suppose we have not covered a particular path in FSM, then this path is explored in future, the system may fail. So there is a need to traverse all paths" -- but, as I've indicated above, this may well be impracticable if not impossible, especially for state transition graphs. This is part of the risk-based nature of all testing. The trick is to find a set of paths that provides *adequate* coverage without being *exhaustive*.

I know of no simple way to identify an "adequate set" for finite state models (which doesn't mean there isn't one -- I'm not well-read on the subject). For control flowgraphs, the commonly accepted criterion for adequacy is the "basis set", which covers 100% of the "basis paths" (a.k.a "independent circuits"). A basis path is an achievable entry-exit path such that, during selection of "this" path, only one decision node in "the prior" path is switched to an alternate link (i.e., only one new node sequence, not previously covered, is introduced). Also, if the path traverses a loop, there is only one iteration of the loop, unless forced by a minimum iterations criterion.

Basis paths have the property that, when you have a complete set (a "basis set"), you can generate *all achievable paths* (i.e., 100% path coverage) by simple addition and subtraction of path elements in the existing set -- hence the name, "basis path set" (a basis for "all paths" coverage, if you want to try for it ...).

A basis set generally contains more paths than a branch-covering set (which exercises all explicit branches), which in turn is generally larger than a statement-covering set (which excludes "null" branches such as an empty "else"). From the testing point of view, it has many advantages -- not just that it exercises more paths than branch coverage, but that:

* It's very systematic (so there's less chance of "missing" cases);
* Test data preparation may be simplified (you may need to change only two variables per test data set in order to exercise or "force" successive paths);
* Debugging may be greatly simplified (the bug is either in or closely related to the "new path segment" exercised for the first time in the test case that found the bug).

With regard to the last point, the operating ideal is, "not more than one bug per test path; not more than one test path per bug". Glen Myers called this, "high-yield testing".

The basic method for generating a basis set is reasonably simple:

0. Create a flowgraph of the algorithm you wish to cover.
1. Identify and record the simplest, most obvious entry-exit path.
2. In the most recently recorded path, identify the first decision node which, when switched to an alternate branch, will lead to a smallest amount of change from the most recently recorded path; trace and record the new path.
3. Repeat (2) until all nodes and links are covered.

Note that, at step (1), some authorities prefer to record the simplest "happy-day scenario" path, which is often longer than the shortest achievable path (often an error case). However, this may make it harder to prepare the initial test data set, and (because of the larger scope of the test case) to isolate the location of any bug(s) it catches.

As a general example, consider the following algorithm:

0. begin
1. if order specifies air shipment
2. if Weight <= 2 kg
3. then Rate = 6 units
4. else if [Weight > 2 kg but] Weight <= 20 kg
5. then Rate = Weight x 3 units
6. else
7. Excess = Weight – 20; Rate = 60 + (Excess x 4 units)
8. endif
9. endif
10. if Destination = Brazil or Eire or China
11. then Rate = Rate + 5%
12. endif
13. endif
14. end

Using line numbers as node numbers, the simplest and most obvious entry-exit path traverses 0-1-13-14 ("order doesn't specify air shipment". An alternative "happy-day" path would traverse 0-1-2-3-9-10-12-13-14: "air shipment for <=2 kg, not going via Brazil, Eire, or China".).

In Path (1) (0-1-13-14), there is only one decision node, node (1), so for path (2) we select the *simplest* consequence of switching it (from "false" to "true"). This leads to the subpath "0-1-2-3" rather than "0-1-2-4", since selecting node (3) represents a "simpler" choice than the consequences of selecting node (4) (a single action, at node 3, rather than a nested decision followed by a choice of actions, at node 4). From node (3), we continue with the sequence, "9-10-13-14". We now have two paths as follows, with the differences from the first to the second (the newly-traversed nodes) set off in brackets:

Path 1: 0-1-13-14
Path 2: 0-1-[2-3-9-10-12]-13-14

If execution of Path 2 reveals a bug, it will be either in or closely related to the bracketed sequence.

I'll spare you description of how the remaining paths are generated, and simply present the full basis test set (one of several candidate sets), marking out the new path segment (subpath) in each path:

Path 1: 0-1-13-14
Path 2: 0-1-[2-3-9-10-12]-13-14
Path 3: 0-1-2-3-9-10-[11]-12-13-14
Path 4: 0-1-2-[4-5-8]-9-10-11-12-13-14
Path 5: 0-1-2-4-[6-7]-8-9-10-11-12-13-14

The paths correspond to the following scenarios:

Path 1: not an air shipment
Path 2: air shipment, weight <= 2 kg, destination not Brazil / Eire / China
Path 3: air shipment, weight <= 2 kg, destination = Brazil / Eire / China
Path 4: air shipment, 2kg <= weight <= 20 kg, destination = Brazil / Eire / China
Path 5: air shipment, weight > 20kg, destination = Brazil / Eire / China

We can verify coverage by simple inspection (is each of the 14 nodes present at least once?) or by constructing a coverage verification table as we go (nodes horizontally, paths vertically; for each path, mark in that row the nodes it traverses; when there is at least one tick for each node, you have node *and* link coverage).

The maximum size of a basis set is easily computed for any algorithm as L - N + E + X, where "L" = links, "N" = nodes, "E" = "entry nodes", and "X" = "exit nodes". Entry- and exit-nodes get counted twice. This computation works for any number of exit and entry nodes; if the algorithm has only a single entry and a single exit, the more usual form (L - N + 2) is adequate. In either case, the resulting number is the "cyclomatic complexity" of the algorithm with the sigillum, "V(G)". (Usually I drop the (G), which simply means "the V for *this* graph".)

Applying the first formula to our algorithm above, we have 17 links joining 14 nodes, of which 1 is an entry node and 1 an exit node, so for this algorithm, V = 17 - 14 + 1 + 1 = 5 (the number of paths I actually selected).

Step (3) of the method outlined above might be rewritten as "Repeat (2) until you have 'V' paths", except that the *achievable* size of the basis test set may be smaller than V because of "predicate correlations" and "predicate dependencies". Predicates are correlated when the truth value of one condition is "forced" by what has happened at a prior condition. A "dependent predicate" is a truth-value forced by some prior processing. Either of these conditions may make a given branch sequence unachievable for all cases in practice, even though it might look feasible on paper, and such unachievable (sub)paths may reduce the achievable size of the basis test set. Often you can't immediately tell from the content of a branch condition which if any previous decisions or computations it's correlated to (e.g., where a boolean or other resultant value in a predicate was computed earlier in the algorithm), and it may be necessary to "interpret" predicates (trace them back through the algorithm) in order to identify the entry variables they depend on. This can be, shall we say, an unexciting activity to engage in, but I never promised excitement.

Saravana, you seemed most concerned with finite state machines, and I was unable to provide you with a full answer there; I hope someone else will be able to. In fact, my answer regarding control flowgraphs is incomplete too, since I haven't considered loops except in passing, but if you have to deal with process (algorithm) models, what I've described will give you a start. (Basic rules for loops: you must cover each subpath *within* the loop; try to test each nested subpath with only a single loop iteration, otherwise observe all minimum iteration criteria; you must test any minimum iteration limits; you must test any maximum iteration limits; apply boundary test concepts to testing limits. We further have to consider nested loops and concatenated loops. And there's also the fact that the condition at node (10) is a compound condition ... Beizer, "Software Testing Techniques", is the principal authority for this stuff.)

As for path testing with cause-effect graphs -- see Nursimulu's and Probert's paper at . The paper also (indeed primarily) discusses the use of such models in requirements verification.

These notes are preliminary to an article I am preparing for the British Computer Society's "The Tester" journal, so I'll be doubly interested in any feedback.

Regards to all,

-- Don
Post a Comment