[LLVMdev] [icFuzz] Help needed with analyzing randomly generated tests that fail on clang 3.4 trunk

Tue Jun 25 14:46:01 PDT 2013

On Tue, Jun 25, 2013 at 01:15:25AM +0000, Haghighat, Mohammad R wrote:
> Hi Karen,
> 
> Thanks much for your comment and for sharing of your experience. icFuzz has a core that is "really" random, but does not cover the entire C space. The tool was designed from scratch to be extensible, and comes with a couple of extensions that target some of compiler optimizations optimizing compilers typically do: CSE, loop interchange, vectorization, etc. But even in the case of extensions, other than the structure of the extension, most of the details are highly parametric and configurable. The chances of a totally random test meeting all the criteria for certain optimizations to kick in is lower than the case in guided test-generation. I actually have the x-y charts where x is the number of generated tests and y is the number of fails for both completely random tests as well as random+guided tests, and the curve of guided tests is significantly above that of random tests. Also, the test generator is restricted to the "unsigned int" type where the C++ semantics are precisely defined even in the cases of overflow. Also, features that have implementation-dependent behavior are excluded by design (e.g., expressions including side effects during their evaluation, array out of bounds access, etc.).
> 
> But you are right in that the generated code has some structure ;) 
> 

Hi Mohammad,
Thanks for clarifying. Enjoyed reading the details.

I understand your point of view. And objectives. They are quite reasonable. And it
is the natural progression to take a fully randomized engine and add in configurable
constraints. I coded in a number of such constraints for my random generator. One
could configure any subset of bytecodes for use in sequence generation. And one could
also define the weights for each valid bytecode, which all carried the same probability
by default. And I also enabled one to inject a starting sequence of bytecodes, to enable
the randomized bytecodes to begin with the DUT in a specified state. With those
capabilities, one could do tightly focused unit testing with the generator.

Using such a tool in unconstrained full random mode is a pure mathematical brute force
method. It is completely unintuituve to conventional QA and debugging methodologies.
The generated code sequences represent pure chaos and don't direct the DUT to do
anything 'useful'. In my work, I used very long randomized sequences of 50,000 fully
random bytecodes per test run. The intent was to perturb system state with each new
random input token, because every bug requires a specific state transition to occur, or
maybe a specific sequence of consecutive state transitions to occur. You need fully
automated failure detection and sufficient computing resources to realize full
coverage in a given time frame. And if everything is done properly, you can find all
the bugs in an analytically rigorous process. In my experience, the test setup can be
the most challenging part of such a system.

Good luck with your work. I believe in the efficacy of randomized testing.

enjoy,
Karen
-- 
Karen Shaeffer
Neuralscape, Mountain View, CA 94040