I. Fuzz Testing
 A. What is Fuzz Testing
------------------------------------------
        WHAT IS FUZZING

Def: *fuzz testing* (or black box fuzzing)
     is


Idea:

               random input
 [ Fuzzer ]  ----------------> [ Program ]
                                   |
                                 crashes
                                   |
                                   v
                              inputs that
                             crash program

------------------------------------------
        What could cause a program crash?
        What is likely to be the largest cost in terms of time in
           fuzz testing?
------------------------------------------
      FUZZING VS. REGRESSION TESTING

         Regression          Fuzzing

Goal


Inputs


------------------------------------------
       What is regression testing?
       What is the goal of regression testing?
       What is the goal of fuzz testing?
       What kind of inputs are used in regression testing?
       What kind of inputs are used in fuzz testing?
       Can we use fuzz testing during regression testing?
  1. monitoring program execution
------------------------------------------
          MONITORING OF PROGRAM

Fuzz tester:
   - maintains list of inputs
      that cause crashes
      

------------------------------------------
        How can the fuzzer know what inputs cause crashes?
        What if the program goes into an infinite loop?
        What information can a crash tell the developer/tester?
        How could we find more bugs than just running the program?
  2. advantages and disadvantages
------------------------------------------
  ADVANTAGES AND DISADVANTAGES OF FUZZING

Advantages:


Disadvantages:


------------------------------------------
        How hard would this be to implement?
        How much time of a programmer is involved in doing fuzz testing?
        Are random inputs going to be effective at finding crashes?
        How does the fuzzer know what inputs are valid?
  3.  finding inputs that crash the program
------------------------------------------
    FUZZ TESTING AND INPUT VALIDATION


Reality:

 [ Fuzzer ]
     |
     | random
     | bytes
     v
 [ Driver ]  ----------------> [ Program ]
                                    |
                                  crashes
                                    |
                                    v
                               inputs that
                             crash program


------------------------------------------
          What kind of programs would take bytes as inputs?
 B. mutation-based fuzzing
          Could we add some human expertise to fuzz testing? How?
  1. idea of mutation-based fuzzing
------------------------------------------
         MUTATION-BASED FUZZING

 1. get seed input(s) from human
 2. track list of good inputs
 3. a. pick a random good input
    b. mutate it
    c. feed mutant to the program
    d. if it causes a crash,
    

 4. repeat


Example fuzzers:
   ZZUF, FileFuzz, Taof, ProxyFuzz, etc.

Example scenario:
   fuzzing a PDF viewer
     1. collect PDF files from web
     2. Mutate those files
     3. Record which mutants crash viewer
------------------------------------------
         When does this process stop?
         What kinds of mutations of inputs would be effective?
  2. advantages and disadvantages
------------------------------------------
     MUTATION-BASED FUZZING

Advantages:


Disadvantages:


------------------------------------------
        How hard is mutation-based fuzzing to implement?
        Does mutation-based fuzzing require precise specifications?
        Is there any danger of missing kinds of attacks?
        What kinds of inputs would resist fuzz testing with mutation?
 C. generation-based fuzzing
        How could we fuzz-test programs that parse/validate their
           inputs?
------------------------------------------
          GENERATION-BASED FUZZING

Idea:
  1. Use specification of format
     to generate inputs

  2. Mutations added to syntactic spots
     in inputs

  3. Otherwise like mutation-based
  
------------------------------------------
        What does it mean to be like generation-based fuzzing otherwise?
        Could we use heuristics in generating inputs?
 D. comparison
------------------------------------------
    ADVANTAGES: MUTATION VS. GENERATION

Mutation-based fuzz testing:


Generation-based fuzz testing:


------------------------------------------
        Which is easier to implement?
        Which requires the least amount of expertise to use?
        Which can work for programs that validate input syntax?
        Which works for data with checksums?
        Which is more likely to find problems?
        How should we measure the performance of these?
        Which has better performance? 
 E. when to stop fuzzing
------------------------------------------
            WHEN TO STOP FUZZING?

How do we know we have tested enough?

------------------------------------------
        The original and mutation-based fuzz testers run forever,
             so when to stop them?
        Generation-based fuzzers may only generate a limited number
             of inputs, how do we know if that's enough?
  1. code coverage
------------------------------------------
               CODE COVERAGE

Def: *Code coverage* is


Tools for profiling:
   gcov, lcov
(see https://about.codecov.io/tool/lcov/)

Also for LLVM see:
  clang.llvm.org/docs/
     SourceBasedCodeCoverage.html
------------------------------------------
        Does code coverage actually tell us about a testing
           process's ability to find bugs?
   a. line coverage
------------------------------------------
         LINE COVERAGE

Measures:
   Percentage of lines of code executed
                (or statements)

Example:

void check(int a, int b) {
    if (a > 2) {
        x = 1;
    }
    if (b > 2) {
        y = 2;
    }
}

  How many pairs (a,b) needed for 100%
  line coverage?


Variation: block coverage is


------------------------------------------
     Do we care about the lines with only a } in them?
  2. branch coverage
------------------------------------------
         BRANCH COVERAGE

Also called: decision coverage

Measures:
   Percentage of branches of code executed

Example:

void check(int a, int b) {
    if (a > 2) {
        x = 1;
    }
    if (b > 2) {
        y = 2;
    }
}

  How many pairs (a,b) needed for 100%
  branch coverage?


------------------------------------------
  3. path coverage
------------------------------------------
          PATH COVERAGE

Measures:
   Percentage of paths in code executed

Example:

void check(int a, int b) {
    if (a > 2) {
        x = 1;
    }
    if (b > 2) {
        y = 2;
    }
}

  How many pairs (a,b) needed for 100%
  path coverage?


------------------------------------------
      How many paths can a program have?
  4. comparison (omit)
------------------------------------------
         COMPARING TYPES OF COVERAGE

Which type is most desirable?


Does 100% coverage guarantee finding bugs?


------------------------------------------
  5. evaluation
   a. benefits of code coverage
------------------------------------------
      BENEFITS OF CODE COVERAGE


------------------------------------------
      What benefits could one get from code coverage?
   b. problems with code coverage
------------------------------------------
       PROBLEMS OF CODE COVERAGE


Consider:

void myStrCpy(char *dst, char* src) {
   if (dst && src) {
     strcpy(dst, src);
   }
}

------------------------------------------
      Is there any vulnerability in that code?
      Will code coverage help us find that vulnerability?
  6. how good is code coverage?
------------------------------------------
   HOW GOOD A MEASURE IS CODE COVERAGE?

See:
Xia Cai and Michael R. Lyu.
"The effect of code coverage on
fault detection under different
testing profiles" 
In A-MOST '05: Proceedings of the 1st
international workshop on Advances
in model-based testing, pp. 1-7, May 2005.
https://doi.org/10.1145/1083274.1083288

"code coverage is simply
  a moderate indicator
  for the capability of fault detection
  on the whole test set."

[code coverage is] "clearly a
  good estimator for the fault detection
  of exceptional test cases,
  but a poor one for test cases
  in normal operations."

"the correlation between code coverage
 and fault coverage is higher in
 functional test than in random testing"
------------------------------------------
 F. coverage-guided gray-box fuzzing
------------------------------------------
   COVERAGE-GUIDED GRAY-BOX FUZZING

Idea:
  1. Mutants of good inputs are good if


Algorithm:
  1. Good inputs start from seed input(s)
     that don't crash or loop forever

  2. Mutate random good input
  3. Execute it and monitor execution
     and its coverage
  4. If program crashes
     or                
     then save input as a new good input


Example: AFL
 
------------------------------------------
   a. AFL
------------------------------------------
     AMERICAN FUZZY LOP (AFL)

Instruments code at compile-time
    (can instrument assembly code or LLVM)
    
Tracks all edges in CFG (with counters)
  - Hashtable tracks log_2 of executions
     with 8 bits per edge


Target program runs as separate process


------------------------------------------
        What's the advantage of using log base 2 of executions?
        What's the disadvantage of using log base 2 of executions?
   b. libfuzzer
     Would it help to combine the ideas of fuzzing and concolic testing?
     How would we do that?
------------------------------------------
         LIBFUZZER

libFuzzer, "a library for coverage-guided
           fuzz testing" in LLVM
          -- llvm.org/docs/LibFuzzer.html

See also tutorial.libfuzzer.info

Uses LLVM's SanitizerCoverage
   for instrumentation

e.g., to build also with address sanitizer

 clang -g -O1 -fsanitize=fuzzer,address \
     target.c

Can be used with AFL
------------------------------------------
   c. go-fuzz
------------------------------------------
        GO-FUZZ

For the Go programming language

Built-in to their tool chain

See https://go.dev/security/fuzz/
------------------------------------------
 G. challenges in fuzz testing
------------------------------------------
      CHALLENGES IN FUZZ TESTING

Seeds need to be crafted well
   - must not crash program
   - must not cause loops
   - should cover different branches

Some branches make code hard to cover...
------------------------------------------
  1. coverage challenges
------------------------------------------
      DIFFICULT TO FUZZ BRANCHES

void test1(int n) {
  if (n==0x12345678) {
     abort();
  }
}
------------------------------------------
      How hard would it be to get to the abort in test1?
------------------------------------------
         EASIER TO FUZZ TEST

void test2(int n) {
  int dummy = 0;
  char *p = (char *)&n;
  if (p[0]==0x12) dummy++;
  if (p[1]==0x34) dummy++;
  if (p[2]==0x56) dummy++;
  if (p[3]==0x56) dummy++;
  if (dummy==4) {
     abort();
  }
}

------------------------------------------
       How many inputs are needed in the worst case for test2?
 H. rules of thumb
------------------------------------------
         WISDOM FROM FUZZ TESTERS

- Knowing input format helps

- Generational testing better than random
  but needs good specifications

- Implementations vary,
  different fuzzers ==> different bugs

- Running longer will find more bugs
  (but reaches a limit)

- Coverage guidance works best

- Use profiling to see where getting stuck
  design better seeds to get past those...
------------------------------------------