I. Side-channel attacks in Web Applications
------------------------------------------
     SIDE CHANNEL ATTACKS ON WEB APPS

Side-channel attacks targeting web apps,
for example:
      - ssh
      - Voice over IP
      - Video streaming
      - Tor
      - software-as-a-service apps
          e.g., Salesforce

Based on:

  S. Chen, R. Wang, X. Wang and K. Zhang,
  "Side-Channel Leaks in Web Applications:
  A Reality Today, a Challenge Tomorrow,"
  In IEEE Symp. on Security and Privacy,
  2010, pp. 191-206,
  doi: 10.1109/SP.2010.20.
------------------------------------------
 A. attack model
------------------------------------------
        ATTACK MODEL FOR WEB APPS

Assume encryption used so
    attacker cannot see message contents
    
Attacker can see for each message:

  - Number sent
  - Timing and direction
  - Size


------------------------------------------
        Is there a military analogy here?
        What could an army infer about an enemy from radio traffic?
        Could an attacker identify what app is running? How?
 B. vulnerabilities
------------------------------------------
  DIFFERENCE FROM APP ON SINGLE COMPUTER

Internal communications are


Effects, leaks of:

    - personal health data
    - family income
    - investment details
    - search queries

despite use of HTTPS and WPA2 encryption

Causes:
   - stateful communication
   - low entropy input
   - significant traffic distinctions
------------------------------------------
        What's the analogy of traffic analysis for a web app?
------------------------------------------
     WEB-BASED PRIVACY VULNERABILITIES

Attackers can fingerprint web pages:

     - resource objects of diff. sizes


How?


Web flows split between client & server:

   - input points
   - program logic
   - program states
------------------------------------------
 C. mitigations
------------------------------------------
          MITIGATIONS APP-SPECIFIC

Mitigations different for each app:

Revise:
 - feature designs,
 - traffic characteristics,
 - publicly available domain knowledge

to


Need to protect app state transitions

------------------------------------------
 D. model of attack, measurement
  1. ambiguity reduction
------------------------------------------
   ATTACKER'S GOAL: REDUCE AMBIGUITY

Ambiguity set of data:


Measuring loss of ambiguity

     If ambiguity set reduced by
        factor of 1/R,
        then
              
              
------------------------------------------
  2. Web app model
------------------------------------------
            WEB APP MODEL

A quintuple (S, Sigma, delta, f, V),
where:
 - S = set of program states
 - Sigma = set of inputs accepted
 - delta = state transition function
     delta : S x Sigma -> S
 - f = output function
     f : S x Sigma -> V
 - V set of visible outputs
     e.g., packet sizes

Notation:
   - 50 -> Browser sends 50 bytes
   - 1024 <- Server sends 1024 bytes
   
------------------------------------------
  3. What attacker must do
------------------------------------------
       WHAT ATTACKER TRIES TO DO

From unknown state s in S:
   observe N outputs
      (v1, v2, ..., vN)
   determine inputs
      (sigma1, sigma2, ..., sigmaN)
   or

   
------------------------------------------
  4. How web app designs help attacker
     What can lead to a large(r) reduction in ambiguity?
------------------------------------------
          FACTORS THAT HELP ATTACKER


------------------------------------------
         What kinds of inputs typically lead immediately to a response?
        How are reduction factors combined?
  5. Density measures difficulty/ease of attack
------------------------------------------
           DENSITY

def: Let P be a set of packet sizes.
     Then
       density(P)= #P / [max(P) - min(P)]

     where
        #P is cardinality of P
        max(P) is maximum size in P
        min(P) is minimum size in P

"A density below 1.0 often indicates
 packets that are easy to distinguish"
      - Chen et al., 2010, p. 195

------------------------------------------
        Why would a density < 1.0 help the attacker?
  6. Examples
   a. Health App
------------------------------------------
          HEALTH APP EXAMPLE

Tabbed data entry with tabs for:
 - Conditions
 - Medications
 - Procedures
 - Test results
 - Immunizations

Density was 0.000211

Problems:
 - Auto-suggestion for typing
    Each keystroke generates web flows
      (253 ->, 581 <-, x <-)
      where x is size of suggestion list

   Density of first character: 0.11
    Density after initial 'a': 0.064

 - Selction from structured dialog/menu
     2670 conditions, density = 0.0046
     
------------------------------------------
        Does anyone play Wordle?
        Would clicking on a suggestion also reduce ambiguity?
        Would selecting from a hierarchical menu benefit the attacker?
        Would information from "find a doctor" reveal a condition?
   b. Tax Form App
------------------------------------------
         TAX FORM APP

Clear workflow
   starting with personal information
   
------------------------------------------
        How many filing statuses are there?
        Does family income (AGI) determine which forms to file?
   c. Other applications
------------------------------------------
          EXAMPLE: ONLINE INVESTING

Funds displayed as GIF images
    and each has a web page
    
Can infer:


------------------------------------------
        Would the GIF image size correlate to a particular fund?
        Can the size of an image be determined separately from HTML size?
        How could attacker determine allocation of funds from
           size of a pie chart?
   d. Web search engines (Google, Bing, etc.)
------------------------------------------
         WEB SEARCH ENGINES

Google, Bing, etc.

  - attack can reveal query history
  
Attacker can use:

  - Auto-suggestion sizes
  
------------------------------------------
        Would a company want its employee's search histories revealed?
        Does capitalization matter in a web query?
   e. Wi-Fi
------------------------------------------
            WPA2 STANDARD FOR WI-FI

Uses CCMP = 128 bit AES in counter mode
   counter mode means
     size of cyphertext = size of message

   So, size is


------------------------------------------
        Does lack of padding help attackers?
 E. mitigations
  1. application agnostic
------------------------------------------
 SAMPLE APPLICATION AGNOSTIC MITIGATIONS

Example, SSH:
  - sends a packet every 50 msec

VOIP:
  - round up all packet sizes to 128 bytes
------------------------------------------
   a. padding
------------------------------------------
          PADDING PACKETS

Rounding:
 - Round up to nearest multiple of D bytes

Random padding:
 - Append padding of 0 to D bytes


Measurements for health app found:

  - D = 128 not enough
         (responses about 200 bytes)
         average overhead was ~ 14%
  - D = 512 hid information
         but average overhead was ~ 33%


For Income Tax App
  - D = 1024 only allows attacker
          to distinguish 7 income ranges
          but about 25% overhead

Apps also need to:
   - merge states on longer paths
   - or add extra states on shorter paths
         
------------------------------------------
        Why do these tactics help?
        What is the average overhead of these tactics?
        So, is there one effective mitigation tactic for all apps?
   b. problems with mashups
------------------------------------------
        MASHUP PROBLEMS

Online investment app
   fetches charts from financial data app
      which makes charts public

 Will the financial data app pad data?


------------------------------------------
   c. Prospects for generality
------------------------------------------
      IS THERE A GENERAL MITIGATION?

Is there a general mitigation?


------------------------------------------
        What does that mean for developers?
   d. App-specific mitigation
------------------------------------------
        CHALLENGES FOR DEVELOPERS

- Finding side-channel vulnerabilities
   look for:
    - stateful communication
    - communication based on user inputs
    - correlations between
        inputs and outputs

- Specifying mitigation policies

- Building policy enforcement mechanisms
   - coordinating browsers and web servers

------------------------------------------
        What problems would there be in determining message sizes?
------------------------------------------
     RECOMMENDED DEVELOPMENT PRACTICE

1. specify privacy policies
2. track info. flows, including web flows
3. vulnerabilities related to policies?
    if yes:
    4. can they be solved by manipulating
       individual packets?
       if yes: add mitigations to packets
            (rounding or random padding)
       if no:
          5. change design of app features
          6. goto step 2
    if no:
       done!
------------------------------------------
     What kind of tool could help in step 3?