I. Side-channel attacks in Web Applications ------------------------------------------ SIDE CHANNEL ATTACKS ON WEB APPS Side-channel attacks targeting web apps, for example: - ssh - Voice over IP - Video streaming - Tor - software-as-a-service apps e.g., Salesforce Based on: S. Chen, R. Wang, X. Wang and K. Zhang, "Side-Channel Leaks in Web Applications: A Reality Today, a Challenge Tomorrow," In IEEE Symp. on Security and Privacy, 2010, pp. 191-206, doi: 10.1109/SP.2010.20. ------------------------------------------ A. attack model ------------------------------------------ ATTACK MODEL FOR WEB APPS Assume encryption used so attacker cannot see message contents Attacker can see for each message: - Number sent - Timing and direction - Size ------------------------------------------ Is there a military analogy here? What could an army infer about an enemy from radio traffic? Could an attacker identify what app is running? How? B. vulnerabilities ------------------------------------------ DIFFERENCE FROM APP ON SINGLE COMPUTER Internal communications are Effects, leaks of: - personal health data - family income - investment details - search queries despite use of HTTPS and WPA2 encryption Causes: - stateful communication - low entropy input - significant traffic distinctions ------------------------------------------ What's the analogy of traffic analysis for a web app? ------------------------------------------ WEB-BASED PRIVACY VULNERABILITIES Attackers can fingerprint web pages: - resource objects of diff. sizes How? Web flows split between client & server: - input points - program logic - program states ------------------------------------------ C. mitigations ------------------------------------------ MITIGATIONS APP-SPECIFIC Mitigations different for each app: Revise: - feature designs, - traffic characteristics, - publicly available domain knowledge to Need to protect app state transitions ------------------------------------------ D. model of attack, measurement 1. ambiguity reduction ------------------------------------------ ATTACKER'S GOAL: REDUCE AMBIGUITY Ambiguity set of data: Measuring loss of ambiguity If ambiguity set reduced by factor of 1/R, then ------------------------------------------ 2. Web app model ------------------------------------------ WEB APP MODEL A quintuple (S, Sigma, delta, f, V), where: - S = set of program states - Sigma = set of inputs accepted - delta = state transition function delta : S x Sigma -> S - f = output function f : S x Sigma -> V - V set of visible outputs e.g., packet sizes Notation: - 50 -> Browser sends 50 bytes - 1024 <- Server sends 1024 bytes ------------------------------------------ 3. What attacker must do ------------------------------------------ WHAT ATTACKER TRIES TO DO From unknown state s in S: observe N outputs (v1, v2, ..., vN) determine inputs (sigma1, sigma2, ..., sigmaN) or ------------------------------------------ 4. How web app designs help attacker What can lead to a large(r) reduction in ambiguity? ------------------------------------------ FACTORS THAT HELP ATTACKER ------------------------------------------ What kinds of inputs typically lead immediately to a response? How are reduction factors combined? 5. Density measures difficulty/ease of attack ------------------------------------------ DENSITY def: Let P be a set of packet sizes. Then density(P)= #P / [max(P) - min(P)] where #P is cardinality of P max(P) is maximum size in P min(P) is minimum size in P "A density below 1.0 often indicates packets that are easy to distinguish" - Chen et al., 2010, p. 195 ------------------------------------------ Why would a density < 1.0 help the attacker? 6. Examples a. Health App ------------------------------------------ HEALTH APP EXAMPLE Tabbed data entry with tabs for: - Conditions - Medications - Procedures - Test results - Immunizations Density was 0.000211 Problems: - Auto-suggestion for typing Each keystroke generates web flows (253 ->, 581 <-, x <-) where x is size of suggestion list Density of first character: 0.11 Density after initial 'a': 0.064 - Selction from structured dialog/menu 2670 conditions, density = 0.0046 ------------------------------------------ Does anyone play Wordle? Would clicking on a suggestion also reduce ambiguity? Would selecting from a hierarchical menu benefit the attacker? Would information from "find a doctor" reveal a condition? b. Tax Form App ------------------------------------------ TAX FORM APP Clear workflow starting with personal information ------------------------------------------ How many filing statuses are there? Does family income (AGI) determine which forms to file? c. Other applications ------------------------------------------ EXAMPLE: ONLINE INVESTING Funds displayed as GIF images and each has a web page Can infer: ------------------------------------------ Would the GIF image size correlate to a particular fund? Can the size of an image be determined separately from HTML size? How could attacker determine allocation of funds from size of a pie chart? d. Web search engines (Google, Bing, etc.) ------------------------------------------ WEB SEARCH ENGINES Google, Bing, etc. - attack can reveal query history Attacker can use: - Auto-suggestion sizes ------------------------------------------ Would a company want its employee's search histories revealed? Does capitalization matter in a web query? e. Wi-Fi ------------------------------------------ WPA2 STANDARD FOR WI-FI Uses CCMP = 128 bit AES in counter mode counter mode means size of cyphertext = size of message So, size is ------------------------------------------ Does lack of padding help attackers? E. mitigations 1. application agnostic ------------------------------------------ SAMPLE APPLICATION AGNOSTIC MITIGATIONS Example, SSH: - sends a packet every 50 msec VOIP: - round up all packet sizes to 128 bytes ------------------------------------------ a. padding ------------------------------------------ PADDING PACKETS Rounding: - Round up to nearest multiple of D bytes Random padding: - Append padding of 0 to D bytes Measurements for health app found: - D = 128 not enough (responses about 200 bytes) average overhead was ~ 14% - D = 512 hid information but average overhead was ~ 33% For Income Tax App - D = 1024 only allows attacker to distinguish 7 income ranges but about 25% overhead Apps also need to: - merge states on longer paths - or add extra states on shorter paths ------------------------------------------ Why do these tactics help? What is the average overhead of these tactics? So, is there one effective mitigation tactic for all apps? b. problems with mashups ------------------------------------------ MASHUP PROBLEMS Online investment app fetches charts from financial data app which makes charts public Will the financial data app pad data? ------------------------------------------ c. Prospects for generality ------------------------------------------ IS THERE A GENERAL MITIGATION? Is there a general mitigation? ------------------------------------------ What does that mean for developers? d. App-specific mitigation ------------------------------------------ CHALLENGES FOR DEVELOPERS - Finding side-channel vulnerabilities look for: - stateful communication - communication based on user inputs - correlations between inputs and outputs - Specifying mitigation policies - Building policy enforcement mechanisms - coordinating browsers and web servers ------------------------------------------ What problems would there be in determining message sizes? ------------------------------------------ RECOMMENDED DEVELOPMENT PRACTICE 1. specify privacy policies 2. track info. flows, including web flows 3. vulnerabilities related to policies? if yes: 4. can they be solved by manipulating individual packets? if yes: add mitigations to packets (rounding or random padding) if no: 5. change design of app features 6. goto step 2 if no: done! ------------------------------------------ What kind of tool could help in step 3?