Week 5 Fault Attacks and SoC Security

2026-03-01  |  Lecture , Cryptography , Fault Attacks , SoC Security

Download the lecture slides for this week here: COMP6420_2026T1_Week5_Fault_Attacks_and_SoC_security.pdf

COMP6420 Week 5 – Fault Attacks and SoC Security


Part A — Fault Attacks

1. Faults: when hardware doesn’t compute correctly

Circuits require sufficient power and sufficient time (clock period) to settle to correct values. If conditions are wrong, internal state can become incorrect and computations can fail.

Faults can be:

Some methods are expensive and require specialized equipment (example images show lab-scale laser fault injection setups). Others can be surprisingly low-cost, such as simple voltage glitch circuits or clock glitch injection logic.


2. Why faults help attackers

Fault attacks intentionally create pairs of outputs:

Those paired results create constraints (effectively “simultaneous equations”) that can dramatically reduce the key search space for cryptography.

A toy analogy is used: if a delivery person breaks exactly one item in transit and a refund rule applies, the payment/refund information can reveal what was shipped. Small controlled perturbations leak hidden information.


3. Differential Fault Analysis (DFA) on AES

3.1 Round 10 byte fault: “local” effect

A convenient fault model is: one byte in the AES intermediate state is corrupted late in the encryption.

If the fault happens before the last-round S-box (round 10 in AES-128):

Result: typically only one ciphertext byte differs, and that byte ties directly to a single last-round key byte.

For the affected byte position j:

Key-byte guessing is done “backwards” through the inverse S-box:

For the correct k, the XOR of those two inverse S-box values equals the fault e.

A single fault pair is usually insufficient because e is unknown, so a fault model (fixed fault, bit-flip, stuck-at, small fault set, etc.) is used to reject impossible key guesses. Repeated faulting with the same plaintext helps identify whether the fault is fixed (oscillates between a small set of values) or inconsistent (harder).

3.2 Round 9 fault: diffusion creates strong structure

Faulting earlier (round 9) is “harder” in the sense that the disturbance spreads, but it also creates a highly restrictive pattern:

MixColumns expands a byte error e to one of a few structured 4-byte vectors like (2e, e, e, 3e) (or rotations), depending on which column is hit.

This structure allows very fast narrowing:

Often ~2 good fault pairs can be enough for a round-9 style attack.

A step-by-step recovery loop is outlined:

  1. Collect (PT, C, C*).
  2. Check whether differences match a single-column, 4-byte pattern.
  3. Map affected positions to a round-9 column.
  4. Guess e, derive expected per-byte differences.
  5. For each byte position, collect candidate key bytes satisfying the inverse-S-box difference equation.
  6. Combine candidates across the 4 bytes via a Cartesian product (typically small vs 2^32).
  7. Repeat with another fault pair and intersect candidates.

3.3 Round 8 and “single-fault” ambitions

Faulting earlier (e.g., round 8) can spread to many bytes (even all 16) and, with the right model and some brute force, can still recover a full 128-bit key from a small number of outputs.

However, single-fault DFA is fragile:

Pragmatically: if one fault is possible, more faults are often possible, and collecting more faulty outputs generally makes recovery easier.

A practical note: DFA is so well studied that tools exist (e.g., a Python library that can solve round 8/9 AES faults automatically, but not round 10).


4. Lab context: Fault Attack (Lab 3)

Lab 3 involves three unique AES FPGA bitstreams corresponding to round 8, round 9, and round 10 faults. Interaction is over SPI, and a button press triggers a faulty encryption, allowing collection of (PT, C, C*) triples. Round 8/9 faults can be solved with an existing DFA tool, while round 10 requires custom reasoning/implementation. Release is by Friday, due Week 7.


5. Defenses against fault attacks

Fault defenses aim to either detect faults (and abort/zeroize) or tolerate them (and still compute correctly).

Two main families highlighted:

5.1 Redundancy

Compute twice and compare results (spatial redundancy) or repeat computations (temporal redundancy). Very effective but can have large area/performance overhead.

5.2 Concurrent Error Detection (CED) with parity

Parity-based checking can detect many fault types with modest overhead. Several AES operations have predictable parity behaviour:

An FPGA implementation of parity-based AES CED is described as having <10% area/performance overhead.


Part B — SoC Security

6. Security assurance and security objectives

Hardware often anchors the root of trust for confidentiality, integrity, and availability. Security assurance is the process of verifying that a design meets its security objectives—even after manufacturing and distribution.

Security objectives are market- and threat-model-dependent:

An example objective:


7. Functional validation vs security validation

Functional verification checks “does it meet spec?” against test plans, use cases, and corner cases.

Security validation adds:

A useful distinction:


8. SoC development lifecycle and where security fits

A typical SoC lifecycle:

  1. Architecture definition (HAS)
  2. Microarchitecture design (MAS)
  3. Pre-silicon verification (simulation/emulation/formal)
  4. Post-silicon verification (debugging manufactured hardware)
  5. Field analysis (post-mortems and user reports).

Security-focused lifecycle work is increasingly formalized (example reference to a USENIX Security 2019 work on security development lifecycles).


9. Common SoC security features and common pitfalls

Common hardware security features for SoCs include:

Common pitfalls map closely to those features:

These pitfalls align naturally with many of the earlier attacks in the course:


10. Taxonomies and risk classification

Security-relevant hardware design faults can be categorized using Hardware CWE (Common Weakness Enumeration). Individual real-world incidents map to CVEs (Common Vulnerabilities and Exposures).

Severity can be scored using CVSS (Common Vulnerability Scoring System).


11. Third-party IP and hardware Trojans (SoC context)

SoC integration often relies on many components that may be third-party. This creates Trojan risk at the SoC/IP-block level, similar in spirit to PCB supply-chain risk:

A practical reminder is given via an “actually inserted trojans” dataset (Hack@DAC) demonstrating that many distinct Trojan designs can be intentionally introduced.