Week 5 - Fault Injection Attacks and SoC Security

2026-03-01 | Lecture , Cryptography , Fault Injection , SoC Security

Download the lecture slides for this week here: COMP6420_2026T1_Week5_Fault_Attacks_and_SoC_security.pdf

COMP6420 Week 5 – Fault Attacks and SoC Security

Part A — Fault Attacks

1. Faults: when hardware doesn’t compute correctly

Circuits require sufficient power and sufficient time (clock period) to settle to correct values. If conditions are wrong, internal state can become incorrect and computations can fail.

Faults can be:

Natural / accidental, such as radiation-induced bit flips (e.g., “from space”), which can randomly flip stored bits and change system behaviour.
Deliberate / induced, where an attacker forces faults by manipulating physical conditions:
- supply voltage,
- external clock (glitches),
- temperature,
- bright light / lasers,
- X-rays / ion beams,
- electromagnetic flux.

Some methods are expensive and require specialized equipment (example images show lab-scale laser fault injection setups). Others can be surprisingly low-cost, such as simple voltage glitch circuits or clock glitch injection logic.

2. Why faults help attackers

Fault attacks intentionally create pairs of outputs:

a correct output from a normal execution, and
a faulty output from an execution with an induced error.

Those paired results create constraints (effectively “simultaneous equations”) that can dramatically reduce the key search space for cryptography.

A toy analogy is used: if a delivery person breaks exactly one item in transit and a refund rule applies, the payment/refund information can reveal what was shipped. Small controlled perturbations leak hidden information.

3. Differential Fault Analysis (DFA) on AES

3.1 Round 10 byte fault: “local” effect

A convenient fault model is: one byte in the AES intermediate state is corrupted late in the encryption.

If the fault happens before the last-round S-box (round 10 in AES-128):

only one byte enters incorrectly,
ShiftRows permutes positions but doesn’t spread errors,
AddRoundKey XORs key bytes but doesn’t spread errors,
there is no MixColumns in the final round, so the disturbance stays local.

Result: typically only one ciphertext byte differs, and that byte ties directly to a single last-round key byte.

For the affected byte position j:

C[j] = S(x)[j] ⊕ K10[j]
C*[j] = S(x ⊕ e)[j] ⊕ K10[j]
where e is the unknown byte fault.

Key-byte guessing is done “backwards” through the inverse S-box:

InvS(C[j] ⊕ k) = x
InvS(C*[j] ⊕ k) = x ⊕ e

For the correct k, the XOR of those two inverse S-box values equals the fault e.

A single fault pair is usually insufficient because e is unknown, so a fault model (fixed fault, bit-flip, stuck-at, small fault set, etc.) is used to reject impossible key guesses. Repeated faulting with the same plaintext helps identify whether the fault is fixed (oscillates between a small set of values) or inconsistent (harder).

3.2 Round 9 fault: diffusion creates strong structure

Faulting earlier (round 9) is “harder” in the sense that the disturbance spreads, but it also creates a highly restrictive pattern:

Inject before MixColumns: a 1-byte error becomes a 4-byte column difference.
ShiftRows permutes those bytes.
One fault commonly impacts 4 output bytes.

MixColumns expands a byte error e to one of a few structured 4-byte vectors like (2e, e, e, 3e) (or rotations), depending on which column is hit.

This structure allows very fast narrowing:

candidate key bytes are those that make the inferred differences match the MixColumns template consistently,
intersecting candidates across just a couple of fault pairs often collapses ambiguity quickly.

Often ~2 good fault pairs can be enough for a round-9 style attack.

A step-by-step recovery loop is outlined:

Collect (PT, C, C*).
Check whether differences match a single-column, 4-byte pattern.
Map affected positions to a round-9 column.
Guess e, derive expected per-byte differences.
For each byte position, collect candidate key bytes satisfying the inverse-S-box difference equation.
Combine candidates across the 4 bytes via a Cartesian product (typically small vs 2^32).
Repeat with another fault pair and intersect candidates.

3.3 Round 8 and “single-fault” ambitions

Faulting earlier (e.g., round 8) can spread to many bytes (even all 16) and, with the right model and some brute force, can still recover a full 128-bit key from a small number of outputs.

However, single-fault DFA is fragile:

if you don’t know which byte was hit, the search space grows,
if you don’t know which round was hit, the search space grows dramatically,
out-of-model faults can invalidate the equations entirely.

Pragmatically: if one fault is possible, more faults are often possible, and collecting more faulty outputs generally makes recovery easier.

A practical note: DFA is so well studied that tools exist (e.g., a Python library that can solve round 8/9 AES faults automatically, but not round 10).

4. Lab context: Fault Attack (Lab 3)

Lab 3 involves three unique AES FPGA bitstreams corresponding to round 8, round 9, and round 10 faults. Interaction is over SPI, and a button press triggers a faulty encryption, allowing collection of (PT, C, C*) triples. Round 8/9 faults can be solved with an existing DFA tool, while round 10 requires custom reasoning/implementation. Release is by Friday, due Week 7.

5. Defenses against fault attacks

Fault defenses aim to either detect faults (and abort/zeroize) or tolerate them (and still compute correctly).

Two main families highlighted:

5.1 Redundancy

Compute twice and compare results (spatial redundancy) or repeat computations (temporal redundancy). Very effective but can have large area/performance overhead.

5.2 Concurrent Error Detection (CED) with parity

Parity-based checking can detect many fault types with modest overhead. Several AES operations have predictable parity behaviour:

Key XOR: output parity can be derived from input parity XOR key parity; key parity can be precomputed and stored with round keys.
MixColumns: preserves parity under the described model.
ShiftRows: rotations don’t change parity.
S-box: no simple parity relation; address this by adding a parity look-up bit per S-box entry indicating whether input/output parity differs.

An FPGA implementation of parity-based AES CED is described as having <10% area/performance overhead.

Part B — SoC Security

6. Security assurance and security objectives

Hardware often anchors the root of trust for confidentiality, integrity, and availability. Security assurance is the process of verifying that a design meets its security objectives—even after manufacturing and distribution.

Security objectives are market- and threat-model-dependent:

data center products often face high remote-attack probability and lower physical-attack probability,
IoT devices often face both high physical and high remote attack probability.

An example objective:

Only authenticated firmware executes on an SoC.
If unauthenticated firmware is loaded, the SoC enters an error state requiring a power-cycle to reload correctly.

7. Functional validation vs security validation

Functional verification checks “does it meet spec?” against test plans, use cases, and corner cases.

Security validation adds:

explicit threat models,
security targets,
“impossible state” handling,
side-channel analysis,
and test cases derived from how attackers might misuse or repurpose unspecified behaviours.

A useful distinction:

Functional bugs violate specification or cause incorrect results/crashes during normal use.
Security bugs emerge when validating against security objectives and threat models; they often leverage unspecified or unintended operation. Spectre/Meltdown are listed as a canonical example.

8. SoC development lifecycle and where security fits

A typical SoC lifecycle:

Architecture definition (HAS)
Microarchitecture design (MAS)
Pre-silicon verification (simulation/emulation/formal)
Post-silicon verification (debugging manufactured hardware)
Field analysis (post-mortems and user reports).

Security-focused lifecycle work is increasingly formalized (example reference to a USENIX Security 2019 work on security development lifecycles).

9. Common SoC security features and common pitfalls

Common hardware security features for SoCs include:

fabric access control,
control register locks/permissions,
cryptographic hardware blocks,
memory isolation,
secure debugging.

Common pitfalls map closely to those features:

incorrect register attributes,
broken or missing register locks (or incorrectly unlocking),
memory security mistakes,
crypto implementation bugs,
debug hardware issues.

These pitfalls align naturally with many of the earlier attacks in the course:

scan access and debug pathways relate to “secure debugging” failures,
power/fault attacks relate to crypto-block implementation weaknesses and insufficient side-channel/fault resilience,
broader trust issues tie into fabric access control and memory isolation.

10. Taxonomies and risk classification

Security-relevant hardware design faults can be categorized using Hardware CWE (Common Weakness Enumeration). Individual real-world incidents map to CVEs (Common Vulnerabilities and Exposures).

Severity can be scored using CVSS (Common Vulnerability Scoring System).

11. Third-party IP and hardware Trojans (SoC context)

SoC integration often relies on many components that may be third-party. This creates Trojan risk at the SoC/IP-block level, similar in spirit to PCB supply-chain risk:

unintended access or changes to signals/buses/modules can cause
- data leakage (confidentiality),
- data modification (integrity),
- denial of service (availability).

A practical reminder is given via an “actually inserted trojans” dataset (Hack@DAC) demonstrating that many distinct Trojan designs can be intentionally introduced.