# Hardware Managed Scratchpad for Embedded Systems Ben Rudzyn









# Approach

• SimpleScalar simulator, with syscall modification

| CPU                       | 6 stage, statically scheduled, single instruction pipeline                                                           |
|---------------------------|----------------------------------------------------------------------------------------------------------------------|
| Functional units          | 1 integer ALU, 1 integer multiplier, 1 integer divider                                                               |
| Functional unit latencies | All single cycle                                                                                                     |
| Instruction cache         | 2 Kb (256 entries), direct mapped, 1 word block (64 bit), 1 cycle hit latency, 2 cycle miss latency (plus mem delay) |
| Instruction SPM           | 8 Kb (1024 entries), 1 cycle hit latency                                                                             |
| Instruction main memory   | 8 Mb SDRAM (10ns), simplified burst mode 10-1-1-1*, 4 word line size                                                 |
| Data cache                | 2 Kb (512 entries), direct mapped, 1 word block (32 bit), 1 cycle hit latency, 2 cycle miss latency (plus mem delay) |
| Data main memory          | 8 Mb SDRAM (10ns), simplified burst mode 10-1-1-1*, 4 word line size                                                 |

• Accuracy within 0.05% of hardware simulations for large programs ( > 1million cycles)







### **Problems**

- Overhead
  - SPM fill vs cache miss
  - SPM fill vs cache hit (common case)
  - SPM hit vs cache hit (optimised case)
- Loop size
  - Three extra instructions (*smi, jump, jump*)
  - Number of iterations
- Loop structure
  - *if* statements
  - loop in loop
- Optimised case
  - Saves on SPM fill overhead
  - Still 2 instructions

## Limitations

- Library functions
  - Can't be copied at the moment
  - Account for 30% of execution time (adpcm)
- Hand maintenance
  - Instruction insertion
  - Update Controller
  - jump addresses
- Size of SPM
  - Optimised case only



## **Future Work**

- Software compiler
  - Automatically insert *smi* instructions  $(13 \rightarrow 70)$
  - Automatically update the Controller
  - Automatically update *jump* addresses
- Better procedure to locate blocks and loops of interest
- Optimisation mark II
  - Modify the optimised *smi* placement scheme
  - Use an extra SPM register
  - Allows > 1024 to be stored in the SPM

## Conclusion

- Not overly promising so far
- Potential room for improvement through automation
- Next step:
  - Calculate energy consumption
  - Energy profile of the hardware model