# Page Tables Revisited THE UNIVERSITY OF NEW SOUTH WALES

# **Learning Outcomes**

- · An understanding of virtual linear array page tables, and their use on the MIPS R3000.
- Exposure to alternative page table structures beyond multi-level and inverted page tables.





**Two-level Translation** 10 bits | 10 bits | 12 bits Paging Mechanism Main Memory

Virtual Linear Array page table • Uses a page table array index by page number Page table array is in virtual memory with only used parts of the array allocated in physical memory A second page table root node has translations for the page table itself THE UNIVERSITY OF NEW SOUTH WALES

Virtual Linear Array Operation • Index into page table array without referring to root PT! · Simply use the full page number as the PT index! Leave unused parts of PT unmapped! If access is attempted to unmapped part of PT, a secondary page fault is triggered - This will load the mapping for the PT from the root PT Root PT is kept in physical memory (cannot trigger page faults) THE UNIVERSITY OF NEW SOUTH WALES





7





9 10





11 12

### Software-loaded TLB

- Pros
  - Can simplify hardware design
  - provide greater flexibility in page table structure
- - typically have slower refill times than hardware managed TLBs.



13

13

# Design Tradeoffs for Software-Managed TLBs David Nagle, Richard Uhlig, Tim Stanley, Stuart Sechrest Trevor Mudge & Richard Brown ISCA '93 Proceedings of the 20th annual international symposium on computer architecture THE UNIVERSITY OF NEW SOUTH WALES

14

# Trends at the time

- · Operating systems
  - moving functionality into user processes
  - making greater use of virtual memory for mapping data structures held within the kernel.
- · RAM is increasing
  - TLB capacity is relatively static
- Statement:
  - Trends place greater stress upon the TLB by increasing miss rates and hence, decreasing overall system performance.
  - True/False? How to evaluate?



15

15 16



| Software Trap on TLB Miss                                                                                                                                                                                                                                                                                                                                                              |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Tapeworm Kernel Code (Unmapped Space)                                                                                                                                                                                                                                                                                                                                                  |
| TLB Miss Handlers Functions Functions Simulated TLB (128 Slots) TLB                                                                                                                                                                                                                                                                                                                    |
| (Mapped Space) (64 Siots)  Figure 1: Tapeworm                                                                                                                                                                                                                                                                                                                                          |
| The Tapeworm TLB simulator is built into the operating system and is invoked whenever there is a real TLB miss. The simulator uses the real TLB misses to simulate its own TLB configuration(s). Because the simulator resides in the operating system, Tapeworm captures the dynamic nature of the system and avoids the problems associated with simulators driven by static traces. |
| THE UNIVERSITY OF NEW SOUTH WALES 16                                                                                                                                                                                                                                                                                                                                                   |
|                                                                                                                                                                                                                                                                                                                                                                                        |

TLB Miss Type OSF/1 511 375 436

Table 3: Costs for Different TLB Miss Types

L1U TLB miss on a level 1 user PTE. TLB miss on a level 1 kernel PTE. L2 TLB miss on level 2 PTE. This can only or miss on a level 1 user PTE. TLB miss on a level 3 PTE. Can of level 2 miss or a level 1 kernel miss.

An access to an page marked as invalid (page fault).

## Note the TLB miss costs

· What is expected to be the common case?



19 20



21 22



Specialising the L2/L1K miss vector

New Total Cost (sec)

0,79

2.99

11.85

3.81

7.29

40.99

0.00

0.00

THE UNIVER NEW SOUTH. .

ULTRIX & OSF/1 File system Networking Kernel Mod ULTRIX & OSF/1 File system, networking, scheduling and Unix interface reside inside a monolithic kernel. Kernel text resides in unmapped space. Ultrix places most kernel data structures in unmapped space while OSF/1 uses mapped space for many of its kernel data structures.

20

Measurement Results 9,177,401 9,817,502 34,972 11,691,398 24,349,121 of TLB M L1U L1K L2 L3 5.81%

30,123,212 2,493,283 43.98 11.85 127,245 3.81 33,933,413 computed Cost of TLB Misses Gi ional Miss Vectors (Mach 3.0) These tables show the number of TLB misses and amount of time spent handling TLB misses for each of the operating systems studied. In Utfix, most of the TLB misses and TLB miss time is spent sperificing 1.0 TLB misses. However, for CSFT and various ventions of Mach 3.0, LTK and 1.2 misses can overstand with KLTB stime. The increase in Modify misses is due to CSFT and Supplying a separate interrupt vector for L2 misses and allowing the uTLB andier to service L1k misses reduces their cost to 40 and 20 cycles, respectively. The contribution to TLB miss time drops from 8.08 and 43.98 seconds down to 0.79 and 2.99 seconds, respectively. 24

23





TEV 30011 WILLS

