# Page Tables Revisited THE UNIVERSITY OF ARMY SOUTH WALES 1

# **Learning Outcomes**

- An understanding of virtual linear array page tables, and their use on the MIPS R3000.
- Exposure to alternative page table structures beyond multi-level and inverted page tables.



2

6



Two-level Translation

Virtual Address

10 bits 10 bits 12 bits

Root page lable (contains 1024 PTEs)

Program Paging Mechanism Main Memory

3

Virtual Linear Array page table

Uses a page table array indexed by page number

Page table array is in virtual memory with only used pages of the array allocated in physical memory

A second page table root node has translations for the page table itself

Virtual Linear Array Operation

4-kbyte root page table array without referring to root PT!

• Simply use the full page number as the PT index!

• Leave unused parts of PT unmapped!

• If access is attempted to unmapped part of PT, a secondary page fault is triggered

• This will load the mapping for the PT from the root PT

• Root PT is kept in physical memory (cannot trigger page faults)

5



R3000 TLB Refill Dedicated exception handler mfc0 k1,C0\_CONTEXT Can be optimised for TLB refill mfc0 k0,C0\_EP; # mfc0 delay only - Does not need to check the # slot lw k1.0(k1) # may double = orig EPC) exception type Does not need to save any # fault (k registers
• It uses a specialised nop mtc0 k1,C0\_E assembly routine that only uses k0 and k1. nop Does not check if PTE exists tlbwr Assumes virtual linear array – see extended OS notes jr k0 rfe With careful data structure How does this choice, exception handler can be made very fast work? THE UNIVERSITY OF NEW SOUTH WALES

8

CO Context Register

31 21 20 2 1 0

PTEBase Bad VPN 0

• cO\_Context = PTEBase + 4 \* PageNumber

- PTEs are 4 bytes

- PTEBase is the base local of the page table array (note: aligned on 4 MB boundary)

- PTEBase is (re)initialised by the OS whenever the page table array is changed

• E.g on a context switch

- After an exception, cO\_Context contains the address of the PTE required to refill the TLB.



9 10





11 12

### Software-loaded TLB

- Pros
  - Can simplify hardware design
  - provide greater flexibility in page table structure
- Cons
  - typically have slower refill times than hardware managed TLBs.



13

13 14

### Trends at the time

- · Operating systems
  - moving functionality into user processes
  - making greater use of virtual memory for mapping data structures held within the kernel.
- RAM is increasing
  - TLB capacity is relatively static
- Statement:
  - Trends place greater stress upon the TLB by increasing miss rates and hence, decreasing overall system performance.
  - True/False? How to evaluate?



15

15



Design Tradeoffs for Software-Managed TLBs David Nagle, Richard Uhlig, Tim Stanley, Stuart Sechrest Trevor Mudge & Richard Brown ISCA '93 Proceedings of the 20th annual international symposium on computer

THE UNIVERSITY OF NEW SOUTH WALES



The Tapeworm TLB simulator is built into the operating system and is invoked whenever there is a real TLB miss. The simulator uses the real TLB misses to simulate its own TLB configuration(s). Because the simulator resides in the operating system, Tapeworm captures the dynamic nature of the system and avoids the problems associated with simulators driven by static traces.



16

TLB Miss Type Ultrix OSF/1 Mach 3.0 L1K 333 355 294 511 375 436 499 336

Table 3: Costs for Different TLB Miss Types

table shows the number of machine cycles (at 60 ns/cycle) required to ce different types of TLB misses. To determine these costs, Monster used to collect at 28K-entry histogram of timings for each type of miss, operater TLB miss types into the six categories described below. Note Unit's does not have L3 misses because it implements a 2-leviel page

| table.  |                                                                                               |
|---------|-----------------------------------------------------------------------------------------------|
| L1U     | TLB miss on a level 1 user PTE.                                                               |
| L1K     | TLB miss on a level 1 kernel PTE.                                                             |
| L2      | TLB miss on level 2 PTE. This can only occur after a<br>miss on a level 1 user PTE.           |
| L3      | TLB miss on a level 3 PTE. Can occur after either a<br>level 2 miss or a level 1 kernel miss. |
| Modify  | A page protection violation.                                                                  |
| Invalid | An access to an page marked as invalid (page fault).                                          |
|         |                                                                                               |

## Note the TLB miss costs

· What is expected to be the common case?



22

19





Unix Serve User Mode AFS Cache Vanager Mach 3.0 Kernel Mode Mach 3.0 + AFSout Same as standard Mach 3.0, but with increased functionality provided by a server task. The AFS Cache Manager is either inside the Unix Server or in its own, user-level server (as pictured above). THE UNIVI NEW SOUTH WALE

| System                                | Total Run Time<br>(sec)                 | L1U           | L1K           | L2         | L3      | Invalid         | Modify         | Total                  |
|---------------------------------------|-----------------------------------------|---------------|---------------|------------|---------|-----------------|----------------|------------------------|
| Ultrix                                | 583                                     | 9,021,420     | 135,847       | 3,828      |         | 16,191          | 115            | 9,177,401              |
| OSF/1                                 | 892                                     | 9,817,502     | 1,509,973     | 34,972     | 207,163 | 79,299          | 42,490         | 11,691,398             |
| Mach3                                 | 975                                     | 21,466,165    | 1,682,722     | 352,713    | 556,264 | 165,849         | 125,409        | 24,349,121             |
| Mach3+AFSIn                           | 1,371                                   | 30,123,212    | 2,493,283     | 330,803    | 690,441 | 168,429         | 127,245        | 33,933,413             |
| Mach3+AFSOut                          | 1,517                                   | 31,611,047    | 2,712,979     | 1,042,527  | 987,648 | 168,128         | 127,505        | 36,649,834             |
|                                       | Total TI D                              |               |               |            |         |                 |                |                        |
| System                                | Total TLB<br>Service Time<br>(sec)      | L1U           | L1K           | L2         | L3      | Invalid         | Modify         | % of Total<br>Run Time |
|                                       | Service Time                            | L1U<br>8.66   | L1K<br>2.71   | L2<br>0.11 | L3      | Invalid<br>0.33 | Modify<br>0.00 |                        |
| Ultrix                                | Service Time<br>(sec)                   |               |               |            | L3      |                 | ·              | Run Time               |
| Ultrix<br>OSF/1<br>Mach3              | Service Time<br>(sec)                   | 8.66          | 2.71          | 0.11       |         | 0.33            | 0.00           | Run Time<br>2.03%      |
| System Ultrix OSF/1 Mach3 Mach3+AFSin | Service Time<br>(sec)<br>11.82<br>51.85 | 8.66<br>11.78 | 2.71<br>32.16 | 0.11       | 4.40    | 0.33            | 0.00           | 2.03%<br>5.81%         |

|                                | Type of PTE<br>Miss | Counts                             | Previous<br>Total<br>Cost<br>from<br>Table 6<br>(sec)    | New<br>Total<br>Cost<br>(sec)                | Time<br>Saved<br>(sec) |
|--------------------------------|---------------------|------------------------------------|----------------------------------------------------------|----------------------------------------------|------------------------|
|                                | Mach3+AFSin         |                                    |                                                          |                                              |                        |
|                                | L1U                 | 30,123,212                         | 36.15                                                    | 36.15                                        | 0.00                   |
|                                | L2                  | 330,803                            | 8.08                                                     | 0,79                                         | 7.29                   |
|                                | L1K                 | 2,493,283                          | 43.98                                                    | 2.99                                         | 40.99                  |
|                                | L3                  | 690,441                            | 11.85                                                    | 11.85                                        | 0.00                   |
|                                | Modify              | 127,245                            | 3.81                                                     | 3.81                                         | 0.00                   |
|                                | Invalid             | 168,429                            | 2.70                                                     | 2.70                                         | 0.00                   |
|                                | Total               | 33,933,413                         | 106.56                                                   | 58.29                                        | 48.28                  |
| UNIVERSITY OF<br>V SOUTH WALES |                     | 1K misses redu<br>Ition to TLB mis | tor for L2 miss<br>ces their cost to<br>s time drops fro | Mach 3.0)<br>ses and allow<br>o 40 and 20 co | ing the uTLB           |

23 24



Per-region VHPT VPN VPN Global VHPT Long Format Short Format PPN PPN PKEY psize 64 bits Tag Tag used for matching Chain psize is page size 4 x 64 bits THE UNIVERSITY OF NEW SOUTH WALES

26

THE UNIVERSITY OF NEW SOUTH WALES



27