The University of Queensland Homepage
School of ITEE ITEE Main Website

 RAMpage Road Map

RAMpage Road Map


RAMpage Open Questions

This page covers areas of work that need to be completed to evaluate and implement the RAMpage model. First, a checklist of future work is presented here; next comes work in progress.

The biggest current area of research is in context switch effects, more general operating system issues and comparison with a wider range of cache organizations. A longer-range issue is considering the effects of migrating main memory to the top of the hierarchy. Some parts of the first two points have already been addressed by published work; running larger traces or more realistic CPU simulations (e.g. using Simplescalar) would be the obvious next step. Implementing OS simulation on top of Simplescalar could be challenging though others are working on the problem. SimOS represents another possibility but implementation of RAMpage on top of SimOS may be a challenge, given that some OS code has to change.

  • context switches – three major issues need to be resolved:
    • the impact of context switch cost to the page-fault handler
    • gains from kernel data structures in SRAM in normal context switches
    • gains from context switches on misses (i.e. to another process)
  • OS simulation – aside from context switches, other operating systems issues include
    • TLB miss handing
    • value of TLB management in hardware
    • caching recently replaced pages in SRAM
    • better implementation of paging
    • more accurate OS simulation
  • alternative cache organizations – a more complete picture of cache alternatives requires data on costs of greater associativity (hit costs, extra logic)
    • will more associativity in L1 make RAMpage more of a win? The assumption here is that a more associative L1 cache will send less traffic to L2, so replacing L2 by RAMpage will be an even bigger win (current simulations only use direct-mapped L1) – this seems to be supported by work on larger L1s, with context switches on misses in the RAMpage model
    • what impact does size of cache / SRAM main memory have (e.g., could RAMpage work on the relatively small SRAM sizes common today, typically at most 1Mbyte on mass-market systems -- some with quite fast CPUs)?
    • what impact would a large TLB have (e.g. on the optimal SRAM page size)? This is interesting because the PowerPC 750 for example although a mass-market design has a relatively large TLB (128 entries each for instructions and data): some results now show that the effect would be to increase the range of viable RAMpage page sizes, though a wider range of workloads needs to be evaluated
  • first-level main memory – while current transistor counts are too low to consider an on-chip main memory, fast off-chip cache interfaces (including the HP PA-8000, which implements a large single-level cache off-chip) indicate the possibility that main memory could be the first level of the hierarchy; main issues to investigate:
    • with no on-chip L1 or L2 tags, and no need to implement associativity in hardware, a first-level main memory reduces the requirement for expensive chip real-estate currently used for implementing caches – what wins can we score here?
    • the effect of simple fast hits in the fastest level of memory
    • how small a main memory can we get away with, if the next level down is relatively fast?
  • multiprocessor effects – a multiprocessor system would increase the number of competing reference streams, but is this necessarily a problem? Possibilities to consider include
    • a more aggressive interconnect, permitting multiple independent DRAM references
    • the extent to which competing references are likely to coincide
    • how to handle multiple main memories, e.g., using ideas from distributed shared memory, or semi-distributed models, like the Stanford DASH machine
    • whether any multiprocessor effects would be worse or better with RAMpage


RAMpage Work in Progress

Results already published include

  • the effect of switching to SDRAM or Rambus to replace ordinary DRAM, with a generally more aggressive implementation of the DRAM level
  • TLB effects of RAMpage vs. a more conventional hierarchy
  • the effect of context switches on misses
  • scalability of RAMpage vs. a more conventional hierarchy as the CPU-DRAM speed gap grows

Results already measured but not yet published include

  • the effect of a larger L1 cache
  • the effect of a larger TLB

The context switch area is the most radical departure from caching assumptions. Initial emphasis is on modelling the performance wins (if any) of taking a context switch on a miss. The assumption here is that relatively long page fault times to DRAM (e.g. of the order of 10,000 instruction issues) which result from large page sizes (e.g. 4Kbytes) can be amortized by switching to another process. Our preliminary data shows that there is a significant reduction in misses with larger page sizes, but that the reduced miss cost is not compensated for by the high cost of a miss for a 4K page. On the other hand, a high miss cost allows time for a context switch. Latest simulations show that taking a context switch on a miss is a clear win, and makes RAMpage much more scalable as the CPU-DRAM speed gap increases than a conventional cache-based hierarchy.



this picture summarizes early measurements of context-switching benefits

measurement is relative to a standard 2-level cache-based hierarchy

(0 = no improvement; 0.n = 1.n times faster)


  my home page