The University of Queensland Homepage
School of ITEE ITEE Main Website

 Memory background

Memory efficiency

The general idea

A processor running at 2GHz (1 clock tick = 0.5ns) capable of completing multiple instructions per clock can complete several instructions per nanosecond. Cache misses from DRAM take tens of nanoseconds to complete, meaning a very big slowdown of cache misses are frequent.

There are various strategies for reducing cache misses, taking to account and enhancing two key properties of programs:

  • spatial locality – if a particular memory location is used, there’s a high probability that others close to it will be used
  • temporal locality – if a particular memory location is used, there’s a high probability that it will be used again

For example, spatial locality can be enhanced by organizing data so all related items are close together. Temporal locality can be enhanced by reorganizing computation so that as much work as possible is done on the same part of the data before moving on.

The challenge

Compilers are less likely to be good at this sort of optimization than pipeline optimizations. You will therefore need to focus on program transformation, and not look at machine code generated by the compiler. Some things to look for:

  • algorithms that can be easily rewritten in more than one order of operation
  • some examples:
    • matrix arithmetics
    • graphics operations
    • sorting algorithms
  • and of course you can find others

It is highly recommended that you use a language reasonably close to machine code, e.g., C or C++ for your examples, otherwise it will be hard to be sure that memory allocation is doing what you expect it to do.