Memory Subsystems

Important Terminologies

  • Locality of Memory References - If a program accesses a particular memory address, it is likely that the next few accesses will be to nearby addresses (spatial locality), and also that the same address is likely to be accessed again within a short time (temporal locality). This is true for instruction fetches, and also for data reads and writes.
  • Block - The minimum amount of data transfered between two adjacent memory levels
  • Memory Levels - Top level is the fastest i.e. but smallest is a register while the bottom level is the slowest but the largest i.e. hard disk.
  • Hit - If the data requested by the processor is found at a memory level.
  • Miss - If not, we have a miss; the request is propagated to the next level down and the block containing the requested data is copied at this level when the data is found. This ensures that the next time this (or nearby) data is accessed there will be a hit at this level.
  • Hit Rate - The fraction of memory references found at a level is called hit rate. [3]

Hierarchy

main-qimg-64d592b11dc2b5c990fc6e10e9396b93-c.jpeg

1. Top Level: Register, very fast but small in terms of storage.

2. First Level: First Level Cache, small (perhaps 64Kbyte), very fast cache on the processor chip

3. Second Level: Second Level Cache, larger (perhaps 512Kbyte), either on the processor chip or on separate chips, and intermediate in speed between the on-chip cache and the main memory.

4. Third Level: Main Memory

5. Bottom: Hard Disk, Slow, but large in terms of storage [3]

Cache

The main issue in designing a cache is how to determine whether a data item is present in the cache and where it is stored. As the storage space of a cache is much smaller than that of the main mem- ory, each cache location can hold the contents of a number of different memory locations.

  • Tag - Since data items are identified by their memory address, in order to ensure that a specific location in the cache indeed holds the required data, the address must be stored in a special field at the cache location together with the data. This special field is called the tag.

When a cache location is unoccupied, the tag field could still match to a requested memory address causing the program to malfunction.

  • Valid Bits - A single bit valid field is also added in each location signifying whether the data held there is valid or not.

Cache Performance

AMAT's three parameters hit time (or hit latency), miss rate, and miss penalty provide a quick analysis of memory systems. Hit latency (H) is the time to hit in the cache. Miss rate (MR) is the frequency of cache misses, while average miss penalty (AMP) is the cost of a cache miss in terms of time. Concretely it can be defined as follows.[1]

AMAT=H + MR * AMP

  • Register: 250psec, 1 clock cycle
  • L1 cache: 1nsec, 1 to 4 clock cycles
  • L2 cache: a few nsec, 7-23 clock cycles
  • Main memory: 30-50nsec, 50-100 clock cycles! [4]

Four Questions

1. Where can a block be placed in the upper level? (Block Placement)

  • Cache structure
    • Fully associative
    • Direct mapped
    • 2-way associative
       40%

2. How is a block found if it is in the upper level? (Block Identification)

  • Tag on each block
    • No need to check index and block offset
       20%
      Source
  • Given an address, we can determine whether the data at that memory lo- cation is in the cache. To do so, we use the following procedure:
    • Use the set index to determine which cache set the address should reside in.
    • For each block in the corresponding cache set, compare the tag asso- ciated with that block to the tag from the memory address. If there is a match, proceed to the next step. Otherwise, the data is not in the cache.
    • For the block where the data was found, look at valid bit. If it is 1, the data is in the cache, otherwise it is not.

3. Which block should be replaced on a miss? (Block Replacement)

  • Easy for direct mapped: (Block address) MOD (Number of sets)
  • Set associative or fully associative:
    • Random
    • Least Recently Used (LRU)
      • Hardware keeps track of the access history
      • Replace the entry that has not been used for the longest time

4. What happens on a write? (Write Strategy)

  • Write through
    • The information is written to both the block in the cache and to the block in the lower-level memory.
  • Write back
    • The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced.
  • Pros and Cons:
    • Write through: read misses cannot result in writes
    • Write back: no repeated writes to same location
  • WT always combined with write buffers so that don’t wait for lower level memory
 50%

Image source

  • A write buffer is needed between the cache and memory, or the cache to the next cache => buffered write-through
    • Processor: writes data into the cache and the write buffer
    • Memory controller: write contents of the buffer to memory
    • First in first out: FIFO
    • Write buffer saturation: store frequency > 1/DRAM write cycle
      • ==> writer buffer merge: combines writes that have consecutive destination addresses into one buffer entry
  • Write miss policy
    • Write allocate (fetch on write)
      • The block is loaded on a write miss
    • No-write allocate (write-around)
    • The block is modified in the lower level and not loaded into the cache

Virtual Memory Address Translation

Virtual memory

Code and data of active processes would be kept in main memory while moving others to backing store, i.e. disk. Use of "virtual memory" makes it possible.

This is transparent to programmers and the operating system takes care of it.

Benefits:

  1. Programs can run on computers with less memory
  2. Memory can efficiently be used

Address translation

Addresses output by the processor must be translated from virtual addresses to the physical addresses. This process is called "Address Translation."

Memory management unit(MMU) handles this translation.

Each process has different memory spaces. This is important for the security but also makes it easy to write a program with absolute addresses.

32-bit virtual addresses consist of 22 bits "page number" and 10 bits "offset."

First, an MMU accesses a *page entry* in the *page table* that is describing the location of the page.

If the data is on the main memory, get the address of the frame, concatenate with the lower 10 bits to gain the corresponding physical address.

If the data is not on the main memory, get the address of the disk and fetch data.

Reference

1.) https://en.wikipedia.org/wiki/Average_memory_access_time

2.) https://www2.cs.duke.edu/courses/fall06/cps220/lectures/PPT/lect11.pdf

3.) http://www.inf.ed.ac.uk/teaching/courses/inf2c/lectures/CS12_13_notes.pdf

4.) http://web.sfc.keio.ac.jp/~rdv/keio/sfc/teaching/architecture/computer-architecture-2016/lec07-cache.html

5.) https://courses.cs.washington.edu/courses/cse378/09wi/lectures/lec15.pdf

6.) https://www.csie.ntu.edu.tw/~yangc/lecture8.pdf

7.) http://www.ele.uri.edu/faculty/sendag/ele594/lec08.pdf


添付ファイル: filemain-qimg-64d592b11dc2b5c990fc6e10e9396b93-c.jpeg 4114件 [詳細]

トップ   編集 凍結 差分 バックアップ 添付 複製 名前変更 リロード   新規 一覧 単語検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2018-07-16 (月) 14:43:02 (370d)