ISCA Day 3


DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric

Low-power High density reconfigurable RAM.

Low power & high capacity FPGA is desirable, use DRAMs instead of SRAM lookup table to provide low power & high capacity

Challenges of building DRAM-based FPGA

LUT is slower thanSRAM, destructive access (data lost after access)

Narrower MAT (DRAM array) from 1K -> 8~16 bits,

Destructive DRAM read is solved by PRE(charge) ACT(ivation) RST(restore)
followed by a wire transfer. These are sequential, but can be overlapped.
RST can be overlapped by wire transfer, and, etc.

Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs

Resistive Memory Write:

Slower writes have endurance benefits (knife sharpening example)

Adaptively use write speed choosing.

Make use of the idle memory(?) to slowly writeback

Bank-aware mellow writes: Choose banks with less blocks to writeback in the wb queue of memory controller. FOr those relatively free banks, issue slow writes.

Performance degradation is not noticeable, endurance improved by 87%

Eager Mellow Writes:

Predict that LLC dirty lines will not be dirtied again, and so writeback slowly to ReRAM

Does some epoch counting? to find cachelines…

Add a eager mellow write queue, lowest priority but uses memory bandwidth to writeback. Eagerly.

Eager writeback also improves performance as it reduces write queue congestion!! 

Also employs lifetiem quota, where a lifetime is enforced.

More energy is used to write slower write!

MITTS: Memory Inter-arrival Time Traffic Shaping

CPU initiated memory bandwidth provisioning.

IaaS can charge the users on memory bandwidth usage (and arrival time)

HW mechanism to provision memory BW (Bulky vs. Bursty bandwidth)

Relative memory inter-arrival time, make into a histogram.

Credits per interarrival time in bins. Thus if you use all your credit, you need to wait, and use the next inter-arrival time credit bin

Array of registers that represent credits in each bin. Also, replenisher to fill the bins

Reliability 2

All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory

Command/Address trnasfer failed to speed up with newer DRAM specs.

CA more prone to erros than data! Due to DDR DIMM technology

CA-parity was introduced in DDR4.

Read address error1 (Read wrong codeword! Data & Data ECC are vailid within themselves, but wrong codeword!!)
Extended data ECC -> also address is encoded into the ECC.

Write Address Error!  are even more severe!

Lots of problems possible with command/address


GPUs 2

ActivePointers: Software Address Translation Layer on GPUs

Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit


Agile Paging: Exceeding…

Virtualization has two problem:

Nested paging-> slower walks

Shadow paging-> fast walks, slow page table updates

Use shadow paging for most of the time, and then use nested page walk during the walk.


Nested paging is better if the address space is being frequently switched! However, only a fraction of the address space is dynamic.

Only a few entries of teh page table are modified frequently!

Start page walk in shadow mode, optionally switch to nested mode.

One bit of the page table entry signifies that the lower levels require nested paging mode. (Do in nested mode!)

CASH: Supporting IaaS Customers with a Sub-core Configurable Architecture

Sharing architecture-> for each 10 million cycle exectuions, the maxima were different, when cache banks and # of slices (cores) were different.

Learning optimizer, has a feedback mechanism that goes through a Kalman filter(?)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s