DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric
Low-power High density reconfigurable RAM.
Low power & high capacity FPGA is desirable, use DRAMs instead of SRAM lookup table to provide low power & high capacity
Challenges of building DRAM-based FPGA
LUT is slower thanSRAM, destructive access (data lost after access)
Narrower MAT (DRAM array) from 1K -> 8~16 bits,
Destructive DRAM read is solved by PRE(charge) ACT(ivation) RST(restore)
followed by a wire transfer. These are sequential, but can be overlapped.
RST can be overlapped by wire transfer, and, etc.
Mellow Writes: Extending Lifetime in Resistive Memories through Selective Slow Write Backs
Resistive Memory Write:
Slower writes have endurance benefits (knife sharpening example)
Adaptively use write speed choosing.
Make use of the idle memory(?) to slowly writeback
Bank-aware mellow writes: Choose banks with less blocks to writeback in the wb queue of memory controller. FOr those relatively free banks, issue slow writes.
Performance degradation is not noticeable, endurance improved by 87%
Eager Mellow Writes:
Predict that LLC dirty lines will not be dirtied again, and so writeback slowly to ReRAM
Does some epoch counting? to find cachelines…
Add a eager mellow write queue, lowest priority but uses memory bandwidth to writeback. Eagerly.
Eager writeback also improves performance as it reduces write queue congestion!!
Also employs lifetiem quota, where a lifetime is enforced.
More energy is used to write slower write!
MITTS: Memory Inter-arrival Time Traffic Shaping
CPU initiated memory bandwidth provisioning.
IaaS can charge the users on memory bandwidth usage (and arrival time)
HW mechanism to provision memory BW (Bulky vs. Bursty bandwidth)
Relative memory inter-arrival time, make into a histogram.
Credits per interarrival time in bins. Thus if you use all your credit, you need to wait, and use the next inter-arrival time credit bin
Array of registers that represent credits in each bin. Also, replenisher to fill the bins
All-Inclusive ECC: Thorough End-to-End Protection for Reliable Computer Memory
Command/Address trnasfer failed to speed up with newer DRAM specs.
CA more prone to erros than data! Due to DDR DIMM technology
CA-parity was introduced in DDR4.
Read address error1 (Read wrong codeword! Data & Data ECC are vailid within themselves, but wrong codeword!!)
Extended data ECC -> also address is encoded into the ECC.
Write Address Error! are even more severe!
Lots of problems possible with command/address
ActivePointers: Software Address Translation Layer on GPUs
Virtual Thread: Maximizing Thread-Level Parallelism beyond GPU Scheduling Limit
Agile Paging: Exceeding…
Virtualization has two problem:
Nested paging-> slower walks
Shadow paging-> fast walks, slow page table updates
Use shadow paging for most of the time, and then use nested page walk during the walk.
Nested paging is better if the address space is being frequently switched! However, only a fraction of the address space is dynamic.
Only a few entries of teh page table are modified frequently!
One bit of the page table entry signifies that the lower levels require nested paging mode. (Do in nested mode!)
CASH: Supporting IaaS Customers with a Sub-core Configurable Architecture
Sharing architecture-> for each 10 million cycle exectuions, the maxima were different, when cache banks and # of slices (cores) were different.
Learning optimizer, has a feedback mechanism that goes through a Kalman filter(?)