LAP: Loop-Block Aware Inclusion Properties for Energy-Efficient Asymmetric Last Level Caches
Adaptive selection between inclusive and exclusive cache based on the LLC misses and memory traffic?
This work is of interest, and I should take a look at it.
Loop blocks are used as guides to the adaptive inclusive/exclusive cache.
Short-Circuit Dispatch: Accelerating Virtual Machine Interpreters on Embedded Processors
Scripting languages offer ease of programming and natural support for event-driven programming model. However, too slow.
Recurring inefficiency of bytecode dispatch loop.
Fetches a bytcode, and doecode, bounds checking, jump address calculation, jump, execute bytecode.
This dispatcher code takes 10~30% of total instruction counts.
There are a few problems:
- Hard to predice inderict jumps.
- Redundunt calculations
- This work solves the two problems above by using the BTB (with bytecode as key!)
Use the BTB using the bytecode as key (not PC)
- Hits short-circuit to correct bytecode hanlder
- Miss falls back to the original slow path
A Measurement Study of ARM Virtualization Performance
KVM & Xen was used to compare ARM & x86
ARM on Xen can be 4x fast for Hypercalls
4x slower on KVM
This is because Xen is a bare-metal hypervisor.
KVM is type 2 hypervisor. Runs app & virtual machine.
ARM EL2 (hypervisor privelege) is designed for simple hardware (Xen)
KVM does a lot better for Virtualized I/O – Hosted Hypervisor is running with linux in the same level that is sophisticated enough to execute I/O
Xen requires switcing from VM->Xen->Dom0->Xen->HW, Thus a lot of traps!
VHE (Virtualization host extension?) Allows KVM ARM to run on EL2 (Hypervisor priviledge)