Electrical Engineering and Computer Science


Defense Event & Computer Engineering Seminar

Physically Dense Server Architectures

Anthony Thomas Gutierrez


 
Tuesday, January 20, 2015
11:00am - 1:00pm
3725 Beyster Bldg.

Add to Google Calendar

About the Event

Distributed, in-memory key-value stores have emerged as one of today’s most important data center workloads. Being critical for the scalability of modern web services, vast resources are dedicated solely to key-value stores in order to ensure that quality of service guarantees are met. These resources include: many server racks to store terabytes—possibly petabytes—of key-value data, the power necessary to run all of the machines, networking equipment and bandwidth, and the data center warehouses used to house the racks. There is, however, a mismatch between the key-value store software and the commodity servers on which it is run, leading to inefficient use of resources. The primary cause of this inefficiency is the overhead incurred from processing individual network packets, which typically carry small payloads of less than a few kilobytes, and require minimal compute resources. Thus, one of the key challenges as we enter the peta-scale era is how to best adjust to the paradigm shift from compute-centric data centers, to storage- centric data centers. This dissertation presents a hardware/software solution that addresses the in- efficiency issues present in the modern data centers on which key-value stores are currently deployed. First, it proposes two physical server designs, both of which use 3D- stacking technology and low-power CPUs to improve density and efficiency. The first 3D architecture—Mercury—consists of stacks of low-power CPUs with 3D- stacked DRAM, as well as NICs. The second architecture—Iridium—replaces DRAM with 3D NAND Flash to improve density. The second portion of this dissertation proposes and enhanced version of the Mercury server design—called KeyVault—that incorporates integrated, zero-copy net- work interfaces along with an integrated switching fabric. In order to fully utilize the integrated networking hardware, as well as reduce the response time of requests, a custom networking protocol is proposed. Unlike prior works on accelerating key-value stores—e.g., by completely bypassing the CPU and OS when processing requests—this work only bypasses the CPU and OS when placing network payloads into a process’ memory. The insight behind this is that because most of the overhead comes from processing packets in the OS kernel—and not the request processing itself—direct placement of packet’s payload is sufficient to provide higher throughput and lower latency than prior approaches. The need for complex hardware or software is also eliminated.

Additional Information

Sponsor(s): Professor Trevor N. Mudge

Open to: Public