Systems Seminar

EPFL IC Systems Seminar

Rethinking the GPU Execution Model



Abstract

Graphics processing units (GPUs) have become the architectural choice to achieve high throughput in general-purpose computing. Thread-level parallelism (TLP) in GPUs is implemented by concurrently executing a large number of threads. However, GPUs cannot often achieve the theoretical peak performance. I found that the critical performance bottlenecks on GPUs are 1) limited memory system performance and 2) limited thread scheduling resources and register file.

In this talk, I will show the GPU execution model and two above performance bottlenecks on GPUs in detail. Then, I will introduce two solutions addressing these challenges. First, I will introduce a new GPU architecture, called Adaptive PREfetching and Scheduling (APRES), that overcomes the limited memory system performance by improving cache efficiency on GPUs. Second, I will introduce another work, called FineReg, that provides a solution to schedule threads over the limits of scheduling resources and register file on GPUs.

Biography

Yunho completed his Ph.D. in the School of Electrical and Electronic Engineering at Yonsei University in August 2018. His research interests include high-performance GPU architecture, in-storage processing architecture, energy-efficient large-scaled system architectures, and database processing systems. From 2016 to 2017, Yunho worked as a visiting graduate scholar at the University of Southern California. From 2011 to 2014, he worked as a software engineer at Mobile Communications Business, Samsung Electronics.