Rethinking the GPU Execution Model

Yunho Oh School of Electrical and Electronic Engineering at Yonsei University Thursday, November 15, 2018 @ 11:15 am BC 420 Hosted by: Prof. Babak Falsafi
Coffee, tea and croissants will be available before the talk as from 11:00 am

Abstract

Graphics processing units (GPUs) have become the architectural choice to achieve high throughput in general-purpose computing. Thread-level parallelism (TLP) in GPUs is implemented by concurrently executing a large number of threads. However, GPUs cannot often achieve the theoretical peak performance. I found that the critical performance bottlenecks on GPUs are 1) limited memory system performance and 2) limited thread scheduling resources and register file.

In this talk, I will show the GPU execution model and two above performance bottlenecks on GPUs in detail. Then, I will introduce two solutions addressing these challenges. First, I will introduce a new GPU architecture, called Adaptive PREfetching and Scheduling (APRES), that overcomes the limited memory system performance by improving cache efficiency on GPUs. Second, I will introduce another work, called FineReg, that provides a solution to schedule threads over the limits of scheduling resources and register file on GPUs.

Biography

Yunho completed his Ph.D. in the School of Electrical and Electronic Engineering at Yonsei University in August 2018. His research interests include high-performance GPU architecture, in-storage processing architecture, energy-efficient large-scaled system architectures, and database processing systems. From 2016 to 2017, Yunho worked as a visiting graduate scholar at the University of Southern California. From 2011 to 2014, he worked as a software engineer at Mobile Communications Business, Samsung Electronics.