Parallel memory prediction for fused linear algebra kernels | CU Experts

Overview

abstract

The performance of many scientific programs is limited by data movement. Loop fusion is one optimization used to increase the speed of memory bound operations. To automate loop fusion for matrix computations, we developed the Build to Order (BTO) compiler. Within BTO, an analytic memory model efficiently and accurately reduces the number of serial loop fusion options considered. In this paper, we extend the model to shared memory parallel machines. We detail the differences between parallel and serial memory use and runtime prediction and explain the changes made to include parallel machines in the model. Analysis of the parallel model's predictions show that when it is included in BTO it will reduce the search space of considered routines.

CU Boulder Authors

Jessup, Elizabeth R

publication date

March 29, 2011

has restriction

closed

Date in CU Experts

October 17, 2013 10:46 AM

Full Author List

Karlin I; Jessup E; Belter G; Siek JG

author count

4

published in

Performance Evaluation Review Journal

Other Profiles

International Standard Serial Number (ISSN)

0163-5999

Digital Object Identifier (DOI)

https://doi.org/10.1145/1964218.1964226

Additional Document Info

start page

43

end page

49

volume

38

issue

4

VIVO

Parallel memory prediction for fused linear algebra kernels Journal Article

Overview

abstract

CU Boulder Authors

publication date

has restriction

Date in CU Experts

Full Author List

author count

published in

Other Profiles

International Standard Serial Number (ISSN)

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue