Papers
A. R. Brodtkorb, C. Dyken, T. R. Hagen, J. M. Hjelmervik and O. O. Storaasli, State-of-the-Art in Heterogeneous Computing,
Accepted for publication in Journal of Scientific Programming.
[Bibtex]
[Draft (PDF)]
Abstract: Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to
traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained
parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a
good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing,
focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and
field programmable gate arrays (FPGAs). We present a review of hardware, available software tools, and an overview of state-ofthe-
art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give
our view on the future of heterogeneous computing.

A. R. Brodtkorb, An Asynchronous API for Numerical Linear Algebra, Scalable Computing: Practice and Experience, special issue on Recent Developments in Multi-Core Computing Systems, 9(3) (2008), pp. 153--163.
[BibTeX]
[Paper (on SCPE)]
Abstract: We present a task-parallel asynchronous API for numerical linear algebra that utilizes multiple CPUs, multiple GPUs, or a combination of both. Furthermore, we present a wrapper of this interface for use in MATLAB. Our API imposes only small overheads, scales perfectly to two processor cores, and shows even better performance when utilizing computational resources on the GPU.
A. R. Brodtkorb and T. R. Hagen,
A Comparison of Three Commodity-Level Parallel Architectures: Multi-core CPU, the Cell BE and the GPU, Seventh International Conference on Mathematical Methods for Curves and Surfaces, 2008
[Bibtex]
[Draft (PDF)]
Abstract: We explore three commodity parallel architectures: multi-core CPUs, the Cell BE processor, and graphics processing units. We have implemented four algorithms on these three architectures: solving the heat equation, inpainting using the heat equation, computing the Mandelbrot set, and MJPEG movie compression. We use these four algorithms to exemplify the benefits and drawbacks of each parallel architecture.
A. R. Brodtkorb, The Graphics Processor as a Mathematical Coprocessor in MATLAB, CISIS 2008 The Second International Conference on Complex, Intelligent and Software Intensive Systems, pages 822-827, March 2008.
[BibTeX]
[Paper (DOI link)]
[Draft (PDF)]
Abstract:
We present an interface to the graphics processing unit (GPU) from MATLAB, and four algorithms from numerical linear algebra available through this interface; matrix-matrix multiplication, Gauss-Jordan elimination, PLU factorization, and tridiagonal Gaussian elimination. In addition to being a highlevel abstraction to the GPU, the interface offers background processing, enabling computations to be executed on the CPU simultaneously. The algorithms are shown to be up-to 31 times faster than highly optimized CPU code. The algorithms have only been tested on single precision hardware, but will easily run on new double precision hardware.
A. R. Brodtkorb, A MATLAB Interface to the GPU, Master’s thesis,
Department of Informatics, Faculty of Mathematics and Natural Sciences, University of Oslo, May 2007.
[BibTeX]
[Thesis (on DUO)]
[Thesis (PDF)]
Abstract: This thesis delves into the field of general purpose computation on graphics processing units (GPGPU). A MATLAB interface for solving numerical linear algebra on the graphics processing unit (GPU), and three algorithms from numerical linear algebra are presented. The algorithms are shown to be faster than the highly efficient ATLAS implementations used in MATLAB. In addition, the interface allows background processing on the GPU, enabling it to be used as a mathematical coprocessor. The computations are shown to be sufficiently accurate, and solving the shallow water equations implicitly is shown where both the CPU and the GPU are both utilized for maximumperformance. A comparison of the interface and other high-level languages for GPGPU is also presented.
A. R. Brodtkorb, T. Fladby and M. L. Sćtra, PLU factorization on a Cluster of GPUs Using Fast Ethernet, White paper, 2007.
[BibTeX]
[Paper (PDF)]
Abstract: In this white paper, we present a novel approach to solve linear systems of equations on a cluster using the PLU factorization. We use the graphics processing unit (GPU) as the main computational engine at each node, and a block-cyclic data distribution to solve the system. The local computation is a new way of solving the PLU factorization on the GPU. It utilizes the full four-way vectorized arithmetic found in most GPUs, and a new pivoting strategy. The global algorithm uses the message passing interface (MPI) for communication between nodes. We show that our algorithm is highly efficient on the local nodes, but bounded by the relatively slow network. A faster network will eliminate this bottleneck, and the speed of the local computations show promising results.
A. R. Brodtkorb, Matrix-Matrix Multiplication in MATLAB using the GPU, White paper, 2006.
[BibTeX]
[Paper (PDF)]
Abstract: The use of GPU's as the main computing resource has yielded great speed-up factors in several fields including solving differential equations, linear algebra, signal processing and database queries. There have been several attempts at implementing efficient algorithms for matrix-matrix products with varying results. In-depth analysis of the algorithms has been presented as well. In this paper I review the work done in the field, and present a crude implementation of matrix-matrix products using the GPU. The implementation is run in Matlab.
Posters
A. R. Brodtkorb, T. R. Hagen, K.-A. Lie, and J. R. Natvig
Efficient GPU-based Algorithms for Solving Systems of Conservation Laws., VERDIKT Program Conference, November 2009.
A. R. Brodtkorb,
Efficiency of Commodity-Level Parallel Architectures, VERDIKT Program Conference, October 2008.
A. R. Brodtkorb, A MATLAB Interface to the GPU, Poster, VERDIKT Program Conference, October 2007.