Many-core Algorithms for High-order Finite Element Methods: When Time to Solution Matters

SCAIM Seminar
September 12, 2013 7:30 pm

Speaker:  Tim Warburton, Department of Computational and Applied Mathematics, Rice University

Location:  ESB 4133

Intended Audience:  Public

The ultimate success of many modeling applications depends on time to solution. I will illustrate the critical nature of time to solution by describing a joint project between my group at Rice University and Dr David Fuentes at the MD Anderson Cancer Center. The project goal is to evaluate the role and viability of using finite element modeling as part of the treatment planning process for MR Guided Laser Induced Thermal Therapy. The success of this project will depend in great part on the ability to model individual treatments with calculations that take mere seconds.

Modern many-core processing units, including graphics processing units (GPU), presage a new era in on-chip massively parallel computing. The advent of processors with O(1000) floating point units (FPU) raises new issues challenging conventional measures of “optimality” of numerical methods. The ramp up in FPU counts for each new generation of GPU over the past four years has been accompanied by a slower increase in the the memory capacity of the GPU. For example, a few hundred US dollars currently buys a parallel computer that is capable of performing O(4 · 1012) floating point operations per second, but only of reading O(5 · 1010) values from memory per second. From the point of view of numerical analysis, this means that the traditional approach of comparing optimality of alternative numerical methods based on their floating point operation count per degree of freedom has become mostly irrelevant. Claims of optimality derived from this measure therefore need to be reevaluated and the formulation of numerical methods in general need to be revisited given the changing computational landscape.

In 2009 we demonstrated that the nodal discontinuous Galerkin time-domain method for computational electrogmagnetics can achieve a high percentage of peak performance of NVIDIA GPUs [1]. In subsequent articles we demonstrated that it is possible to improve time-stepping efficiency of these methods using local time-stepping methods on GPUs [2, 3]. We also demonstrated scalability when using multiple GPUs in a workstation [4] and on over 400 GPUs in a larger scale cluster [5]. Finally we also created a new variant of the discontinuous Galerkin methods that enables the use of curvilinear elements to accurately model non-planar domain boundaries within the restricted memory budget of current GPUs [6]. The most challenging aspect of this effort is not in fact the implementation but rather the analysis of the new method [7, 8].

The presentation will touch on several important and inter-linked issues that impacted the development of high-order finite-element methods including spectral element and discontinuous Galerkin based solvers for a moving many-core architecture target. We will discuss on-chip scalability, multi- GPU scalability, inter-generational GPU scaling, specialization for different element types and how we modified the solver memory requirements. Finally we will discuss programming tools that we are currently developing to enhance the productivity of programmers engaged in this type of implementation task.

References

[1]  A. Klöckner, T. Warburton, J. Bridge, and J.S. Hesthaven. Nodal discontinuous Galerkin methods on graphics processors. J. Comp. Phys., 228:7863-7882, 2009.

[2]  Nico Gödel, Steffen Schomann, Tim Warburton, and Markus Clemens, GPU Accelerated Adams- Bashforth Multirate Discontinuous Galerkin Simulation of High Frequency Electro- magnetic Fields, IEEE Transactions on Magnetics, 46:8, Pages 2735-2738, 2010.

[3]  S. Schomann, N. Gödel, T. Warburton, and M. Clemens, Local Time-stepping Techniques Using Taylor Expansion for Modeling Wave Propagation with Discontinuous Galerkin FEM, IEEE Transactions on Magnetics, 46:8, Pages 3504-3507, 2010.

[4]  N. Gödel, N. Nunn, T. Warburton, and M. Clemens, Accelerating Multi GPU Based Discontinuous Galerkin FEM Computations for Electromagnetic Radio Frequency Problems, Applied Computational Electromagnetic Society Journal, 25:4 pp. 331-338, 2010.

[5]  Carsten Burstedde, Omar Ghattas, Michael Gurnis, Tobin Isaac, Andreas Klöckner, Georg Stadler, Tim Warburton, and Lucas C. Wilcox, Extreme-Scale AMR. ACM IEEE SC Conference Series, 2010.

[6]  T. Warburton, A low storage curvilinear discontinuous galerkin time-domain method for electro- magnetics, 2010 URSI International Symposium on Electromagnetic Theory (EMTS), pp. 996Ð999, IEEE, 2010.

[7]  T. Warburton, Aspects of the a priori convergence analysis for the low storage curvilinear discontinuous Galerkin method, extended abstract submitted to Oberwolfach Workshop on the Theory and Applications of Discontinuous Galerkin Methods, MFO Technical Report 1208a, 2012.

[8]  T. Warburton,A Low Storage Curvilinear Discontinuous Galerkin Method for Wave Problems, SIAM Journal on Scientifc Computing, Volume 35, Number 4, pages A1987-A2012, 2013.