You are here: Home Research Units RU-C: Extreme Scale Computing (1) Numerical algorithms

(1) Numerical algorithms

[PIs: Griebel, Klein, Lippert, Luu, Meißner, Mukherjee, Neese, Reuter, Schweitzer, Urbach]

With the advent of exascale systems in the years to come there is substantial need for the design of  improved or completely new parallel numerical algorithms. Conventional parallelization techniques can no longer just simply be scaled up on such future compute systems to achieve fast execution times. Either existing approaches must be substantially modified or completely new parallelization paradigms must be developed to cope with the issues of memory access times, fault tolerance and resilience. Also, since the cost of data movement will not improve as much as the cost of floating point operations, exascale computing will on top of the aforementioned issues require the development of new algorithmic approaches which minimize the number and volume of data movements - even at the expense of an increased operation count. Moreover, it is to be expected that there will be a substantial relative reduction  in memory per processor in exascale systems, which renders many parallel scaling approaches useless. Within the cluster we will address these algorithmic challenges for a number of specific application algorithms. To this end, we will consider lattice QCD algorithms on the one hand, but also so-called multigrid or multilevel techniques for the efficient solution of large sparse linear systems for lattice simulations in physics, for the DFT in computational chemistry and for partial differential equation solutions in the life sciences on the other hand. First, besides fault tolerant and resilience-aware Monte Carlo (MC) and Quasi MC methods, we will develop parallel sparse grid methods for lattice QCD and path integration for exascale systems. Exploiting the U(3) and SU(3) structures of many lattice problems will allow to obtain exponential convergence rates, provided that certain additional prerequisites like e.g. small correlation length or analytic smoothness are present. Moreover, for such high dimensional integration problems, the so-called combination technique, which is a certain variant of the sparse grid method, introduces a second, numerically decoupled level of parallelism that ensures high scalability beyond conventional domain decomposition. This way it becomes indeed possible to deal with faults in an algorithm-based way without the need for expensive checkpoint-restart. This allows to extend fault tolerance for higher-dimensional problems to all levels of parallelization and to the detection and treatment even of silent failures due to data corruption. Second, for the fast solution of large sparse linear systems we will address multigrid and multilevel methods, which can provide scalable optimal complexity solvers in the first place. Again, new parallelization techniques will be developed and studied which take the issues of memory access times, fault tolerance, resilience and data movement minimization for  multilevel and (algebraic) multigrid methods into account. An important contribution will be here the view of a multilevel method as additive subspace iteration over a generating system. This way, even if some update calculations will be missing or will be corrupted due to faults in the exascale hardware system, the  overall iterative parallel solver can still run, albeit with a slightly reduced convergence rate.

Document Actions