Do We Really Understand the Conjugate Gradient Algorithm?

The conjugate gradient algorithm (CG) is a widely used iterative method for solving large symmetric positive definite linear systems Ax=b. Beginning in the 1980’s (and, even earlier, in the thesis of Paige dating back to 1971), a great deal of research has been aimed at explaining the behavior of the Lanczos and conjugate gradient algorithms in finite precision arithmetic. The algorithms do not behave the way exact arithmetic theory predicts, yet they are widely used and often deliver impressive results. Explanations were derived, under the assumption that the finite precision computation satisfied certain properties that could be shown to hold for particular (now standard) implementations. Names associated with this early work include Paige, Druskin and Knizhnerman, Greenbaum and Strakos.

With the advent of parallel computing, different variants of the conjugate gradient algorithm were proposed to take better advantage of parallelism; see, e.g, Chronopoulas and Gear, On the efficient implementation of preconditioned s-step conjugate gradient methods on multiprocessors with memory hierarchy, Parallel Comput. 11 (1989), pp. 37-53, and Ghysels and Vanroose, Hiding global synchronization latency in the preconditioned conjugate gradient algorithm, Parallel Comput. 40 (2014), pp. 224-238.

All of these are mathematically equivalent to the original Hestenes and Stiefel algorithm, but none have been shown to satisfy the hypotheses used in the analysis of the 1980’s. Does this mean that they perform poorly? Or does it simply mean that the assumptions of the early analysis were sufficient but not necessary conditions for good behavior? Lower precision arithmetic is now becoming more popular. What will be the effect of this on CG codes? I will discuss these issues and what insight might be gained from the analysis in order to devise stable and efficient finite precision implementations.