Coding for Reliable Computing - Institute of Applied Mathematics

Abstract:

Coding theory has been associated mainly with maintaining reliability in communication and storage systems. Yet work published already in the 1950s by Von Neumann, Moore, and Shannon also considered the use of error handling techniques to maintain reliability of computation devices. Still, until very recently, the prevailing paradigm of designing computing devices appeared to put all of the effort of the reliability guarantee on the hardware design. Distributed computing, as well as the introduction of new nanoscale accelerator devices, are two examples of recent developments that have revived the interest in coding for reliable computation.

In this talk, we focus on accelerators that perform vector–matrix multiplication. We describe code designs that guarantee computation reliability under two paradigms: exact computation, which is needed when accelerating ordinary ALU computations, and approximate computation, which suits learning applications. In the latter, we view the computation to be over the real field, and introduce new tools for analyzing and synthesizing coding schemes for this model. In particular, we show connections between the code design problem and certain extremal problems of convex polygons