The Maximum Entropy on the Mean Method for Linear Inverse Problems (and Beyond)

The principle of maximum entropy states that the probability distribution that best represents the current state of knowledge about a system is the one with largest entropy with respect to a given prior (data) distribution. It was first formulated in the context of statistical physics in two seminal papers by E. T. Jaynes (Physical Review, Series II. 1957), and thus constitutes an information-theoretic manifestation of Occam’s razor. We bring the idea of maximum entropy to bear in the context of linear inverse problems: we solve for the probability measure that is close to the (learned or chosen) prior and whose expectation has small residual with respect to the observation. Duality leads to tractable, finite-dimensional (dual) problems. A core tool, which we then show to be useful beyond the linear inverse problem setting, is the “MEMM functional”: it is an infimal projection of the Kullback-Leibler divergence and a linear equation, which coincides with Cramer’s function (ubiquitous in the theory of large deviations) in most cases, and is paired in duality with the cumulant generating function of the prior measure. Numerical examples underline the efficacy of the presented framework.

This is joint work with Yakov Vaisbourd (McGill), Rustum Choksi (McGill), Ariel Goodwin (McGill), and Carola-Bibiane Schönlieb (Cambridge).