Different from most deep learning settings, we focus on the situation where we have only one or a few large-scale examples. We also assume that only partial labels are available. Examples include video segmentation and segmentation of 3D geophysical imaging results where the ground-truth labels are available in a small number of boreholes.
First, we discuss how we can use partial/projected loss-functions and regularization of the neural-network output to train on partial examples and mitigate low prediction quality away from labeled pixels/voxels. Second, we present a fully reversible network that enables training on large-scale data. We induce and exploit the reversibility of networks based on certain partial-differential-equations. As a result, no storage is required for all the network states to compute a gradient, and the memory requirements for training are independent of network depth. A fully reversible network can train directly from video-to-video without resorting to slice-by-slice based methods and therefore simplifies previous approaches.
Throughout the talk, we highlight similarities and differences with well-known computational techniques from PDE-constrained optimization.
Joint work with Eldad Haber, Justin Granek, and Keegan Lensink.