NOTE: This seminar will be on a TUESDAY rather than the usual Monday.
The choice of the nonlinear activation function in deep learning architectures is crucial and heavily impacts the performance of a neural network. We introduce rational neural networks, which are neural networks with trainable rational activation functions. A composition of low-degree rational functions has a good approximation power while maintaining a relatively small number of trainable parameters. Hence, we show that rational neural networks require fewer nodes and exponentially smaller depth than ReLU networks to approximate smooth functions to within a certain accuracy. This improved approximation power has practical consequences for large neural networks, given that a deep neural network is computationally expensive to train due to expensive radient evaluations and slower convergence.
Refreshments will be served before the talk, starting at 2:45.