We present a framework based on bilevel opti- mization for learning multilayer, deep data rep- resentations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer, a fidelity term and a linear function depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the lin- ear functions to yield a linearly separable final representation. We show that, by choosing the fi- delity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feedforward neural network. Instead, by elaborating on Breg- man distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experi- ments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.
Dettaglio pubblicazione
2022, Proceedings of the 39th International Conference on Machine Learning, Pages - (volume: 162)
Bregman Neural Networks (04b Atto di convegno in volume)
Frecon Jordan, Gasso Gilles, Pontil Massimiliano, Salzo Saverio
Gruppo di ricerca: Continuous Optimization
keywords