ICANNGA'07 tutorial on
Learning from Data with Generalization as an Inverse Problem
Academy of Sciences of the Czech Republic
Generalization capability in
learning from data can be investigated in terms of regularization, which has
been used in many branches of applied mathematics to obtain stable solutions
of inverse problems, i.e., problems of finding unknown causes (such as
shapes of functions) of known consequences (such as measured data). It will
be shown that supervised learning modeled as minimization of error
functionals, the expected and the empirical one, can be reformulated in
terms of inverse problems with solutions in spaces of functions defined by
kernels. Mathematical results from theory of inverse problems applied to
construction of optimal solutions of learning tasks can be used to design
learning algorithms based on solutions of systems of linear equations.
Learning from data:
minimization of the empirical error functional defined by a sample of data,
minimization of the expected error functional defined by a probability
distribution, optimization of error functionals as best approximation, tools
from approximation theory.
philosophical concept of generalization, generalization in learning as a
stability of solutions with respect to small changes of data, penalization
of solutions with high-frequency oscillation.
well and ill-posed problems, well and ill-conditioned problems,
Moore-Penrose pseudosolution, measures of stability, regularization as
improvement of stability, properties of optimal and regularized solutions.
learning as an inverse problem: typical operators defining inverse
problems, tomography and Radon transform, operators defining inverse
problems modeling learning, characterization of optimal and regularized
solutions (the Representer Theorem).
Three reasons for using
kernels in machine learning: kernels define a class of hypothesis spaces,
where theory of inverse problems can be applied, kernels define stabilizers
penalizing various types of high-frequency oscillations, kernels define
transformations of input space geometries allowing more types of data to be
Learning algorithms based
on the Representer Theorem: neural network learning as a solution of a
system of linear equations, approximate optimization for complexity
reduction, comparison with algorithms operating on networks with smaller
number of units than the size of the sample of data.
Prof. Vĕra KýrkovŠ,
scientist, Institute of Computer Science, Academy of Sciences of the Czech
Republic, Praguesince 2002 Head of the Department of Theoretical Computer
Scienceresearch in mathematical theory of neurocomputing and learning,
nonlinear approximation and optimization published many research papers,
several chapters in books and articles in encyclopedias, coedited a book and
a proceeding published by Springer-Verlag, member of the Editorial Board of
Neural Processing Letters (Kluwer)
Selected recent publications:
V. Kurkova: Supervised learning with generalization as an inverse problem.
Logic Journal of IGPL 13: 551--559, 2005.
P. C. Kainen, V. Kurkova, M. Sanguineti: Rates of approximate minimization
of error functionals over Boolean variable-basis functions. Journal of
Mathematical Modelling and Algorithms 4: 355 -- 368, 2005.
V. Kurkova, M. Sanguineti: Learning with generalization capability by kernel
methods of bounded complexity. Journal of Complexity 21: 350--367, 2005.
V. Kurkova, M. Sanguineti: Error estimates for approximate optimization by
the extended Ritz method. SIAM Journal on Optimization 15: 461--487, 2005.
V. Kurkova: High-dimensional approximation and optimization by neural
networks. Chapter 4 in Advances in Learning Theory: Methods, Models and
Applications. (Eds. J. Suykens et al.) (pp. 69-88). Amsterdam: IOS Press,
P. C. Kainen, V. Kurkova, A. Vogt: Best approximation by linear combinations
of characteristic functions of half-spaces. Journal of Approx. Theory 122:
P. C. Kainen, V. Kurkova, M. Sanguineti: Minimization of error functionals
over neural networks. SIAM Journal on Optimization 14: 732--742, 2003.
V. Kurkova, M. Sanguineti: Comparison of worst-case errors in linear and
neural network approximation. IEEE Transactions on Information Theory: 48,
P. C. Kainen, V. Kurkova, A. Vogt: Continuity of approximation by neural
networks in -spaces. Annals of Operational Research 101: 143--147, 2001.
V. Kurkova, M. Sanguineti: Bounds on rates of variable-basis and neural
network approximation. IEEE Transactions on Information Theory 47: