ICANNGA'07 tutorial on
Integration of Statistical and Neural
to Train Classification and Prediction Algorithms
Institute of Mathematics and
Classification and prediction algorithms are
among major topics in Machine Learning and Data Mining applications. Very
often training of the algorithms is performed by statistical methods or by
neural network based methods. In spite of similarity between both approaches
there is a certain confrontation between followers of both of them. An
objective of the tutorial is to explain how both approaches can be utilized
simultaneously in order to use positive features of each of them, consider
sample size, the algorithm complexity and performance relationships.
We explain that when training SLP under special conditions, we may obtain
seven well known in statistical data analysis linear classification
algorithms (Euclidean distance classifier, regularized discriminant analysis,
standard Fisher classifier or |fisher with matrix pseudo-inversion, robust,
minimum empirical error, and support vector (SV) classifiers) or six
regressions (naive, regularized, two types of mean square, robust, minimax,
or SV regressions). Evolution of the SLP during its iterative training
process opens the possibility to integrate statistical and neural approaches
for designing classification and prediction algorithms. For this purpose we
should use all our knowledge about the real-world problem at hand, and apply
one of the known statistical models to estimate the inputs’ covariance
matrix, described by a small number of parameters. Then, instead of using
the original covariance matrix to construct the classification or prediction
algorithm directly, we decorrelate and scale the multivariate data and
obtain this statistical algorithm just after the first iteration. Such an
approach provides a two-fold benefit: first, it utilizes the researcher’s a
priori statistical knowledge about the data, helping to reduce the
generalization error, and second, it decorrelates the multivariate data and
equates its variances, speeding up the iterative training process. In
comparison of earlier author’s tutorials on the subject, cases of a) unequal
covariance matrices and b) many pattern classes are considered.
Last part of the tutorial is devoted to explain relationships between
complexity, learning set size and generalization error of statistical
classifiers. In contrast to complexity and sample size relation reviews
based on analysis of empirical risk minimization algorithms and
classification error bounds, the present survey considers generalization
errors of variety of statistical pattern classification algorithms, mostly
these that could be obtained while training the non-linear single layer
perceptron and multinomial classifier used when features assume categorical
values (this method is strongly related with popular decision tree
classifiers). Use of methods of multivariate statistical analysis to examine
statistical classification rules sometimes allow obtaining exact analytical
expressions for generalization error provided true models of the data are
known. The survey also contains two new still unpublished results: a)
unequal learning set sizes of distinct pattern classes and b) examination of
loses that arise due to sample based feature selection used in order to
reduce input feature space and the loses that arise due to estimation of
distinct parameters of the classifiers in original high dimensional spaces.
It is shown that for some non-optimal (in Bayes sense), however, popular
statistical classification algorithms (the standard multivariate Gaussian
density based quadratic classifier and the multinomial classifier used when
features assume categorical values) for some data models expected
generalization error unexpectedly starts increasing if a number of learning
vectors of minority class is increasing.
Introduction (objectives and overview of the
The single layer perceptron and its
Model and traditional applications of SLP.
The cost function and gradient descent
The weights’ growth during training.
Parameters used to control the training
process (target values, learning step, noise injection, regularization,
SLP and statistical algorithms
SLP as 7 statistical classifiers.
SLP as 6 statistical regression algorithms.
Integration of statistical and neural
approaches to train classification and prediction rules. .
K pattern classes cases.
Complexity and sample size relationships
Normal density based classifiers.
Nonparametric (local) classifiers (Parzen
window, k-NN, Multinomial, Decision tree).
Effects of unequal training set sizes on
Effect of sample based feature
selection/ extraction on generalization error.
General conclusions and discussions with the
The authors main publications on the subject
S. Raudys (2004). Integration of statistical
and neural methods to design classifiers in case of unequal covariance
matrices. Lecture Notes in Artificial intelligence, S. Biundo, T.
Frühwirth, and G. Palm (Eds.): KI 2004, Springer-Verlag. Vol. 3238, pp.
Š. Raudys. (2001 ) Statistical and Neural
Classifiers: An integrated approach to design. Springer. London. 312 pages.
Š. Raudys, A. Saudargienė (2001). Tree type
dependency model and sample size - dimensionality properties. IEEE Trans.
on Pattern Analysis and Machine Intell., 23, 233-239.
M. Skurichina, Š. Raudys, RPW. Duin (2000).
K-nearest neighbors directed noise injection in multilayer perceptron
training, IEEE Trans. on Neural Networks 11:504-511.
Š. Raudys (2000). How good are support vector
machines? Neural Networks 13:9-11.
Š. Raudys (2000). Evolution and generalization
of a single neurone. III. Primitive, regularized, standard, robust and
minimax regressions. Neural Networks 13 (3/4):507-523.
Š .Raudys (1998). Evolution and generalization
of a single neurone. I. SLP as seven statistical classifiers. Neural
Networks 11( 2):283-296.
S. Raudys and D.Young (2004). Results in
statistical discriminant analysis: A review of the former Soviet Union
literature, Journal of Multivariate Analysis. 89, 1-35.
Š. Raudys(1998). Evolution and generalization
of a single Neurone. II. Complexity of statistical classifiers and sample
size considerations. Neural Networks 11(2):297-313.
S.Raudys and A.K.Jain (1991). Small sample
size effects in statistical pattern recognition: recommendations for
practitioners. IEEE Transactions on Pattern Analysis and Machine
Intelligence, PAMI -13 (3), 252-264.
S. Raudys and V.Pikelis (1980). On
dimensionality, sample size, classification error and complexity of
classification algorithm in pattern recognition. IEEE Transactions on
Pattern Analysis and Machine Intelligence, PAMI-2 (3), 1980, 242-252.
S. Raudys (1976). Limitation of Sample Size in
Classification Problems. Inst. of Physics and Math. Press, Vilnius, (a
monograph, 186 pages, in Russian).
S. Raudys (1972). On the amount of a priori
information in designing the classification algorithms. Proeedings of
Academy of Sciences of the USSR, Technical Cybernetics, No. 4, 168-174 (in
S. Raudys (1967). On determining training
sample size of linear classifier. Computing Systems, Novosibirsk, Nauka,
28, 79-87 (in Russian).
3 new papers, "Generalization Error of
Multinomial Classifier", "Feature Over-Selection", accepted to SPR+SSPR
Conf., Hong-Kong, August, 2006, and “A Pool of Classifiers by SLP: A
multi-class case” accepted to ICIAR 2006 Conf. (Sept, 2006) in Portugal.
Expected audience: tutorial is targeted to
graduate students, research workers and practitioners in data mining,
machine learning, pattern recognition, artificial neural networks,
bioinformatics and related areas.
Background knowledge expected of the participants: no special theoretical
background beyond the knowledge of elements of probability and statistics,
linear algebra, and calculus at the master degree level is required. Sketchy
knowledge of main statistical and neural network based classification and
prediction methods would be useful.
Prof. Sarunas Raudys
is a Head of Data analysis department at Institute of Mathematics and
Informatics (Vilnius) who started his research in Statistical pattern
recognition and Multivariate statistical analysis, then moved to Artificial
neural networks and presently applies his knowledge in Artificial
intelligence and Artificial life disciplines. All the time, he was solving
practical data mining tasks for a great number of researchers and
practitioners in diverse activity areas. He teaches (or was teaching)
Artificial Neural Networks, and Data mining courses for master degree
students in various Lithuanian Universities in Vilnius, Kaunas and Klaipeda.
He was invited speaker in 30 important international conferences, including
4 h tutorial at 15th Int. Pattern Recognition Conference in Barcelona, 2000.
During the last few years he have presented cycles of lectures (tutorials)
on integration of statistical and neural approaches in a number of
Universities of Italy, Spain, The Netherlands, Malaysia, Japan, USA and
Canada. He was working as visiting scientists in Michigan State University,
1989 and 1990, Baylor Univ., Waco, TX, USA, 1990, Delft University of
Technology, 1991, Energy Research Centre Netherlands, 1992, University Paris
6, 1993 and 1994 , Bosporus University, Istanbul, 1995, Interdisciplinary
Research Centre RIKEN, Tokyo, 1996, Ford motors Scientific Research
Laboratories, Detroit, USA, 1999 , National University of Malaysia, 2004,
Institute of Bio-diagnostics, Winnipeg, Canada, 2004 . He is a member of
Editorial boards of international journals: Informatica (Vilnius), Pattern
Recognition and Image Processing (Moscow) Pattern Recognition (Washington,
DC), Pattern Analysis and Applications (London) and others. He published 2
books and over 150 research papers in various scientific journals and
Conference proceedings. A number of citations in ISI Web of Science data
base exceed 600.