foto (c) M.Iwanowski

History
Venue
Organizers
Call for papers
Presentation
Registration
Programme
Social Programme
Hotels&Tours
Sponsors
Links

 

 

ICANNGA'07 tutorial on

Integration of Statistical and Neural Approaches

to Train Classification and Prediction Algorithms

Sarunas Raudys

Institute of Mathematics and Informatics

Vilnius, Lithuania
 

Classification and prediction algorithms are among major topics in Machine Learning and Data Mining applications. Very often training of the algorithms is performed by statistical methods or by neural network based methods. In spite of similarity between both approaches there is a certain confrontation between followers of both of them. An objective of the tutorial is to explain how both approaches can be utilized simultaneously in order to use positive features of each of them, consider sample size, the algorithm complexity and performance relationships.
We explain that when training SLP under special conditions, we may obtain seven well known in statistical data analysis linear classification algorithms (Euclidean distance classifier, regularized discriminant analysis, standard Fisher classifier or |fisher with matrix pseudo-inversion, robust, minimum empirical error, and support vector (SV) classifiers) or six regressions (naive, regularized, two types of mean square, robust, minimax, or SV regressions). Evolution of the SLP during its iterative training process opens the possibility to integrate statistical and neural approaches for designing classification and prediction algorithms. For this purpose we should use all our knowledge about the real-world problem at hand, and apply one of the known statistical models to estimate the inputs’ covariance matrix, described by a small number of parameters. Then, instead of using the original covariance matrix to construct the classification or prediction algorithm directly, we decorrelate and scale the multivariate data and obtain this statistical algorithm just after the first iteration. Such an approach provides a two-fold benefit: first, it utilizes the researcher’s a priori statistical knowledge about the data, helping to reduce the generalization error, and second, it decorrelates the multivariate data and equates its variances, speeding up the iterative training process. In comparison of earlier author’s tutorials on the subject, cases of a) unequal covariance matrices and b) many pattern classes are considered.
Last part of the tutorial is devoted to explain relationships between complexity, learning set size and generalization error of statistical classifiers. In contrast to complexity and sample size relation reviews based on analysis of empirical risk minimization algorithms and classification error bounds, the present survey considers generalization errors of variety of statistical pattern classification algorithms, mostly these that could be obtained while training the non-linear single layer perceptron and multinomial classifier used when features assume categorical values (this method is strongly related with popular decision tree classifiers). Use of methods of multivariate statistical analysis to examine statistical classification rules sometimes allow obtaining exact analytical expressions for generalization error provided true models of the data are known. The survey also contains two new still unpublished results: a) unequal learning set sizes of distinct pattern classes and b) examination of loses that arise due to sample based feature selection used in order to reduce input feature space and the loses that arise due to estimation of distinct parameters of the classifiers in original high dimensional spaces. It is shown that for some non-optimal (in Bayes sense), however, popular statistical classification algorithms (the standard multivariate Gaussian density based quadratic classifier and the multinomial classifier used when features assume categorical values) for some data models expected generalization error unexpectedly starts increasing if a number of learning vectors of minority class is increasing.

CONTENTS

  1. Introduction (objectives and overview of the tutorial)

  2.  The single layer perceptron and its training algorithm

    1. Model and traditional applications of SLP.

    2. The cost function and gradient descent training.

    3. The weights’ growth during training.

    4. Parameters used to control the training process (target values, learning step, noise injection, regularization, etc.).

  3. SLP and statistical algorithms

    1. SLP as 7 statistical classifiers.

    2. SLP as 6 statistical regression algorithms.

    3. Integration of statistical and neural approaches to train classification and prediction rules. .

    4. K pattern classes cases.

    5. Software problems

  4. Practical examples

    1. Complexity and sample size relationships

    2.  Normal density based classifiers.

    3. Nonparametric (local) classifiers (Parzen window, k-NN, Multinomial, Decision tree).

    4. Effects of unequal training set sizes on generalization error

    5.  Effect of sample based feature selection/ extraction on generalization error.

  5. General conclusions and discussions with the audience.

The authors main publications on the subject

  • S. Raudys (2004). Integration of statistical and neural methods to design classifiers in case of unequal covariance matrices. Lecture Notes in Artificial intelligence, S. Biundo, T. Frühwirth, and G. Palm (Eds.): KI 2004, Springer-Verlag. Vol. 3238, pp. 270-280

  • Š. Raudys. (2001 ) Statistical and Neural Classifiers: An integrated approach to design. Springer. London. 312 pages.

  • Š. Raudys, A. Saudargienė (2001). Tree type dependency model and sample size - dimensionality properties. IEEE Trans. on Pattern Analysis and Machine Intell., 23, 233-239.

  • M. Skurichina, Š. Raudys, RPW. Duin (2000). K-nearest neighbors directed noise injection in multilayer perceptron training, IEEE Trans. on Neural Networks 11:504-511.

  • Š. Raudys (2000). How good are support vector machines? Neural Networks 13:9-11.

  • Š. Raudys (2000). Evolution and generalization of a single neurone. III. Primitive, regularized, standard, robust and minimax regressions. Neural Networks 13 (3/4):507-523.

  • Š .Raudys (1998). Evolution and generalization of a single neurone. I. SLP as seven statistical classifiers. Neural Networks 11( 2):283-296.

  • S. Raudys and D.Young (2004). Results in statistical discriminant analysis: A review of the former Soviet Union literature, Journal of Multivariate Analysis. 89, 1-35.

  • Š. Raudys(1998). Evolution and generalization of a single Neurone. II. Complexity of statistical classifiers and sample size considerations. Neural Networks 11(2):297-313.

  • S.Raudys and A.K.Jain (1991). Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI -13 (3), 252-264.

  • S. Raudys and V.Pikelis (1980). On dimensionality, sample size, classification error and complexity of classification algorithm in pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-2 (3), 1980, 242-252.

  • S. Raudys (1976). Limitation of Sample Size in Classification Problems. Inst. of Physics and Math. Press, Vilnius, (a monograph, 186 pages, in Russian).

  • S. Raudys (1972). On the amount of a priori information in designing the classification algorithms. Proeedings of Academy of Sciences of the USSR, Technical Cybernetics, No. 4, 168-174 (in Russian).

  • S. Raudys (1967). On determining training sample size of linear classifier. Computing Systems, Novosibirsk, Nauka, 28, 79-87 (in Russian).

  • 3 new papers, "Generalization Error of Multinomial Classifier", "Feature Over-Selection", accepted to SPR+SSPR Conf., Hong-Kong, August, 2006, and “A Pool of Classifiers by SLP: A multi-class case” accepted to ICIAR 2006 Conf. (Sept, 2006) in Portugal.

Expected audience: tutorial is targeted to graduate students, research workers and practitioners in data mining, machine learning, pattern recognition, artificial neural networks, bioinformatics and related areas.
Background knowledge expected of the participants: no special theoretical background beyond the knowledge of elements of probability and statistics, linear algebra, and calculus at the master degree level is required. Sketchy knowledge of main statistical and neural network based classification and prediction methods would be useful.



Prof. Sarunas Raudys is a Head of Data analysis department at Institute of Mathematics and Informatics (Vilnius) who started his research in Statistical pattern recognition and Multivariate statistical analysis, then moved to Artificial neural networks and presently applies his knowledge in Artificial intelligence and Artificial life disciplines. All the time, he was solving practical data mining tasks for a great number of researchers and practitioners in diverse activity areas. He teaches (or was teaching) Artificial Neural Networks, and Data mining courses for master degree students in various Lithuanian Universities in Vilnius, Kaunas and Klaipeda. He was invited speaker in 30 important international conferences, including 4 h tutorial at 15th Int. Pattern Recognition Conference in Barcelona, 2000. During the last few years he have presented cycles of lectures (tutorials) on integration of statistical and neural approaches in a number of Universities of Italy, Spain, The Netherlands, Malaysia, Japan, USA and Canada. He was working as visiting scientists in Michigan State University, 1989 and 1990, Baylor Univ., Waco, TX, USA, 1990, Delft University of Technology, 1991, Energy Research Centre Netherlands, 1992, University Paris 6, 1993 and 1994 , Bosporus University, Istanbul, 1995, Interdisciplinary Research Centre RIKEN, Tokyo, 1996, Ford motors Scientific Research Laboratories, Detroit, USA, 1999 , National University of Malaysia, 2004, Institute of Bio-diagnostics, Winnipeg, Canada, 2004 . He is a member of Editorial boards of international journals: Informatica (Vilnius), Pattern Recognition and Image Processing (Moscow) Pattern Recognition (Washington, DC), Pattern Analysis and Applications (London) and others. He published 2 books and over 150 research papers in various scientific journals and Conference proceedings. A number of citations in ISI Web of Science data base exceed 600.
 

Back

 

web-site:  http://icannga07.ee.pw.edu.pl            e-mail: icannga07@ee.pw.edu.pl
(c) 2005,06,07  Institute of Control and Industrial Electronics, Warsaw University of Technology, last modification: 16.02.2007