Hostname: page-component-586b7cd67f-l7hp2 Total loading time: 0 Render date: 2024-11-25T18:56:44.846Z Has data issue: false hasContentIssue false

Aspects of the numerical analysis of neural networks

Published online by Cambridge University Press:  07 November 2008

S.W. Ellacott
Affiliation:
Department of Mathematical SciencesUniversity of Brighton MoulsecoombBrighton BN2 4GJEngland E-mail: [email protected]

Abstract

This article starts with a brief introduction to neural networks for those unfamiliar with the basic concepts, together with a very brief overview of mathematical approaches to the subject. This is followed by a more detailed look at three areas of research which are of particular interest to numerical analysts.

The first area is approximation theory. If K is a compact set in ℝn, for some n, then it is proved that a semilinear feedforward network with one hidden layer can uniformly approximate any continuous function in C(K) to any required accuracy. A discussion of known results and open questions on the degree of approximation is included. We also consider the relevance of radial basis functions to neural networks.

The second area considered is that of learning algorithms. A detailed analysis of one popular algorithm (the delta rule) will be given, indicating why one implementation leads to a stable numerical process, whereas an initially attractive variant (essentially a form of steepest descent) does not. Similar considerations apply to the backpropagation algorithm. The effect of filtering and other preprocessing of the input data will also be discussed systematically.

Finally some applications of neural networks to numerical computation are considered.

Type
Research Article
Copyright
Copyright © Cambridge University Press 1994

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Aleksander, I. and Morton, H. (1990), An Introduction to Neural Computing, Chapman and Hall (London).Google Scholar
Almeida, L. B. and Silva, F. M. (1992), ‘Adaptive decorrelation’, in Artificial Neural Networks 2 (Aleksander, I. and Taylor, J., eds), Vol. 2, North-Holland (Amsterdam) 149156.Google Scholar
Amari, S. I. (1990), ‘Mathematical foundations of neurocomputing’, Proc. IEEE 78, 11431463.CrossRefGoogle Scholar
Aiyer, S. V. B., Niranjan, M. and Fallside, F. (1989), ‘A theoretical investigation into the performance of the Hopfield model’, Tech. Report, CUED/F-INFENG/TR 36, Cambridge University Engineering Department, Cambridge, CB2 1PZ, England.Google Scholar
Baldi, P. and Hornik, K. (1989), ‘Neural networks and principal component analysis: learning from examples without local minima’, Neural Networks 2, 5358.CrossRefGoogle Scholar
Ben-Israel, A. and Greville, T. N. E. (1974), Generalised Inverses, Theory and Applications, Wiley (Chichester).Google Scholar
Bichsel, M. and Seitz, P. (1989), ‘Minimum class entropy: a maximum information approach to layered networks’, Neural Networks 2, 133141.CrossRefGoogle Scholar
Brause, R. W. (1992), ‘The error bounded descriptional complexity of approximation networks’, Fachberiech Informatik, J W Goethe University, Frankfurt am Main, Germany.Google Scholar
Broomhead, D.S. and Lowe, D. (1988), ‘Multivariable function interpolation and adaptive networks’, Complex Systems 2, 321355.Google Scholar
Chen, T., Chen, H. and Liu, R. (1991), ‘A constructive proof and extension of Cy-benko's approximation theorem’, in Computing Science and Statistics: Proc. 22nd Symp. on the Interface, Springer (Berlin) 163168.Google Scholar
Cheney, E. W. (1966), Introduction to Approximation Theory, McGraw-Hill (New York).Google Scholar
Cybenko, G. (1989), ‘∞ approximation by superpositions of a sigmoidal function’, Math. Control-Signals Systems 2, 303314.CrossRefGoogle Scholar
Diaconis, P. and Shashahani, M. (1984), ‘On nonlinear functions of linear combinations’, SIAM J. Sci. Statist. Comput. 5, 175191.CrossRefGoogle Scholar
Ellacott, S. W. (1990), ‘An analysis of the delta rule’, Proc. Int. Neural Net Conf., ParisKluwer (Deventer) 956959.CrossRefGoogle Scholar
Ellacott, S. W. (1993a), ‘The numerical analysis approach’, in Mathematical Approaches to Neural Networks (Taylor, J.G., ed.), North Holland (Amsterdam) 103138.Google Scholar
Ellacott, S. W. (1993b), ‘Techniques for the mathematical analysis of neural networks’, J. Appl. Comput. Math, to appear.Google Scholar
Ellacott, S. W. (1993c), ‘Singular values and neural network algorithms’, in Proc. British Neural Network Society Meeting, February 1993.Google Scholar
Falconer, K. (1990), Fractal Geometry, Wiley (New York).Google Scholar
Fombellida, M. and Destine, J. (1992), ‘The extended quickprop’, in Artificial Neural Networks 2 (Aleksander, I. and Taylor, J.), Vol. 2, North-Holland (Amsterdam) 973977.CrossRefGoogle Scholar
Funahashi, K.-I. (1989), ‘On the approximate realization of continuous mappings by neural networks’, Neural Networks 2, 183192.CrossRefGoogle Scholar
Hornik K, K., Stinchcombe, M. and White, H. (1989), ‘Multilayer feedforward networks are universal approximators’, Neural Networks 2, 359366.CrossRefGoogle Scholar
INNC 90 (1990), Proc. Int. Neural Network Conf. (9–13 July 1990, Palais de Con-gres, Paris, France) Kluwer (Deventer).Google Scholar
Isaacson, E. and Keller, H. B. (1966), Analysis of Numerical Methods, Wiley (New York).Google Scholar
Jacobs, D. (ed.) (1977), The State of the Art in Numerical Analysis, Academic Press (New York).Google Scholar
Jones, A.J. (1992), ‘Neural computing applications to prediction and control’, Department of Computing, Imperial College, London, United Kingdom.Google Scholar
Kreyszig, E. (1978), Introductory Functional Analysis with Applications, Wiley (New York).Google Scholar
Linggard, B. and Nightingale, C. (eds) (1992), Neural Networks for Images, Speech and Natural Language, Chapman and Hall (London).CrossRefGoogle Scholar
Light, W. (1992), ‘Ridge function, sigmoidal functions and neural networks’, in Approximation Theory VII (Cheney, E. W., Chui, C. K. and Schumaker, L. L., eds) Academic (Boston) 144.Google Scholar
Mason, J. C. and Parks, P. C. (1992), ‘Selection of neural network structures – some approximation theory guidelines’, Ch. 8, in Neural Networks for Control and Systems, (Warwick, K., Irwin, G. W. and Hunt, K. J., eds) IEE Control Engineering Series no. 46, Peter Peregrinus (Letchworth).Google Scholar
Mhaskar, H. N. and Micchelli, C. (1992), ‘Approximation by superposition of sigmoidal functions’, Adv. Appl. Math. 13, 350373.CrossRefGoogle Scholar
Mhaskar, H. N. (1993), ‘Approximation properties of a multilayered feedforward artificial neural network’, Adv. Comput. Math. 1, 6180.CrossRefGoogle Scholar
Moré, J. J. (1978), ‘The Levenberg–Marquardt algorithm, implementation and theory’, in Proc. Dundee Biennial Conf. on Numerical Analysis 1977, (Watson, G. A., ed.), Springer Lecture Notes in Mathematics no. 630, Springer (Berlin) 105116.Google Scholar
Oja, E. (1983), Subspace Methods of Pattern Recognition, Research Studies Press (Letchworth, UK).Google Scholar
Oja, E. (1992), ‘Principal components, minor components and linear neural networks’, Neural Networks 5, 927935.CrossRefGoogle Scholar
Oja, E., Ogawa, H. and Wangviwattana, J. (1992), ‘PCA in fully parallel neural networks’, in Artificial Neural Networks 2 (Aleksander, I. and Taylor, J., eds), Vol. 2, North-Holland (Amsterdam) 199202.Google Scholar
Powell, M. J. D. (1992), ‘The theory of radial basis functions approximation in 1990’, in Advances in Numerical Analysis (Light, W., ed.), Vol. II, Oxford University Press (Oxford), 105210.CrossRefGoogle Scholar
Rumelhart, D. E. and McClelland, J. L. (1986), Parallel and Distributed Processing: Explorations in the Micro structure of Cognition, Vols 1 and 2, MIT (Cambridge, MA).CrossRefGoogle Scholar
Sagan, H. (1969), Introduction to the Calculus of Variations, McGraw-Hill (New York).Google Scholar
Simpson, P. K. (1990), Artificial Neural Systems: Foundations, Paradigms, Applications and Implementations, Pergamon Press (New York).Google Scholar
Stein, E. and Weiss, G. (1971), Introduction to Fourier Analysis on Euclidean Spaces, Princeton University Press (Princeton, USA).Google Scholar
Taylor, J.G. (ed.)(1993), Mathematical Approaches to Neural Networks, North Holland (Amsterdam).Google Scholar
Van den Bout, D. E. and Miller, T. K. (1988), ‘A travelling salesman objective function that works’, Proc. IEEE Conf. on Neural Networks, Vol. 2, SOS Printing (San Diego, CA) 299304.Google Scholar
Venkataraman, G. and Athithan G, G. (1991), ‘Spin glass, the travelling salesman problem, neural networks and all that’, Prãamana J. Phys. 36, 177.CrossRefGoogle Scholar
Wang, Z., Tham, M. T. and Morris, A. J. (1992), ‘Multilayer feed forward neural networks: a cannonical form approximation of nonlinearity’, Department of Chemical and Process Engineering, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU, United Kingdom.Google Scholar
Warwick, K., Irwin, G. W. and Hunt, K. J. (1992), ‘Neural networks for control and systems’, IEE Control Engineering Series No. 46, Peter Peregrinus (Letch-worth).Google Scholar
Wasserman, P. D. (1989), Neural Computing: Theory and Practice, Van Nostrand Reinhold (New York).Google Scholar
Werbos, P. J. (1992), ‘Neurocontrol: where it is going and why it is crucial’, in Artificial Neural Networks 2 (Aleksander, I. and Taylor, J., eds), Vol. 1, North-Holland (Amsterdam) 6168.Google Scholar
Xu, Y., Light, W. A. and Cheney, E. W. (1991), ‘Constructive methods of approximation by ridge functions and radial functions’. Address of first author: Department of Mathematics, University of Arkansas at Little Rock, Little Rock, AR 72204, USA.Google Scholar