Deep Learning

doi:10.1017/9781108755610.012

9 - Deep Learning

from Part II - Cognitive Modeling Paradigms

Published online by Cambridge University Press: 21 April 2023

Marco Gori ,

Frédéric Precioso and

Edmondo Trentin

Edited by

Ron Sun

Show author details

Ron Sun: Affiliation:
Rensselaer Polytechnic Institute, New York

Book contents

Get access

Summary

This chapter introduces deep learning (DL) in the framework of experimentalism, taking inspiration from Pierre Oleron’s explanation of human intellectual activities in terms of long (or, deep) circuits. A history of DL is presented, from its origin in the mid-twentieth century to the breakthrough of deep neural networks (DNNs) in the last decades. Architectural and representational issues are then discussed in depth. Convolutional neural networks, the most popular and successful DL algorithm to date, are reviewed in detail. Finally, adaptive activation functions in DNNs are presented in the context of homeostatic neuroplasticity, surveyed, and analyzed.

Keywords

deep learning deep neural network internal representation convolutional neural network adaptive activation function

Type: Chapter
Information: The Cambridge Handbook of Computational Cognitive Sciences , pp. 301 - 349

DOI: https://doi.org/10.1017/9781108755610.012 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Abadi, M., Barham, P., Chen, J., et al. (2016). Tensorflow: a system for large-scale machine learning. In Keeton, K., & Roscoe, T., (Eds.), In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (pp. 265–283). USENIX Association.Google Scholar

Agostinelli, F., Hoffman, M. D., Sadowski, P. J., & Baldi, P. (2015). Learning activation functions to improve deep neural networks. In Bengio, Y. & LeCun, Y., (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, Workshop Track Proceedings.Google Scholar

Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine, 34(6), 26–38.Google Scholar

Bellman, R. (1961). Adaptive Control Processes: A Guided Tour. Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Schölkopf, B., Platt, J., & Hoffman, T., (Eds.), Advances in Neural Information Processing Systems 19 (pp. 153–160). Cambridge, MA: MIT Press.Google Scholar

Bengio, Y., & Lecun, Y. (2007). Scaling Learning Algorithms Towards AI. Cambridge, MA: MIT Press.Google Scholar

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.Google Scholar

Bianchini, M., Frasconi, P., & Gori, M. (1995). Learning in multilayered networks used as autoassociators. IEEE Transactions on Neural Networks, 6(2), 512–515.Google Scholar

Bodyanskiy, Y., Deineko, A., Pliss, I., & Slepanska, V. (2019). Formal neuron based on adaptive parametric rectified linear activation function and its learning. In Kryvinska, N., Izonin, I., Gregus, M., Poniszewska-Maranda, A., & Dronyuk, I., (Eds.), Proceedings of the 1st International Workshop on Digital Content & Smart Multimedia (DCSMart 2019), vol. 2533 of CEUR Workshop Proceedings (pp. 14–22). CEUR-WS.org.Google Scholar

Bohn, B., Griebel, M., & Rieger, C. (2019). A representer theorem for deep kernel learning. Journal of Machine Learning Research, 20, 1–32.Google Scholar

Boring, E. (1950). A History of Experimental Psychology. New York, NY: Appleton-Century-Crofts.Google Scholar

Castelli, I., & Trentin, E. (2011). Supervised and unsupervised co-training of adaptive activation functions in neural nets. In Schwenker, F., & Trentin, E., (Eds.), Partially Supervised Learning – First IAPR TC3 Workshop, PSL 2011, Revised Selected Papers, vol. 7081 of Lecture Notes in Computer Science (pp. 52–61). New York, NY: Springer.Google Scholar

Castelli, I., & Trentin, E. (2014). Combination of supervised and unsupervised learning for training the activation functions of neural networks. Pattern Recognition Letters, 37, 178–191.Google Scholar

Cho, K., Courville, A., & Bengio, Y. (2015). Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 1875–1886.Google Scholar

Clevert, D., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network learning by exponential linear units (elus). In Bengio, Y., & LeCun, Y., (Eds.), Proceedings of the 4th International Conference on Learning Representations (ICLR, 2016).Google Scholar

Cortes, C., Gonzalvo, X., Kuznetsov, V., Mohri, M., & Yang, S. (2017). AdaNet: adaptive structural learning of artificial neural networks. In Precup, D., & Teh, Y. W., (Eds.), Proceedings of the 34th International Conference on Machine Learning (vol. 70, pp. 874–883).Google Scholar

Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. Electronic Computers, IEEE Transactions on, 14 (3), 326–334.Google Scholar

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2, 303–314.Google Scholar

Dasgupta, S., Stevens, C. F., & Navlakha, S. (2017). A neural algorithm for a fundamental computing problem. Science, 358(6364), 793–796.Google Scholar

Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., & Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., & Weinberger, K. Q., (Eds.), Advances in Neural Information Processing Systems, vol. 27. New York, NY: Curran Associates, Inc.Google Scholar

Dechter, R. (1986). Learning while searching in constraint-satisfaction-problems. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 178–183.Google Scholar

Delahunt, C. B., Riffell, J. A., & Kutz, J. N. (2018). Biological mechanisms for learning: a computational model of olfactory learning in the manduca sexta moth, with applications to neural nets. Frontiers in Computational Neuroscience, 12, 102.Google Scholar

Ducoffe, M., & Precioso, F. (2018). Adversarial active learning for deep networks: a margin based approach. arXiv:1802.09841Google Scholar

Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York, NY: Wiley.Google Scholar

Dushkoff, M., & Ptucha, R. (2016). Adaptive activation functions for deep networks. Electronic Imaging, XVI(5), 1–5.Google Scholar

Elsayed, G. F., Shankar, S., Cheung, B., et al. (2018). Adversarial examples that fool both computer vision and time-limited humans. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 3914–3924. Red Hook, NY: Curran Associates.Google Scholar

Fiori, S. (2000). Blind signal processing by the adaptive activation function neurons. Neural Networks, 13, 597–611.Google Scholar

Flennerhag, S., Yin, H., Keane, J., & Elliot, M. (2018). Breaking the activation function bottleneck through adaptive parameterization. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., & Garnett, R., (Eds.), Advances in Neural Information Processing Systems 31 (pp. 7739–7750). New York, NY: Curran Associates.Google Scholar

Fuchs, E., & Flügge, G. (2014). Adult neuroplasticity: more than 40 years of research. Neural Plasticity, 541870, 1–10.Google Scholar

Fukushima, K. (1975). Cognitron: a self-organizing multilayered neural network. Biological Cybernetics, 20(3–4), 121–136.Google Scholar

Fukushima, K. (1980). Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.Google Scholar

Fukushima, K. (2019). Recent advances in the deep CNN neocognitron. Nonlinear Theory and Its Applications, IEICE, 10(4), 304–321.Google Scholar

Godfrey, L. B. (2019). An evaluation of parametric activation functions for deep learning. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics, pp. 3006–3011.Google Scholar

Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014a). Generative adversarial nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., & Weinberger, K. Q., (Eds.), Advances in Neural Information Processing Systems, vol. 27. New York, NY: Curran Associates.Google Scholar

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., et al. (2014b). Generative adversarial nets. In Ghahramani, Z. et al., (Eds.), Advances in Neural Information Processing Systems, 27, 2672–2680.Google Scholar

Gori, M., & Scarselli, F. (1998). Are multilayer perceptrons adequate for pattern recognition and verification? IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 1121–1132.Google Scholar

Håstad, J. (1987). Computational Limitations of Small-Depth Circuits. Cambridge, MA: MIT Press.Google Scholar

Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Hoboken, NJ: Prentice Hall.Google Scholar

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision, (pp. 1026–1034). IEEE Computer Society, USA.Google Scholar

He, X., Zhao, K., & Chu, X. (2021). Automl: a survey of the state-of-the-art. Knowledge-Based Systems, 212, 106622.Google Scholar

Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York, NY: Wiley.Google Scholar

Hinton, G. E., & Osindero, S. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 2006.Google Scholar

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.Google Scholar

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2261–2269).Google Scholar

Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3), 574.CrossRef Google Scholar PubMed

Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology, 160(1), 106.Google Scholar

Hubel, D. H., & Wiesel, T. N. (1977). Ferrier lecture-functional architecture of macaque monkey visual cortex. Proceedings of the Royal Society of London. Series B. Biological Sciences, 198(1130), 1–59.Google Scholar

Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man, and Cybernetics, 1(4), 364–378.Google Scholar

Ivakhnenko, A. G., & Lapa, V. G. (1965). Cybernetic Predicting Devices. New York, NY: CCM Information Corporation.Google Scholar

Jagtap, A. D., Kawaguchi, K., & Karniadakis, G. E. (2020). Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics, 404, 109136.CrossRef Google Scholar

Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3), 630–644.Google Scholar

Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations. OpenReview.net.Google Scholar

Klambauer, G., Unterthiner, T., Mayr, A., & Hochreiter, S. (2017). Self-normalizing neural networks. In Guyon, I. et al., (Eds.), Advances in Neural Information Processing Systems 30 (pp. 971–980).Google Scholar

Kriegeskorte, N. (2015). Deep neural networks: a new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1(1), 417–446.Google Scholar

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105).Google Scholar

Kunc, V., & Kléma, J. (2019). On transformative adaptive activation functions in neural networks for gene expression inference. bioRxivGoogle Scholar

LeCun, Y., Boser, B., Denker, J. S., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.Google Scholar

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pp. 2278–2324.Google Scholar

Lee, H., & Fu, K. (1974). Grammatical inference for syntactic pattern recognition. In Tou, J., (Ed.), Information Systems (pp. 425–449). Boston, MA: Springer.Google Scholar

Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 609–616).Google Scholar

LeNail, A. (2019). NN-SVG: publication-ready neural network architecture schematics. The Journal of Open Source Software, 4(33), 747.Google Scholar

Li, D., Chen, X., Becchi, M., & Zong, Z. (2016). Evaluating the energy efficiency of deep convolutional neural networks on cpus and gpus. In the 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (pp. 477–484).Google Scholar

Lippmann, R. P., & Gold, B. (1987). Neural classifiers useful for speech recognition. In IEEE Proceedings of the First International Conference on Neural Networks, vol. IV (pp. 417–422). San Diego, CA.Google Scholar

Liu, B., Yu, X., Yu, A., Zhang, P., Wan, G., & Wang, R. (2018). Deep few-shot learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 57(4), 2290–2304.Google Scholar

Marra, G., Zanca, D., Betti, A., & Gori, M. (2018). Learning neuron non-linearities with kernel-based deep neural networks. CoRR, abs/1807.06302Google Scholar

Michels, F., Uelwer, T., Upschulte, E., & Harmeling, S. (2019). On the vulnerability of capsule networks to adversarial attacks. arXiv:1906.03612Google Scholar

Minsky, M., & Papert, S. A. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press.Google Scholar

Mozzachiodi, R., & Byrne, J. (2010). More than synaptic plasticity: role of nonsynaptic plasticity in learning and memory. Trends in Neurosciences, 33(1), 17–26.Google Scholar

Oléron, P. (1963). Les activités intellectuelles. In P. Oléron, J. Piaget, B. Inhelder, & P. Gréco, , (Eds.), Traité de psychologie expérimentale VII. L’Intelligence (pp. 1–70). Paris: Presses Universitaires de France.Google Scholar

Oléron, P., Piaget, J., Inhelder, B., & Gréco, P. (1963). Traité de psychologie expérimentale VII. L’Intelligence. Paris: Presses Universitaires de France.Google Scholar

Olson, R. S., Cava, W. G. L., Orzechowski, P., Urbanowicz, R. J., & Moore, J. H. (2017). PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining, 10(1), 36:1–36:13.Google Scholar

Paszke, A., Gross, S., Massa, F., et al. (2019). PyTorch: an imperative style, high-performance deep learning library. In Wallach, H. et al. (Eds.), Advances in Neural Information Processing Systems 32, (pp. 8024–8035). New York, NY: Curran Associates.Google Scholar

Peterson, J. C., Abbott, J. T., & Griffiths, T. L. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 2648–2669.Google Scholar

Qian, S., Liu, H., Liu, C., Wu, S., & Wong, H.-S. (2018). Adaptive activation functions in convolutional neural networks. Neurocomputing, 272, 204–212.Google Scholar

Roy, S., Unmesh, A., & Namboodiri, V. P. (2018). Deep active learning for object detection. In 29th British Machine Vision Conference (p. 91).Google Scholar

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986a). Learning representations by back-propagating errors. Nature, 323, 533–536.Google Scholar

Rumelhart, D. E., McClelland, J. L., & Group, P. R. (1986b). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press.Google Scholar

Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 3859–3869).Google Scholar

Scardapane, S., Vaerenbergh, S. V., & Uncini, A. (2019). Kafnets: kernel-based non-parametric activation functions for neural networks. Neural Networks, 110, 19–32.Google Scholar

Shawahna, A., Sait, S. M., & El-Maleh, A. (2019). FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access, 7, 7823–7859.Google Scholar

Shen, Y., Dasgupta, S., & Navlakha, S. (2020). Habituation as a neural algorithm for online odor discrimination. Proceedings of the National Academy of Sciences, 117(22), 12402–12410.Google Scholar

Siddoway, B., Hou, H., & Xia, H. (2014). Molecular mechanisms of homeostatic synaptic downscaling. Neuropharmacology, 78, 38–44.Google Scholar

Siu, K.-Y., Roychowdhury, V., & Kailath, T. (1995). Discrete Neural Networks. Hoboken, NJ: Prentice Hall.Google Scholar

Solazzi, M., & Uncini, A. (2004). Regularising neural networks using flexible multivariate activation function. Neural Networks, 17(2), 247–260.Google Scholar

Steinkrau, D., Simard, P. Y., & Buck, I. (2005). Using GPUs for machine learning algorithms. In Proceedings of the 8th International Conference on Document Analysis and Recognition (pp. 1115–1119). IEEE Computer Society.Google Scholar

Szegedy, C., Zaremba, W., Sutskever, I., et al. (2014). Intriguing properties of neural networks. In 2nd International Conference on Learning Representations.Google Scholar

Tanay, T., & Griffin, L. (2016). A boundary tilting perspective on the phenomenon of adversarial examples. arXiv e-prints arXiv–1608Google Scholar

Tramèr, F., Papernot, N., Goodfellow, I., Boneh, D., & McDaniel, P. (2017). The space of transferable adversarial examples. arXiv:1704.03453Google Scholar

Trentin, E. (1998). Learning the amplitude of activation functions in layered networks. In Marinaro, M., & Tagliaferri, R. (Eds.), Neural Nets - WIRN Vietri 98, vol. 7081 of Lecture Notes in Computer Science, (pp. 138–144). Berlin: Springer.Google Scholar

Trentin, E. (2001). Networks with trainable amplitude of activation functions. Neural Networks, 14(4–5), 471–493.Google Scholar

Turrigiano, G. G., & Nelson, S. B. (2000). Hebb and homeostasis in neuronal plasticity. Current Opinion in Neurobiology, 10(3), 358–364.Google Scholar

Vanschoren, J., van Rijn, J. N., Bischl, B., & Torgo, L. (2013). OpenML: networked science in machine learning. SIGKDD Explorations, 15(2), 49–60.Google Scholar

Vecci, L., Piazza, F., & Uncini, A. (1998). Learning and approximation capabilities of adaptive spline activation function neural networks. Neural Networks, 11(2), 259–270.Google Scholar

Viroli, C., & Mclachlan, G. J. (2019). Deep Gaussian mixture models. Statistics and Computing, 29(1), 43–51.Google Scholar

WardJr., J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.Google Scholar

Werbos, P. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Science. Ph.D. Thesis, Department of Applied Mathematics, Harvard University.Google Scholar

Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339–356.Google Scholar

Wiener, N. (1958). Nonlinear Problems in Random Theory. New York, NY: John Wiley.Google Scholar

Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018). Zero-shot learning: a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251–2265.Google Scholar

Xu, B., Wang, N., Chen, T., and Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853v2Google Scholar

Yang, M., Sheth, S. A., Schevon, C. A., McKhann, G. M., & Mesgarani, N. (2015). Speech reconstruction from human auditory cortex with deep neural networks. In Proceedings of INTERSPEECH 2015, ISCA (pp. 1121–1125).Google Scholar

Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2021–2030).Google Scholar