Hierarchically organised behaviour and its neural foundations: a reinforcement-learning perspective

doi:10.1017/CBO9780511731525.017

13 - Hierarchically organised behaviour and its neural foundations: a reinforcement-learning perspective

from Part II - Computational neuroscience models

Published online by Cambridge University Press: 05 November 2011

Matthew M. Botvinick ,

Yael Niv and

Edited by

Tony J. Prescott and

Anil K. Seth: Affiliation:
University of Sussex
Tony J. Prescott: Affiliation:
University of Sheffield
Joanna J. Bryson: Affiliation:
University of Bath

Book contents

Get access

Summary

Research on human and animal behaviour has long emphasised its hierarchical structure – the divisibility of ongoing behaviour into discrete tasks, which are comprised of subtask sequences, which in turn are built of simple actions. The hierarchical structure of behaviour has also been of enduring interest within neuroscience, where it has been widely considered to reflect prefrontal cortical functions. In this chapter, we re-examine behavioural hierarchy and its neural substrates from the point of view of recent developments in computational reinforcement learning. Specifically, we consider a set of approaches known collectively as hierarchical reinforcement learning, which extend the reinforcement learning paradigm by allowing the learning agent to aggregate actions into reusable subroutines or skills. A close look at the components of hierarchical reinforcement learning suggests how they might map onto neural structures, in particular regions within the dorsolateral and orbital prefrontal cortex. It also suggests specific ways in which hierarchical reinforcement learning might provide a complement to existing psychological models of hierarchically structured behaviour. A particularly important question that hierarchical reinforcement learning brings to the fore is that of how learning identifies new action routines that are likely to provide useful building blocks in solving a wide range of future problems. Here and at many other points, hierarchical reinforcement learning offers an appealing framework for investigating the computational and neural underpinnings of hierarchically structured behaviour.

In recent years, it has become increasingly common within both psychology and neuroscience to explore the applicability of ideas from machine learning. Indeed, one can now cite numerous instances where this strategy has been fruitful. Arguably, however, no area of machine learning has had as profound and sustained an impact on psychology and neuroscience as that of computational reinforcement learning (RL). The impact of RL was initially felt in research on classical and instrumental conditioning (Barto and Sutton, 1981; Sutton and Barto, 1990; Wickens et al., 1995). Soon thereafter, its impact extended to research on midbrain dopaminergic function, where the temporal-difference learning paradigm provided a framework for interpreting temporal profiles of dopaminergic activity (Barto, 1995; Houk et al., 1995; Montague et al., 1996; Schultz et al., 1997). Subsequently, actor–critic architectures for RL have inspired new interpretations of functional divisions of labour within the basal ganglia and cerebral cortex (see Joel et al., 2002, for a review), and RL-based accounts have been advanced to address issues as diverse as motor control (e.g., Miyamoto et al., 2004), working memory (e.g., O’Reilly and Frank, 2006), performance monitoring (e.g., Holroyd and Coles, 2002), and the distinction between habitual and goal-directed behaviour (e.g., Daw et al., 2005).

Type: Chapter
Information: Modelling Natural Action Selection , pp. 264 - 299

DOI: https://doi.org/10.1017/CBO9780511731525.017 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Agre, P. E 1988

Aldridge, W. J.Berridge, K. C. 1998 Coding of serial order by neostriatal neurons: a ‘natural action’ approach to movement sequenceJ. Neurosci. 18 2777CrossRef Google Scholar

Aldridge, J. W.Berridge, K. C.Rosen, A. R. 2004 Basal ganglia neural mechanisms of natural movement sequencesCan. J. Physiol. Pharmacol 82 732CrossRef Google Scholar PubMed

Alexander, G. E.Crutcher, M. D.DeLong, M. R. 1990 Basal ganglia-thalamocortical circuits: parallel substrates for motor, oculomotor, ‘prefrontal’ and ‘limbic’ functionsProg. Brain Res 85 119CrossRef Google Scholar PubMed

Alexander, G. E.DeLong, M. R.Strick, P. L. 1986 Parallel organization of functionally segregated circuits linking basal ganglia and cortexAnnu. Rev. Neurosci. 9 357CrossRef Google Scholar PubMed

Allport, A.Wylie, G 2000 Task-switching, stimulus-response bindings and negative primingControl of Cognitive Processes: Attention and Performance, XVIIIMonsell, S.Driver, J.Cambridge, MAMIT Press35Google Scholar

Anderson, J. R. 2004 An integrated theory of mindPsychol. Rev.1036CrossRef Google Scholar

Andre, D.Russell, S. J. 2001 Programmable reinforcement learning agentsAdv. Neural Inf. Proc. Syst. 13 1019Google Scholar

Andre, D.Russell, S. J. 2002

Ansuini, C.Santello, M.Massaccesi, S.Castiello, U. 2006 Effects of end-goal on hand shapingJ. Neurophysiol. 95 2456CrossRef Google Scholar PubMed

Arbib, M. A. 1985 Schemas for the temporal organization of behaviourHum. Neurobiol 4 63Google Scholar PubMed

Asaad, W. F.Rainer, G.Miller, E. K. 2000 Task-specific neural activity in the primate prefrontal cortexJ. Neurophysiol 84 451CrossRef Google Scholar PubMed

Averbeck, B. B.Lee, D. 2007 Prefrontal neural correlates of memory for sequencesJ. Neurosci 27 2204CrossRef Google Scholar PubMed

Badre, D. 2008 Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobesTrends Cogn. Sci. 12 193CrossRef Google Scholar PubMed

Balleine, B. W.Dickinson, A. 1998 Goal-directed instrumental action: contingency and incentive learning and their cortical substratesNeuropharmacology 37 407CrossRef Google Scholar PubMed

Barto, A. G. 1995 Adaptive critics and the basal gangliaModels of Information Processing in the Basal GangliaHouk, J. C.Davis, J.Beiser, D.Cambridge, MAMIT Press215Google Scholar

Barto, A. G.Mahadevan, S. 2003 Recent advances in hierarchical reinforcement learningDiscrete Event Dyn. S. 13 343Google Scholar

Barto, A. G.Singh, S.Chentanez, N. 2004

Barto, A. G.Sutton, R. S. 1981 Toward a modern theory of adaptive networks: Expectation and predictionPsychol. Rev 88 135Google Scholar

Barto, A. G.Sutton, R. SAnderson, C. W. 1983 Neuronlike adaptive elements that can solve difficult learning control problemsIEEE T. Syst. Man and Cyb. 13 834CrossRef Google Scholar

Berlyne, D. E. 1960 Conflict, Arousal and CuriosityNew YorkMcGraw-HillCrossRef Google Scholar

Bhatnagara, S.Panigrahi, J. R. 2006 Actor-critic algorithms for hierarchical Markov decision processesAutomatica 42 637CrossRef Google Scholar

Bogacz, R.Brown, E.Moehlis, J.Holmes, P.Cohen, J. D. 2006 The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasksPsychol. Rev. 113 700CrossRef Google Scholar PubMed

Bor, D.Duncan, J.Wiseman, R. J.Owen, A. M. 2003 Encoding strategies dissociate prefrontal activity from working memory demandNeuron 37 361CrossRef Google Scholar PubMed

Botvinick, M.Plaut, D. C. 2002 Representing task context: proposals based on a connectionist model of actionPsychol. Res.298CrossRef Google Scholar

Botvinick, M.Plaut, D. C. 2004 Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential actionPsychol. Rev. 111 395CrossRef Google Scholar PubMed

Botvinick, M.Plaut, D. C. 2006 Such stuff as habits are made on: a reply to Cooper and Shallice (2006)Psychol. Rev. 113 917CrossRef Google Scholar

Botvinick, M. M. 2007 Multilevel structure in behaviour and the brain: a model of Fuster's hierarchyPhil. Trans. Roy. Soc. B 362 1615CrossRef Google Scholar PubMed

Botvinick, M. M. 2008 Hierarchical models of behavior and prefrontal functionTrends Cogn. Sci. 12 201CrossRef Google Scholar PubMed

Bruner, J. 1973 Organization of early skilled actionChild Dev. 44 1CrossRef Google Scholar PubMed

Bunge, S. A. 2004 How we use rules to select actions: a review of evidence from cognitive neuroscienceCogn. Affect. Behav. Ne. 4 564CrossRef Google Scholar PubMed

Bunzeck, N.Duzel, E. 2006 Absolute coding of stimulus novelty in the human substantia nigra/VTANeuron 51 369CrossRef Google Scholar PubMed

Cohen, J. D.Braver, T. S.O’Reilly, R. C 1996 A computational approach to prefrontal cortex, cognitive control and schizophrenia: recent developments and current challengesPhil. Trans. Roy. Soc. B 351 1515CrossRef Google Scholar PubMed

Cohen, J. D.Dunbar, K.McClelland, J. L. 1990 On the control of automatic processes: a parallel distributed processing account of the Stroop effectPsychol. Rev. 97 332CrossRef Google Scholar PubMed

Conway, C. M.Christiansen, M. H. 2001 Sequential learning in non-human primatesTrends Cogn. Sci. 5 539CrossRef Google Scholar PubMed

Cooper, R.Shallice, T. 2000 Contention scheduling and the control of routine activitiesCogn. Neuropsychol. 17 297CrossRef Google Scholar PubMed

Courtney, S. M.Roth, J. K.Sala, J. B.A hierarchical biased-competition model of domain-dependent working memory maintenance and executive controlOsaka, N.Logie, R.Esposito, M. DWorking Memory: Behavioural and Neural CorrelatesOxfordOxford University Press369

D’Esposito, M. 2007 From cognitive to neural models of working memoryPhil. Trans. Roy. Soc. B 362 761CrossRef Google Scholar PubMed

Daw, N. D.Courville, A. C.Touretzky, D. S. 2003 Timing and partial observability in the dopamine systemAdvances in Neural Information Processing SystemsCambridge, MAMIT Press99Google Scholar

Daw, N. D.Niv, Y.Dayan, P. 2005 Uncertainty-based competition between prefrontal and striatal systems for behavioral controlNat. Neurosci. 8 1704CrossRef Google Scholar PubMed

Daw, N. D.Niv, Y.Dayan, P. 2006 Actions, policies, values and the basal gangliaRecent Breakthroughs in Basal Ganglia ResearchBezard, E.New YorkNova Science Publishers369Google Scholar

De Pisapia, NGoddard, N. H. 2003 A neural model of frontostriatal interactions for behavioral planning and action chunkingNeurocomputing 52Google Scholar

Dehaene, S.Changeux, J.-P. 1997 A hierarchical neuronal network for planning behaviorProc. Nat. Acad. Sci. 94 13293CrossRef Google Scholar PubMed

Dell, G. S.Berger, L. K.Svec, W. R. 1997 Language production and serial orderPsychol. Rev 104 123CrossRef Google Scholar PubMed

Dietterich, T. G. 1998

Dietterich, T. G. 2000 Hierarchical reinforcement learning with the maxq value function decompositionJ. Artif. Intell. Res. 13 227Google Scholar

Elfwing, S.Uchibe, K.Christensen, H. I. 2007 Evolutionary development of hierarchical learning structuresIEEE Trans. Evol. Comput. 11 249CrossRef Google Scholar

Estes, W. K. 1972 An associative basis for coding and organization in memoryCoding Processes in Human MemoryMelton, A. WMartin, E.Washington DCV. H. Winston and Sons161Google Scholar

Ribas-Fernandes, JA. Solway, C. Diuk

Fischer, K. W. 1980 A theory of cognitive development: the control and construction of hierarchies of skillsPsychol. Rev. 87 477CrossRef Google Scholar

Fischer, K. W.Connell, M. W. 2003 Two motivational systems that shape development: epistemic and self-organizingB. J. Educ. Psychol. 2 103Google Scholar

Frank, M. J.Claus, E. D. 2006 Anatomy of a decision: striato-orbitofrontal interactions in reinforcement learning, decision making, and reversalPsychol. Rev. 113 300CrossRef Google Scholar PubMed

Fujii, N.Graybiel, A. M. 2003 Representation of action sequence boundaries by macaque prefrontal cortical neuronsScience 301 1246CrossRef Google Scholar PubMed

Fuster, J. M. 1997 The Prefrontal Cortex: Anatomy, Physiology, and Neuropsychology of the Frontal LobePhiladelphia, PALippincott-Raven.Google Scholar

Fuster, J. M. 2001 The prefrontal cortex – an update: time is of the essenceNeuron 30 319CrossRef Google Scholar PubMed

Fuster, J. M. 2004 Upper processing stages of the perception-action cycleTrends Cogn. Sci 8 143CrossRef Google Scholar PubMed

Gergely, G.Csibra, G. 2003 Teleological reasoning in infancy: the naive theory of rational actionTrends Cogn. Sci. 7 287CrossRef Google Scholar

Gopnik, A.Glymour, C.Sobel, D. 2004 A theory of causal learning in children: causal maps and Bayes netsPsychol. Rev. 111 1CrossRef Google Scholar PubMed

Gopnik, A.Schulz, L. 2004 Mechanisms of theory formation in young childrenTrends Cogn. Sci. 8 371CrossRef Google Scholar PubMed

Grafman, J. 2002 The human prefrontal cortex has evolved to represent components of structured event complexesGrafman, J.Handbook of NeuropsychologyAmsterdamElsevier157Google Scholar

Graybiel, A. M. 1995 Building action repertoires: memory and learning functions of the basal gangliaCurr. Opin. Neurobiol 5 733CrossRef Google Scholar PubMed

Graybiel, A. M. 1998 The basal ganglia and chunking of action repertoiresNeurobiol. Learn. Mem 70 119CrossRef Google Scholar PubMed

Greenfield, P. M. 1984 A theory of the teacher in the learning activities of everyday lifeEveryday Cognition: Its Development in Social ContextRogoff, B.Lave, J.Cambridge, MAHarvard University Press117Google Scholar

Greenfield, P. M.Nelson, K.Saltzman, E. 1972 The development of rulebound strategies for manipulating seriated cups: a parallel between action and grammarCogn. Psychol. 3 291CrossRef Google Scholar

Greenfield, P. M.Schneider, L. 1977 Building a tree structure: the development of hierarchical complexity and interrupted strategies in children's construction activityDev. Psychol. 13 299CrossRef Google Scholar

Grossberg, S. 1986 The adaptive self-organization of serial order in behavior: speech, language, and motor controlPattern Recognition by Humans and Machines, Volume 1: Speech, PerceptionSchwab, E. CNusbaum, H. C.New YorkAcademic Press187CrossRef Google Scholar

Hamilton, A. F. d. CGrafton, S. T. 2008 Action outcomes are represented in human inferior frontoparietal cortexCereb. Cortex 18 1160CrossRef Google Scholar PubMed

Harlow, H. F.Harlow, M. K.Meyer, D. R. 1950 Learning motivated by a manipulation driveJ. Exp. Psychol. 40 228CrossRef Google Scholar PubMed

Haruno, M.Kawato, M. 2006 Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learningNeural Networks 19 1242CrossRef Google Scholar PubMed

Hayes-Roth, B.Hayes-Roth, F. 1979 A cognitive model of planningCogn. Sci. 3 275CrossRef Google Scholar

Hengst, B. 2002 Discovering hierarchy in reinforcement learning with HEXQP. Int. C. Mach. Learn. 19 243Google Scholar

Holroyd, C. B.Coles, M. G. H. 2002 The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativityPsychol. Rev. 109 679CrossRef Google Scholar PubMed

Hoshi, E.Shima, K.Tanji, J. 1998 Task-dependent selectivity of movement-related neuronal activity in the primate prefrontal cortexJ. Neurophysiol. 80 3392CrossRef Google Scholar PubMed

Houk, J. C.Adams, C. M.Barto, A. G. 1995 A model of how the basal ganglia generate and use neural signals that predict reinforcementModels of Information Processing in the Basal GangliaHouk, J. CDavis, D. G.Cambridge, MAMIT Press249Google Scholar

Joel, D.Niv, Y.Ruppin, E. 2002 Actor-critic models of the basal ganglia: new anatomical and computational perspectivesNeural Networks 15 535CrossRef Google Scholar PubMed

Johnston, K.Everling, S. 2006 Neural activity in monkey prefrontal cortex is modulated by task context and behavioral instruction during delayed-match-to-sample and conditional prosaccade–antisaccade tasksJ. Cogn. Neurosci. 18 749CrossRef Google Scholar PubMed

Jonsson, A.Barto, A. 2001 Automated state abstraction for options using the U-tree algorithmAdvances in Neural Information Processing SystemsCambridge, MAMIT Press1054Google Scholar

Jonsson, A.Barto, A. 2005

Kambhampati, S.Mali, A. D.Srivastava, B. 1998 Hybrid planning for partially hierarchical domainsProceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)Madison, WIAAAI Press882Google Scholar

Kaplan, F.Oudeyer, P.-Y. 2004 Maximizing learning progress: an internal reward system for developmentEmbodied Artificial IntelligenceIida, F.Pfeifer, R.Steels, L.BerlinSpringer-Verlag259CrossRef Google Scholar

Kearns, M.Singh, S. 2002 Near-optimal reinforcement learning in polynomial timeMach. Learn. 49 209CrossRef Google Scholar

Koechlin, E.Ody, C.Kouneiher, F. 2003 The architecture of cognitive control in the human prefrontal cortexScience 302 1181CrossRef Google Scholar PubMed

Krueger, K. A.Dayan, P. 2008 Flexible Shaping. Presented atCosyne (Computational and Systems Neuroscience)Salt Lake CityUtahGoogle Scholar

Laird, J. E.Rosenbloom, P. S.Newell, A. 1986 Chunking in soar: the anatomy of a general learning mechanismMach. Learn. 1 11CrossRef Google Scholar

Landrum, E. R. 2005 Production of negative transfer in a problem-solving taskPsychol. Rep. 97 861CrossRef Google Scholar

Lashley, K. S. 1951 The problem of serial order in behaviorCerebral Mechanisms in Behavior: The Hixon SymposiumJeffress, L. ANew York, NYWiley112Google Scholar

Lee, I. H.Seitz, A. R.Assad, J. A. 2006 Activity of tonically active neurons in the monkey putamen during initiation and withholding of movementJ. Neurophysiol. 95 2391CrossRef Google Scholar

Lee, F. J.Taatgen, N. A. 2003 Production compilation: a simple mechanism to model complex skill acquisitionHum. Factors61Google Scholar

Lehman, J. F.Laird, J.Rosenbloom, P. 1996 A gentle introduction to Soar, an architecture for human cognitionInvitation to Cognitive ScienceSternberg, S.Scarborough, D.Cambridge, MAMIT Press212Google Scholar

Li, L.Walsh, T. J. 2006

Logan, G. D. 2003 Executive control of thought and action: in search of the wild homunculusCurr. Dir. Psychol. Sci. 12 45CrossRef Google Scholar

Luchins, A. S. 1942 Mechanization in problem solvingPsychol. Monogr. 248 1Google Scholar

MacDonald, A. W., J. D.Stenger, V. A.Carter, C. S. 2000 Dissociating the role of the dorsolateral prefrontal and anterior cingulate cortex in cognitive controlScience 288 1835CrossRef Google Scholar PubMed

MacKay, D. G. 1987 The Organization of Perception and Action: A Theory for Language and Other Cognitive SkillsNew YorkSpringer-VerlagCrossRef Google Scholar

Mannor, S.Menache, I.Hoze, A.Klein, U. 2004 Dynamic abstraction in reinforcement learning via clusteringProceedings of the Twenty-First International Conference on Machine LearningNew YorkACM Press560Google Scholar

Marthi, B.Russell, S. J.Wolfe, J. 2007 2007

McGovern, A. 2002

Mehta, S.Ray, P.Tadepalli, P.Dietterich, T. 2008 Automatic discovery and transfer of MAXQ hierarchies. Paper presented atInternational Conference on Machine LearningHelsinkiFinlandGoogle Scholar

Meltzoff, A. N. 1995 Understanding the intentions of others: re-enactment of intended acts by 18-month-old childrenDev. Psychol. 31 838CrossRef Google Scholar PubMed

Menache, I.Mannor, S.Shimkin, N. 2002

Middleton, F. A.Strick, P. L. 2002 Basal-ganglia ‘projections’ to the prefrontal cortex of the primateCereb. Cortex 12 926CrossRef Google Scholar PubMed

Miller, E. K.Cohen, J. D. 2001 An integrative theory of prefrontal cortex functionAnnu. Rev. Neurosci.167CrossRef Google Scholar PubMed

Miller, G. A.Galanter, E.Pribram, K. H. 1960 Plans and the Structure of BehaviorNew YorkHolt, Rinehart and WinstonCrossRef Google Scholar

Minton, S.Hayes, P. J.Fain, J. 1985 Controlling Search in Flexible Parsing

Miyamoto, H.Morimoto, J.Doya, K.Kawato, M. 2004 Reinforcement learning with via-point representationNeural Networks 17 299CrossRef Google Scholar PubMed

Monsell, S. 2003 Task switchingTrends Cogn. Sci. 7 134CrossRef Google Scholar PubMed

Monsell, S.Yeung, N.Azuma, R. 2000 Reconfiguration of task-set: is it easier to switch to the weaker taskPsychol. Res. 63 250CrossRef Google Scholar PubMed

Montague, P. R.Dayan, P.Sejnowski, T. J. 1996 A framework for mesencephalic dopamine based on predictive Hebbian learningJ. Neurosci. 16 1936CrossRef Google Scholar PubMed

Morris, G.Arkadir, D.Nevet, A.Vaadia, E.Bergman, H. 2004 Coincident but distinct messages of midbrain dopamine and striatal tonically active neuronsNeuron 43 133CrossRef Google Scholar PubMed

Muhammad, R.Wallis, J. D.Miller, E. K. 2006 A comparison of abstract rules in the prefrontal cortex, premotor cortex, inferior temporal cortex, and striatumJ. Cogn. Neurosci. 18 974CrossRef Google Scholar PubMed

Nason, S.Laird, J. E. 2005 Soar-RL: integrating reinforcement learning with SoarCogn. Syst. Res. 6 51CrossRef Google Scholar

Newell, A.Simon, H. A. 1963 GPS, a program that simulates human thoughtComputers and ThoughtFeigenbaum, E. AFeldman, J.New YorkMcGraw-Hill279Google Scholar

Newtson, D. 1976 Foundations of attribution: the perception of ongoing behaviorNew Directions in Attribution ResearchHarvey, J. H.Ickes, W. J.Kidd, R. F.Hillsdale, NJErlbaum223Google Scholar

O’Doherty, J.Critchley, H.Deichmann, R.Dolan, R. J. 2003 Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal corticesJ. Neurosci 79Google Scholar

O’Doherty, J.Dayan, P.Schultz, P. 2004 Dissociable roles of ventral and dorsal striatum in instrumental conditioningScience 304 452CrossRef Google Scholar PubMed

O’Reilly, R. CFrank, M. J. 2006 Making working memory work: a computational model of learning in prefrontal cortex and basal gangliaNeural Comput 18 283CrossRef Google Scholar PubMed

Oudeyer, P.-Y.Kaplan, F.Hafner, V. 2007 Intrinsic motivation systems for autonomous developmentIEE T. Evol. Comput. 11 265CrossRef Google Scholar

Parent, A.Hazrati, L. N. 1995 Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loopBrain Res. Rev. 20 91CrossRef Google Scholar PubMed

Parr, R.Russell, S. 1998 Reinforcement learning with hierarchies of machinesAdv. Neural Inf. Proc. Syst. 10 1043Google Scholar

Pashler, H. 1994 Dual-task interference in simple tasks: data and theoryPsychol. Bull. 116 220CrossRef Google Scholar PubMed

Petrides, M. 1995 Impairments on nonspatial self-ordered and externally ordered working memory tasks after lesions to the mid-dorsal part of the lateral frontal cortex in the monkeyJ. Neurosci. 15 359CrossRef Google Scholar PubMed

Piaget, J. 1936 The Origins of Intelligence in ChildrenNew YorkInternational Universities PressGoogle Scholar

Pickett, M.Barto, A. G. 2002 PolicyBlocks: an algorithm for creating useful macro-actions in reinforcement learningMachine Learning: Proceedings of the Nineteenth International Conference on Machine LearningSammut, C.Hoffmann, A.San FranciscoMorgan Kaufmann506Google Scholar

Postle, B. R. 2006 Working memory as an emergent property of the mind and brainNeurosci. 139 23CrossRef Google Scholar PubMed

Ravel, S.Sardo, P.Legallet, E.Apicella, P. 2006 Influence of spatial information on responses of tonically active neurons in the monkey striatumJ. Neurophysiol 95 2975CrossRef Google Scholar PubMed

Rayman, W. E. 1982 Negative transfer: a threat to flying safetyAviat. Space Envir. Md.1224Google Scholar PubMed

Reason, J. T. 1992 Human ErrorCambridgeCambridge University PressGoogle Scholar

Redgrave, P.Gurney, K. 2006 The short-latency dopamine signal: a role in discovering novel actionsNat. Rev. Neurosci. 7 967CrossRef Google Scholar PubMed

Roesch, M. R.Taylor, A. R.Schoenbaum, G. 2006 Encoding of time-discounted rewards in orbitofrontal cortex is independent of valueNeuron 51 509CrossRef Google Scholar PubMed

Rolls, E. T. 2004 The functions of the orbitofrontal cortexBrain Cogn. 55 11CrossRef Google Scholar PubMed

Rougier, N. P., D. C.Braver, T. S.Cohen, J. D.O’Reilly, R. C. 2005 Prefrontal cortex and flexible cognitive control: rules without symbolsProc. Nat. Acad. Sci 102 7338CrossRef Google Scholar PubMed

Ruh, N. 2007

Rumelhart, D.Norman, D. A. 1982 Simulating a skilled typist: a study of skilled cognitive-motor performanceCogn. Sci. 6 1CrossRef Google Scholar

Rushworth, M. F. S.Walton, M. E.Kennerley, S. W.Bannerman, D. M. 2004 Action sets and decisions in the medial frontal cortexTrends Cogn. Sci. 8 410CrossRef Google Scholar PubMed

Ryan, R. M.Deci, E. L. 2000 Intrinsic and extrinsic motivationContemp. Edu. Psychol. 25 54CrossRef Google Scholar

Saffran, J. R.Aslin, R. N.Newport, E. L. 1996 Statistical learning by 8-month-old infantsScience 13 1926CrossRef Google Scholar

Saffran, J. R.Wilson, D. P. 2003 From syllables to syntax: multilevel statistical learning by 12-month-old infantsInfancy 4 273CrossRef Google Scholar

Salinas, E. 2004 Fast remapping of sensory stimuli onto motor actions on the basis of contextual modulationJ. Neurosci. 24 1113CrossRef Google Scholar PubMed

Schank, R. C.Abelson, R. P. 1977 Scripts, Plans, Goals and UnderstandingHillsdale, NJErlbaum.Google Scholar

Schmidhuber, J. 1991 A possibility for implementing curiosity and boredom in model-building neural controllersFrom Animals to Animats: Proceedings of the First International Conference on Simulation of Adaptive BehaviorCambridgeMIT Press222Google Scholar

Schneider, D. W.Logan, G. D. 2006 Hierarchical control of cognitive processes: switching tasks in sequencesJ. Exp. Psychol. 135 623CrossRef Google Scholar PubMed

Schoenbaum, G.Chiba, A. A.Gallagher, M. 1999 Neural encoding in orbitofrontal cortex and basolateral amygdala during olfactory discrimination learningJ. Neurosci. 19 1876CrossRef Google Scholar PubMed

Schultz, W.Apicella, P.Ljungberg, T. 1993 Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response taskJ. Neurosci. 13 900CrossRef Google Scholar PubMed

Schultz, W.Dayan, P.Montague, P. R. 1997 A neural substrate of prediction and rewardScience 275 1593CrossRef Google Scholar PubMed

Schultz, W.Tremblay, K. L.Hollerman, J. R. 2000 Reward processing in primate orbitofrontal cortex and basal gangliaCereb. Cortex 10 272CrossRef Google Scholar PubMed

Shallice, T.Burgess, P. W. 1991 Deficits in strategy application following frontal lobe damage in manBrain 114 727CrossRef Google Scholar PubMed

Shima, K.Isoda, M.Mushiake, H.Tanji, J. 2007 Categorization of behavioural sequences in the prefrontal cortexNature 445 315CrossRef Google Scholar PubMed

Shima, K.Tanji, J. 2000 Neuronal activity in the supplementary and presupplementary motor areas for temporal organization of multiple movementsJ. Neurophysiol. 84 2148CrossRef Google Scholar PubMed

Shimamura, A. P. 2000 The role of the prefrontal cortex in dynamic filteringPsychobiol. 28 207Google Scholar

Simsek, O.Wolfe, A.Barto, A. 2005 Identifying useful subgoals in reinforcement learning by local graph partitioningProceedings of the Twenty-Second International Conference on Machine Learning (ICML 05)New YorkACM816CrossRef Google Scholar

Singh, S.Barto, A. G.Chentanez, N. 2005 Intrinsically motivated reinforcement learningAdvances in Neural Information Processing Systems 17: Proceedings of the 2004 ConferenceSaul, L. KWeiss, Y.Bottou, L.Cambridge, MAMIT PressGoogle Scholar

Sirigu, A.Zalla, T.Pillon, B. 1995 Selective impairments in managerial knowledge in patients with pre-frontal cortex lesionsCortex 31 301CrossRef Google Scholar

Sommerville, J.Woodward, A. L. 2005 Pulling out the intentional structure of action: the relation between action processing and action production in infancyCognition1CrossRef Google Scholar PubMed

Sommerville, J. A.Woodward, A. L. 2005 Infants’ sensitivity to the causal features of means–end support sequences in action and perceptionInfancy 8 119CrossRef Google Scholar

Suri, R. E.Bargas, J.Arbib, M. A. 2001 Modeling functions of striatal dopamine modulation in learning and planningNeurosci. 103 65CrossRef Google Scholar PubMed

Sutton, R. S.Barto, A. G. 1990 Time-derivative models of Pavlovian reinforcementLearning and Computational Neuroscience: Foundations of Adaptive, NetworksGabriel, MMoore, J.Cambridge, MAMIT Press497Google Scholar

Sutton, R. S.Barto, A. G. 1998 Reinforcement Learning: An IntroductionCambridge, MAMIT PressGoogle Scholar

Sutton, R. S.Precup, D.Singh, S. 1999 Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learningArtif. Intell 112 181CrossRef Google Scholar

Tenenbaum, J. BSaxe, R. R. 2006 Bayesian Models of Action UnderstandingCambridge, MAMIT PressGoogle Scholar

Thrun, S. B.Scwhartz, A. 1995 Finding structure in reinforcement learningAdvances in Neural Information Processing Systems: Proceedings of the 1994 ConferenceTesauro, G.Touretzky, D. S.Leen, T.Cambridge, MAMIT Press385Google Scholar

Wallis, J. D.Anderson, K. C.Miller, E. K. 2001 Single neurons in prefrontal cortex encode abstract rulesNature 411 953CrossRef Google Scholar PubMed

Wallis, J. D.Miller, E. K. 2003 From rule to response: neuronal processes in the premotor and prefrontal cortexJ. Neurophysiol. 90 1790CrossRef Google Scholar PubMed

Ward, GAllport, A. 1997 Planning and problem-solving using the five-disc Tower of London taskQ. J. Exp. Psychol 50CrossRef Google Scholar

White, I. M. 1999 Rule-dependent neuronal activity in the prefrontal cortexExp. Brain Res. 126 315CrossRef Google Scholar PubMed

White, R. W. 1959 Motivation reconsidered: the concept of competencePsychol. Rev. 66 297CrossRef Google Scholar PubMed

Wickens, J.Kotter, R.Houk, J. C. 1995 Cellular models of reinforcementModels of Information Processing in the Basal GangliaDavis, J. LBeiser, D. G.Cambridge, MAMIT Press187Google Scholar

Wolpert, D.Flanagan, J. 2001 Motor predictionCurr. Biol. 18 R729CrossRef Google Scholar

Wood, J. N.Grafman, J. 2003 Human prefrontal cortex: processing and representational perspectivesNature Rev. Neurosci 4 139CrossRef Google Scholar PubMed

Woodward, A. L.Sommerville, J. A.Guajardo, , J. J. 2001 How infants make sense of intentional actionMalle, B. FMoses, L. JBaldwin, D. A.Intentions and Intentionality: Foundations of Social CognitionCambridge, MAMIT Press149Google Scholar

Yamada, S.Tsuji, S. 1989

Yan, Z.Fischer, K. 2002 Always under construction: dynamic variations in adult cognitive microdevelopmentHum. Dev. 45 141CrossRef Google Scholar

Zacks, J. M.Braver, T. S.Sheridan, M. A. 2001 Human brain activity time-locked to perceptual event boundariesNature Neurosci. 4 651CrossRef Google Scholar PubMed

Zacks, J. M.Speer, N. K.Swallow, K. M.Braver, T. S.Reynolds, J. R. 2007 Event perception: a mind/brain perspectivePsychol.Bull. 133 273CrossRef Google Scholar PubMed

Zacks, J. M.Tversky, B. 2001 Event structure in perception and conceptionPsychol. Bull. 127 3CrossRef Google Scholar PubMed

Zalla, TP. Pradat-Diehl, Sirigu, A. 2003 Perception of action boundaries in patients with frontal lobe damageNeuropsychologia 41 1619CrossRef Google Scholar PubMed

Zhou, W.Coggins, R. 2002 Computational models of the amygdala and the orbitofrontal cortex: a hierarchical reinforcement learning system for robotic controlLecture Notes AI: LNAI 2557McKay, I.Slaney, J.BerlinSpringer-Verlag419Google Scholar

Zhou, WCoggins, R. 2004 Biologically inspired reinforcement learning: reward-based decomposition for multi-goal environmentsBiologically Inspired Approaches to Advanced Information TechnologyIjspeert, A. JMurata, M.Wakamiya, N.BerlinSpringer-VerlagGoogle Scholar