Toll-based reinforcement learning for efficient equilibria in route choice

Gabriel de O. Ramos; Bruno C. Da Silva; Roxana Rădulescu; Ana L. C. Bazzan; Ann Nowé

doi:10.1017/S0269888920000119

Toll-based reinforcement learning for efficient equilibria in route choice

Part of: Adaptive Learning Agents 2018

Published online by Cambridge University Press: 05 March 2020

Ana L. C. Bazzan and

Gabriel de O. Ramos: Affiliation:
Graduate Program in Applied Computing, Universidade do Vale do Rio dos Sinos, São Leopoldo, Brazil, e-mail: [email protected] Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium, e-mails: [email protected], [email protected]
Bruno C. Da Silva: Affiliation:
Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, e-mails: [email protected], [email protected]
Roxana Rădulescu: Affiliation:
Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium, e-mails: [email protected], [email protected]
Ana L. C. Bazzan: Affiliation:
Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil, e-mails: [email protected], [email protected]
Ann Nowé: Affiliation:
Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium, e-mails: [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The problem of traffic congestion incurs numerous social and economical repercussions and has thus become a central issue in every major city in the world. For this work we look at the transportation domain from a multiagent system perspective, where every driver can be seen as an autonomous decision-making agent. We explore how learning approaches can help achieve an efficient outcome, even when agents interact in a competitive environment for sharing common resources. To this end, we consider the route choice problem, where self-interested drivers need to independently learn which routes minimise their expected travel costs. Such a selfish behaviour results in the so-called user equilibrium, which is inefficient from the system’s perspective. In order to mitigate the impact of selfishness, we present Toll-based Q-learning (TQ-learning, for short). TQ-learning employs the idea of marginal-cost tolling (MCT), where each driver is charged according to the cost it imposes on others. The use of MCT leads agents to behave in a socially desirable way such that the is attainable. In contrast to previous works, however, our tolling scheme is distributed (i.e., each agent can compute its own toll), is charged a posteriori (i.e., at the end of each trip), and is fairer (i.e., agents pay exactly their marginal costs). Additionally, we provide a general formulation of the toll values for univariate, homogeneous polynomial cost functions. We present a theoretical analysis of TQ-learning, proving that it converges to a system-efficient equilibrium (i.e., an equilibrium aligned to the system optimum) in the limit. Furthermore, we perform an extensive empirical evaluation on realistic road networks to support our theoretical findings, showing that TQ-learning indeed converges to the optimum, which translates into a reduction of the congestion levels by 9.1%, on average.

Type: Adaptive and Learning Agents
Information: The Knowledge Engineering Review , Volume 35 , 2020 , e8

DOI: https://doi.org/10.1017/S0269888920000119 [Opens in a new window]
Copyright: © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abdallah, S. & Lesser, V. 2006. Learning the task allocation game, In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS ’06, ACM Press, 850–857.Google Scholar

Agogino, A. K. & Tumer, K. 2004. Unifying temporal and structural credit assignment problems, In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS ’04, IEEE, 980–987.Google Scholar

Auer, P., Cesa-Bianchi, N., Freund, Y. & Schapire, R. E. 2002. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77.CrossRef Google Scholar

Awerbuch, B. & Kleinberg, R. D. 2004. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’04, ACM, 45–53.Google Scholar

Bar-Gera, H. 2010. Traffic assignment by paired alternative segments. Transportation Research Part B: Methodological 44(8–9), 1022–1046.CrossRef Google Scholar

Bazzan, A. L. C. 2009. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Autonomous Agents and Multiagent Systems 18(3), 342–375.CrossRef Google Scholar

Bazzan, A. L. C. & Klügl, F. 2005. Case studies on the Braess paradox: simulating route recommendation and learning in abstract and microscopic models. Transportation Research C 13(4), 299–319.CrossRef Google Scholar

Bazzan, A. L. C. & Klügl, F. 2013. Introduction to Intelligent Systems inTraffic and Transportation, Vol. 7. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan and Claypool.Google Scholar

Beckmann, M., McGuire, C. B. & Winsten, C. B. 1956. Studies in the Economics of Transportation, Yale University Press, New Haven.Google Scholar

Bonifaci, V., Salek, M. & Schäfer, G. 2011. Efficiency of restricted tolls in non-atomic network routing games. In Algorithmic Game Theory: Proceedings of the 4th International Symposium (SAGT 2011), Persiano, G. (ed). Springer, 302–313.Google Scholar

Bowling, M. 2005. Convergence and no-regret in multiagent learning. In Saul, L. K., Weiss, Y. & Bottou, L., (eds.) Advances in Neural Information Processing Systems 17: Proceedings of the 2004 Conference, MIT Press, 209–216.Google Scholar

Boyan, J. A. & Littman, M. L. 1994. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems 6, 671–678.Google Scholar

Braess, D. 1968. Über ein Paradoxon aus der Verkehrsplanung. Unternehmensforschung 12, 258.Google Scholar

Buşoniu, L., Babuska, R. & De Schutter, B. 2008. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 38(2), 156–172.CrossRef Google Scholar

Chen, H., An, B., Sharon, G., Hanna, J. P., Stone, P., Miao, C. & Soh, Y. C. 2018. DyETC: Dynamic electronic toll collection for traffic congestion alleviation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), number February, AAAI Press, 757–765.Google Scholar

Chen, P.-A. & Kempe, D. 2008. Altruism, selfishness, and spite in traffic routing. In Proceedings of the 9th ACM Conference on Electronic Commerce (EC ’08), Riedl, J. & Sandholm, T. (eds.), ACM Press, 140–149.Google Scholar

Claus, C. & Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 746–752.Google Scholar

Colby, M., Duchow-Pressley, T., Chung, J. J. & Tumer, K. 2016. Local approximation of difference evaluation functions. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), IFAAMAS, Singapore, 521–529.Google Scholar

Cole, R., Dodis, Y. & Roughgarden, T. 2003. Pricing network edges for heterogeneous selfish users. In Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, STOC ’03, ACM, 521–530.Google Scholar

Crites, R. H. & Barto, A. G. 1998. Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2), 235–262.CrossRef Google Scholar

de Palma, A. & Lindsey, R. 2011. Traffic congestion pricing methodologies and technologies. Transportation Research Part C: Emerging Technologies 19(6), 1377–1399.CrossRef Google Scholar

Fehr, E. & Fischbacher, U. 2003. The nature of human altruism. Nature 425(6960), 785–791.CrossRef Google Scholar PubMed

Fleischer, L., Jain, K. & Mahdian, M. 2004. Tolls for heterogeneous selfish users in multicommodity networks and generalized congestion games. In 45th Annual IEEE Symposium on Foundations of Computer Science, IEEE, Rome, Italy, 277–285.Google Scholar

Foerster, J., Nardell, N., Farquhar, G., Afouras, T., Torr, P. H., Kohli, P. & Whiteson, S. 2017. Stabilising experience replay for deep multi-agent reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, 70, PMLR, 1146–1155.Google Scholar

Centre for Economics and Business Research 2014. The Future Economic and Environmental Costs of Gridlock in 2030, Technical report, Centre for Economics and Business Research, London.Google Scholar

Hernandez-Leal, P., Zhan, Y., Taylor, M. E., Sucar, L. E. & Munoz de Cote, E. 2017. An exploration strategy for non-stationary opponents. Autonomous Agents and Multi-Agent Systems 31(5), 971–1002.Google Scholar

Hoefer, M. & Skopalik, A. 2009. Altruism in atomic congestion games. In 17th Annual European Symposium on Algorithms, Fiat, A. & Sanders, P. (eds.), Springer, Berlin Heidelberg, 179–189.Google Scholar

Hu, J. & Wellman, M. P. 1998. Multiagent reinforcement learning: Theoretical framework and an algorithm. In Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, 242–250.Google Scholar

Hu, J. & Wellman, M. P. 2003. Nash q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069.Google Scholar

Jayakrishnan, R., Cohen, M., Kim, J., Mahmassani, H. S. & Hu, T.-Y. 1993. A Simulation-Based Framework for the Analysis of Traffic Networks Operating with Real-Time Information, Technical Report UCB-ITS-PRR-93-25, University of California, Berkeley.Google Scholar

Kaisers, M. & Tuyls, K. 2010. Frequency adjusted multi-agent q-learning. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 309–316.Google Scholar

Kobayashi, K. & Do, M. 2005. The informational impacts of congestion tolls upon route traffic demands. Transportation Research A 39(7–9), 651–670.Google Scholar

Koutsoupias, E. & Papadimitriou, C. 1999. Worst-case equilibria. In Proceedings of the 16th Annual Conference on Theoretical Aspects of Computer Science (STACS), Springer-Verlag, 404–413.Google Scholar

Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Tuyls, K., Perolat, J., Silver, D. & Graepel, T. 2017. A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems, Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S. & Garnett, R. (eds.), 30, Curran Associates, Inc., 4190–4203.Google Scholar

Lauer, M. & Riedmiller, M. 2004. Reinforcement learning for stochastic cooperative multi-agent-systems. Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2004 3, 1516–1517.Google Scholar

Laurent, G. J., Matignon, L. & Le Fort-Piat, N. 2011. The world of independent learners is not markovian. International Journal of Knowledge-based and Intelligent Engineering Systems 15(1), 55–64.CrossRef Google Scholar

LeBlanc, L. J., Morlok, E. K. & Pierskalla, W. P. 1975. An efficient approach to solving the road network equilibrium traffic assignment problem. Transportation Research 9(5), 309–318.Google Scholar

Levy, N. & Ben-Elia, E. 2016. Emergence of system optimum: A fair and altruistic agent-based route-choice model. Procedia Computer Science 83, 928–933.CrossRef Google Scholar

Littman, M. L. 1994. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the 11th International Conference on Machine Learning, ML, Morgan Kaufmann, 157–163.Google Scholar

Littman, M. L. 2001. Friend-or-Foe Q-learning in general-sum games. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML’01). Morgan Kaufmann, 322–328.Google Scholar

Lujak, M., Giordani, S. & Ossowski, S. 2015. Route guidance: Bridging system and user optimization in traffic assignment. Neurocomputing 151, 449–460.Google Scholar

Malialis, K., Devlin, S. & Kudenko, D. 2016. Resource abstraction for reinforcement learning in multiagent congestion problems. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 503–511.Google Scholar

Matignon, L., Laurent, G. J. & Le Fort-Piat, N. 2012. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review 27(1), 1–31.CrossRef Google Scholar

McFadden, D. 2001. Disaggregate behavioral travel demand’s RUM side. In Travel Behaviour Research: The Leading Edge, Hensher, D. A. (ed), Elsevier, 17–63.Google Scholar

Meir, R. & Parkes, D. 2018. Playing the wrong game: Bounding externalities in diverse populations of agents. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2018), IFAAMAS, Stockholm, 86–94.Google Scholar

Meir, R. & Parkes, D. C. 2016. When are marginal congestion tolls optimal? In Proceedings of the Ninth Workshop on Agents in Traffic and Transportation (ATT-2016), Bazzan, A. L. C., Klügl, F., Ossowski, S. & Vizzari, G. (eds). CEUR-WS.org, 8.Google Scholar

Mirzaei, H., Sharon, G., Boyles, S., Givargis, T. & Stone, P. 2018. Link-based parameterized micro-tolling scheme for optimal traffic management. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’18, Dastani, M., Sukthankar, G., André, E. & Koenig, S. (eds). IFAAMAS, 2013–2015.Google Scholar

Nash, J. 1950. Non-Cooperative Games, PhD thesis, Princeton University.Google Scholar

National Surface Transportation Infrastructure Financing Commission 2009. Paying Our Way: A New Framework for Transportation Finance, Technical report, National Surface Transportation Infrastructure Financing Commission, Washington DC.Google Scholar

Bureau of Public Roads 1964. Traffic Assignment Manual, Technical report, US Department of Commerce, Washington, D. C.Google Scholar

Omidshafiei, S., Pazis, J., Amato, C., How, J. P. & Vian, J. 2017. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In Proceedings of the 34th International Conference on Machine Learning, 70, 4108–4122.Google Scholar

Ortúzar, J. d. D. & Willumsen, L. G. 2011. Modelling Transport, 4 edition, John Wiley & Sons.CrossRef Google Scholar

Pigou, A. 1920. The Economics of Welfare, Palgrave Classics in Economics, Palgrave Macmillan.Google Scholar

Proper, S. & Tumer, K. 2012. Modeling difference rewards for multiagent learning. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2012), Conitzer, V., Winikoff, M., Padgham, L. & van der Hoek, W. (eds). IFAAMAS.Google Scholar

Rădulescu, R., Vrancx, P. & Nowé, A. 2017. Analysing congestion problems in multi-agent reinforcement learning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, International Foundation for Autonomous Agents and Multiagent Systems, 1705–1707.Google Scholar

Ramos, G. de O. 2018. Regret Minimisation and System-Efficiency in Route Choice, PhD thesis, Universidade Federal do Rio Grande do Sul, Porto Alegre.Google Scholar

Ramos, G. de O. & Bazzan, A. L. C. 2015. Towards the user equilibrium in traffic assignment using GRASP with path relinking. In Proceedings of the 2015 on Genetic and Evolutionary Computation Conference, GECCO ’15, ACM, 473–480.Google Scholar

Ramos, G. de O. & Bazzan, A. L. C. 2016. Efficient local search in traffic assignment. In 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1493–1500.Google Scholar

Ramos, G. de O., da Silva, B. C. & Bazzan, A. L. C. 2017. Learning to minimise regret in route choice. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das, S., Durfee, E., Larson, K. & Winikoff, M. (eds). IFAAMAS, 846–855.Google Scholar

Ramos, G. de O., da Silva, B. C., Rădulescu, R. & Bazzan, A. L. C. 2018. Learning system-efficient equilibria in route choice using tolls. In Proceedings of the Adaptive Learning Agents Workshop 2018 (ALA-18), Stockholm.Google Scholar

Ramos, G. de O., Rădulescu, R., Nowé, A. & Tavares, A. R. 2020. Toll-based learning for minimising congestion under heterogeneous preferences. In Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020), An, B., Yorke-Smith, N., El Fallah Seghrouchni, A. & Sukthankar, G. (eds). IFAAMAS.Google Scholar

Robbins, H. 1952. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58(5), 527–535.Google Scholar

Rosenthal, R. W. 1973. A class of games possessing pure-strategy Nash equilibria. International Journal of Game Theory 2, 65–67.CrossRef Google Scholar

Roughgarden, T. 2005. Selfish Routing and the Price of Anarchy, MIT Press.Google Scholar

Sandholm, T. 2007. Perspectives on multiagent learning. Artificial Intelligence 171(7), 382–391.CrossRef Google Scholar

Sen, S., Sekaran, M. & Hale, J. 1994. Learning to coordinate without sharing information. In Proceedings of the Twelfth National Conference on Artificial Intelligence, 426–431.Google Scholar

Sharon, G., Boyles, S. D., Alkoby, S. & Stone, P. 2019. Marginal cost pricing with a fixed error factor in traffic networks. In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Agmon, N., Taylor, M., Elkind, E. & Veloso, M. (eds). IFAAMAS, Montreal, 1539–1546.Google Scholar

Sharon, G., Hanna, J. P., Rambha, T., Levin, M. W., Albert, M., Boyles, S. D. & Stone, P. 2017. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Das, S., Durfee, E., Larson, K. & Winikoff, M. (eds). IFAAMAS, 828–836.Google Scholar

Stefanello, F. & Bazzan, A. L. C. 2016. Traffic Assignment Problem – Extending Braess Paradox, Technical report, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS.Google Scholar

Sutton, R. & Barto, A. 1998. Reinforcement Learning: An Introduction, MIT Press.CrossRef Google Scholar

Tan, M. 1993. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, 330–337.Google Scholar

Tavares, A. R. & Bazzan, A. L. 2014. An agent-based approach for road pricing: System-level performance and implications for drivers. Journal of the Brazilian Computer Society 20(1), 15.CrossRef Google Scholar

Tesauro, G. 1994. Td-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 6(2), 215–219.CrossRef Google Scholar

Tuyls, K. & Weiss, G. 2012. Multiagent learning: Basics, challenges, and prospects. AI Magazine 33(3), 41–52.CrossRef Google Scholar

van Essen, M., Thomas, T., van Berkum, E. & Chorus, C. 2016. From user equilibrium to system optimum: a literature review on the role of travel information, bounded rationality and non-selfish behaviour at the network and individual levels. Transport Reviews 36(4), 527–548.CrossRef Google Scholar

Verbeeck, K., Nowé, A., Parent, J. & Tuyls, K. 2007. Exploring selfish reinforcement learning in repeated games with stochastic rewards. Autonomous Agents and Multi-Agent Systems 14(3), 239–269.CrossRef Google Scholar

Vrancx, P., Verbeeck, K. & Nowe, A. 2008. Decentralized learning in markov games. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38(4), 976–981.CrossRef Google Scholar PubMed

Vrancx, P., Verbeeck, K. & Nowé, A. 2010. Learning to take turns. In Proceedings of the AAMAS 2010 Workshop on Adaptive Learning Agents and Multi-Agent Systems (ALA 2010), 1–7.Google Scholar

Wardrop, J. G. 1952. Some theoretical aspects of road traffic research. Proceedings of the Institution of Civil Engineers, Part II 1(36), 325–362.Google Scholar

Watkins, C. J. C. H. & Dayan, P. 1992. Q-learning. Machine Learning 8(3), 279–292.CrossRef Google Scholar

Wolpert, D. H. & Tumer, K. 1999. An introduction to Collective Intelligence, Technical report NASA-ARC-IC-99-63, NASA Ames Research Center. arXiv:cs/9908014 [cs.LG].Google Scholar

Wolpert, D. H. & Tumer, K. 2002. Collective intelligence, data routing and Braess’ paradox. Journal of Artificial Intelligence Research 16, 359–387.Google Scholar

Yang, H., Meng, Q. & Lee, D.-H. 2004. Trial-and-error implementation of marginal-cost pricing on networks in the absence of demand functions. Transportation Research Part B: Methodological 38(6), 477–493.CrossRef Google Scholar

Ye, H., Yang, H. & Tan, Z. 2015. Learning marginal-cost pricing via a trial-and-error procedure with day-to-day flow dynamics. Transportation Research Part B: Methodological 81, 794–807.CrossRef Google Scholar

Yen, J. Y. 1971. Finding the k shortest loopless paths in a network. Management Science 17(11), 712–716.CrossRef Google Scholar

Youn, H., Gastner, M. T. & Jeong, H. 2008. Price of anarchy in transportation networks: Efficiency and optimality control. Physical Review Letters 101(12), 128701.CrossRef Google Scholar

Zhang, J., Pourazarm, S., Cassandras, C. G. & Paschalidis, I. C. 2016. The price of anarchy in transportation networks by estimating user cost functions from actual traffic data. In 2016 IEEE 55th Conference on Decision and Control (CDC), IEEE, 789–794.Google Scholar

Zinkevich, M. 2003. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, AAAI Press, 928–936.Google Scholar

Article contents

Toll-based reinforcement learning for efficient equilibria in route choice

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests