A spacecraft attitude manoeuvre planning algorithm based on improved policy gradient reinforcement learning

Bing Hua; Shenggang Sun; Yunhua Wu; Zhiming Chen

doi:10.1017/S0373463321000813

A spacecraft attitude manoeuvre planning algorithm based on improved policy gradient reinforcement learning

Published online by Cambridge University Press: 14 December 2021

Bing Hua ,

Shenggang Sun ,

Yunhua Wu and

Zhiming Chen

Show author details

Bing Hua*: Affiliation:
School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Shenggang Sun: Affiliation:
School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Yunhua Wu: Affiliation:
School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Zhiming Chen: Affiliation:
School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
*: *Corresponding author: E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

To solve the problem of spacecraft attitude manoeuvre planning under dynamic multiple mandatory pointing constraints and prohibited pointing constraints, a systematic attitude manoeuvre planning approach is proposed that is based on improved policy gradient reinforcement learning. This paper presents a succinct model of dynamic multiple constraints that is similar to a real situation faced by an in-orbit spacecraft. By introducing return baseline and adaptive policy exploration methods, the proposed method overcomes issues such as large variances and slow convergence rates. Concurrently, the required computation time of the proposed method is markedly reduced. Using the proposed method, the near optimal path of the attitude manoeuvre can be determined, making the method suitable for the control of micro spacecraft. Simulation results demonstrate that the planning results fully satisfy all constraints, including six prohibited pointing constraints and two mandatory pointing constraints. The spacecraft also maintains high orientation accuracy to the Earth and Sun during all attitude manoeuvres.

Keywords

spacecraft attitude manoeuvre attitude constraints path planning reinforcement learning policy gradient

Type: Research Article
Information: The Journal of Navigation , Volume 75 , Issue 3 , May 2022 , pp. 662 - 684

DOI: https://doi.org/10.1017/S0373463321000813 [Opens in a new window]
Copyright: Copyright © The Author(s), 2021. Published by Cambridge University Press on behalf of The Royal Institute of Navigation

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Celani, F and Lucarelli, D. (2019). Spacecraft Attitude Motion Planning Using Gradient-Based Optimization. Journal of Guidance, Control, and Dynamics, 43(6), 1–6.Google Scholar

Chen, T. D., Huang, Y. Y. and Zhang, Y. L. (2019). Non-trap dynamic path planning based on collision risk. Systems Engineering and Electronics, 41(11), 2496–2506.Google Scholar

Cheng, Y. (2019). Research on Spacecraft Attitude Control Methods with State Constraints (Doctoral dissertation, Harbin Institute of Technology).Google Scholar

Cui, P., Zhong, W. and Cui, H. (2017). Onboard Spacecraft Slew-Planning by Heuristic State-Space Search and Optimization. International Conference on Mechatronics and Automation, IEEE Publ, Piscataway, NJ, USA, 2115–2119.Google Scholar

Greensmith, E., Bartlett, P. L. and Baxter, J. (2004). Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 2004(5), 1471–1530.Google Scholar

Guo, Y. J., Liu, Q., Bao, J. K., Xu, F. and Lv, W. H. (2020). Overview of AUV obstacle avoidance algorithm based on artificial potential field method. Computer Engineering and Applications, 56(04), 16–23.Google Scholar

Hablani, H. B. (1999). Attitude commands avoiding bright objects and maintaining communication with ground station. Journal of Guidance, Control, and Dynamics, 22(6), 759–767.CrossRef Google Scholar

Huang, X. X., Li, S., Sun, P., Liu, X. W., Yang, B. and Liu, X. Y. (2021). Review of spacecraft guidance and control based on artificial intelligence. Acta Aeronautica et Astronautica Sinica, 42(X), 524201 (in Chinese).Google Scholar

Kim, Y. and Mesbahi, M. (2004). Quadratically constrained attitude control via semidefinite programming. IEEE Transactions on Automatic Control, 49(5), 731–735.CrossRef Google Scholar

Kim, J. Y., Kim, Y. S., Kim, Y. K., Park, H. J., Kim, S. J., Kang, J. H., Wang, Y. P., Jang, H. S., Lee, S. N. and Yoon, S. C. (2015). On the convex parametrization of spacecraft orientation in presence of constraints and its applications. IEEE Transactions on Aerospace and Electronic Systems, 2010, 46(3), 1097–1109.CrossRef Google Scholar

Kjellberg, H. C. and Lightsey, E. G. (2016). Discretized quaternion constrained attitude path finding. Journal of Guidance, Control, and Dynamics, 39(3), 713–718.CrossRef Google Scholar

Lee, U. and Mesbahi, M. (2011) Spacecraft Reorientation in Presence of Attitude Constraints via Logarithmic Barrier Potentials. The 37th American Control Conference, San Francisco, USA, 29 June–1 July.Google Scholar

Liu, J. W., Gao, F. and Luo, X. L. (2019). Survey of deep reinforcement learning based on value function and policy gradient. Chinese Journal of Computers, 42(06), 1406–1438.Google Scholar

Ma, G. F., Liu, M. M., Wang, L. Y. and Guo, Y. N. (2020). Spacecraft backstepping attitude control considering multiple forbidden pointing regions. Journal of Astronautics, 41(08), 1042–1048.Google Scholar

Miao, Y. M. and Pan, T. (2015). Path planning method for spacecraft attitude slew to avoid forbidden celestial. Spacecraft Engineering, 24(04), 33–37.Google Scholar

Olivares, A. and Staffetti, E. (2018). Time-optimal attitude scheduling of a spacecraft equipped with reaction wheels. International Journal of Aerospace Engineering. 2018, 1–14.CrossRef Google Scholar

Reed, R. S. (2020). A Machine Learning Approach to Enable Mission Planning of Time-Optimal Attitude Maneuver. Naval Postgraduate School, Monterey, California; Naval Postgraduate School.Google Scholar

Showalter, D. J. and Black, J. T. (2014). Responsive theater maneuvers via particle swarm optimization. Journal of Spacecraft and Rockets, 51(6), 1976–1985.CrossRef Google Scholar

Siliang, Y. and Shijie, X. (2010). Spacecraft attitude maneuver planning based on particle swarm optimization. Journal of Beijing University of Aeronautics and Astronautics, 2010(1), 48–51.Google Scholar

Singh, G., Macala, G., Wong, E. and Rasmussen, R. D. (1997). A Constraint Monitor Algorithm for the Cassini Spacecraft. AIAA Journal, 1997.CrossRef Google Scholar

Spiller, D., Ansalone, L. and Curti, F. (2016). Particle swarm optimization for time-optimal spacecraft reorientation with keep-out cones. Journal of Guidance, Control, and Dynamics, 39(2), 312–325.CrossRef Google Scholar

Tang, Z. T., Shao, K., Zhao, D. B. and Zhu, Y. H. (2017). Recent progress of deep reinforcement learning: from AlphaGo to AlphaGo zero. Control Theory & Applications, 34(12), 1529–1546.Google Scholar

Tanygin, S. (2017). Fast autonomous three-axis constrained attitude pathfinding and visualization for boresight alignment. Journal of Guidance, Control, and Dynamics, 40(2), 358–370.CrossRef Google Scholar

Wang, G. F., Fang, Z. and Li, P. (2015). Natural actor-critic based on batch recursive least-squares. Journal of Zhejiang University (Engineering Science), 49(07), 1335–1342.Google Scholar

Wu, C. Q., Xu, R. and Zhu, S. Y. (2015). Deep space explorer attitude planning and control method based on logarithmic potential function. Journal of Deep Space Exploration, 2(04), 365–370.Google Scholar

Xu, R., Wang, H., Zhu, S., Jiang, H. and Li, Z. (2020). Multiobjective planning for spacecraft reorientation under complex pointing constraints. Aerospace Science and Technology, 104, 106002.CrossRef Google Scholar

Zhao, T., Gang, N., Ning, X., Yang, J. C. and Masashi, S. (2015). Regularized policy gradients: direct variance reduction in policy gradient estimation. JMLR: Workshop and Conference Proceedings, 45, 333–348.Google Scholar

Article contents

A spacecraft attitude manoeuvre planning algorithm based on improved policy gradient reinforcement learning

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests