Hostname: page-component-586b7cd67f-rcrh6 Total loading time: 0 Render date: 2024-11-25T16:14:08.507Z Has data issue: false hasContentIssue false

A spacecraft attitude manoeuvre planning algorithm based on improved policy gradient reinforcement learning

Published online by Cambridge University Press:  14 December 2021

Bing Hua*
Affiliation:
School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Shenggang Sun
Affiliation:
School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Yunhua Wu
Affiliation:
School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Zhiming Chen
Affiliation:
School of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing, China
*
*Corresponding author: E-mail: [email protected]

Abstract

To solve the problem of spacecraft attitude manoeuvre planning under dynamic multiple mandatory pointing constraints and prohibited pointing constraints, a systematic attitude manoeuvre planning approach is proposed that is based on improved policy gradient reinforcement learning. This paper presents a succinct model of dynamic multiple constraints that is similar to a real situation faced by an in-orbit spacecraft. By introducing return baseline and adaptive policy exploration methods, the proposed method overcomes issues such as large variances and slow convergence rates. Concurrently, the required computation time of the proposed method is markedly reduced. Using the proposed method, the near optimal path of the attitude manoeuvre can be determined, making the method suitable for the control of micro spacecraft. Simulation results demonstrate that the planning results fully satisfy all constraints, including six prohibited pointing constraints and two mandatory pointing constraints. The spacecraft also maintains high orientation accuracy to the Earth and Sun during all attitude manoeuvres.

Type
Research Article
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press on behalf of The Royal Institute of Navigation

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Celani, F and Lucarelli, D. (2019). Spacecraft Attitude Motion Planning Using Gradient-Based Optimization. Journal of Guidance, Control, and Dynamics, 43(6), 16.Google Scholar
Chen, T. D., Huang, Y. Y. and Zhang, Y. L. (2019). Non-trap dynamic path planning based on collision risk. Systems Engineering and Electronics, 41(11), 24962506.Google Scholar
Cheng, Y. (2019). Research on Spacecraft Attitude Control Methods with State Constraints (Doctoral dissertation, Harbin Institute of Technology).Google Scholar
Cui, P., Zhong, W. and Cui, H. (2017). Onboard Spacecraft Slew-Planning by Heuristic State-Space Search and Optimization. International Conference on Mechatronics and Automation, IEEE Publ, Piscataway, NJ, USA, 2115–2119.Google Scholar
Greensmith, E., Bartlett, P. L. and Baxter, J. (2004). Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 2004(5), 14711530.Google Scholar
Guo, Y. J., Liu, Q., Bao, J. K., Xu, F. and Lv, W. H. (2020). Overview of AUV obstacle avoidance algorithm based on artificial potential field method. Computer Engineering and Applications, 56(04), 1623.Google Scholar
Hablani, H. B. (1999). Attitude commands avoiding bright objects and maintaining communication with ground station. Journal of Guidance, Control, and Dynamics, 22(6), 759767.CrossRefGoogle Scholar
Huang, X. X., Li, S., Sun, P., Liu, X. W., Yang, B. and Liu, X. Y. (2021). Review of spacecraft guidance and control based on artificial intelligence. Acta Aeronautica et Astronautica Sinica, 42(X), 524201 (in Chinese).Google Scholar
Kim, Y. and Mesbahi, M. (2004). Quadratically constrained attitude control via semidefinite programming. IEEE Transactions on Automatic Control, 49(5), 731735.CrossRefGoogle Scholar
Kim, J. Y., Kim, Y. S., Kim, Y. K., Park, H. J., Kim, S. J., Kang, J. H., Wang, Y. P., Jang, H. S., Lee, S. N. and Yoon, S. C. (2015). On the convex parametrization of spacecraft orientation in presence of constraints and its applications. IEEE Transactions on Aerospace and Electronic Systems, 2010, 46(3), 10971109.CrossRefGoogle Scholar
Kjellberg, H. C. and Lightsey, E. G. (2016). Discretized quaternion constrained attitude path finding. Journal of Guidance, Control, and Dynamics, 39(3), 713718.CrossRefGoogle Scholar
Lee, U. and Mesbahi, M. (2011) Spacecraft Reorientation in Presence of Attitude Constraints via Logarithmic Barrier Potentials. The 37th American Control Conference, San Francisco, USA, 29 June–1 July.Google Scholar
Liu, J. W., Gao, F. and Luo, X. L. (2019). Survey of deep reinforcement learning based on value function and policy gradient. Chinese Journal of Computers, 42(06), 14061438.Google Scholar
Ma, G. F., Liu, M. M., Wang, L. Y. and Guo, Y. N. (2020). Spacecraft backstepping attitude control considering multiple forbidden pointing regions. Journal of Astronautics, 41(08), 10421048.Google Scholar
Miao, Y. M. and Pan, T. (2015). Path planning method for spacecraft attitude slew to avoid forbidden celestial. Spacecraft Engineering, 24(04), 3337.Google Scholar
Olivares, A. and Staffetti, E. (2018). Time-optimal attitude scheduling of a spacecraft equipped with reaction wheels. International Journal of Aerospace Engineering. 2018, 1–14.CrossRefGoogle Scholar
Reed, R. S. (2020). A Machine Learning Approach to Enable Mission Planning of Time-Optimal Attitude Maneuver. Naval Postgraduate School, Monterey, California; Naval Postgraduate School.Google Scholar
Showalter, D. J. and Black, J. T. (2014). Responsive theater maneuvers via particle swarm optimization. Journal of Spacecraft and Rockets, 51(6), 19761985.CrossRefGoogle Scholar
Siliang, Y. and Shijie, X. (2010). Spacecraft attitude maneuver planning based on particle swarm optimization. Journal of Beijing University of Aeronautics and Astronautics, 2010(1), 4851.Google Scholar
Singh, G., Macala, G., Wong, E. and Rasmussen, R. D. (1997). A Constraint Monitor Algorithm for the Cassini Spacecraft. AIAA Journal, 1997.CrossRefGoogle Scholar
Spiller, D., Ansalone, L. and Curti, F. (2016). Particle swarm optimization for time-optimal spacecraft reorientation with keep-out cones. Journal of Guidance, Control, and Dynamics, 39(2), 312325.CrossRefGoogle Scholar
Tang, Z. T., Shao, K., Zhao, D. B. and Zhu, Y. H. (2017). Recent progress of deep reinforcement learning: from AlphaGo to AlphaGo zero. Control Theory & Applications, 34(12), 15291546.Google Scholar
Tanygin, S. (2017). Fast autonomous three-axis constrained attitude pathfinding and visualization for boresight alignment. Journal of Guidance, Control, and Dynamics, 40(2), 358370.CrossRefGoogle Scholar
Wang, G. F., Fang, Z. and Li, P. (2015). Natural actor-critic based on batch recursive least-squares. Journal of Zhejiang University (Engineering Science), 49(07), 13351342.Google Scholar
Wu, C. Q., Xu, R. and Zhu, S. Y. (2015). Deep space explorer attitude planning and control method based on logarithmic potential function. Journal of Deep Space Exploration, 2(04), 365370.Google Scholar
Xu, R., Wang, H., Zhu, S., Jiang, H. and Li, Z. (2020). Multiobjective planning for spacecraft reorientation under complex pointing constraints. Aerospace Science and Technology, 104, 106002.CrossRefGoogle Scholar
Zhao, T., Gang, N., Ning, X., Yang, J. C. and Masashi, S. (2015). Regularized policy gradients: direct variance reduction in policy gradient estimation. JMLR: Workshop and Conference Proceedings, 45, 333348.Google Scholar