1. Introduction
Underactuated systems feature a class of systems whose state Degree-of-Freedom (DoF) is greater than its number of control inputs. This kind of systems are easily witnessed in a wide range in practice, for example, wheeled robots [Reference Chen, Liu, He, Qiao and Ji1], underwater vehicles [Reference Heshmati-alamdari, Nikou and Dimarogonas2], flexible robot systems [Reference Liu, Zhan, Xing, Wu, Xu and Wu3], etc. One critical advantage of underactuated systems over fully and over-actuated systems is that they require less cost and have less complexity due to lack of control inputs. Nonetheless, due to the exact same reasons, the control problem of underactuated systems has been a heated research direction. Among all underactuated systems, cartpole system has been a classic benchmark model that absorbs uncertainty, coupling, nonlinearity, non-minimum phase, multivariable and instability, which encompasses a majority of other underactuated systems. Therefore, the research into cartpole system has a fundamental significance to gain insight into other system dynamics [Reference Messikh, Guechi and Blai4].
Over the decades, various methods have been constantly proposed for the stabilization of cartpole system. Some early work tried to linearize the nonlinear model of cartpole near the equilibrium location and then implemented linear controllers. This simplification usually ensures stability near the equilibrium. Well-known examples include Proportional-Integral-Derivative (PID) controller and Linear-Quadratic-Regulator [Reference Banerjee and Pal5–Reference Eizadiyan and Naseriyan7]. Nonetheless, the linearization procedure impairs the accuracy of the dynamics and therefore cannot achieve large-scale stability [Reference Slotine and Li8]. Backstepping has been one of the most widely researched method for cartpole system control problem. Shao et al. adopted a state-feedback-based backstepping controller for the tracking and switching control of cartpole systems [Reference Shao and Li9]. Targeting at underactuated systems, Jiang et al. proposed an underactuated backstepping method for a class of underactuated systems [Reference Jiangand and Astolfi10]. Compared with conventional backstepping, this method has a systematic solution to a class of systems. However, the tuning and selection of those control matrices still remain an open question. Adaptive and robust control methods also received public attention. An adaptive optimal fuzzy controller based on feedback linearization and sliding mode control was proposed in ref. [Reference Lakmesari, Mahmoodabadi and Ibrahim11] for cartpole systems. Fuzzy logic system and gradient descent were combined to tune the parameters and a multi-object optimization algorithm was used to adjust sliding mode control gain. In ref. [Reference Dao and Liu12], an adaptive output-feedback optimal control was combined with integral sliding mode control for wheeled inverted pendulum under disturbance. The integral sliding mode controller was responsible for finite-time convergence, and adaptive dynamic programming was to deal with coupled uncertainties. In ref. [Reference Ordaz and Poznyak13], an adaptive control scheme is proposed based on Adaptive Ellipsoid Method (AEM) to tune the gain matrices of the observer and controller. The experiments on an underactuated vertical double pendulum with uncertainties illustrate the superiority over conventional AEM controller. There are many other control methods implemented case-by-case to cartpole systems, including energy-based [Reference Kennedy, King and Tran14], state-feedback controller [Reference Ranasinghe, Manoharan, Pallegedara and Kodithuwakku15], neuro network [Reference Ratolikar and Kumar16], and so on. Besides, more other methods were targeting at other underactuated systems which could easily be extended to cartpole system, for example, event-triggered dynamic surface control [Reference Peng, Jiang and Wang17], fast terminal sliding mode control [Reference Rojsiraphisal, Mobayen, Asad, Vu, Chang and Puangmalai18], etc. Nonetheless, most of the above-mentioned methods are tedious in design, which hinders their application in industry.
One big class of methods that are emergent recently is learning-based method. The common feature of this class of approaches lies in using huge amount of data to train a specific controller and optimizes a designed objective function. A milestone research in this direction was conducted by Google DeepMind, which used deep Q-learning to complete the control task of cartpole [Reference Lillicrap, Hunt, Pritzel, Heess, Erez, Tassa, Silver and Wierstra19]. A forward neural network was implemented to approximate the Q-values of state-action pairs. Shi et al. combined type-1 Fuzzy Logic System (FLS) with Reinforcement Learning to achieve robust cartpole control [Reference Shi, Lam, Xuan and Chen20]. The FLS was implemented as an encoder to cope with the uncertainty of the system, and RL was to find optimal policy that minimizes the tracking error. Hiremath et al. applied a deep neural network-based gated-recurrent-units (GRUs) method for the stabilization and tracking problem of constrained stochastic cartpole system [Reference Hiremath and Bajçinca21]. Nevertheless, this class of methods usually require too much data to train the model. Besides, the black-box models trained make the inner dynamics intractable.
Therefore, borrowing ideas from conventional PID controller, this paper proposes a control method for underactuated cartpole system. The advantages of conventional PID controller are simple to design and intuitive to understand. Borrowing ideas from backstepping method, the work here extends the conventional PID controller to a cascaded version, which helps establish internal control targets. This manipulation increases the order of controller. In this way, the merits of PID controller can be maintained, while it can be implemented directly without linearization procedures. Contrary to widely accepted backstepping approach, the proposed method does not suffer from exploding terms or complex coordinate transformation technique. What is more, while most of the analysis of previous PID research lies in using linearization and transfer function, this paper proposes to implement Jacobian matrix-based stability analysis, which is applicable to any differentiable nonlinear system dynamics. The contributions of this article are summarized as follows:
• Propose a unique cascaded PD controller, which transforms the pole dynamics into a virtual PD controller and using the coupling term as the design variable for the second PD controller design. This model-based method absorbs the simplicity and intuitiveness of a conventional PID controller, but exploits the system dynamics in the meantime.
• Introduce a stability analysis method for the fourth-order cascaded PD controller using the Jacobian matrix of the residual system, although it concludes only locally asymptotic stability. This presents a novel way to approach stability analysis in this context, with the potential to be used for parameter design.
• Achieve automation of complex derivations and design processes through symbolic calculations on a PC, allowing for more efficient design and validation.
• This paper includes a comprehensive analysis on both the linearized and original nonlinear models, providing a thorough examination of the proposed method’s applicability.
• Simulation results reveal the proposed method’s advantages in stabilizing both the cart and the pole simultaneously, showing superior performance over widely used double-loop PD controllers. The robustness against Coulomb friction and random noise is also demonstrated.
The rest of the paper is organized as follows. In Section 2, some background rationale is introduced. Firstly, the dynamic model of the cartpole system is given, both linear and nonlinear. Secondly, conventional PD controller is presented, which serves as the basis of the proposed method. Section 3 articulates the design process of the proposed method. The overall framework and workflow are foremost described. Then, the design process for linear and nonlinear dynamics is presented, followed with stability analysis procedures. Section 4 illustrates the results in simulation. The system responses are depicted, and further analysis is carried out using Jacobian matrix. Section 5 concludes the article and points out many potent further research directions.
2. Preliminaries
2.1. Dynamic model description
This section introduces the structure as well as dynamics of the cartpole system to be investigated later in this research. Both nonlinear and linear version of the dynamics will be presented, and the controller design is to be carried out on both. The inclusion of linear version model is to present the method more clearly, as the nonlinear model of cartpole system is so complicated that the readers may be distracted from the mathematics instead of the workflow of the proposed control method.
Figure 1 illustrates the conceptual structure of the cartpole system. The system, as the name suggests, is composed of a cart, to which a pole is connected on top of it. The goal of control in this regard is to keep the pole upstraight for as long as possible. In the meantime, it is reasonably required to reduce the movement of the cart during the process. The (nonlinear) dynamics of the system can be expressed as follows [Reference Lam and Leung22]. In Eq. (1) and Fig. 1, $u$ is the control force (N), $x_1,x_2$ are the angular position and angular velocity of the pole $(\text{rad}, \text{rad/s})$ , $x_3,x_4$ are the position and linear velocity of the cart $(\text{m, m/s})$ . l is the length of the pole (m), $M_1$ is the mass of the cart (kg), J is the moment of inertia $(\text{kg m}^2)$ , and $M_2$ is the mass of the pole (kg). Besides, $F_0,F_1$ are the friction factor of the cart and the pole respectively $(\text{N/m/s})$ . g is the gravity coefficient.
To simplify the controller design, the model is frequently linearized near the equilibrium location, namely when $x_1 \approx 0, x_3 \approx 0$ . Therefore, the following approximations hold:
Integrating with Eq. (1), a linearized model of cartpole is derived:
where
It is conceivable that the cartpole system is a highly nonlinear system. High order of trigonometric functions appear both in the denominators and the numerators. Besides, the coupling effect is remarkable. $x_1,x_2,x_4$ and $u$ pose influence on the dynamics of both the cart and the pole. Therefore, the control problem of cartpole system is challenging and has some fundamental influence in control realm.
2.2. Conventional PD controller
Figure 2 is the conceptual structure of a conventional PD controller. $R$ is the reference signal, $e(t)$ is the error in time $t$ , $U$ is the control signal and $Y$ is the output. The core modules of PD controller are proportional and derivative module, and the mathematical expression is [Reference Chen23]
where $k_p,k_d$ are the proportional gain and derivative gain. PD controller is a simplified version of PID controller, which is widely adopted in the industry [Reference Tomei24].
3. Controller design and analysis
3.1. Controllability analysis around equilibrium point
In this section, the controllability of the system near the unstable equilibrium point is analysed, which serves as the foundation for controller design. Consider system dynamics (5) and write down the system matrices as follows:
According to controllability theorem, if matrix $[Z,YZ,Y^2Z,Y^3Z]$ has full rank, then the system is controllable in the equilibrium point. Integrating system parameters in Table II, the following controllability matrix is verified to be full rank.
3.2. Framework overview
Figure 3 is the conceptual framework of proposed semi-implicit cascaded PD controller design. The original fourth-order system of cartpole is considered as two coupled second-order systems. This paper uses “subplant1” to denote the pole dynamics, namely $x_1,x_2$ , and “subplant2” to represent the cart dynamics, which is $x_3,x_4$ . One direct comprehension of the proposed method is to use one PD controller each for two subplants, respectively. It is expected that if both subplants can be stabilized separately, the overall system can be stable. Nonetheless, the coupling effect inside the model determines that a direct realization of such idea will yield unsatisfying performance. Besides, while the reference signal for the first PD controller, namely “PD1,” is given, the tracking target for the second PD controller, “PD2,” is not available and should be determined in some way. The proposed method solves the above-mentioned problems and makes cascaded PD controller feasible in this situation. It is achieved by (1) transforming the subplant1 dynamics into an equivalent virtual PD controller, considering the coupling term of subplant2 as a design variable, and then (2) the desired tracking target for subplant2 is derived in a semi-implicit manner through the coupling term, as well as feedback linearization design of PD2. Finally, a cascaded PD controller can be implemented with the coupling effect exploited and solved.
The workflow of proposed controller is specified in the following. Firstly, the reference signal $X_{1d}$ is input into PD1, where the desired virtual torque for subplant1 is computed. The coupling term from subplant2 is transformed to a design variable. In this way, the coupling term represents the desired position for subplant2 $X_{4d}$ , which helps the dynamics of subplant1 to approximate a PD controller to stabilize subplant1. Up to now, the control $u$ is not calculated, so the desired position for subplant2 cannot be calculated explicitly, and require further information from subplant2. Focusing on subplant2 only, a PD controller with feedback linearization can be easily designed. Combining the expression of the controller for subplant2 and that from subplant1, an equation set should be solved, and the expressions for $u$ as well as $X_{4d}$ are derived. The $X_{4d}$ is then fed into PD2 to complete the control for subplant2. Notice that using Gaussian elimination method [Reference Higham25], $X_{4d}$ appears on both sides of the equations, thus representing a semi-implicit process, resembling that of semi-implicit Euler integration method [Reference Deng and Liu26]. Besides, PD1 is designed and utilized on top of PD2 during the control process, therefore forming a cascaded relationship.
3.3. Semi-explicit cascaded PD controller design for linear approximated model
This section illustrates the design process of the proposed method using the linearized version of dynamic model (5). The purpose of using a linearized simple model is to make the derivation process tractable, thus enabling a clearer presentation of the idea and rationale. The design process based on original nonlinear model (1) will be given in Section 3.5.
Based on Eq. (5), the dynamics for two subplants can be written explicitly as:
In Eq. (18), the coupling term from subplant2 is $Cx_4$ , which will be used as design variable. Borrowing ideas from a serial integrator under the control of a PD controller, if the following equation always holds,
then there must exist certain parameters $k_{p1}, k_{d1}$ that makes subplant1 stable. Here, $k_{p1}, k_{d1}$ are the proportional and derivative gains of PD1 controller, and $e_1=-x_1,e_2=-x_2$ are the angular and angular velocity errors of subplant1. With this assumption satisfied, Eq. (18) becomes a normal second-order system controller by a PD controller as follows:
Noticing the coupling term $x_4$ can be controlled by a first-order system in Eq. (19), Eq. (20) is rewritten as
and that
where $x_{4d}$ defines the desired tracking target for subplant2. With the tracking target calculated, the attention shall be shifted to subplant2, where a PD2 controller with feedback linearization is designed as
where $k_{p2}, k_{d2}$ are the proportional and derivative gains of PD2 controller, and $e_3,e_4$ are the angular and angular velocity errors of subplant2 delicately chosen as
in which $x_r$ is user-defined constant target for the cart position. Notice that $x_{4d}$ is a velocity signal. In order to implement PD controller, it should be converted to position signal, hence the manipulation in Eq. (25), where $\Delta t$ represents sampling time. With that being done, the information of $x_{4d}$ is fully reflected in $e_3$ , enabling the assignment of Eq. (26). Frankly speaking, $x_{4d}$ is used to construct discrete position targets with reference velocity of the cart being $0$ . The subplant2 dynamics now becomes a second-order system controlled by PD2 controller with a reference signal related to $x_1,x_2$ :
Up to now, neither $u$ nor $x_{4d}$ are explicitly expressed. Combining Eqs. (23) and (24) to eliminate $u$ arriving at the following
Consequently, the expression for $x_{4d}$ is derived:
Similarly, the full expression for $u$ is available by substituting Eq. (29) into Eq. (24).
Remark 1 (Joint stability): If subplant1 and subplant2 can be stabilized separately, the whole system would be expected to be stable. Indeed, it can be proved that system (21) and (27) can be stabilized separately [Reference Zhao and Guo27] under natural assumptions. However, a joint analysis is still required to ensure stability of the fourth-order system, which will be detailed in Sections 3.4 and 3.6.
Remark 2 (Cascaded PD controller): The cascaded PD controller in this paper is different from the conventional one. Conventional cascaded PD controller works in adjacent order of the system, for example, one PD controller to assign desired velocity and the other PD controller to control the acceleration [Reference Andrade, Guedes, Carvalho, Zachi, Haddad, Almeida, de Melo and Pinto28]. In contrast, the PD1 controller in this paper serves as the acceleration controller for subplant1, as well as the calculator of the reference signal for subplant2. And PD2 controller is the acceleration controller for subplant2.
Remark 3 (Semi-implicitness): The proposed cascaded PD controller is semi-implicit in two levels. The most shallow level is that in Eq. (29), $x_{4d}$ appears in both sides of the equations, which means using an unknown term to calculate the result of the same unknown term. It is similar to the manipulation in semi-implicit Euler method for numerical integration. Nonetheless, the source of this semi-implicitness is generated from two sources of information utilized for the derivation of $x_{4d}$ . The first is from transforming subplant1 into a virtual PD controller, and the second is from the PD controller design for subplant2. Both separate design processes finally point to the same target, $x_{4d}$ .
Remark 4 (Transforming PI to PD): In Eq. (26), the authors transform velocity target for $x_4$ to position target for $x_3$ using discrete approximation $x_{4d}\Delta t$ , so as to implement PD controller for subplant2. More frequently used controller for first-order system is Proportional-Integral (PI) controller. Nevertheless in this paper, for the consistency and simplicity of stability analysis, such a handling is chosen. The challenge for implementing PI controller instead is how to formulate a unified stability analysis together with PD controller.
3.4. Stability analysis using Jacobian matrix for linear approximated model
Following the controller design procedure of Section 3.3, this section presents the stability analysis method for the whole system. Although this is a linearized model, instead of using widely used transfer function, a Jacobian matrix-based method is implemented for wider applicability. The error vectors of the system are constructed, along with their dynamics. The Jacobian matrix is then calculated. By analysing the eigenvalues and eigenvectors of the Jacobian matrix, the stability of the system can be determined.
The error vectors for linear approximated model (5) is
Differentiating (30) and replacing all state variables using $e_i,i=1,2,3,4$ :
In the following, the term $\dot{e}_3$ is managed separately due to its complexity in calculation. Rewriting Eq. (29) using $e_i,i=1,2,3,4$ :
Differentiating $x_{4d}$ and substituting Eq. (31):
Integrating back to Eq. (31) yields the final expression of $\dot{e}_3$ :
Denote the vector field of Eq. (31) as $F_{\text{linear}}(e_1,e_2,e_3,e_4)$ , that is,
Then, the Jacobian matrix can be derived easily, denoting $p=C+\frac{D}{H}k_{p2}\Delta t$ :
It is noticeable that $[e_1,e_2,e_3,e_4]=[0,0,0,0]$ is a fixed point of the system (46). Under most conditions, the system is deemed locally asymptotic stable around the fixed point if all the eigenvalues of Eq. (36) have negative real parts, which represents an exponentially decaying term in time domain, and thus means stability. However, the explicit solution to the eigenvalues of Eq. (36) is too complex, and therefore, it better serves as an examiner of the stability through numerical calculation.
3.5. Semi-explicit cascaded PD controller design for nonlinear model
Following the same design procedure in Section 3.3, the proposed method is implemented in nonlinear model (1) in this section. The derivation for nonlinear model of cartpole system is far more complicated than its linear counterpart. Therefore, only necessary derivation is completed by hand, for example, the derivatives of error vectors, and then, the authors use SymPy [Reference Meurer, Smith, Paprocki, Čertík, Kirpichev, Rocklin, Kumar, Ivanov, Moore, Singh, Rathnayake, Vig, Granger, Muller, Bonazzi, Gupta, Vats, Johansson, Pedregosa, Curry, Terrel, Roučka, Saboo, Fernando, Kulal, Cimrman and Scopatz29] to complete the final symbolic as well as numerical calculation.
Splitting Eq. (1) into two subplants as follows, denoting $p_2 = (M_1+M_2)(J+M_2l^2)-M_2^2l^2 \cos ^2(x_1)$ :
By transforming the dynamics of subplant1 into a virtual PD controller, and assigning the coupling term $x_4$ as the reference signal for subplant2, the following can be derived
where $k_{p1},k_{d1}$ are the proportional and derivative gains of PD1 controller, and $e_1=-x_1,e_2=-x_2$ are the angular and angular velocity errors of the pole dynamics. If this equation holds, subplant1 is equivalent to Eq. (21). Thus, the expression of $x_{4d}$ can be initially given:
Turning focus to subplant2 and design a PD2 controller with feedback linearization
where $e_3,e_4$ are the position and velocity errors of the cart, defined as
so that the original subplant2 becomes a second-order system manipulated by a PD controller as in Eq. (27). Substituting Eq. (41) into Eq. (40) and solving the semi-implicit equation of $x_{4d}$ , the closed-form expression becomes:
The explicit expression for $u$ can be herein obtained by substituting Eq. (44) with Eq. (42), Eq. (43) back into Eq. (41).
3.6. Stability analysis using Jacobian matrix for nonlinear model
This section provides the stability analysis for system and controller designed in Section 3.5. Compared with the stability analysis of linear model in Section 3.3, the derivation in this section is intimidating and even beyond the human ability within reasonable time period. For example, the calculation of $\dot{x}_{4d}$ would require compound function derivative calculation with trigonometric function existing in both the numerators and denominators. It is even more challenging to derive partial differentiation of $\dot{x}_{4d}$ w.r.t the error vectors. Thus, we first derive all the necessary components required to calculate the final result and input everything into the PC to complete the final symbolic and numerical evaluations.
The error vectors for nonlinear model (1) is
Differentiating Eq. (45) and replacing all state variables using $e_i,i=1,2,3,4$ :
In the following, the term $\dot{e}_3$ is managed separately due to its complexity in calculation. Rewriting Eq. (44) using $e_i,i=1,2,3,4$ . Hence,
where $A_2$ and $B_2$ are the numerator and denominator, respectively, expressed as below:
Firstly, the term $p_2 = (M_1+M_2)(J+M_2l^2)-M_2^2l^2 \cos ^2(x_1)$ should be processed. Rewriting it using error variables and taking the derivative of it based on Eq. (45).
So the derivative of $A_2$ is
Noticeably, $\dot{x}_{4d}$ appears again in $\dot{A}_2$ , which means one more implicit equation to solve in order to calculate $\dot{x}_{4d}$ . Similarly, the derivative of $B_2$ is
At last, the derivative of $x_{4d}$ can be formulated based on fractional derivation rule
Solving the above implicit equation arrives at the final equation:
Denote the vector field of Eq. (46) as $F_{\text{nonlinear}}(e_1,e_2,e_3,e_4)$
Then, the Jacobian matrix can be derived easily:
In Eq. (58), $\frac{\partial \dot{e}_i}{\partial e_j},i=2,3;j=1,2,3,4$ are the partial derivatives of corresponding terms w.r.t the error variables. Those are too complicated to be manually derived and are solved instead by the PC symbolically. It is noticeable that $[e_1,e_2,e_3,e_4]=[0,0,0,0]$ is a fixed point of the system (46). If all eigenvalues of Eq. (58) have negative real parts, then the system is sure to be locally asymptotic stable around the fixed point.
Remark 5 (Parameters selection): In this paper, an analytical solution to the specific range of those parameters that stabilizes the system is not provided, which is a potential research direction. However, a critical advantage of the proposed controller lies in similarly intuitive tuning process as conventional PID controller. With that being said, based on the derived Jacobian matrix, one can use whatever optimization method to find an approximate range of parameters that stabilizes the system by examining the eigenvalues of the resulting Jacobian matrix. In addition, to further understand the influence of each parameter on the system performance, an ablation study is implemented (around the chosen parameters) on both the baseline and proposed controller. The observation has been concluded in Table I. In the table, “+” means increased overshoot, increased oscillation or increased convergence rate under the increase of corresponding parameters accordingly, and vice versa for “−.” The results (see Appendix) also show that the chosen parameters are Pareto optimal. That is, by perturbing the current parameters, no further simultaneous improvement on the performance of $x_1$ and $x_3$ . This is also important to ensure that the baseline method has achieved its optimal performance.
Remark 6 (Stability analysis using eigenvalues): In control theory, the Jacobian matrix of the closed system can be used to ensure stability, by examining that all eigenvalues have strictly negative real parts. Under this condition, the Jacobian matrix is called a stable matrix (or sometimes Hurwitz matrix) and that the system is asymptotically stable around the equilibrium points [Reference Khalil30].
4. Simulation
This section implements the proposed method to cartpole system and retrieve numerical simulation results as well as stability analysis. Firstly, necessary parameters for the dynamic model, simulation environment and cascaded PD controller are specified. Secondly, the simulation results are presented, closely followed by stability analysis. The results on linear approximated model are foremost presented and then comes the nonlinear model.
4.1. Parameters specification
Table II shows the parameters of the dynamic models (1) and (5). Table III lists the parameters of the simulation environment setup. We are using OpenAI Gym [Reference Brockman, Cheung, Pettersson, Schneider, Schulman, Tang and Zaremba31] environment to carry out the simulation. Gym is a popular simulation platform with both continuous and discrete environment setup written in Python. Table IV are the parameters of the proposed method and the baseline. Similarly, The initial states are set as $[x_1,x_2,x_3,x_4]=[0.5,0,0,0]$ . The reference signal for cartpole system is $[x_{1d},x_{2d},x_{d},x_{4r}]=[0,0,0.5,0]$ , which represent the desired angle and angular velocity are all 0. Naturally, it is hoped that the ending position of the cart is not too far away from its initial location, which means also that the linear velocity of the cart should converge to 0 with passage of time. Under these considerations, the cost of an episode is defined as
then many optimization algorithms can be used to find the optimal controller parameters for cascaded PD controller. In this paper, Bayesian optimization is implemented as a baseline, plus manual fine-tune to determine the final parameters. Bayesian optimization is chosen because of its data efficiency in optimization process [Reference Neumann-Brosig, Marco, Schwarzmann and Trimpe32]. By constructing a surrogate probabilistic model, it can retrieve the next location with high probability of getting better result. However, one disadvantage is that it may only find a sub-optimal solution [Reference Solis and Thomas33]. Thus, using Bayesian optimization as a baseline helps us get closer to the optimal parameters quickly, and then the parameters are fine tuned manually aiming for best performance.
4.2. Double-loop PD controller as baseline
Double-loop PD controller is a well-established method for the control of the cartpole systems and is also one of the most fundamental controller that inspires the invention of many other methods [Reference Wang, Sun and Zai34]. The intuitive idea of double-loop PD controller lies in using PD controller each for the stabilization of the pole and the cart, respectively. The final control input is a direction summation of those two PD controllers. The formula is represented as
where $e_1,e_2,e_3,e_4$ are errors for $x_1,x_2,x_3,x_4$ , respectively, and $\bar{k}_{p1}, \bar{k}_{d1}, \bar{k}_{p2}, \bar{k}_{d2}$ are the parameters. Conceivably, although this controller has been proved effective in practice, its implementation is too intuitive to exploit any information that the dynamical system has to provide.
4.3. Results and evaluation
4.3.1. Results of linear approximated model
The results of the linear approximated model are depicted in Figs. 4–8. Figure 4 is the output of the angle of the pole. Compared with the baseline, the proposed method outputs slightly higher magnitude and frequency oscillation, with shorter settling time. Figure 5 also coincides with Fig. 4 by showing corresponding oscillation in the velocity level. Figure 6 depicts the position of the cart, where the baseline method presents a very slow asymptotic convergence. Figure 7 shows the outputs of $x_4$ . Lastly, Fig. 8 presents how the torque input changes alongside the episode. It also oscillates at first and gradually converges to 0. A severe overshoot is observed for the baseline controller at the very beginning, but the oscillation is alleviated afterwards.
Next, the Jacobian matrix of this linear system is to be investigated to understand some phenomena that happened in the simulation results. Substituting the controller parameters into Eq. (36), the Jacobian matrix can be calculated out:
and the eigenvalues with corresponding eigenvectors are
In Eqs. (62) and (63), $\mathrm{j}$ is the imaginary unit and $T$ represents transpose of vectors. All the eigenvalues have negative real part, which ensure local stability near the equilibrium point.
Remark 7 (Global stability): The system is at least locally asymptotic stability but not guaranteed to be globally stable. The attempt to elevate the stability conclusion to global is intuitive and has its background from Markus-Yamabe’s theorem [Reference Feßler35]. However, Markus-Yamabe’s theorem only holds for second-order system, and many counterexamples have been discovered for higher-order systems [Reference Kuznetsov, Kuznetsova, Koznov, Mokaev and Andrievsky36]. With that being said, in the simulation, the system can be stabilized whatever the initial states, as long as the pole is placed within the upper half plane.
4.3.2. Results of nonlinear model
The results of the original nonlinear model are depicted in Figs. 9–20, which are similar to the linear case. Figure 9 is the output of the angle of the pole. Tt converges to 0 soon after some oscillation of decaying magnitude. Figure 10 shows the profile of the angular velocity of the pole, which shares similar pattern with Fig. 9. In comparison, the oscillation magnitude of baseline controller is similar to the proposed method, but with a smaller frequency and therefore slower convergence rate. Figure 11 depicts the position of the cart. The proposed method converges much faster than baseline controller without comprising the convergence performance of $x_1$ , while the baseline controller shows a very slow asymptotic convergence of $x_3$ . Figure 12 shows the outputs of $x_4$ . Lastly, Fig. 13 presents how the torque input changes alongside the episode. It also oscillates at first and gradually converge to 0. A big overshoot is rendered by the baseline controller.
Next, the Jacobian matrix of this nonlinear system is to be investigated for local stability analysis. Substituting the controller parameters into Eq. (58), the Jacobian matrix near the fixed point $[e_1,e_2,e_3,e_4]=[0,0,0,0]$ can be calculated. The results are exactly the same with Eqs. (36)–(63). This also verifies the derivation process, since in the equilibrium point, the linear model should be equivalent to the nonlinear model.
4.3.3. Performance indices overview
To conclude the results showcase section, a performance overview of both linear and nonlinear models with two controllers respectively is shown in Table V. $t_1,t_2$ are the convergence time for the pole and the cart, respectively. $\text{MAE}_1, \text{MAE}_2$ are the mean absolute error of the pole and the cart respectively. “energy” means the mean square sum of control input, which represents the energy consumption of the controller. We can safely conclude that the proposed method is outstanding compared with double-loop PD controller in terms of convergence rates and tracking errors of $x_1,x_3$ . However, the ensued cost of superiority lies in increased control efforts and slightly severer oscillation. The advantages of proposed controller originate from the exploitation of the internal dynamics of the model through a semi-implicit process, thus a system-level consistent intermediate target is derived. However, for double-loop PD controller, the control efforts required by the cart and the pole are competing, resulting in a compromise between performance of those two and limiting the overall performance.
4.3.4. Robust performance
To illustrate the robustness of the proposed controller, this subsection presents the results of simulation under both Coulomb friction and random noise. Coulomb friction is an approximation of dry friction in practice, including both the static friction and kinetic friction, with different coefficients. According to the Coulomb’s law of friction, the magnitude of the friction between two dry sliding surface is independent of the magnitude of the relative velocity. However, the direction of the friction is opposed to the relative velocity. Therefore, Coulomb friction is a highly nonlinear type of disturbance [Reference Lötstedt37]. Accordingly, the cartpole system dynamics with disturbance is
where $f_{\text{cart}}$ is the friction acting on the cart because of rolling. This force will counteract the control input $u$ directly, and therefore $u$ is directly deducted by $f_{\text{cart}}$ . $f_{\text{pole}}$ is the friction acting on the revolute joint that connects the cart and the pole. To convert it into angular acceleration, it is multiplied by the radius of the joint $R_{\text{joint}}=0.01\,\text{m}$ and then divided by the inertia $J$ . $-1 \leq d_1,d_2 \leq 1$ are bounded random total disturbance added to the acceleration. The force analysis figure is plotted in Fig. 14.
According to the Coulomb friction theory, the friction is proportional to the normal force, and cannot revert the relative motions between two surfaces. Firstly, $f_{\text{cart}}$ is considered. The sliding surfaces are the wheels and the ground. This is a rolling motion, and the friction coefficient is chosen slightly smaller than sliding friction. The static friction coefficient $c_{\text{cart_static}}=0.2$ , and the kinetic friction coefficient $c_{\text{cart_kine}}=0.05$ . The normal force is affected by both the mass gravity and the lifting force generated by the centrifugal force of the pole, but cannot be negative. Accordingly, the normal force of the cart is:
On the other hand, $f_{\text{cart}}$ cannot revert the influence of $u$ , which means if $u-f_{\text{cart}}$ has different sign with $u$ , then $f_{\text{cart}}=u$ . Therefore, the total expression of $f_{\text{cart}}$ is:
The $f_{\text{pole}}$ is modelled as follows. The normal force of the pole is a vector summation of the centrifugal force and the force generated by the cart acceleration. Therefore, the normal force should be expressed as:
Similarly, the full expression of $f_{\text{pole}}$ is:
where the static friction coefficient $c_{\text{pole_static}}=0.5$ , and the kinetic one $c_{\text{pole_kine}}=0.3$ .
Figures 15–20 are the comparative results of the proposed controller and double-loop PD controller under added disturbance. In comparison with previous sections without disturbance, the results here are similar, only with some oscillation and chattering near the equilibrium point. This is due to the existence of friction and random noise, which slightly impairs the control performance. Nonetheless, the system is still stable under both controllers. Besides, Fig. 20 illustrates the Coulomb friction profile, which features abrupt change, nonlinearity and clipping as the theory suggests.
5. Conclusion
A control method for underactuated cartpole systems based on cascaded PD controller is proposed in this article. The gist is to transform the pole dynamics into a virtual PD controller, with the coupling term exploited as the design variable. The desired value of the coupling term $x_{4d}$ is then fed into the cart dynamics for the realization of a second PD controller. The expressions of the control input as well as $x_{4d}$ are derived by solving a semi-implicit equation. This method absorbs all the blessings that conventional PID controller has to offer (i.e., very simple in design and relatively intuitive to understand) and can be carried out on the original state-space equations without coordinate transformation, along with all the assumptions ensue. Besides, contrary to many other PID controller research, a stability analysis method for the fourth-order cascaded PD controller is proposed using Jacobian matrix of the residual system, although it only concludes locally asymptotic stability in this system and bears with it some drawbacks. The simulation results illustrate the advantages of proposed method in terms of stabilizing the cart and the pole simultaneously compared with widely used double-loop PD controller. In addition, the robustness against Coulomb friction and random noise is verified through simulation. The superiority is derived from the exploitation of internal dynamical structure of the system through solving a semi-implicit equation.
Considering that this is a preliminary research of a control method for underactuated cartpole systems using cascaded PD controller, there are many efforts in urgent need to solve the following problems. Firstly, a stability analysis approach is required that can reach the conclusion of global stability. For example, Lyapunov-based stability theorem may be a good alternative to Jacobian matrix-based method in this article. With that being said, in the numerical simulation, the cartpole system can be stabilized with a wide range of values of the initial states of the system. Noticeably, for some systems, the Jacobian matrix-based analysis can actually conclude global stability using relevant theorem proposed by Markus and Yamabe [Reference Markus and Yamabe38] for high-dimension systems. Moving one step forward, how to ensure that all eigenvalues of a high dimensional (>2) Jacobian matrix are negative everywhere is an open question. A closed-form calculation is obviously infeasible for complicated matrix like in Eq. (58). Secondly, although this paper is targeted on cartpole system only, the authors envision that the proposed method should be able to be implemented to other kinds of underactuated systems and be expanded to a class of underactuated systems. Last but not least, a systematic and theoretic way of parameter selection should be investigated. The method of tuning in this article is still a combination of Bayesian optimization and trials. To achieve this, a more capable method for stability proof should be employed, for example, Lyapunov stability theorem.
Appendix A: Ablation study of semi-implicit cascaded PD controller
This appendix illustrates the ablation study of the proposed method, where the parameters of the controllers are perturbed one by one and illustrate the outputs of $x_1,x_3$ in order to see the influence of each parameter. The presented results not only feature the process of manual tuning but also prove that the chosen parameters in the paper are the OPTIMAL ones, by showing that the perturbation of parameters can only render Pareto optimum w.r.t $x_1,x_3$ convergence. Of all the pictures, the green line most approximates the actual performance, which lies in the middle of the perturbation bounds. By showing that the increase of performance on $x_1/x_3$ means the decrease of the other, the authors make sure that the chosen parameters are nearly Pareto Optimal.
Figures 21 and 22 show the $x_1,x_3$ outputs respectively under the perturbation of $k_{p1}$ . When $k_{p1}$ increases, the convergence of $x_1$ is accelerated, and its oscillation is suppressed. However, the convergence rate of $x_3$ is decreased. Figures 23 and 24 show the $x_1,x_3$ outputs respectively under the perturbation of $k_{p2}$ , which shares the same discussion with $k_{p1}$ .
Figures 25 and 26 show the $x_1,x_3$ outputs respectively under the perturbation of $k_{d1}$ . When $k_{d1}$ increases, the convergence of $x_1$ is decelerated, with smaller oscillation. In the meantime, the convergence of $x_3$ is also deteriorated. Note that too high-frequency oscillation is unfavourable, and the chosen parameter actually strikes a balance by leaning to the convergence performance. Figures 27 and 28 show the $x_1,x_3$ outputs respectively under the perturbation of $k_{d2}$ , whose discussion is similar to $k_{d1}$ .
Appendix B: Ablation study of double-loop PD controller
This appendix illustrates the ablation study of baseline method, where the parameters of the controllers are perturbed one by one and illustrate the outputs of $x_1,x_3$ in order to see the influence of each parameter. The presented results not only feature the process of manual tuning but also prove that the chosen parameters in the paper are nearly the OPTIMAL ones, by showing that the perturbation of parameters can only render Pareto optimum w.r.t $x_1,x_3$ convergence.
Figures 29 and 30 show the $x_1,x_3$ outputs respectively under the perturbation of $k_{p1}$ . When $k_{p1}$ increases, the convergence of $x_1$ is accelerated, and its oscillation is suppressed. However, the convergence rate of $x_3$ is slightly decreased. Figures 31 and 32 show the $x_1,x_3$ outputs respectively under the perturbation of $k_{p2}$ , which shares the opposite discussion with $k_{p1}$ . When $k_{p2}$ increases, the performance of $x_1$ is worse at the cost of better $x_3$ convergence.
Figures 33 and 34 show the $x_1,x_3$ outputs respectively under the perturbation of $k_{d1}$ . The decrease of $k_{d1}$ leads to more precise tracking of $x_1$ . However, it takes longer for $x_3$ to reach the reference location. Figures 35 and 36 show the $x_1,x_3$ outputs respectively under the perturbation of $k_{d2}$ , which illustrates a trade-off between overshoot and settling time for $x_1$ and $x_3$ . The selected parameters achieve a middle performance.