1. Introduction
The applications of human–robot interaction (HRI) range from material assembly [Reference Rovedal, Pallucca, Pedrocchi, Brafhin and Tosatti1], rehabilitation [Reference Zhang, Mcdaid, Veale, Peng and Xie2], and elderly care [Reference Murata, Yamazoe, Chung and Lee3]. With the variability of the environment, the robot needs to adjust the uncertainty automatically. Improvement of the flexibility of the robot’s interaction with the environment becomes a challenging problem.
In the fields of HRI control, impedance control [Reference Sharifi, Behzadipour and Vossoighi4] and admittance control [Reference Dohring and Newman5] are two common methods by adjusting robot impedance parameters. In the 1980s, the concept of impedance control was proposed by Hogan [Reference Hogan6]. In this method, a damping-spring-mass model was used for representing the dynamic relationship between the environment and the robot position. The input force of the robot is obtained by measuring the position of the robot with appropriate impedance parameters. Admittance control realizes the compliant control by controlling the interaction force and generating the desired trajectory. In the early research, model parameters of the two methods are usually prescribed [Reference Colbaugh and N. Hogan7–Reference Buerger and Hogan9]. In ref. [Reference Newman8], a guaranteed-stable interaction controller was derived for arbitrary linear, time-invariant systems. In ref. [Reference Buerger and Hogan9], a loop-shaping method was developed to design actuator controllers for physically interactive machines. However, inappropriate model parameters will cause outrageous errors. For human–environment interaction, both the varying environmental factors and the robot model parameters need to be considered, which makes it difficult to obtain appropriate parameters.
To solve the above problems, it is necessary to introduce variable impedance method into the field of robot control. The development of reinforcement learning provides the conditions for adaptive dynamic programing (ADP) [Reference Vrabie and Lewis10–Reference Roveda, Maskani, Franceschi, Arash and Pedrocchi12], which is broadly applied in optimal control problems with uncertain dynamics. An adaptive optimization algorithm was proposed for continuous-time based on part of dynamic model [Reference Vrabie and Lewis10]. An improved algorithm [Reference Jiang and Jiang11] was developed for the system with complete unknown dynamic model. A model-based reinforcement learning variable impedance controller is proposed, in which model predictive control (MPC) was applied to optimize the impedance control parameters online [Reference Roveda, Maskani, Franceschi, Arash and Pedrocchi12]. However, the state constraints and the system stability analysis are not considered. For those systems under conditions of measurable state, the output feedback algorithm is an effective method to estimate the state information [Reference Rizvi and Lin13–Reference Liu, Ge, Zhao and Mei15]. Rizvi established a state observer to estimate the state of the system [Reference Rizvi and Lin13]. Gao et al. reconstructed the state based on the discrete state space model [Reference Gao, Jiang, Jiang and Chai14]. The output feedback (OPFB) ADP algorithm was selected to obtain optimal solution which considers the arbitrary environment position/trajectory [Reference Liu, Ge, Zhao and Mei15]. The OPFB ADP algorithm is an effective method to determine the optimal impedance parameters of the human–robot system in unknown environment dynamics.
In HRI tasks, humans make direct collaborations with robots. In this situation, safety is vital in HRI systems. However, the desired trajectory guided by interaction force may exceed regular working space in admittance control. In the past decades, a lot of research works has been done in the safety control field [Reference Al-Tamimi, Lewis and Abu-Khalaf16–Reference Vicentini, Pedrocchi, Beschi, Giussani and Fogliazza20]. An integrated HRI strategy was proposed to ensure safety through a coordinated suite of safety strategies [Reference Al-Tamimi, Lewis and Abu-Khalaf16]. A cooperative fuzzy-impedance control with embedded velocity and force rules was proposed to enhance safety [Reference Roveda, Haghshenas, Caimmi, Pedrocchi and Tosatti17]. In order to improve the safety performance, an actuation mechanism was built up in ref. [Reference Ahmed and Kalaykov18]. A task-space admittance controller was proposed in which the inertia matrix was conditional online [Reference Fonseca, Adorno and Fraisse19]. Human–robot collaboration and control were addressed in ref. [Reference Vicentini, Pedrocchi, Beschi, Giussani and Fogliazza20], including safety assessment, advanced force control, human-aware motion planning, gesture recognition, and task scheduling. The conception of barrier Lyapunov function (BLF) was put forwarded to guarantee states within certain bounds [Reference Ngo, Mahony and Jiang21]. The BLF method was widely applied in the control field [Reference Ames, Coogan, Egerstedt, Notomista, Sreenath and Tabuada22–Reference He, Chen and Yin24], which was employed to prevent the violation of the output constraints in ref. [Reference Zhang, Xiao and Ge23], and was used in the design of an adaptive impedance control [Reference He, Chen and Yin24]. In terms of the ability to fit arbitrary nonlinear curves [Reference He, Chen and Yin24–Reference He and Dong26], neural network (NN) was considered to be an effective method in control systems, which require less relative information about the system dynamics [Reference Yu, Xie, Paszczynski and Wilamowski25]. BLF-based controllers with NN approaches were studied by many researchers to ensure the safety and stability of robot [Reference He and Dong26–Reference Liu, Lu, C. Tong, Chen, Chen and Li28].
In order to ensure the high accuracy of the impedance model, while guaranteeing the performance of the trajectory tracking, a double loops control system was proposed [Reference Li, Liu, Huang, Peng, Pu and Ding29, Reference Modares, Ranatunga, Lewis and Popa30]. The control method decouples the robot-specific inner loop controller from the task-specific outer loop control, which considers the safety of tracking and the factor of the environment model. Inspired by this double loop system, a new two loops control structure through the feedback of force is proposed in this paper. Different from ref. [Reference Modares, Ranatunga, Lewis and Popa30], the environment and robot impedance model are integrated together with linear quadratic regulation (LQR). Considering the partial unmeasurable states, an output feedback algorithm is presented to estimate the optimal control policy of LQR problem in the outer loop. By constructing the asymmetric barrier Lyapunov function (ABLF) controller in the inner loop, the stability of the system is ensured under the condition of flexible constrains in HRI system. The design of NN in adaptive controller compensates for the dynamic uncertainties. Simulations and experimental results validate the effectiveness of inner and outer loops. The primary contributions of this paper are summarized as follows:
-
1) Two-loop control structure is designed to assist the human operators to perform cooperative task with robot and optimize the tracking performance. The inner and outer loops are decoupled.
-
2) Considering the unknown dynamic and position parameters when interacting with the environment, optimal impedance adaptation method is used to obtain the optimal control law in the outer loop. The discrete output feedback algorithm realizes the optimal control of the complete robot–environment interaction system, and a discount factor is added to the algorithm to improve the convergence speed.
-
3) Robot is controlled in the inner loop to track the desired trajectory from the output of admittance control. ABLF is proposed in the design of inner loop controller to restrict the motion space and tracking errors. Compared to the symmetrical controllers [Reference Liu, Lu, C. Tong, Chen, Chen and Li28], this method is able to adjust the upper and lower bounds to adapt to different task requirements. Meanwhile, radial basis function neural network (RBFNN) is designed to compensate for the unknown dynamics of the robot, guaranteeing the precision of admittance model.
The rest of this paper is organized as following arrangements. In Section 2, a system of HRI is described. In Section 3, an overall system structure is proposed with outer loop and inner loop, which are described, respectively. In Section 4, simulations of inner and outer loops are given to verify the proposed methods, respectively. In Section 5, experimental results of a practical task conducted on a Franka robot are shown. Conclusion is presented in Section 6.
2. System description
The HRI system includes a rigid robot manipulator and human, where the end-effector of the robot manipulator interacts with human physically. The dynamic of robot manipulator is described by
where
Here $\boldsymbol{{q}}$ , $\dot{{\boldsymbol{{q}}}}$ , $\ddot{{\boldsymbol{{q}}}}\in \mathbb{R}^n$ are joint angles, velocities, and accelerations, respectively. $\boldsymbol{{x}}$ , $\dot{{\boldsymbol{{x}}}}$ , $\ddot{{\boldsymbol{{x}}}}\in \mathbb{R}^m$ are the Cartesian positions, velocities, and accelerations of robot end effector. ${\boldsymbol{{M}}}({\boldsymbol{{q}}})\in \mathbb{R}^{n \times n}$ is the inertia matrix, ${\boldsymbol{{C}}}({\boldsymbol{{q}}}, \dot{{\boldsymbol{{q}}}})\in \mathbb{R}^{n \times n}$ is the Coriolis and centripetal coupling matrix, ${\boldsymbol{{G}}}({\boldsymbol{{q}}})\in \mathbb{R}^n$ is the gravity loading, ${\boldsymbol{{J}}}({\boldsymbol{{q}}})\in \mathbb{R}^{n \times n}$ is the Jacobian matrix. $\boldsymbol{\tau }$ is the vector of control input, ${\boldsymbol{{F}}}\in \mathbb{R}^n$ denotes the interaction force with the environment. $\varphi ({\cdot})$ represents the inverse kinematics. $n$ denotes the number of joints, and $m$ denotes the degree of freedom (DOF).
-
1. The matrix ${\boldsymbol{{M}}}({\boldsymbol{{q}}})$ is symmetric and positive definite.
-
2. The matrix $2{\boldsymbol{{C}}}({\boldsymbol{{q}}},\dot{{\boldsymbol{{q}}}})-\dot{{\boldsymbol{{M}}}}({\boldsymbol{{q}}})$ is skew-symmetric.
From the perspective of the robot manipulator, the following impedance model is presented for the safe of HRI.
where ${\boldsymbol{{M}}_{\boldsymbol{{d}}}}\in \mathbb{R}^{n\times n}$ , ${\boldsymbol{{C}}_{\boldsymbol{{d}}}}\in \mathbb{R}^{n\times n}$ , ${\boldsymbol{{G}}_{\boldsymbol{{d}}}}\in \mathbb{R}^{n\times n}$ are the desired inertia, damping, and stiffness matrices, respectively, and $\boldsymbol{{x}}_{\boldsymbol{{d}}}$ denotes the desired trajectory of robot manipulator.
Remark 1: $\boldsymbol{{M}}_{\boldsymbol{{d}}}$ , $\boldsymbol{{C}}_{\boldsymbol{{d}}}$ , and $\boldsymbol{{G}}_{\boldsymbol{{d}}}$ are defined under different interaction behavior requirements.
To discuss the interaction behavior, it is necessary to take the environment dynamics into consideration. Two forms of environment models are presented.
where $\boldsymbol{{M}}_{\boldsymbol{{e}}}$ , $\boldsymbol{{C}}_{\boldsymbol{{e}}}$ , $\boldsymbol{{G}}_{\boldsymbol{{e}}}$ are unknown mass, damping, and stiffness of environment models, respectively. Eq. (6) represents a mass-damping-stiffness system and (7) represents a damping-stiffness system. The difference between (6) and (7) depends on the weight of the interactive environment model. For the sake of generality, we take the mass into consideration in robot–environment model. Thus, from (5) and (6), we have
3. Adaptive optimal control with discrete output feedback
3.1. Overall structure
The system diagram of overall structure in Fig. 1 is first designed and includes outer and inner loops. The outer loop of the system is to generate the virtual desired trajectory $\boldsymbol{{x}}_{\boldsymbol{{r}}}$ according to the interaction force $\boldsymbol{{f}}$ and impedance model ${\boldsymbol{{Z}}}({\cdot})$ . The inner loop with an ABLF controller is designed to track the virtual desired trajectory precisely. The main purpose of the controller is to make the robot comply with the human’s objective, while guaranteeing safety and tracking accuracy.
In general, the desired impedance model in the Cartesian space is written as
where $\boldsymbol{{x}}_{\boldsymbol{{d}}}$ is the desired trajectory and $\boldsymbol{{x}}_{\boldsymbol{{r}}}$ is the reference trajectory in Cartesian space. Then, the virtual desired reference trajectory $\boldsymbol{{q}}_{\boldsymbol{{r}}}$ can be obtained by using inverse kinematics. The outer loop of the system is to generate the virtual desired trajectory according to the interaction force $\boldsymbol{{f}}$ and the impedance model ${\boldsymbol{{Z}}}({\cdot})$ . The OPFB ADP method is proposed in the outer loop to determine the optimized impedance parameters of the unknown robot–environment dynamics model.
The main purpose for the inner loop is to guarantee the tracking performance of trajectory $\boldsymbol{{q}}_{\boldsymbol{{r}}}$ . For the security of robot arm, ABLF-based controller is designed to constrain the state and speed of robot. Then, the RBFNN is employed to fit the unknown robot dynamics in the constrained controller.
3.2. OPFB ADP method in outer loop
To optimize the robot–environment interaction, an impedance adaptation method considering the environment dynamics is developed. In this paper, the robot and environmental system are taken into consideration to achieve the desired interaction performance by minimizing the following cost function:
where $\boldsymbol{{Q}}_{\textbf{1}}$ , $\boldsymbol{{Q}}_{\textbf{2}}$ are positive definite, describing the weight of velocity and tracking errors, respectively. $\boldsymbol{{R}}$ is the weight of the interaction force torque. The state variable is defined as
where $\boldsymbol{{s}}_{\textbf{1}}$ and $\boldsymbol{{s}}_{\textbf{2}}$ are two states constructed in the model, which can be written as
where $\boldsymbol{{F}}_{\textbf{1}}$ and $\boldsymbol{{G}}_{\textbf{1}}$ are two known matrices, $\boldsymbol{{F}}_{\textbf{2}}$ and $\boldsymbol{{G}}_{\textbf{2}}$ are unknown matrices. Two linear systems in (12) are utilized to determine the desired trajectory $\boldsymbol{{x}}_{\boldsymbol{{d}}}$ and the varying position of the contact trajectory $\boldsymbol{{x}}_{\boldsymbol{{e}}}$ .
According to (12), the cost function (10) can be rewriten as
where ${{{\boldsymbol{{Q}}}}}=\left [ \begin{array}{c@{\quad}c}{\boldsymbol{{Q}}_{\textbf{1}}} &{\textbf{0}} \\{\textbf{0}} &{{\boldsymbol{{Q}}}_{\boldsymbol{{2}}}^{\prime}} \end{array} \right ]$ with ${{\boldsymbol{{Q}}}_{\boldsymbol{{2}}}^{\prime}}=\left [ \begin{array}{c@{\quad}c}{\boldsymbol{{Q}}_{\textbf{2}}} & -{\boldsymbol{{Q}}_{\textbf{2}}}{{\boldsymbol{{G}}_{\textbf{1}}}} \\ -{\boldsymbol{{G}}_{\textbf{1}}}^{T} &{\boldsymbol{{G}}_{\textbf{1}}}^{T}{\boldsymbol{{Q}}_{\textbf{2}}}{\boldsymbol{{G}}_{\textbf{1}}} \end{array}\right ]$ .
For the convenience of calculation, a discrete output feedback method is introduced to solve adaptive dynamic programing problems. First, the continuous-time state-space function can be written as
where ${\boldsymbol{\xi }}\in \mathbb{R}^{n\times n}$ is the state, and ${\boldsymbol{{y}}}\in \mathbb{R}^p$ is the measured output, and ${\boldsymbol{{u}}}\in \mathbb{R}^m$ is the control input. In real application, $\boldsymbol{{u}}$ represents the interaction force $\boldsymbol{{f}}$ . Assume that $({{{\boldsymbol{{A}}}}},{{{\boldsymbol{{B}}}}})$ is controllable and $({{{\boldsymbol{{A}}}}},{\boldsymbol{{C}}})$ is observable. The matrices ${{{\boldsymbol{{A}}}}}\in \mathbb{R}^{n\times n}$ and ${{{\boldsymbol{{B}}}}}\in \mathbb{R}^{n\times m}$ include the unknown environment dynamics due to the unmeasurable state $\boldsymbol{{s}}_{\textbf{2}}$ , where
The output matrix $\boldsymbol{{C}}$ is defined as
Then, a discrete model of (14) using the zero-order hold method is obtained by taking periodic sampling
where ${\boldsymbol{{A}}_{\boldsymbol{{d}}}}=e^{\,{{{\boldsymbol{{A}}}}h}}$ , ${\boldsymbol{{B}}_{\boldsymbol{{d}}}}=(\!\int _{0}^{h}e^{\,{{{\boldsymbol{{A}}}}h}}\mathrm{d}{\boldsymbol{\tau }}){{{\boldsymbol{{B}}}}}$ , and $h\gt 0$ is a small sampling period. The stabilizing control policy is given as ${\boldsymbol{{u}}}_k={\boldsymbol{\mu }}({\boldsymbol{\xi }}_k)$ , which can minimize the performance index function:
with weighting matrices ${\boldsymbol{{Q}}_{\boldsymbol{{d}}}}\geq 0$ , ${\boldsymbol{{R}}_{\boldsymbol{{d}}}}\gt 0$ , and $({\boldsymbol{{A}}_{\boldsymbol{{d}}}},\sqrt{{\boldsymbol{{Q}}_{\boldsymbol{{d}}}}}{\boldsymbol{{C}}})$ is observable and $({\boldsymbol{{A}}_{\boldsymbol{{d}}}},{\boldsymbol{{B}}_{\boldsymbol{{d}}}})$ is controllable.
Since the solution of (16) is similar to the LQR problem, we can find a positive definite matrix ${\boldsymbol{{P}}_{\boldsymbol{{d}}}}\in \mathbb{R}^{n\times n}$ to rewrite (16) into the following form.
Then, the LQR Bellman equation is obtained as follows:
To find the optimal control, combining (15) and (18) yields
To determine the minimizing control, setting the derivative with respect to ${\boldsymbol{{u}}}_k$ to zero can obtain
According to ref. [Reference Lewis and Vamvoudakis31], the Riccati equation can be derived as
For the unmeasurable states, an output feedback method is introduced to parameterize the internal state in terms of the filtered inputs and outputs of the system.
Based on the discretized system model (15), the state ${\boldsymbol{\xi }}_k$ is reconstructed with the previous $N$ inputs and outputs.
where $\overline{{\boldsymbol{{u}}}}_{k-1,k-N}$ and $\overline{{\boldsymbol{{y}}}}_{k-1,k-N}$ are the measured input and output sequences over the time interval $[k-N,k-1]$ , respectively, and $\overline{{\boldsymbol{{z}}}}_{k-1,k-N}=[\overline{{\boldsymbol{{u}}}}_{k-1,k-N}^{\mathrm{T}},\overline{{\boldsymbol{{y}}}}_{k-1,k-N}^{\mathrm{T}}]\in \mathbb{R}^l$ , $l=N(m+p)$ , ${\boldsymbol{\Theta }}=[E_u,E_y]$ , and $\boldsymbol{{E}}_{\boldsymbol{{u}}}$ , $\boldsymbol{{E}}_{\boldsymbol{{y}}}$ are parameter matrices of two sequences, respectively, which is derived in ref. [Reference Lewis and Vamvoudakis31].
Substituting (22) into (17) yields
where ${\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}={\boldsymbol{\Theta }}^{\mathrm{T}}{\boldsymbol{{P}}_{\boldsymbol{{d}}}}{\boldsymbol{\Theta }}\in \mathbb{R}^{(m+p)N\times (m+p)N}$ . Then, the LQR Bellman equation is written in the form of measured data:
The optimal control can be determined online timely using temporal difference (TD) RL method to minimize the following Bellman TD error online [Reference Lewis and Vamvoudakis31]. Then, the Bellman TD error is defined based on Eq. (14), which can be rewritten as
The optimal policy is obtained by minimizing value function in terms of the measured data.
Decomposing ${{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}^{\mathrm{T}}{{\overline{\boldsymbol{{P}}}}_d}{{\overline{\boldsymbol{{z}}}}}_{k,k-N+1}$ , we have
where ${\boldsymbol{{p}}}_0\in \mathbb{R}^{m\times m}$ , ${\boldsymbol{{p}}}_u\in \mathbb{R}^{m\times m(N-1)}$ , ${\boldsymbol{{p}}}_y\in \mathbb{R}^{m\times pn}$ . Then, differentiating with respect to ${\boldsymbol{{u}}}_k$ to perform minimization in (27) yields
Theorem 1. [Reference Liu, Ge, Zhao and Mei15]: Let $\delta ^j,(\,j=1,2,\ldots )$ be the maximum difference between ${{{\boldsymbol{{V}}}}}^j({\boldsymbol{\xi }}_k)$ and ${{{\boldsymbol{{V}}}}}^*({\boldsymbol{\xi }}_k)$ . When adding a discount factor $\gamma (0\leq \gamma \leq 1)$ to the VI algorithm, the convergence rate of the algorithm to the optimal value ${\boldsymbol{{P}}}^*$ is
Considering the convergence speed in the iteration process of control law, a discount factor is added to the controller. Then LQR Bellman equation with discount factor is expressed as
The optimal control law with discount factor is written as follows:
Thus, the optimal interaction force is obtained as ${\boldsymbol{{f}}}={\boldsymbol{{u}}}_k^{*}$ .
3.3. ABLF-based NN inner loop controller
Referring to (1), and taking ${\boldsymbol{{x}}}_1={\boldsymbol{{q}}}$ , ${\boldsymbol{{x}}}_2=\dot{{\boldsymbol{{q}}}}$ , the robot dynamic system can be rewritten as
where ${\boldsymbol{{x}}}_1=[x_{11},x_{12},\cdots,x_{1n}]^{\mathrm{T}}$ , and the desired trajectory is ${\boldsymbol{{x}}_{\boldsymbol{{d}}}}(t)=[q_{d1}(t),q_{d2}(t),\cdots,q_{dn}(t)]^{\mathrm{T}}$ . We need to ensure all states and outputs are within the limits when tracking the desired trajectory, $i.e.$ , $|{\boldsymbol{{x}}}_{1i}|\lt k_{c1i}$ , $|x_{2i}|\lt k_{c2i}$ , $\forall{t}\geq 0$ with $k_{c1i}$ , $k_{c2i}$ being positive constants, where ${\boldsymbol{{k}}_{c1}}=[k_{c11},k_{c12},\cdots,k_{c1n}]^{\mathrm{T}}$ , ${\boldsymbol{{k}}_{c2}}=[k_{c21},k_{c22},\cdots,k_{c2n}]^{\mathrm{T}}$ are positive constant vectors.
Lemma 1: There exists any positive constant vector ${\boldsymbol{{k}}_b}\in \mathbb{R}$ , for $\forall{{\boldsymbol{{x}}}}\in \mathbb{R}$ in the interval $|{\boldsymbol{{x}}}|\lt |{\boldsymbol{{k}}_{\boldsymbol{{b}}}}|$ , then the following inequality holds
where $p$ is a positive integer.
For the convenience of expression, the controller proposed in the inner loop is expressed in a continuous way. In the real experiment, the torque calculated from the asymmetric constrained controller is exerted to the robot in a discrete way. So it is corresponding to the discrete outer loop.
In order to achieve the requirements of tracking constraint, an ABLF-based controller is employed in the system. The tracking errors are designed as ${\boldsymbol{{z}}}_1=[z_{11},z_{12},\cdots,z_{1n}]^{\mathrm{T}}={\boldsymbol{{x}}}_1-{\boldsymbol{{x}}_{\boldsymbol{{d}}}}$ , ${\boldsymbol{{z}}}_2=[z_{21},z_{22},\cdots,z_{2n}]^{\mathrm{T}}={\boldsymbol{{x}}}_2-{\boldsymbol{\alpha }}$ , and we have $\dot{{\boldsymbol{{z}}}}_1={\boldsymbol{{x}}}_2-{\dot{\boldsymbol{{x}}}_{\boldsymbol{{d}}}}$ . The virtual control $\boldsymbol{\alpha }$ is defined as
where
Positive constants ${{\boldsymbol{{k}}}_{\boldsymbol{{a}}}}$ and ${{\boldsymbol{{k}}}_{\boldsymbol{{b}}}}$ denote the lower and upper bounds of tracking error where $-{\boldsymbol{{k}}_{\boldsymbol{{a}}}}\lt{{{\boldsymbol{{z}}}_{\textbf{1,2}}}}\lt{\boldsymbol{{k}}_{\boldsymbol{{b}}}}$ , and $k_i$ $\!(i=1,2,\ldots,n)$ denotes a positive constant. Taking the derivation of $\boldsymbol{{z}}_{\textbf{1}}$ and $\boldsymbol{{z}}_{\textbf{2}}$ with respect to time, we have
The ABLF is defined as follows:
where $k_{ai}=x_{di}-\underline{k}_{ci}$ , $k_{bi}=\overline{k}_{ci}-x_{di}$ , $\underline{k}_{ai}\leq k_{ai}\leq \overline{k}_{ai}$ , $\underline{k}_{bi}\leq k_{bi}\leq \overline{k}_{bi}$ , and $h_1\!(z_1)$ is given as
The time derivative of (39) is
Then, a second Lyapunov function is designed as
Take the time derivation of $\boldsymbol{{V}}_{\textbf{2}}$ to obtain
When ${\boldsymbol{{z}}_{\textbf{2}}}=[0,0,\ldots,0]^{\mathrm{T}}$ , ${\dot{\boldsymbol{{V}}}_{\textbf{2}}}=-\sum _{i=1}^nk_iz_{1i}^{2p}\lt 0$ . According to Lyapunov stability theory, the system is asymptotically stable. On the contrary, according to the property of the Moore-Penrose pseudoinverse, if ${\boldsymbol{{z}}_{\textbf{2}}}\neq [0,0,\ldots,0]^{\mathrm{T}}$ , ${\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}}({\boldsymbol{{z}}_{\textbf{2}}}^{\mathrm{T}})^+=1$ . $\boldsymbol{{K}}_{\textbf{2}}$ is a designed positive gain matrix with $\boldsymbol{{K}}_{\textbf{2}}={\boldsymbol{{K}}_{\textbf{2}}}^{\mathrm{T}}\gt 0$ . Then the model-based controller is designed as
Substituting (44) into (43) yields
where ${\boldsymbol{{K}}_{\textbf{2}}}={\boldsymbol{{K}}_{\textbf{2}}}^{\mathrm{T}}=diag([k_1,k_2,\ldots,k_n])\gt 0$ is a diagonal matrix.
Eq.(44) is model-based, in real applications, the dynamic parameters ${\boldsymbol{{C}}}({\boldsymbol{{x}}_{\textbf{1}}},{\boldsymbol{{x}}_{\textbf{2}}})$ , ${\boldsymbol{{G}}}({\boldsymbol{{x}}_{\textbf{1}}})$ , ${\boldsymbol{{M}}}({\boldsymbol{{x}}_{\textbf{1}}})$ , and $\boldsymbol{{f}}$ are difficult to be obtained. To address uncertainties in the dynamic model, an adaptive NN method is employed to fit the uncertain terms. The adaptive NN control is proposed as
where ${\boldsymbol{{S}}}({\cdot})$ is the basis function, $\hat{{\boldsymbol{{W}}}}\in{{R}^{l\times n}}$ is the estimation weight in which $l$ is the node number of NN, and ${\boldsymbol{{Z}}}={{\left [{\boldsymbol{{x}}_{\textbf{1}}}^{T},{\boldsymbol{{x}}_{\textbf{2}}}^{T},{{{\boldsymbol{\alpha }}}^{T}},{{{\dot{{\boldsymbol{\alpha }}}}}^{T}} \right ]}^{T}}$ is the input of the NN basis function. With the property of approximating any nonlinearity, ${{\hat{{\boldsymbol{{W}}}}}^{T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})$ is the estimation of ${{{\boldsymbol{{W}}}}^{*T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})$ , while the ${{{\boldsymbol{{W}}}}^{*T}}{\boldsymbol{{S}}}({\boldsymbol{{Z}}})$ represents the real value of unknown term in model-based control (45) with
The adaptive law is designed as
Then, the BLF $V_3$ is designed as
where ${{\tilde{{\boldsymbol{{W}}}}}_{i}}={{\hat{{\boldsymbol{{W}}}}}_{i}}-{\boldsymbol{{W}}}_{i}^{*}$ and ${\tilde{{\boldsymbol{{W}}}}}_{i}$ , ${\hat{{\boldsymbol{{W}}}}}_{i}$ , ${\boldsymbol{{W}}}_{i}^{*}$ are the NN weight error, approximation, and ideal value, respectively. Differentiating (49), we have
If ${{\boldsymbol{{z}}_{\textbf{2}}}}\ne{{\left [ 0,0,\ldots,0 \right ]}^{T}}$ , according to (46)–(48), we have
where
Here ${{\lambda }_{\min }}({\bullet})$ and ${{\lambda }_{\max }}({\bullet})$ denote the minimum and maximum eigenvalues of the matrix, respectively, and
with $\left \|{\boldsymbol{{S}}}_i({\boldsymbol{{Z}}}) \right \| \le \left \|{\boldsymbol{{s}}}_i \right \|$ and ${\boldsymbol{{s}}}_i\gt 0$ .
Theorem 2 Consider the dynamic system described in (1). If the initial conditions satisfy $\left |{{x}_{1i}}(0) \right |\lt{{k}_{c1i}}$ , $\left |{{x}_{2i}}(0) \right |\lt{{k}_{c2i}}$ , the control law (46) ensures that all error signals are semi-globally uniformly bounded. Then the position and velocity constraints are not violated, that is, $\forall t\gt 0$ , $\text{ }\left |{{x}_{1i}}(t) \right |\le{{k}_{c1i}}$ , $\text{ }\left |{{x}_{2i}}(t) \right |\le{{k}_{c2i}}$ . The closed-loop error signals $\boldsymbol{{z}}_1$ , $\boldsymbol{{z}}_2$ and $\tilde{{\boldsymbol{{W}}}}$ remain in compact sets ${\boldsymbol{\Omega }}_{{\boldsymbol{{z}}_{\textbf{1}}}}$ , ${\boldsymbol{\Omega }}_{{\boldsymbol{{z}}_{\textbf{2}}}}$ , ${\boldsymbol{\Omega }}_{{\boldsymbol{{w}}}}$ , respectively.
where ${\boldsymbol{{D}}}=2\left ({\boldsymbol{{V}}_{\textbf{3}}}(0)+{\boldsymbol{{C}}}/\rho \right )$ .
Proof: Multiplying $e^{\rho t}$ on both sides of (51), we have
and
Then, we have
According to (58) and the property of inequality in ref. [Reference Zhang, Dong, Ouyang, Yin and Peng32], we have
4. Simulations
In the simulations, a two-link robotic manipulator simulation platform is constructed as shown in Fig. 2 and the parameters of the robot are defined in Table I. The simulations of inner loop and outer loop are exerted separately to validate their own effectiveness.
4.1. Inner loop
In order to validate the effectiveness and feasibility of the designed asymmetric constrained controller. Simulations using the proposed controller, PD controller, and controller in ref. [Reference Zheng and Fang33] are carried in MATLAB.
The initial positions of the robot are given as ${{\boldsymbol{{q}}_{\textbf{1}}}}(0)=[0.4,-0.1](rad)$ , ${{\boldsymbol{{q}}_{\textbf{2}}}}(0)=[0,0](rad/s)$ . The interaction force with end-effector is ${\boldsymbol{{f}}}(t)=[\!\sin\!(t)+1,\cos\!(t)+0.5](N)$ , where $t\in [0,{{t}_{f}}]$ and $t_f=20s$ . The desired trajectory is set as ${{{\boldsymbol{{q}}}_{\boldsymbol{{r}}}}}=[0.1\sin\!(t)+0.5\cos\!(t),0.1\cos\!(t)+0.5\sin\!(t)](rad)$ . The limits of tracking errors are ${{{\boldsymbol{{k}}_{\boldsymbol{{a}}}}}}=[\!-\!1.0,-1.0](rad)$ , ${{{{\boldsymbol{{k}}}_{\boldsymbol{{b}}}}}}=[1.2,1.2](rad)$ , and the physical limit of robot joints is ${{{\boldsymbol{{k}}}_{\boldsymbol{{c}}}}}=[1.8,1.8](rad)$ .
For PD control, the controller is designed as ${\boldsymbol{\tau }} =-{{\boldsymbol{{G}}_{\textbf{1}}}}({\boldsymbol{{q}}_{\textbf{1}}}-{{\boldsymbol{{q}}}_{\boldsymbol{{r}}}})-{\boldsymbol{{G}}_{\textbf{2}}}({\boldsymbol{{q}}_{\textbf{2}}}-{\dot{\boldsymbol{{q}}}_{\textbf{r}}})$ where ${\boldsymbol{{G}}_{\textbf{1}}}=diag(50,30)$ and ${\boldsymbol{{G}}_{\textbf{2}}}=diag(30,3)$ .
For the proposed controller, Eq. (46) is the control law applied in the inner loop. ${\boldsymbol{{S}}}({\cdot})$ is defined as a Gaussian function. As for the gain of the NN, $\sigma$ is chosen as 0.2, the number of the NN nodes is $2^8$ , and the centers are selected in the area of $[\!-\!1,1]\times 5$ . The variance of centers is chosen as 0.5, and the initial value of the NN weight matrix is defined as 0. The control parameters are set as ${\boldsymbol{{K}}_{\textbf{2}}}=diag(3,3)$ , ${\boldsymbol{{k}}_{\textbf{1}}}={\boldsymbol{{k}}_{\textbf{2}}}=50$ , ${\boldsymbol{{\Gamma }}_{\textbf{1}}}={\boldsymbol{{\Gamma }}_{\textbf{2}}}=100{{{{\boldsymbol{{I}}}}}_{256\times 256}}$ after regulating.
The tracking results are shown in Figs. 3 and 4. From Fig. 4, the tracking errors of PD controller remain relatively large, and the tracking errors of the controller in ref. [Reference Zheng and Fang33] are smaller. It is seen that the tracking errors are almost close to zero using our proposed method, which is better than the other methods. The comparison results show that our proposed method improves the tracking performance compared with PD controller and the controller in ref. [Reference Zheng and Fang33]. The control input of the three methods is shown in Fig. 5. The torque of the proposed controller is relatively stable.
4.2. Outer loop
In the outer loop, a robot manipulator with two revolute joints physically interacting with the human/environment is considered. The simulation is implemented in MATLAB.
In the simulation, the sample interval is set to 1 ms. The adaptation procedure includes three steps. Firstly, the exploration noise is added to the initial control input to guarantee the PE conditions [Reference Vrabie and Lewis10], which is chosen as ${{e}_{k}}=\sum \nolimits _{\omega =1}^{10}{(0.06/\omega )\sin\!(k\omega )}$ . Then, optimal impedance learning is conducted until the criterion is satisfied, which is $|{\bar{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{\,\,\,j+1}-{\bar{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{\,\,\,j}|\lt 0.001$ . Finally, the optimized impedance model is acquired through iterations.
The parameters in (10) are defined as $Q_1=1$ , $Q_2=0.3$ , $R=0.1$ . In the simulation, the environment model is selected as $\ddot{x}+10\dot{x}+100(x-{{x}_{e}})=-F$ . The environment position and robot desired trajectory in the Cartesian space are given by (12), with $F_1=-1$ , $G_1=0.3$ , $F_2=-1$ , $G_2=1$ . All the elements of the initial matrix $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ are selected as 0.01, and $\gamma$ is set as 0.3. Figures 6 and 7 show the convergence of $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ . ${\boldsymbol{p}}_{\textbf{0}}$ , ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}}}$ , $\boldsymbol{{p}}_{\boldsymbol{{u}}}$ reach convergence after 17 steps.
The optimal values of $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ are shown as follows:
Then, the optimal matrix $\boldsymbol{{P}}_{\boldsymbol{{d}}}$ is obtained from $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ according to (24) as follows:
For comparison, the ideal optimal matrix ${\boldsymbol{{P}}_{\boldsymbol{{d}}}}^{*}$ is calculated by using the model information as follows:
As we know, the matrix ${\boldsymbol{{P}}_{\boldsymbol{{d}}}}^{*}$ is the optimal solution to minimize the cost function. From the two matrices, it is obvious that the optimal matrix from our method is close to the real solution, which validates the correctness of the proposed algorithm.
Remark 2: Since this method is model-free and data-based, the parameters have few influences on the results. There is almost no need to tune them. In order to further improve the system performance in practical applications, the control parameters can be fine-tuned by adjusting the parameters in the inner and outer loops separately.
5. Experiment
The experiments are carried out for the robot (Franka Emika Panda, Franka Emika GmbH, Germany) shown in Fig. 8. The robot has 7 degrees of freedom (DOFs) with seven flexible joints equipped with angle and torque sensors. The communication frequency is 1000 Hz. The robot has three control modes including velocity mode, position mode, and torque mode. The operation system is Ubuntu 16.04 with the platform ROS Kinetic. When human moves the robot, it will generate a desired trajectory. Then the virtual trajectory is calculated through the impedance model in real time. There are two independent control nodes in ROS which are employed for calculating and controlling the robot, and the communication frequency between the two nodes is 200 Hz. The first node is used to calculate the adaptive constraint controller and publish the control result to the second node. Then, the second node subscribes to the information from the robot and publishes the control signals to control the robot.
In order to verify the effectiveness of the proposed controller, a trajectory tracking task is conducted. For the convenience of the experiment, a desired trajectory is defined by human in the outer loop. As shown in the Fig. 1, the virtual desired trajectory $\boldsymbol{{x}}_{\boldsymbol{{r}}}$ which calculated from impedance model is the input of the inner loop. It is adapted online on the basis of the human interaction with the robot. The parameters of the controller are chosen as followers: ${\boldsymbol{{S}}}({\cdot})$ is the Gaussian function, $l=2^8$ is the node numbers of NN, and the centers of each node are selected in the area of $[\!-\!1,1]$ , the center variance is set as 1.5. The initial weight $\hat{{\boldsymbol{{W}}}}$ of NN is 0, the controller parameters ${\boldsymbol{{K}}_{\textbf{2}}}=diag(10,10,10,10,10,10,10)$ and the constraint error of tracking are set as ${{k}_{b1}}=2$ , ${{k}_{b2}}=2$ , ${{k}_{b3}}=2$ , ${{k}_{b4}}=3$ , ${{k}_{b5}}=2$ , ${{k}_{b6}}=2.6$ , ${{k}_{b7}}=1.5$ , ${{k}_{a1}}=-1.2$ , ${{k}_{a2}}=-1.2$ , ${{k}_{a3}}=-1.5$ , ${{k}_{a4}}=-2$ , ${{k}_{a5}}=-2$ , ${{k}_{a6}}=-2$ , ${{k}_{a7}}=-1.8$ , ${{\sigma }_{i}}=0.02$ , ${{\boldsymbol{\Gamma }}_{i}}=200{{{{\boldsymbol{{I}}}}}_{16\times 16}}$ , $\left ( i=1,2,\ldots 7 \right )$ .
Figure 9 shows the tracking trajectories of the three methods, it is obvious that the tracking performance of the proposed controller is better than others. Figure 10 shows the tracking errors in each direction, where green line is the zero baseline. Compared with the other two methods, the trajectory error of our proposed controller is smaller. The main verification metric for the accuracy of the controller is the tracking error. The average error of three controllers is provided in Fig. 11. Table II shows the average error and error variance of the three methods. It is obvious that the average error and error variance of the proposed controller are smaller than the other two methods. Moreover, we present the index of smoothness index (as described in ref. [Reference Roveda, Haghshenas, Caimmi, Pedrocchi and Tosatti17]) of three controllers in Fig. 12, which demonstrates that the proposed controller is smoother than the other two methods. The calculation input torque result of the adaptive constraint controller is shown in Fig. 13.
The calculated control torque is exerted in the robot when a HRI task is conducted on the Franka robot. The outer loop aims at optimizing the parameters of impedance model. For the cost function (10), the parameters $Q_1$ , $Q_2$ and $R$ are set to 1. The stopping criterion is $\left |{\bar{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{\,\,\,j+1}-{\bar{\boldsymbol{{P}}}_{\boldsymbol{{d}}}}^{\,\,\,j} \right |\lt 0.01$ , and the discount factor $\gamma$ is selected as 0.3. The initial value of $p_0$ is set as 2, and the initial values of ${{\boldsymbol{{p}}}_{\boldsymbol{{y}0}}}$ is set as $\left [ \begin{matrix} \!-2 &\quad 10 &\quad 10 \\ \end{matrix} \right ]$ . The initial values of $\boldsymbol{{p}}_{\boldsymbol{{u}}}$ are set as 1, and then all the other elements of the initial matrix $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ are 0. The sampling interval is set as 10 ms.
Figures 14 and 15 show that the parameters ${{\boldsymbol{{p}}}_{\boldsymbol{{y}}}}$ , ${{\boldsymbol{{p}}}_{\textbf{0}}}$ and $\boldsymbol{{p}}_{\boldsymbol{{u}}}$ reach convergence after 1000 iteration. The optimal values of $\overline{\boldsymbol{{P}}}_{\boldsymbol{{d}}}$ are shown as follows:
Figure 16 shows the interaction force, where the interval 0–10s is the data collection stage, and the interval 10–20s is the calculation stage. After 20s, the optimal control is exerted in the end-effector of the robot which is more stable to complete the given task.
6. Conclusion
In this paper, to realize safe HRI, a novel impedance adaptation method is proposed in unknown environment. A double-loop system is constructed in which the inner loop improves the tracking accuracy with the safety of interaction and the outer loop fits the uncertainty of model. An adaptive dynamic programing with discrete output feedback is proposed in the outer loop to solve the problem of uncertain environment position. A discount factor is added for faster convergence speed. In the interaction process, the robot is limited in safe space through an ABLF-based inner loop controller. Meanwhile, the RBFNN compensates for the unknown dynamics in the design of the controller. Simulation and experimental results verify the effectiveness of the proposed controller in HRI system. In future works, a reduced-order observer will be added to reduce the dimensionality of states so as to increase the speed of calculation.
Author contributions statement
Xinyi Yu and Huizhen Luo conceived and designed the study. Huizhen Luo and Shuanwu Shi did the experiments. Wei Yan and Linlin Ou modify the article and verified correctness.
Financial support
This work was supported by National Key R ${\And}$ D Program of China (2018YFB1308400) and Natural Science Foundation of Zhejiang province (LY21F030018).
Conflicts of interest
The authors declare no conflicts of interest.
Ethical approval
This research did not require any ethical approval.