非線性系統(tǒng)自學(xué)習(xí)最優(yōu)控制:自適應(yīng)動態(tài)規(guī)劃方法(英文版)Self-learning optimal control of nonlinear systems
定 價(jià):120 元
叢書名:普通高等教育“十三五”規(guī)劃教材普通高等院校工程實(shí)踐系列規(guī)劃教材
- 作者:Qinglai Wei ... [et al.]
- 出版時(shí)間:2018/7/1
- ISBN:9787030520609
- 出 版 社:科學(xué)出版社
- 中圖法分類:TP271
- 頁碼:
- 紙張:
- 版次:
- 開本:B5
無
更多科學(xué)出版社服務(wù),請掃碼獲取。
Contents
1 Principle of Adaptive Dynamic Programming 1
1.1 Dynamic Programming 1
1.1.1 Discrete-Time Systems 1
1.1.2 Continuous-Time Systems 2
1.2 Original Forms of Adaptive Dynamic Programming 3
1.2.1 Principle of Adaptive Dynamic Programming 4
1.3 Iterative Forms of Adaptive Dynamic Programming 9
1.3.1 Value Iteration 9
1.3.2 Policy Iteration 10
1.4 About This Book 11
References 14
2 An Iterative *-Optimal Control Scheme for a Class of Discrete-Time Nonlinear Systems with Unfixed Initial State 19
2.1 Introduction 19
2.2 Problem Statement 20
2.3 Properties of the Iterative Adaptive Dynamic Programming Algorithm 21
2.3.1 Derivation of the Iterative ADP Algorithm 21
2.3.2 Properties of the Iterative ADP Algorithm 23
2.4 The *-Optimal Control Algorithm 28
2.4.1 The Derivation of the *-Optimal Control Algorithm 28
2.4.2 Properties of the *-Optimal Control Algorithm 32
2.4.3 The *-Optimal Control Algorithm for Unfixed Initial State 34
2.4.4 The Expressions of the *-Optimal Control Algorithm 37
2.5 Neural Network Implementation for the *-Optimal Control Scheme 37
2.5.1 The Critic Network 38
2.5.2 The Action Network 39
2.6 Simulation Study 40
2.7 Conclusions 42
References 43
3 Discrete-Time Optimal Control of Nonlinear Systems via Value Iteration-Based Q-Learning 47
3.1 Introduction 47
3.2 Preliminaries and Assumptions 49
3.2.1 Problem Formulations 49
3.2.2 Derivation of the Discrete-Time Q-Learning Algorithm 50
3.3 Properties of the Discrete-Time Q-Learning Algorithm 52
3.3.1 Non-Discount Case 52
3.3.2 Discount Case 59
3.4 Neural Network Implementation for the Discrete-Time Q-Learning Algorithm 64
3.4.1 The Action Network 65
3.4.2 The Critic Network 67
3.4.3 Training Phase 69
3.5 Simulation Study 70
3.5.1 Example 1 70
3.5.2 Example 2 76
3.6 Conclusion 81
References 82
4 A Novel Policy Iteration-Based Deterministic Q-Learning for Discrete-Time Nonlinear Systems 85
4.1 Introduction 85
4.2 Problem Formulation 86
4.3 Policy Iteration-Based Deterministic Q-Learning Algorithm for Discrete-Time Nonlinear Systems 87
4.3.1 Derivation of the Policy Iteration-Based Deterministic Q-Learning Algorithm 87
4.3.2 Properties of the Policy Iteration-Based Deterministic Q-Learning Algorithm 89
4.4 Neural Network Implementation for the Policy Iteration-Based Deterministic Q-Learning Algorithm 93
4.4.1 The Critic Network 93
4.4.2 The Action Network 95
4.4.3 Summary of the Policy Iteration-Based Deterministic Q-Learning Algorithm 96
4.5 Simulation Study 97
4.5.1 Example 1 97
4.5.2 Example 2 100
4.6 Conclusion 107
References 107
5 Nonlinear Neuro-Optimal Tracking Control via Stable Iterative Q-Learning AIgorithm 111
5.1 lntroduction 111
5.2 Problem Statement 112
5.3 Policy Iteration Q-Leaming Algotithm for Optimal Tracking Control 114
5.4 Properties of the Policy Iteration Q-Learning Algorithm 114
5.5 Neural Network Implementation for the Policy Iteration Q-Leaming Algorithm 119
5.5.1 The Critic Network 120
5.5.2 The Action Network 120
5.6 Simulation Study 121
5.6.1 Example 1 122
5.6.2 Example 2 125
5.7 Conclusions 129
References 129
6 Model-Free Multiobjective Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems with General Performance Index Functions 133
6.1 Introduction 133
6.2 Preliminaries 134
6.3 Multiobjective Adaptive Dynamic Programming Method 135
6.4 Model-Free Incremental Q-Leaming Method 145
6.5 Neural Network Implementation for the Incremental Q-Learning Method 147
6.5.1 The Critic Network 148
6.5.2 The Action Network 149
6.5.3 The Procedure of the Model-Free Incremental Q-Iearning Method 150
6.6 Convergence Proof 150
6.7 Simulation Study 153
6.7.1 Example 1 153
6.7.2 Example 2 155
6.8 Conclusion 157
References 157
7 Multiobjective Optimal Control for a Class of Unknown Nonlinear Systems Based on Finite-Approximation-Error ADP Algorithm 159
7.1 Introduction 159
7.2 General Formulation 160
7.3 Optimal Solution Based on Finite-Approximation-Error ADP 162
7.3.1 Data-Based Identifier of Unknown System Dynamics 162
7.3.2 Derivation of the ADP Algorithm with Finite Approximation Errors 166
7.3.3 Convergence Analysis of the Iterative ADP Algorithm 168
7.4 Implementation of the Iterative ADP Algorithm 173
7.4.1 Critic Network 174
7.4.2 The Action Network 174
7.4.3 The Procedure of the ADP Algorithm 175
7.5 Simulation Study 175
7.5.1 Example 1 176
7.5.2 Example 2 179
7.6 Conclusions 182
References 182
8 A New Approach for a Class of Continuous-Time Chaotic Systems Optimal Control by Online ADP Algorithm 185
8.1 Introduction 185
8.2 Problem Statement 185
8.3 Optimal Control Based on Online ADP Algorithm 187
8.3.1 Design Method of the Critic Network and the Action Network 188
8.3.2 Stability Analysis 191
8.3.3 Online ADP Algorithm Implementation 195
8.4 Simulation Examples 195
8.4.1 Example 1 196
8.4.2 Example 2 197
8.5 Conclusions 199
References 200
9 Off-Policy IRL Optimal Tracking Control for Continuous-Time Chaotic Systems 201
9.1 Introduction 201
9.2 System Description and Problem Statement 201
9.3 Off-Policy IRL ADP Algorithm 203
9.3.1 Convergence Analysis of IRL ADP Algorithm 204
9.3.2 Off-Policy IRL Method 206
9.3.3 Methods for Updating Weights 208
9.4 Simulation Study 209
9.4.1 Example 1 209
9.4.2 Example 2 211
9.5 Conclusion 213
References 213
10 ADP-Based Optimal Sensor Scheduling for Target Tracking in Energy Harvesting Wireless Sensor Networks 215
10.1 Introduction 215
10.2 Problem Formulation 216
10.2.1 NN Model Description of Solar Energy Harvesting 216
10.2.2 Sensor Energy Consumption 217
10.2.3 KF Technology 218
10.3 ADP-Based Sensor Scheduling for Maximum WSNs Residual Energy and Minimum Measuring Accuracy 219
10.3.1 Optimization Problem of the Sensor Scheduling 219
10.3.2 ADP-Based Sensor Scheduling with Convergence Analysis 220
10.3.3 Critic Network 223
10.3.4 Implementation Process 224
10.4 Simulation Study 224
10.5 Conclusion 226
References 227
Index 229