Optimal Frequency Control for Virtual Synchronous Generator Based AC Microgrids via Adaptive Dynamic Programming

This paper proposes a novel virtual synchronous generator (VSG) controller for converters in AC microgrids (MGs). Such controller improves the control cost and DC-side energy requirements, while considering the system non-linearity for frequency support. First, the frequency dynamics of the MG, which are analytically studied based on the VSG with a secondary frequency controller, are formulated as a nonlinear state space representation. In this later, the reciprocal of the inertia is modeled as the control input. Correspondingly, a cost function is defined by comprehensively considering the angular deviation, frequency deviation, rate of change of the frequency (RoCoF), and a discount factor, which can retain a tradeoff between the critical frequency bounds and the required control energy. Following, the optimal frequency regulation problem is solved by using an online adaptive dynamic programming (ADP) method, where a single echo state network (ESN) is constructed to approximate the optimal cost function and the optimal control input to significantly reduce the computational burden and improve the real-time computation. Finally, simulation results demonstrate that the frequency response of the system is significantly improved, while also retaining more DC-side energy.


I. INTRODUCTION
M ODERN electricity power system is evolving toward the so-called distributed generation (DG) structure with the exhaustion of traditional fossil energy [1], [2]. DGs, such as solar photovoltaic (PV), wind turbine, storage, and diesel generator units, are connected to form an autonomous microgrid (MG) system. However, unlike traditional synchronous generators (SGs), the lack of inertia provision [3] ability of the DGs and the increasing proportion of renewable energy sources (RES) deteriorate the frequency stability of the MGs. Therefore, certain frequency support capability of the RES along with proper control approaches is necessary for maintaining a safe and stable operation of the power system [4].
One of the most promising control methods under development is a virtual synchronous generator (VSG), which is an inertia emulation technology through applying the mechanical equation and electromagnetic equation of the SG to mitigate the transient system dynamics [5]. The implementation usually relies on the assumption that the infinite power can be generated or absorbed by the generator within a short period, whereas the DC-side capacitor is limited in the real world [6]. This problem is solved in [7] with a distributed virtual power system inertia scheme that regulates the DC-link voltages of power converters, where relatively large capacitor units are aggregated to provide frequency support. In addition, a set of the so-called frequency disturbance-based regulators [8], [9], [10] appear, in which the product symbol of the frequency deviation and rate of change of the frequency (RoCoF), is adopted to instruct whether the unit is in the "acceleration" or "deceleration" stage. The principle is to tentatively decide the size of virtual inertia in a gain tuning manner. Following, similar/improved adaptive inertia control methods are proposed, such as that based on dualadaptivity inertia control to improve the overall performance of power and frequency [11] and that in [12] and [13] where the adaptive virtual inertia is developed by referring to the physical meaning of the SG rotor inertia and power angle curve. Nonetheless, all of the above concepts pay only close attention to overall frequency and power improvement, the control cost and energy resources required for such regulation are ignored.
Therefore, researchers begin to employ the optimizationbased method to cope with the parameters setting for VSG. For example, [14] introduces the particle swarm method to tune the control parameters of the VSG units, which can simultaneously realize smooth transition after disturbances and keep the voltage angle deviation within the allowable range. The parameters design rule of the VSG is developed in [15] based on the modal proximity-based method, which can avoid the adverse effects of VSG on the small signal angular stability. The linear quadratic regulator-based optimization technique is employed to achieve the optimal virtual inertia gain for a single-inverter in [16], which is further extended to a uniform multi-machine frequency model in [17]. The aforementioned optimization technique-based methods have shown outstanding advantages in the parameter design of the VSGs on the premise of meeting the stability of the power system. Nevertheless, these studies are established based on small signal modeling method accompanied by linearization processes, although VSG-based MG model is nonlinear built in general. The power system stability under linear control technology can be affected in the case of large disturbances. Based on these analyses, how to adaptively obtain the optimal virtual inertia gain for VSGs, which uses the originally constructed system model, i.e., linearization-free operation, is still an open problem.
When it comes to the optimal control problem of the nonlinear systems, how to solve a nonlinear partial differential equation, which is the so-called Hamilton-Jacobi-Bellman (HJB) equation, is usually challenging. Confront with this challenge, adaptive dynamic programming (ADP) is proposed to solve the HJB equation iteratively by critic-actor techniques and neural networks (NNs) approximation [18]. ADP has successfully solved many control problems, including optimal tracking control [19], robust control [20] and multi-agent consensus control [21]. Furthermore, some improved ADP-based control strategies have been extended to cope with discretetime nonlinear systems [22], [23]. In particular, ADP is applied to the practical problems of wireless connected vehicles [24], power systems [25] and wastewater treatment [26], which shows great practical application prospects [27].
In this paper, an online ADP method is developed to solve the optimal adaptive virtual inertia controller design, which can realize the real-time control of the frequency. Because the equilibrium point of the angular deviation may not be zero when the system is stabilized, this paper propose a cost function by adding a discount factor. Considering that the AC MG is a large-scale system composed of many state variables, it is difficult to design an appropriate activation function of the polynomial NN. Echo state network (ESN) is a novel improved recurrent NN [28]. In ESN, the activation function is called as reservoir, which is generated randomly. Here, a single ESN is designed to implement the ADP method by approximating the optimal cost function and the optimal control input simultaneously, which can obtain better control performance. In other words, the single ESN not only is easy to construct, but also can effectively reduce the computational burden and improve the real-time computation [29]. The main contributions of this paper are listed as follows.
1) A uniform AC MG frequency dynamic model based on the VSG control with a secondary frequency controller is derived, where a nonlinear state space representation is formulated and the reciprocal of the inertia is modeled as the control input.
2) The frequency control problem is solved in an adaptive optimal manner by introducing a cost function, which is defined by using the angular deviation, frequency deviation, RoCoF, and a discount factor. Unlike the conventional VSG control, the proposed approach can adaptively provide the optimal inertia online while retain a tradeoff between the critical frequency bounds and the required control energy. Then, an online ADP method is designed to obtain the optimal frequency controller from the derived HJB equation.
3) The single ESN is constructed to approximate the optimal cost function and the optimal control policy simultaneously, which can reduce the computational burden and improve the real-time computation. The stability of the closed-loop system is analyzed by using the Lyapunov theorem to guarantee that the MG states and the NN output weights are uniformly ultimately bounded (UUB).
The remainder of this paper is organized as follows. The frequency dynamics of the MG are investigated in Section II. Section III presents the optimal frequency controller design approach based on the ADP. Section IV investigates the stability issue of the AC MG with the adaptive algorithm using the Lyapunov theorem. Simulation results are presented to demonstrate the effectiveness of the proposed control scheme in Section V. The conclusions are drawn in Section VI. Fig. 1 shows the generalized structure of a balanced threephase AC MG, where an AC load and an inverter-coupled DG are simultaneously connected with each AC bus.

II. SYSTEM FREQUENCY DYNAMICS
The output of each inverter is determined by the secondary frequency regulation control of the VSG as follows: Secondary Controller (2) where ω i represents the frequency deviation; K Pi and K Ii represent the proportional and integral coefficients of the secondary frequency regulation, respectively; P g,i represents the variation of the mechanical turbine power; R i and T g,i are the droop gain and time constant of the turbine governor dynamics, respectively; J i and D i are the virtual moment of inertia and the active frequency droop damping coefficient, respectively; P l,i is used to represent load disturbance in the network; P e,i represents the change in the electrical power output of the i-th DG, which is given according to the AC power flow calculation as follows [30]: For simplifying (3), the following is adopted

Then, (3) is rewritten as follows
T represents the angle deviation vector. A δ,1 is a N × 1 column vector with all elements equal to 1; A δ,2 is an identity matrix of size N × N; D δ i is a 1 × N column vector with the element in the i-th row equal to 1 and all the remaining elements 0; n is the number of bus nodes in the AC MG; G ij and B ij are the real part and imaginary part of node admittance matrix; superscript 0 is the initial value of the variable. We assume that the internal emf E i is constant because of the action of the excitation system. Combined with (1) and (2), then Substitute (4) into (5), the second-order differential equation of ω i can be described by Considering a constant step change disturbance P l,i and assigning the following vectors T the complete nonlinear model of the AC MG under study is intrinsically represented aṡ where During the frequency regulation, the inverters' inertia parameters are regulated through a state feedback control. The dynamics-based frequency system model is presented in Fig. 2.
Remark 1: Once (6) is established, a respective mathematical expression of the frequency response in the time domain can be obtained by the inverse Laplace transform, where P l,i is considered as a step-change in the active power balance. Take the second derivative of ω i with respect to time, the maximum RoCoF investigated often occurs at t = 0, and is directly determined by P l,i and J i , which indicates that regulating the inertia could have significant improvements on the overall system dynamics.

III. ADP-BASED OPTIMAL FREQUENCY REGULATION
In this section, an adaptive optimal inertia regulation strategy is designed by using the ADP method, where a single ESN are used to implement it. As shown in Fig. 2, the optimal System frequency dynamics model associated with the optimal adaptive frequency controller.
adaptive frequency controller aims to derive the effective control information from the system data. The states of the AC MGs δ(t), ω(t) and ω(t) are used to train the optimal cost function. Then, the optimal inertia J is derived by using the optimal cost function to control the VSGs. In the following, the design and implementation of the optimal frequency control strategy will be presented.

A. ADP-Based Control Design
The main goal of this work is to obtain an optimal control policy u(x) such that the VSG can own an optimal contribution to frequency response, while can also meet the requirements of DC-side energy. To obtain such a balance, a common cost function is defined as where the utility function r(x, u(x)) is chosen as x T Qx+u T Ru.
Here, Q and R are positive definite symmetric matrices with proper dimensions. As it is shown the utility function is constructed of the quadratic function of the angle deviation δ, frequency deviation ω, RoCoF ω and control effort u. However, the equilibrium point of the angular deviation may not be zero when the system is stabilized, which will cause the defined cost function ϒ to fail to converge, and then cannot be used to solve the optimal controller. To solve this problem, a modified cost function with a discount factor is defined where λ > 0 is a discount factor. The term e −λ(τ −t) is used to guarantee that the cost function V(x) converges. According to the Bellman optimality principle, the Hamiltonian function of the problem can be obtained as where ∇V(x) is the partial derivative of the cost function V(x) relative to x, and V(0) = 0. Then, the optimal cost function V * (x) is written as (11) where ( ) is an admissible control set, and the optimal control policy u * (x) that meets the HJB equation is where ( ) is an admissible control set. Suppose (12) holds, then the optimal frequency control policy for benchmark system (7) can be written as Bringing (13) into (12), then the expression of the HJB equation in terms of ∇V * (x) can be further written as where the HJB equation with the optimal control policy u * (x).

B. Single ESN Implementation of the ADP Method
Because of the nonlinear nature of (14), the HJB equation is almost impossible to be directly solved. Thus, the ADP method is adopted to iteratively solve the approximate cost function V * (x) and the optimal control policy u * (x) based on policy evaluation and policy improvement [31]. To implement the ADP method, the polynomial NN is usually adopted to achieve the aforementioned objectives. However, designing and selecting an appropriate activation function for the polynomial NN is always a challenging task, especially when the dimension of the system states is large. In this paper, the system state vector has 3N dimensions, which is hard to design an activation function to achieve a good approximate accuracy by using the polynomial NN. ESN is a promising novel NN with many advancements, which has the property of universal uniform approximate. To further reduce the training burden, the single ESN is used to approximate the cost function and the control policy. Therefore, a continuous time leaky-integrator ESN is defined as followṡ where ξ ∈ R p and ψ(·) denote the reservoir neurons and the input activation function, respectively. W in ∈ R p×n and W ∈ R p×p are the input and internal connection weight matrices, which are generated randomly. ρ > 0 and α > 0 are constant parameters.
ESN is used to approximate the derivative of the cost function (i.e., costate) ∇V(x). It is reasonable to assume that there exist weights W out so that ∇V * (x) can be restructured as where δ c is the ideal approximation error. Then, bring (16) into (14), we have (17) where G = gR −1 g T , δ HJB is the HJB approximation error, which is Substituting (16) into (13), the optimal controller is obtained as where δ a = − 1 2 R −1 g T δ c . It is assumed that the ESN has the echo state property, which is a condition to guarantee the stability and convergence when the ESN is trained. Besides, assuming that the vector [x; ξ ] has a bound of the constant b xξ , the approximation error δ c is bound as δ c < b δc , and the HJB approximation error satisfies the boundness δ HJB < b HJB .
However, the ideal weights W out of the designed ESN are unknown, so the estimated weightsŴ out are first selected. Then, the estimated costate function is expressed as The estimation error of the ideal weights is estimated as Then the estimated controller is as followŝ Bringing (20) and (22) into (10), the approximate Hamiltonian function is derived as Assuming thatV(x) can be obtained from the integral oḟ ) T (f + gû)dτ , and by using (17), (21) and (23), the Hamiltonian error is formulated as Thus, it is reasonable to define E with e to regulate the critic network output weights, which is By minimizing (25) and considering the stability analysis, the tuning law for the critic ESN output weights is designed aṡ where η is the primary learning rate of the output weights. Further, using the estimation error of the ideal weight (21), the dynamics of the estimation output weights error is derived aṡ Therefore, the output weights of the ESN can be obtained by using the updating law (26). Then the optimal frequency control law can be obtained from (22), which can stabilize the frequency of the benchmark system (7) under the disturbance of loads.

IV. STABILITY ANALYSIS OF THE PROPOSED FREQUENCY SUPPORT SCHEME
The stability of the whole closed-loop AC MG system is provided by constructing a complex Lyapunov function of the MG system states and the ESN output weights.
Theorem 1: Consider the complete nonlinear model of the AC MG (7) with the HJB equation (14), the controller is designed by (22), and the adaptive tuning law of the ESN output weights is given as (26). Then, the entire closed-loop AC MG system state x and the output weight estimation errorW out are stable in the sense of uniformly ultimately bounded (UUB).
Proof: A complex Lyapunov function can be constructed as follows where V * (x) is defined in (11), which is constructed of the AC MG system states and satisfies V * (x) ≥ 0, V * (0) = 0. The derivative of (28) iṡ The first term can be derived aṡ According to HJB approximation error (17), we have Substituting (31) into (30), one haṡ where the specific expansions of matrix tr(Ẇ T out λ −1W out ) are given in the Appendix. Combine (32) and (33), (29) becomeṡ According to the frequency dynamic (7), it is reasonable to assume that the following boundedness conditions hold where b f , b g , b xξ and b HJB are positive constants. So (39) Choosing the appropriate parameters of the AC MG model such that > 0, then (37) becomeṡ When the following condition holds we haveL < 0. It means that if L exceeds a certain bound, one hasL < 0. According to the standard Lyapunov extension Theorem, the MG system states and the ESN output weights are all UUB. Therefore, the closed-loop control of the studied AC MG can be guaranteed stable under the designed adaptive controller (22) and the ESN output weights tuning law (26).

V. SIMULATION RESULTS
Simulations based on MATLAB/Simulink are implemented to illustrate the proposed optimal frequency control method. The system parameters are detailed in Table I Fig. 1.
At the beginning, the AC MG is in the steady state, i.e., P l = 0. At t = 2s, the small load changes P l,1 = 1 p.u. and P l,2 = 4 p.u. are performed. Moreover, the large load changes P l,1 = −10 p.u. and P l,2 = −5 p.u. are also For better showing the control performance, six methods of virtual inertia regulator are applied: (a) the no-feedback control with constant inertia (J 0 = 0.1); (b) the non-adaptive control with large constant inertia (J 0 = 2); (c) the LQR-based control; (d) the adaptive control proposed in [11] (J max = 1.2, J min = 0.5, k g = 1e7); (e) the adaptive control proposed in [12] (J 0 = 0.5, k = 4); and (f) the ADP-based adaptive inertia control proposed in this paper.

A. Under Small Load Changes
The LQR control is selected as u = −kx and the quadratic objective is the same with (9), then the control feedback gain can be solved as For the ADP-based control, the predefined parameters are as follows, Q=I 6×6 , R=I 2×2 , λ = 0.01 and k = 2, where I is the identity matrix. The ESN is set as α = 0.01, ρ = 0.5. The input function ψ is selected as identity function. The size of the reservoir is selected as p = 20, so the output weights are a matrix with the dimension of R 6×26 . The input weight, internal weight and initial output weight are generated randomly within a scale of [0, 1]. During the online training, a persistence of excitation (PE) condition is added into the control input to better obtain the system features. After training, the weights of the ESN are obtained, in which only ten dimensions are presented in Fig. 3 for clarity. Fig. 4 presents the evolution of the system states during the online training. Then, by using the converged ESN weights and (22), the optimal frequency controller design can be obtained. 1) Angle, Frequency, and Inertia Tests: Fig. 5 depicts the waveforms of output angular deviation/difference, operating frequency, RoCoF and the emulated inertia of the two DGs. In the first case, with the no-feedback control, both the returning time and deviating time are relatively short, which means that the system has a very fast response. However, it comes at the cost of having a large frequency deviation and RoCoF. In the second case, with the non-adaptive control, the nadir of arrested frequency and RoCoF are higher; but are lower than that of the no-feedback control, with relatively long returning  and deviating time and more oscillation. In the third case, with the LQR-based control, whose control effects are similar to the proposed ADP-based control when encountering small load changes, the performance becomes worse than the ADP-based control in the case of large load changes, because it actually belongs to the linear optimal control method. As observed, the system has a shorter returning time and a longer deviating time than that of the non-adaptive control, which can also be well understood by the curves of the emulated inertia (see Fig. 5(d)). At the beginning of the disturbance, relatively large inertia is provided to restrain the frequency drop, while during the "rebound period", relatively small inertia is presented to speed up the steady-state process.
Following, we further compare the proposed method with the existing adaptive methods recently published in [11] and [12]. The results are shown in Fig. 6. Among them, the nadir of frequency and RoCoF with the method in [11] and [12] are similar to that of the ADP-based adaptive virtual inertial control proposed in this paper, whereas the emulated inertia vary greatly. For the method in [11], the virtual inertia is adjusted to the maximum when the nadir of frequency is reached, reduced until the rebound period, and then recovered to the initial until the stable operating point, which increases the risk of system interference. For the method in [12], considering the limitation of the inertia range, that is, it cannot be negative, an excessively large value of the compensation coefficient k cannot be obtained, which hinders the inertia adjustment in other cases. For the strategy proposed in this paper, when the disturbance occurs, the virtual inertia is adjusted to the maximum, then gradually decreases, and finally returns to a stable value, which well improves the frequency, RoCoF and rebound speed.
2) Active Power and Energy Utilization Tests: Figs. 7 and 8 provide the output active power and the energy utilization of the two DGs. As the two DGs have the same control parameters, the active power responses of six methods are similar (see Fig. 7(a) and 8(a)), and the accurate active power sharing is always achieved. Furthermore, it is worth noting that the active power under the non-adaptive control has some oscillations, as depicted in Fig. 7(a). However, after adopting the proposed ADP-based adaptive inertial control, not only the power oscillation can be improved, but also the output power of the two DGs can be smoothed, which indicates that it has a power oscillation damping function. On the other hand, some insightful conclusions about these six control methods can be drawn from the energy utilization, with the respective energy term calculated as E J = J ωdt. As observed in Figs. 7(b) and 8(b), the energy consumption of the non-adaptive method is the largest, followed by the method in [11], the method in [12], the LQR-based and the ADP-based control, which further reflects the role of the developed control strategy in DC-side energy saving.

B. Under Large Load Changes
The case of AC MG under small load changes has been studied in Section V-A. However, another scenario, such as the fluctuation caused by large load integrations, also happens frequently. Here, the effects of such large disturbances are presented. Specifically, the gain k for the LQR-based control is solved by following the same procedure as before as For the ADP-based control, the weights shown in Fig. 9 converge to the optimal values. During the online training, with the effective PE condition, the system states converge as  needed, which is shown in Fig. 10. Then, by using the converged ESN weights and (22), the optimal controller design can be obtained.
From Figs. 11 and 12, it is obvious that the ADP-based optimal frequency control scheme performs better than the no-feedback control, non-adaptive control, LQR-based control, method in [11] control and method in [12] control for such large disturbances scenario. Specifically, it can be seen in Fig. 11 that both no-feedback control and the non-adaptive control remain performing frequency regulation poorly under large load changes. Because the LQR-based control method considers linear systems, the occurrence of large load disturbances may lead to poor improvement in frequency regulation compared with the ADP-based control method. The method in [11] and [12] can work, but in the former, the RoCoF even exceeds the acceptable operating range. Meanwhile, the corresponding waveforms of the real output power and the energy utilization under large load changes are also provided in Figs. 13 and 14, respectively. Similar to the results investigated in Section V-A, the proposed ADP-based control has an excellent power oscillation damping function and saves more DC-side energy simultaneously. Large load disturbances may have a great impact on the stability and security of the system. Thus, the adaptability and robustness of the controller is particularly important and necessary. The design of an ADPbased adaptive inertia controller can well cope with various disturbances and be employed in a real-world application.

VI. CONCLUSION
This paper proposed an optimal frequency regulation method for the AC MG-connected VSG, together with a frequency dynamics model of the later. An ADP-based optimal feedback gain was computed to adaptively adjust the emulated inertia according to the predefined cost function. The angular deviation, frequency deviation, RoCoF, and a discount factor are considered in the design of the cost function, while simultaneously preserving a trade-off between the critical frequency limits and the required control energy. The proposal performance was examined based on two different load changes, namely the small load changes and large load changes. Moreover, the proposed control design was compared to the no-feedback, the non-adaptive, the LQR-based, the method in [11], and the method in [12] in terms of the angular deviation/difference, operating frequency, RoCoF, emulated inertia, active power, and, more importantly, energy cost. The advantages of the proposed control include: 1) Lower frequency deviation and RoCoF, which offers the same advantages of the non-adaptive equivalent; 2) Smoother output power, which offers the same advantages of the LQR-based, the method in [11], and the method in [12] equivalent; 3) Lower energy cost, which offers the same advantages of the no-feedback equivalent; On the whole, the proposed method supports frequency dynamics and promotes real-world application.

APPENDIX
The matrix of equation (33).