Control optimization of energy flexibility using Reinforcement Learning algorithms (Recurrent Neural Network + Advantage Actor-Critic + Long Short-Term Memory) for a nearly zero-energy office building

Research output: Contribution to journalJournal articleResearchpeer-review

Abstract

Energy flexibility of buildings by controlling the heating and cooling systems is important for balancing the grid. In this paper, a model-free Advantage Actor-Critic (A2C) Reinforcement Learning (RL) controller strategy is developed to regulate heating and cooling systems using an artificial Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). Optimization of performance combining cost, energy flexibility and thermal comfort is the purpose of training the controller. The office building is modelled in EnergyPlus. Using the Building Control Virtual Test Bed (BCVTB), the control set-point schedule during each time step output from the RL controller interacts with the building model. The controller takes information about building condition, weather data and price signal from the grid as state (including operative temperature, ventilation rate, internal heat gain, etc. in different thermal zones of the building, solar radiation and outdoor air temperature, etc.) to optimize the reward which combines cost, energy flexibility and thermal comfort.
More than 1500 cases with rule-based control strategies considering weather prediction are also modelled and simulated. The results of the comparison between the RL control and the rule-based control cases show that the performance of RL controller exceeds that of all the cases after around 200 training (200 yearly operations in the BCVTB platform). It concludes the RL algorithms can easily reach an optimized trade-off performance combining cost, energy flexibility and thermal comfort which is very difficult to be reached by purely analysis and optimization from knowledge and experience.
Original languageEnglish
JournalApplied Energy
ISSN0306-2619
Publication statusIn preparation - 2019

Fingerprint

Office buildings
Recurrent neural networks
Reinforcement learning
Learning algorithms
reinforcement
learning
Thermal comfort
Controllers
energy
Cooling systems
cost
heating
cooling
Heating
Costs
trade-off
ventilation
solar radiation
Solar radiation
air temperature

Keywords

  • energy flexibility
  • demand response
  • thermal activation
  • Recurrent neural network (RNN)
  • advantage actor-critic (A2C)
  • Long short-term memory (LSTM)

Cite this

@article{d2752ef07c254ff88ad49981dfc752d2,
title = "Control optimization of energy flexibility using Reinforcement Learning algorithms (Recurrent Neural Network + Advantage Actor-Critic + Long Short-Term Memory) for a nearly zero-energy office building",
abstract = "Energy flexibility of buildings by controlling the heating and cooling systems is important for balancing the grid. In this paper, a model-free Advantage Actor-Critic (A2C) Reinforcement Learning (RL) controller strategy is developed to regulate heating and cooling systems using an artificial Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). Optimization of performance combining cost, energy flexibility and thermal comfort is the purpose of training the controller. The office building is modelled in EnergyPlus. Using the Building Control Virtual Test Bed (BCVTB), the control set-point schedule during each time step output from the RL controller interacts with the building model. The controller takes information about building condition, weather data and price signal from the grid as state (including operative temperature, ventilation rate, internal heat gain, etc. in different thermal zones of the building, solar radiation and outdoor air temperature, etc.) to optimize the reward which combines cost, energy flexibility and thermal comfort.More than 1500 cases with rule-based control strategies considering weather prediction are also modelled and simulated. The results of the comparison between the RL control and the rule-based control cases show that the performance of RL controller exceeds that of all the cases after around 200 training (200 yearly operations in the BCVTB platform). It concludes the RL algorithms can easily reach an optimized trade-off performance combining cost, energy flexibility and thermal comfort which is very difficult to be reached by purely analysis and optimization from knowledge and experience.",
keywords = "energy flexibility, demand response, thermal activation, Recurrent neural network (RNN), advantage actor-critic (A2C), Long short-term memory (LSTM)",
author = "Mingzhe Liu and Sandijs Vasilevskis and Per Heiselberg and Anna Marszal-Pomianowska and Hicham Johra",
year = "2019",
language = "English",
journal = "Applied Energy",
issn = "0306-2619",
publisher = "Pergamon Press",

}

TY - JOUR

T1 - Control optimization of energy flexibility using Reinforcement Learning algorithms (Recurrent Neural Network + Advantage Actor-Critic + Long Short-Term Memory) for a nearly zero-energy office building

AU - Liu, Mingzhe

AU - Vasilevskis, Sandijs

AU - Heiselberg, Per

AU - Marszal-Pomianowska, Anna

AU - Johra, Hicham

PY - 2019

Y1 - 2019

N2 - Energy flexibility of buildings by controlling the heating and cooling systems is important for balancing the grid. In this paper, a model-free Advantage Actor-Critic (A2C) Reinforcement Learning (RL) controller strategy is developed to regulate heating and cooling systems using an artificial Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). Optimization of performance combining cost, energy flexibility and thermal comfort is the purpose of training the controller. The office building is modelled in EnergyPlus. Using the Building Control Virtual Test Bed (BCVTB), the control set-point schedule during each time step output from the RL controller interacts with the building model. The controller takes information about building condition, weather data and price signal from the grid as state (including operative temperature, ventilation rate, internal heat gain, etc. in different thermal zones of the building, solar radiation and outdoor air temperature, etc.) to optimize the reward which combines cost, energy flexibility and thermal comfort.More than 1500 cases with rule-based control strategies considering weather prediction are also modelled and simulated. The results of the comparison between the RL control and the rule-based control cases show that the performance of RL controller exceeds that of all the cases after around 200 training (200 yearly operations in the BCVTB platform). It concludes the RL algorithms can easily reach an optimized trade-off performance combining cost, energy flexibility and thermal comfort which is very difficult to be reached by purely analysis and optimization from knowledge and experience.

AB - Energy flexibility of buildings by controlling the heating and cooling systems is important for balancing the grid. In this paper, a model-free Advantage Actor-Critic (A2C) Reinforcement Learning (RL) controller strategy is developed to regulate heating and cooling systems using an artificial Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN). Optimization of performance combining cost, energy flexibility and thermal comfort is the purpose of training the controller. The office building is modelled in EnergyPlus. Using the Building Control Virtual Test Bed (BCVTB), the control set-point schedule during each time step output from the RL controller interacts with the building model. The controller takes information about building condition, weather data and price signal from the grid as state (including operative temperature, ventilation rate, internal heat gain, etc. in different thermal zones of the building, solar radiation and outdoor air temperature, etc.) to optimize the reward which combines cost, energy flexibility and thermal comfort.More than 1500 cases with rule-based control strategies considering weather prediction are also modelled and simulated. The results of the comparison between the RL control and the rule-based control cases show that the performance of RL controller exceeds that of all the cases after around 200 training (200 yearly operations in the BCVTB platform). It concludes the RL algorithms can easily reach an optimized trade-off performance combining cost, energy flexibility and thermal comfort which is very difficult to be reached by purely analysis and optimization from knowledge and experience.

KW - energy flexibility

KW - demand response

KW - thermal activation

KW - Recurrent neural network (RNN)

KW - advantage actor-critic (A2C)

KW - Long short-term memory (LSTM)

M3 - Journal article

JO - Applied Energy

JF - Applied Energy

SN - 0306-2619

ER -