Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection

Adam Lagoda, Seyedeh Fatemeh Mahdavi Sharifi, Thomas Aagaard Pedersen, Daniel Ortiz Arroyo, Shi Chang, Petar Durdevic

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

1 Citation (Scopus)
149 Downloads (Pure)

Abstract

This paper discusses the implementation of a Deep
Reinforcement Learning policy, based on DQN, which optimizes
the navigation of the UAV to the front of wind turbine blades.
The UAV was trained in simulation using Unreal Engine V4.27
coupled with AirSim. The action space of the UAV was discretized
while allowing 6 different actions to be executed. A Yolov5
network trained with images of simulated wind turbines was
used for detection and tracking, providing the DQN policy with
state information, upon which it has been trained. In addition to
this, the dynamic reward has been implemented, which combined
both navigation and inspection objectives in the final evaluation
of actions. Our tests showed that after 7500 time-steps the
exploration rate reached near 0, the mean length of the episodes
increased from 10 down to 30, but the mean reward increased
from around -60 to stabilizing the output at 26. These results
suggest that the proposed method is a promising solution to
optimizing the autonomous inspection of wind turbines with
UAVs.
Original languageEnglish
Title of host publication9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023
Number of pages6
PublisherIEEE (Institute of Electrical and Electronics Engineers)
Publication dateOct 2023
Pages2372-2377
Article number10284087
ISBN (Print)979-8-3503-1141-9
ISBN (Electronic)979-8-3503-1140-2
DOIs
Publication statusPublished - Oct 2023
Event9th International Conference on Control, Decision and Information Technologies (CoDIT) - Rome, Italy
Duration: 3 Jul 20236 Jul 2023
https://codit2023.com/

Conference

Conference9th International Conference on Control, Decision and Information Technologies (CoDIT)
Country/TerritoryItaly
CityRome
Period03/07/202306/07/2023
Internet address
SeriesInternational Conference on Control, Decision and Information Technologies (CoDIT)
ISSN2576-3555

Keywords

  • Condition Monitoring
  • Deep Q-network
  • Deep Reinforcement Learning
  • Dynamic Reward
  • Inspection
  • Path-planning
  • Simulation
  • UAV

Fingerprint

Dive into the research topics of 'Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection'. Together they form a unique fingerprint.

Cite this