Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection

Adam Lagoda, Seyedeh Fatemeh Mahdavi Sharifi, Thomas Aagaard Pedersen, Daniel Ortiz Arroyo, Petar Durdevic

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

5 Downloads (Pure)


This paper discusses the implementation of a Deep
Reinforcement Learning policy, based on DQN, which optimizes
the navigation of the UAV to the front of wind turbine blades.
The UAV was trained in simulation using Unreal Engine V4.27
coupled with AirSim. The action space of the UAV was discretized
while allowing 6 different actions to be executed. A Yolov5
network trained with images of simulated wind turbines was
used for detection and tracking, providing the DQN policy with
state information, upon which it has been trained. In addition to
this, the dynamic reward has been implemented, which combined
both navigation and inspection objectives in the final evaluation
of actions. Our tests showed that after 7500 time-steps the
exploration rate reached near 0, the mean length of the episodes
increased from 10 down to 30, but the mean reward increased
from around -60 to stabilizing the output at 26. These results
suggest that the proposed method is a promising solution to
optimizing the autonomous inspection of wind turbines with
Original languageEnglish
Title of host publicationInternational Conference on Control, Decision and Information Technologies
Publication dateOct 2023
ISBN (Print)979-8-3503-1141-9
ISBN (Electronic)979-8-3503-1140-2
Publication statusPublished - Oct 2023


Dive into the research topics of 'Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection'. Together they form a unique fingerprint.

Cite this