Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection

Adam Lagoda; Seyedeh Fatemeh Mahdavi Sharifi; Thomas Aagaard Pedersen; Daniel Ortiz Arroyo; Shi Chang; Petar Durdevic

doi:10.1109/CoDIT58514.2023.10284087

Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection

Adam Lagoda, Seyedeh Fatemeh Mahdavi Sharifi, Thomas Aagaard Pedersen, Daniel Ortiz Arroyo, Shi Chang, Petar Durdevic

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

53 Downloads (Pure)

Abstract

This paper discusses the implementation of a Deep
Reinforcement Learning policy, based on DQN, which optimizes
the navigation of the UAV to the front of wind turbine blades.
The UAV was trained in simulation using Unreal Engine V4.27
coupled with AirSim. The action space of the UAV was discretized
while allowing 6 different actions to be executed. A Yolov5
network trained with images of simulated wind turbines was
used for detection and tracking, providing the DQN policy with
state information, upon which it has been trained. In addition to
this, the dynamic reward has been implemented, which combined
both navigation and inspection objectives in the final evaluation
of actions. Our tests showed that after 7500 time-steps the
exploration rate reached near 0, the mean length of the episodes
increased from 10 down to 30, but the mean reward increased
from around -60 to stabilizing the output at 26. These results
suggest that the proposed method is a promising solution to
optimizing the autonomous inspection of wind turbines with
UAVs.

Originalsprog	Engelsk
Titel	9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023
Antal sider	6
Forlag	IEEE
Publikationsdato	okt. 2023
Sider	2372-2377
Artikelnummer	10284087
ISBN (Trykt)	979-8-3503-1141-9
ISBN (Elektronisk)	979-8-3503-1140-2
DOI	https://doi.org/10.1109/CoDIT58514.2023.10284087
Status	Udgivet - okt. 2023
Begivenhed	9th International Conference on Control, Decision and Information Technologies (CoDIT) - Rome, Italien Varighed: 3 jul. 2023 → 6 jul. 2023 https://codit2023.com/

Konference

Konference	9th International Conference on Control, Decision and Information Technologies (CoDIT)
Land/Område	Italien
By	Rome
Periode	03/07/2023 → 06/07/2023
Internetadresse	https://codit2023.com/

Navn	International Conference on Control, Decision and Information Technologies (CoDIT)
ISSN	2576-3555

Adgang til dokumentet

10.1109/CoDIT58514.2023.10284087

CoDIT__23___Conference_Paper___APEL2_1Accepteret manuskript, 3,11 MB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Andre filer og links

Link to publication in Scopus

Citationsformater

Lagoda, A., Fatemeh Mahdavi Sharifi, S., Pedersen, T. A., Ortiz Arroyo, D., Chang, S., & Durdevic, P. (2023). Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection. I 9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023 (s. 2372-2377). Artikel 10284087 IEEE. https://doi.org/10.1109/CoDIT58514.2023.10284087

Lagoda, Adam ; Fatemeh Mahdavi Sharifi, Seyedeh ; Pedersen, Thomas Aagaard et al. / Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection. 9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023. IEEE, 2023. s. 2372-2377 (International Conference on Control, Decision and Information Technologies (CoDIT)).

@inproceedings{894fd1ef06ad4b0480138b3c506a71bb,

title = "Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection",

abstract = "This paper discusses the implementation of a DeepReinforcement Learning policy, based on DQN, which optimizesthe navigation of the UAV to the front of wind turbine blades.The UAV was trained in simulation using Unreal Engine V4.27coupled with AirSim. The action space of the UAV was discretizedwhile allowing 6 different actions to be executed. A Yolov5network trained with images of simulated wind turbines wasused for detection and tracking, providing the DQN policy withstate information, upon which it has been trained. In addition tothis, the dynamic reward has been implemented, which combinedboth navigation and inspection objectives in the final evaluationof actions. Our tests showed that after 7500 time-steps theexploration rate reached near 0, the mean length of the episodesincreased from 10 down to 30, but the mean reward increasedfrom around -60 to stabilizing the output at 26. These resultssuggest that the proposed method is a promising solution tooptimizing the autonomous inspection of wind turbines withUAVs.",

keywords = "Condition Monitoring, Deep Q-network, Deep Reinforcement Learning, Dynamic Reward, Inspection, Path-planning, Simulation, UAV",

author = "Adam Lagoda and {Fatemeh Mahdavi Sharifi}, Seyedeh and Pedersen, {Thomas Aagaard} and {Ortiz Arroyo}, Daniel and Shi Chang and Petar Durdevic",

year = "2023",

month = oct,

doi = "10.1109/CoDIT58514.2023.10284087",

language = "English",

isbn = "979-8-3503-1141-9",

series = "International Conference on Control, Decision and Information Technologies (CoDIT)",

pages = "2372--2377",

booktitle = "9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023",

publisher = "IEEE",

address = "United States",

note = "9th International Conference on Control, Decision and Information Technologies (CoDIT) , CoDIT 2023 ; Conference date: 03-07-2023 Through 06-07-2023",

url = "https://codit2023.com/",

}

Lagoda, A, Fatemeh Mahdavi Sharifi, S , Pedersen, TA , Ortiz Arroyo, D, Chang, S & Durdevic, P 2023, Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection. i 9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023., 10284087, IEEE, International Conference on Control, Decision and Information Technologies (CoDIT), s. 2372-2377, 9th International Conference on Control, Decision and Information Technologies (CoDIT) , Rome, Italien, 03/07/2023. https://doi.org/10.1109/CoDIT58514.2023.10284087

Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection. / Lagoda, Adam; Fatemeh Mahdavi Sharifi, Seyedeh ; Pedersen, Thomas Aagaard et al.
9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023. IEEE, 2023. s. 2372-2377 10284087 (International Conference on Control, Decision and Information Technologies (CoDIT)).

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

TY - GEN

T1 - Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection

AU - Lagoda, Adam

AU - Fatemeh Mahdavi Sharifi, Seyedeh

AU - Pedersen, Thomas Aagaard

AU - Ortiz Arroyo, Daniel

AU - Chang, Shi

AU - Durdevic, Petar

PY - 2023/10

Y1 - 2023/10

N2 - This paper discusses the implementation of a DeepReinforcement Learning policy, based on DQN, which optimizesthe navigation of the UAV to the front of wind turbine blades.The UAV was trained in simulation using Unreal Engine V4.27coupled with AirSim. The action space of the UAV was discretizedwhile allowing 6 different actions to be executed. A Yolov5network trained with images of simulated wind turbines wasused for detection and tracking, providing the DQN policy withstate information, upon which it has been trained. In addition tothis, the dynamic reward has been implemented, which combinedboth navigation and inspection objectives in the final evaluationof actions. Our tests showed that after 7500 time-steps theexploration rate reached near 0, the mean length of the episodesincreased from 10 down to 30, but the mean reward increasedfrom around -60 to stabilizing the output at 26. These resultssuggest that the proposed method is a promising solution tooptimizing the autonomous inspection of wind turbines withUAVs.

AB - This paper discusses the implementation of a DeepReinforcement Learning policy, based on DQN, which optimizesthe navigation of the UAV to the front of wind turbine blades.The UAV was trained in simulation using Unreal Engine V4.27coupled with AirSim. The action space of the UAV was discretizedwhile allowing 6 different actions to be executed. A Yolov5network trained with images of simulated wind turbines wasused for detection and tracking, providing the DQN policy withstate information, upon which it has been trained. In addition tothis, the dynamic reward has been implemented, which combinedboth navigation and inspection objectives in the final evaluationof actions. Our tests showed that after 7500 time-steps theexploration rate reached near 0, the mean length of the episodesincreased from 10 down to 30, but the mean reward increasedfrom around -60 to stabilizing the output at 26. These resultssuggest that the proposed method is a promising solution tooptimizing the autonomous inspection of wind turbines withUAVs.

KW - Condition Monitoring

KW - Deep Q-network

KW - Deep Reinforcement Learning

KW - Dynamic Reward

KW - Inspection

KW - Path-planning

KW - Simulation

KW - UAV

UR - http://www.scopus.com/inward/record.url?scp=85177474986&partnerID=8YFLogxK

U2 - 10.1109/CoDIT58514.2023.10284087

DO - 10.1109/CoDIT58514.2023.10284087

M3 - Article in proceeding

SN - 979-8-3503-1141-9

T3 - International Conference on Control, Decision and Information Technologies (CoDIT)

SP - 2372

EP - 2377

BT - 9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023

PB - IEEE

T2 - 9th International Conference on Control, Decision and Information Technologies (CoDIT)

Y2 - 3 July 2023 through 6 July 2023

ER -

Lagoda A, Fatemeh Mahdavi Sharifi S , Pedersen TA , Ortiz Arroyo D, Chang S, Durdevic P. Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection. I 9th 2023 International Conference on Control, Decision and Information Technologies, CoDIT 2023. IEEE. 2023. s. 2372-2377. 10284087. (International Conference on Control, Decision and Information Technologies (CoDIT)). doi: 10.1109/CoDIT58514.2023.10284087

Dynamic Reward in DQN for Autonomous Navigation of UAVs using Object Detection

Abstract

Konference

Adgang til dokumentet

AUB Link

Andre filer og links

Fingeraftryk

Citationsformater