Reducing Annotation Efforts in Electricity Theft Detection through Optimal Sample Selection

Wenlong Liao; Birgitte Bak-Jensen; Jayakrishnan Radhakrishna Pillai; Xiaofang Xia; Guangchun Ruan; Zhe Yang

doi:10.1109/TIM.2024.3352696

Reducing Annotation Efforts in Electricity Theft Detection through Optimal Sample Selection

Wenlong Liao, Birgitte Bak-Jensen, Jayakrishnan Radhakrishna Pillai, Xiaofang Xia, Guangchun Ruan, Zhe Yang

Research output: Contribution to journal › Journal article › Research › peer-review

32 Downloads (Pure)

Abstract

Supervised machine learning models are receiving increasing attention in electricity theft detection due to their high detection accuracy. However, their performance depends on a massive amount of labeled training data, which comes from time-consuming and resource-intensive annotations. To maximize model performance within a limited annotation budget, this article aims to reduce the annotation effort in electricity theft detection through optimal sample selection. In particular, a general framework and three new strategies are proposed to select the most valuable and representative samples from different perspectives, including uncertainty, class imbalance, and diversity of samples. In-depth simulations and analyses are conducted to evaluate the effectiveness of the proposed strategies on commonly used machine learning models and a real-world dataset. Simulation results show that the proposed strategies significantly outperform baselines on datasets of different sizes and fraudulent ratios. Besides, the proposed strategies are effective in improving detection performance across a range of classifiers.

Original language	English
Article number	3508911
Journal	I E E E Transactions on Instrumentation and Measurement
Volume	73
Pages (from-to)	1-11
Number of pages	11
ISSN	0018-9456
DOIs	https://doi.org/10.1109/TIM.2024.3352696
Publication status	Published - 2024

Keywords

Annotations
Costs
Data annotation
Data models
Electricity theft
Games
Machine Learning
Power distribution
Sample selection
Smart grid
Training
Training data

Access to Document

10.1109/TIM.2024.3352696

final versionAccepted author manuscript, 1.03 MBLicence: CC BY 4.0

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{79da336cbab646fdb122aabcd16833e6,

title = "Reducing Annotation Efforts in Electricity Theft Detection through Optimal Sample Selection",

abstract = "Supervised machine learning models are receiving increasing attention in electricity theft detection due to their high detection accuracy. However, their performance depends on a massive amount of labeled training data, which comes from time-consuming and resource-intensive annotations. To maximize model performance within a limited annotation budget, this article aims to reduce the annotation effort in electricity theft detection through optimal sample selection. In particular, a general framework and three new strategies are proposed to select the most valuable and representative samples from different perspectives, including uncertainty, class imbalance, and diversity of samples. In-depth simulations and analyses are conducted to evaluate the effectiveness of the proposed strategies on commonly used machine learning models and a real-world dataset. Simulation results show that the proposed strategies significantly outperform baselines on datasets of different sizes and fraudulent ratios. Besides, the proposed strategies are effective in improving detection performance across a range of classifiers.",

keywords = "Annotations, Costs, Data annotation, Data models, Electricity theft, Games, Machine Learning, Power distribution, Sample selection, Smart grid, Training, Training data",

author = "Wenlong Liao and Birgitte Bak-Jensen and Pillai, {Jayakrishnan Radhakrishna} and Xiaofang Xia and Guangchun Ruan and Zhe Yang",

year = "2024",

doi = "10.1109/TIM.2024.3352696",

language = "English",

volume = "73",

pages = "1--11",

journal = "I E E E Transactions on Instrumentation and Measurement",

issn = "0018-9456",

publisher = "IEEE",

}

TY - JOUR

T1 - Reducing Annotation Efforts in Electricity Theft Detection through Optimal Sample Selection

AU - Liao, Wenlong

AU - Bak-Jensen, Birgitte

AU - Pillai, Jayakrishnan Radhakrishna

AU - Xia, Xiaofang

AU - Ruan, Guangchun

AU - Yang, Zhe

PY - 2024

Y1 - 2024

N2 - Supervised machine learning models are receiving increasing attention in electricity theft detection due to their high detection accuracy. However, their performance depends on a massive amount of labeled training data, which comes from time-consuming and resource-intensive annotations. To maximize model performance within a limited annotation budget, this article aims to reduce the annotation effort in electricity theft detection through optimal sample selection. In particular, a general framework and three new strategies are proposed to select the most valuable and representative samples from different perspectives, including uncertainty, class imbalance, and diversity of samples. In-depth simulations and analyses are conducted to evaluate the effectiveness of the proposed strategies on commonly used machine learning models and a real-world dataset. Simulation results show that the proposed strategies significantly outperform baselines on datasets of different sizes and fraudulent ratios. Besides, the proposed strategies are effective in improving detection performance across a range of classifiers.

AB - Supervised machine learning models are receiving increasing attention in electricity theft detection due to their high detection accuracy. However, their performance depends on a massive amount of labeled training data, which comes from time-consuming and resource-intensive annotations. To maximize model performance within a limited annotation budget, this article aims to reduce the annotation effort in electricity theft detection through optimal sample selection. In particular, a general framework and three new strategies are proposed to select the most valuable and representative samples from different perspectives, including uncertainty, class imbalance, and diversity of samples. In-depth simulations and analyses are conducted to evaluate the effectiveness of the proposed strategies on commonly used machine learning models and a real-world dataset. Simulation results show that the proposed strategies significantly outperform baselines on datasets of different sizes and fraudulent ratios. Besides, the proposed strategies are effective in improving detection performance across a range of classifiers.

KW - Annotations

KW - Costs

KW - Data annotation

KW - Data models

KW - Electricity theft

KW - Games

KW - Machine Learning

KW - Power distribution

KW - Sample selection

KW - Smart grid

KW - Training

KW - Training data

UR - http://www.scopus.com/inward/record.url?scp=85182917020&partnerID=8YFLogxK

U2 - 10.1109/TIM.2024.3352696

DO - 10.1109/TIM.2024.3352696

M3 - Journal article

SN - 0018-9456

VL - 73

SP - 1

EP - 11

JO - I E E E Transactions on Instrumentation and Measurement

JF - I E E E Transactions on Instrumentation and Measurement

M1 - 3508911

ER -

Reducing Annotation Efforts in Electricity Theft Detection through Optimal Sample Selection

Abstract

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this