TY - JOUR
T1 - Doing More with Less
T2 - a Survey of Data Selection Methods for Mathematical Modeling
AU - Weinreich, Nicolai A.
AU - Oshnoei, Arman
AU - Teodorescu, Remus
AU - Larsen, Kim G.
N1 - Publisher Copyright:
© 1989-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Big data applications such as Artificial Intelligence (AI) and Internet of Things (IoT) have in recent years been leading to many technological breakthroughs in system modeling. However, these applications are typically data intensive, thus requiring an increasing cost of resources. In this paper, a first-of-its-kind comprehensive review of data selection methods across different engineering disciplines is given in order to analyze the effectiveness of these methods in improving the data efficiency of mathematical modeling algorithms. Eight distinct selection methods have been identified and subsequently analyzed and discussed on the basis of the relevant literature. In addition, the selection methods have been classified according to three dichotomies established by the survey. A comparative analysis of these methods was conducted along with a discussion of potentials, challenges, and future research directions for the research area. Data selection was found to be widely used in many engineering applications and has the potential to play an important role in making more sustainable Big Data applications, especially those in which transmission of data across large distances is required. Furthermore, making resource-aware decisions about the use of data has been shown to be highly effective in reducing energy costs while ensuring high performance of the model.
AB - Big data applications such as Artificial Intelligence (AI) and Internet of Things (IoT) have in recent years been leading to many technological breakthroughs in system modeling. However, these applications are typically data intensive, thus requiring an increasing cost of resources. In this paper, a first-of-its-kind comprehensive review of data selection methods across different engineering disciplines is given in order to analyze the effectiveness of these methods in improving the data efficiency of mathematical modeling algorithms. Eight distinct selection methods have been identified and subsequently analyzed and discussed on the basis of the relevant literature. In addition, the selection methods have been classified according to three dichotomies established by the survey. A comparative analysis of these methods was conducted along with a discussion of potentials, challenges, and future research directions for the research area. Data selection was found to be widely used in many engineering applications and has the potential to play an important role in making more sustainable Big Data applications, especially those in which transmission of data across large distances is required. Furthermore, making resource-aware decisions about the use of data has been shown to be highly effective in reducing energy costs while ensuring high performance of the model.
KW - Artificial intelligence
KW - Big Data
KW - communication
KW - data selection
KW - mathematical modeling
UR - http://www.scopus.com/inward/record.url?scp=85218952260&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2025.3545965
DO - 10.1109/TKDE.2025.3545965
M3 - Journal article
AN - SCOPUS:85218952260
SN - 1041-4347
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
ER -