Abstract
Representation or embedding based machine learning models, such as language models or convolutional neural networks have shown great potential for improved performance. However, for complex models on large datasets training time can be extensive, approaching weeks, which is often infeasible in practice. In this work, we present a method to reduce training time substantially by selecting training instances that provide relevant information for training. Selection is based on the similarity of the learned representations over input instances, thus allowing for learning a non-trivial weighting scheme from multi-dimensional representations. We demonstrate the efficiency and effectivity of our approach in several text classification tasks using recursive neural networks. Our experiments show that by removing approximately one fifth of the training data the objective function converges up to six times faster without sacrificing accuracy.
Original language | English |
---|---|
Title of host publication | Advances in Knowledge Discovery and Data Mining - 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, April 14-17, 2019, Proceedings, Part III |
Editors | Qiang Yang, Min-Ling Zhang, Zhiguo Gong, Sheng-Jun Huang, Zhi-Hua Zhou |
Number of pages | 14 |
Publisher | Springer VS |
Publication date | 2019 |
Pages | 40-53 |
ISBN (Print) | 978-3-030-16141-5 |
DOIs | |
Publication status | Published - 2019 |
Event | Pacific-Asia Conference on Knowledge Discovery and Data Mining - Macau, China Duration: 14 Apr 2019 → 17 Apr 2019 Conference number: 23rd |
Conference
Conference | Pacific-Asia Conference on Knowledge Discovery and Data Mining |
---|---|
Number | 23rd |
Country/Territory | China |
City | Macau |
Period | 14/04/2019 → 17/04/2019 |
Series | Lecture Notes in Computer Science |
---|---|
Volume | 11441 |
ISSN | 0302-9743 |
Keywords
- Machine learning
- Neural network
- Recursive models
- Selective training