Utilising Tree-Based Ensemble Learning for Speaker Segmentation

Mohamed Abou-Zleikha; Zheng-Hua Tan; Mads Græsbøll Christensen; Søren Holdt Jensen

doi:10.1007/978-3-662-44654-6_5

Utilising Tree-Based Ensemble Learning for Speaker Segmentation

Mohamed Abou-Zleikha, Zheng-Hua Tan, Mads Græsbøll Christensen, Søren Holdt Jensen

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

273 Downloads (Pure)

Abstract

In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used ones as they proved to be very effective for such task. The main criticism levelled against BIC-based approaches is the use of a penalty parameter in the BIC function. The use of this parameters consequently means that a fine tuning is required for each variation of the acoustic conditions. When tuned for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability.
In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant left and right segments. The tree is constructed where each node corresponds to the highest ΔBIC with the associated point index. After building the forest and using all trees, the accumulated ΔBIC for each point is calculated and the positions of the local maximums are considered as speaker changing points. The proposed approach is tested on artificially created conversations from the TIMIT database. The approach proposed show very accurate results comparable to those achieved by the-state-of-the-art methods with a 9% (absolute) higher F 1 compared with the standard ΔBIC with optimally tuned penalty parameter.

Original language	English
Title of host publication	Artificial Intelligence Applications and Innovations: 10th IFIP WG 12.5 International Conference, AIAI 2014, Rhodes, Greece, September 19-21, 2014. Proceedings
Volume	436
Publisher	Springer
Publication date	2014
Pages	50-59
ISBN (Print)	978-3-662-44653-9
ISBN (Electronic)	978-3-662-44654-6
DOIs	https://doi.org/10.1007/978-3-662-44654-6_5
Publication status	Published - 2014
Event	10th Conference on Artificial Intelligence Applications & Innovations 2014 - Rodos, Greece Duration: 19 Sept 2014 → 21 Sept 2014

Conference

Conference	10th Conference on Artificial Intelligence Applications & Innovations 2014
Country/Territory	Greece
City	Rodos
Period	19/09/2014 → 21/09/2014

Series	IFIP AICT - Advances in Information and Communication technology
ISSN	1868-4238

Access to Document

10.1007/978-3-662-44654-6_5

Utilising Tree-Based Ensemble Learning for Speaker SegmentationSubmitted manuscript, 263 KB

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

Abou-Zleikha, M., Tan, Z.-H., Christensen, M. G., & Jensen, S. H. (2014). Utilising Tree-Based Ensemble Learning for Speaker Segmentation. In Artificial Intelligence Applications and Innovations: 10th IFIP WG 12.5 International Conference, AIAI 2014, Rhodes, Greece, September 19-21, 2014. Proceedings (Vol. 436, pp. 50-59). Springer. https://doi.org/10.1007/978-3-662-44654-6_5

Abou-Zleikha, Mohamed ; Tan, Zheng-Hua ; Christensen, Mads Græsbøll et al. / Utilising Tree-Based Ensemble Learning for Speaker Segmentation. Artificial Intelligence Applications and Innovations: 10th IFIP WG 12.5 International Conference, AIAI 2014, Rhodes, Greece, September 19-21, 2014. Proceedings. Vol. 436 Springer, 2014. pp. 50-59 (IFIP AICT - Advances in Information and Communication technology).

@inproceedings{32b1b6cc990f427f9da4cdf7248639ac,

title = "Utilising Tree-Based Ensemble Learning for Speaker Segmentation",

abstract = "In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used ones as they proved to be very effective for such task. The main criticism levelled against BIC-based approaches is the use of a penalty parameter in the BIC function. The use of this parameters consequently means that a fine tuning is required for each variation of the acoustic conditions. When tuned for a certain condition, the model becomes biased to the data used for training limiting the model{\textquoteright}s generalisation ability.In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant left and right segments. The tree is constructed where each node corresponds to the highest ΔBIC with the associated point index. After building the forest and using all trees, the accumulated ΔBIC for each point is calculated and the positions of the local maximums are considered as speaker changing points. The proposed approach is tested on artificially created conversations from the TIMIT database. The approach proposed show very accurate results comparable to those achieved by the-state-of-the-art methods with a 9% (absolute) higher F 1 compared with the standard ΔBIC with optimally tuned penalty parameter.",

author = "Mohamed Abou-Zleikha and Zheng-Hua Tan and Christensen, {Mads Gr{\ae}sb{\o}ll} and Jensen, {S{\o}ren Holdt}",

year = "2014",

doi = "10.1007/978-3-662-44654-6_5",

language = "English",

isbn = "978-3-662-44653-9",

volume = "436",

series = "IFIP AICT - Advances in Information and Communication technology",

publisher = "Springer",

pages = "50--59",

booktitle = "Artificial Intelligence Applications and Innovations: 10th IFIP WG 12.5 International Conference, AIAI 2014, Rhodes, Greece, September 19-21, 2014. Proceedings",

address = "Germany",

note = "10th Conference on Artificial Intelligence Applications & Innovations 2014, AIAI 2014 ; Conference date: 19-09-2014 Through 21-09-2014",

}

Abou-Zleikha, M, Tan, Z-H , Christensen, MG & Jensen, SH 2014, Utilising Tree-Based Ensemble Learning for Speaker Segmentation. in Artificial Intelligence Applications and Innovations: 10th IFIP WG 12.5 International Conference, AIAI 2014, Rhodes, Greece, September 19-21, 2014. Proceedings. vol. 436, Springer, IFIP AICT - Advances in Information and Communication technology, pp. 50-59, 10th Conference on Artificial Intelligence Applications & Innovations 2014, Rodos, Greece, 19/09/2014. https://doi.org/10.1007/978-3-662-44654-6_5

Utilising Tree-Based Ensemble Learning for Speaker Segmentation. / Abou-Zleikha, Mohamed; Tan, Zheng-Hua ; Christensen, Mads Græsbøll et al.
Artificial Intelligence Applications and Innovations: 10th IFIP WG 12.5 International Conference, AIAI 2014, Rhodes, Greece, September 19-21, 2014. Proceedings. Vol. 436 Springer, 2014. p. 50-59 (IFIP AICT - Advances in Information and Communication technology).

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - Utilising Tree-Based Ensemble Learning for Speaker Segmentation

AU - Abou-Zleikha, Mohamed

AU - Tan, Zheng-Hua

AU - Christensen, Mads Græsbøll

AU - Jensen, Søren Holdt

PY - 2014

Y1 - 2014

N2 - In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used ones as they proved to be very effective for such task. The main criticism levelled against BIC-based approaches is the use of a penalty parameter in the BIC function. The use of this parameters consequently means that a fine tuning is required for each variation of the acoustic conditions. When tuned for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability.In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant left and right segments. The tree is constructed where each node corresponds to the highest ΔBIC with the associated point index. After building the forest and using all trees, the accumulated ΔBIC for each point is calculated and the positions of the local maximums are considered as speaker changing points. The proposed approach is tested on artificially created conversations from the TIMIT database. The approach proposed show very accurate results comparable to those achieved by the-state-of-the-art methods with a 9% (absolute) higher F 1 compared with the standard ΔBIC with optimally tuned penalty parameter.

AB - In audio and speech processing, accurate detection of the changing points between multiple speakers in speech segments is an important stage for several applications such as speaker identification and tracking. Bayesian Information Criteria (BIC)-based approaches are the most traditionally used ones as they proved to be very effective for such task. The main criticism levelled against BIC-based approaches is the use of a penalty parameter in the BIC function. The use of this parameters consequently means that a fine tuning is required for each variation of the acoustic conditions. When tuned for a certain condition, the model becomes biased to the data used for training limiting the model’s generalisation ability.In this paper, we propose a BIC-based tuning-free approach for speaker segmentation through the use of ensemble-based learning. A forest of segmentation trees is constructed in which each tree is trained using a sampled version of the speech segment. During the tree construction process, a set of randomly selected points in the input sequence is examined as potential segmentation points. The point that yields the highest ΔBIC is chosen and the same process is repeated for the resultant left and right segments. The tree is constructed where each node corresponds to the highest ΔBIC with the associated point index. After building the forest and using all trees, the accumulated ΔBIC for each point is calculated and the positions of the local maximums are considered as speaker changing points. The proposed approach is tested on artificially created conversations from the TIMIT database. The approach proposed show very accurate results comparable to those achieved by the-state-of-the-art methods with a 9% (absolute) higher F 1 compared with the standard ΔBIC with optimally tuned penalty parameter.

U2 - 10.1007/978-3-662-44654-6_5

DO - 10.1007/978-3-662-44654-6_5

M3 - Article in proceeding

SN - 978-3-662-44653-9

VL - 436

T3 - IFIP AICT - Advances in Information and Communication technology

SP - 50

EP - 59

BT - Artificial Intelligence Applications and Innovations: 10th IFIP WG 12.5 International Conference, AIAI 2014, Rhodes, Greece, September 19-21, 2014. Proceedings

PB - Springer

T2 - 10th Conference on Artificial Intelligence Applications & Innovations 2014

Y2 - 19 September 2014 through 21 September 2014

ER -

Abou-Zleikha M, Tan ZH , Christensen MG, Jensen SH. Utilising Tree-Based Ensemble Learning for Speaker Segmentation. In Artificial Intelligence Applications and Innovations: 10th IFIP WG 12.5 International Conference, AIAI 2014, Rhodes, Greece, September 19-21, 2014. Proceedings. Vol. 436. Springer. 2014. p. 50-59. (IFIP AICT - Advances in Information and Communication technology). doi: 10.1007/978-3-662-44654-6_5

Utilising Tree-Based Ensemble Learning for Speaker Segmentation

Abstract

Conference

Access to Document

AUB Link

Fingerprint

Cite this