Towards Comparing Recommendation to Multiple-Query Search Sessions for Talent Search

Mesut Kaya; Toine Bogers

Towards Comparing Recommendation to Multiple-Query Search Sessions for Talent Search

Mesut Kaya^*, Toine Bogers

^*Corresponding author for this work

Research output: Contribution to journal › Conference article in Journal › Research › peer-review

25 Downloads (Pure)

Abstract

Query-level evaluation metrics such as nDCG that originate from field of Information Retrieval (IR) have seen widespread adoption in the Recommender Systems (RS) community for comparing the quality of different ranked lists of recommendations with different levels of relevance to the user. However, the traditional (offline) RS evaluation paradigm is typically restricted to evaluating a single results list. In contrast, IR researchers have also developed evaluation metrics over the past decade for the session-based evaluation of more complex search tasks. Here, the sessions consist of multiple queries and multi-round search interactions, and the metrics evaluate the quality of the session as a whole. Despite the popularity of the more traditional single-list evaluation paradigm, RS can also be used to assist users with complex information access tasks. In this paper, we explore the usefulness of session-level evaluation metrics for evaluating and comparing the performance of both recommender systems and search engines. We show that, despite possible misconceptions that comparing both scenarios is akin to comparing apples to oranges, it is indeed possible to compare recommendation results from a single ranked list to the results from a whole search session. In doing so, we address the following questions: (1) how can we fairly and realistically compare the quality of an individual list of recommended items to the quality of an entire manual search session; (2) how can we measure the contribution that the RS is making to the entire search session. We contextualize our claims by focusing on a particular complex search scenario: the problem of talent search. An example of professional search, talent search involves recruiters searching for relevant candidates given a specific job posting by issuing multiple queries in the course of a search session. We show that it is possible to compare the search behavior and success of recruiters to that of a matchmaking recommender system that generates a single ranked list of relevant candidates for a given job posting. In particular, we adopt a session-based metric from IR and motivate how it can be used to perform valid and realistic comparisons of recommendation lists to multiple-query search sessions.

Original language	English
Article number	6
Journal	CEUR Workshop Proceedings
Volume	3228
ISSN	1613-0073
Publication status	Published - 2022
Event	2022 Perspectives on the Evaluation of Recommender Systems Workshop, PERSPECTIVES 2022 - Seattle, United States Duration: 22 Sept 2022 → …

Conference

Conference	2022 Perspectives on the Evaluation of Recommender Systems Workshop, PERSPECTIVES 2022
Country/Territory	United States
City	Seattle
Period	22/09/2022 → …

Bibliographical note

Funding Information:
This research was supported by the Innovation Fund Denmark, grant no. 0175-000005B.

Publisher Copyright:
© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)

Keywords

evaluation
job recommendation
recruitment
search
session-based recommendation

Access to Document

Open Access ArticleFinal published version, 0.99 MBLicence: CC BY 4.0

https://ceur-ws.org/Vol-3228/paper6.pdfLicence: CC BY 4.0

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{788d7f3941ca4fb69c67167015727e61,

title = "Towards Comparing Recommendation to Multiple-Query Search Sessions for Talent Search",

abstract = "Query-level evaluation metrics such as nDCG that originate from field of Information Retrieval (IR) have seen widespread adoption in the Recommender Systems (RS) community for comparing the quality of different ranked lists of recommendations with different levels of relevance to the user. However, the traditional (offline) RS evaluation paradigm is typically restricted to evaluating a single results list. In contrast, IR researchers have also developed evaluation metrics over the past decade for the session-based evaluation of more complex search tasks. Here, the sessions consist of multiple queries and multi-round search interactions, and the metrics evaluate the quality of the session as a whole. Despite the popularity of the more traditional single-list evaluation paradigm, RS can also be used to assist users with complex information access tasks. In this paper, we explore the usefulness of session-level evaluation metrics for evaluating and comparing the performance of both recommender systems and search engines. We show that, despite possible misconceptions that comparing both scenarios is akin to comparing apples to oranges, it is indeed possible to compare recommendation results from a single ranked list to the results from a whole search session. In doing so, we address the following questions: (1) how can we fairly and realistically compare the quality of an individual list of recommended items to the quality of an entire manual search session; (2) how can we measure the contribution that the RS is making to the entire search session. We contextualize our claims by focusing on a particular complex search scenario: the problem of talent search. An example of professional search, talent search involves recruiters searching for relevant candidates given a specific job posting by issuing multiple queries in the course of a search session. We show that it is possible to compare the search behavior and success of recruiters to that of a matchmaking recommender system that generates a single ranked list of relevant candidates for a given job posting. In particular, we adopt a session-based metric from IR and motivate how it can be used to perform valid and realistic comparisons of recommendation lists to multiple-query search sessions.",

keywords = "evaluation, job recommendation, recruitment, search, session-based recommendation",

author = "Mesut Kaya and Toine Bogers",

note = "Funding Information: This research was supported by the Innovation Fund Denmark, grant no. 0175-000005B. Publisher Copyright: {\textcopyright} 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0); 2022 Perspectives on the Evaluation of Recommender Systems Workshop, PERSPECTIVES 2022 ; Conference date: 22-09-2022",

year = "2022",

language = "English",

volume = "3228",

journal = "CEUR Workshop Proceedings",

issn = "1613-0073",

publisher = "CEUR Workshop Proceedings",

}

TY - GEN

T1 - Towards Comparing Recommendation to Multiple-Query Search Sessions for Talent Search

AU - Kaya, Mesut

AU - Bogers, Toine

N1 - Funding Information: This research was supported by the Innovation Fund Denmark, grant no. 0175-000005B. Publisher Copyright: © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)

PY - 2022

Y1 - 2022

N2 - Query-level evaluation metrics such as nDCG that originate from field of Information Retrieval (IR) have seen widespread adoption in the Recommender Systems (RS) community for comparing the quality of different ranked lists of recommendations with different levels of relevance to the user. However, the traditional (offline) RS evaluation paradigm is typically restricted to evaluating a single results list. In contrast, IR researchers have also developed evaluation metrics over the past decade for the session-based evaluation of more complex search tasks. Here, the sessions consist of multiple queries and multi-round search interactions, and the metrics evaluate the quality of the session as a whole. Despite the popularity of the more traditional single-list evaluation paradigm, RS can also be used to assist users with complex information access tasks. In this paper, we explore the usefulness of session-level evaluation metrics for evaluating and comparing the performance of both recommender systems and search engines. We show that, despite possible misconceptions that comparing both scenarios is akin to comparing apples to oranges, it is indeed possible to compare recommendation results from a single ranked list to the results from a whole search session. In doing so, we address the following questions: (1) how can we fairly and realistically compare the quality of an individual list of recommended items to the quality of an entire manual search session; (2) how can we measure the contribution that the RS is making to the entire search session. We contextualize our claims by focusing on a particular complex search scenario: the problem of talent search. An example of professional search, talent search involves recruiters searching for relevant candidates given a specific job posting by issuing multiple queries in the course of a search session. We show that it is possible to compare the search behavior and success of recruiters to that of a matchmaking recommender system that generates a single ranked list of relevant candidates for a given job posting. In particular, we adopt a session-based metric from IR and motivate how it can be used to perform valid and realistic comparisons of recommendation lists to multiple-query search sessions.

AB - Query-level evaluation metrics such as nDCG that originate from field of Information Retrieval (IR) have seen widespread adoption in the Recommender Systems (RS) community for comparing the quality of different ranked lists of recommendations with different levels of relevance to the user. However, the traditional (offline) RS evaluation paradigm is typically restricted to evaluating a single results list. In contrast, IR researchers have also developed evaluation metrics over the past decade for the session-based evaluation of more complex search tasks. Here, the sessions consist of multiple queries and multi-round search interactions, and the metrics evaluate the quality of the session as a whole. Despite the popularity of the more traditional single-list evaluation paradigm, RS can also be used to assist users with complex information access tasks. In this paper, we explore the usefulness of session-level evaluation metrics for evaluating and comparing the performance of both recommender systems and search engines. We show that, despite possible misconceptions that comparing both scenarios is akin to comparing apples to oranges, it is indeed possible to compare recommendation results from a single ranked list to the results from a whole search session. In doing so, we address the following questions: (1) how can we fairly and realistically compare the quality of an individual list of recommended items to the quality of an entire manual search session; (2) how can we measure the contribution that the RS is making to the entire search session. We contextualize our claims by focusing on a particular complex search scenario: the problem of talent search. An example of professional search, talent search involves recruiters searching for relevant candidates given a specific job posting by issuing multiple queries in the course of a search session. We show that it is possible to compare the search behavior and success of recruiters to that of a matchmaking recommender system that generates a single ranked list of relevant candidates for a given job posting. In particular, we adopt a session-based metric from IR and motivate how it can be used to perform valid and realistic comparisons of recommendation lists to multiple-query search sessions.

KW - evaluation

KW - job recommendation

KW - recruitment

KW - search

KW - session-based recommendation

UR - http://www.scopus.com/inward/record.url?scp=85140338925&partnerID=8YFLogxK

M3 - Conference article in Journal

AN - SCOPUS:85140338925

SN - 1613-0073

VL - 3228

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

M1 - 6

T2 - 2022 Perspectives on the Evaluation of Recommender Systems Workshop, PERSPECTIVES 2022

Y2 - 22 September 2022

ER -

Towards Comparing Recommendation to Multiple-Query Search Sessions for Talent Search

Abstract

Conference

Bibliographical note

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this