A Vector Worth a Thousand Counts: A Temporal Semantic Similarity Approach to Patent Impact Prediction

Research output: Working paperResearch

Standard

Harvard

APA

CBE

MLA

Vancouver

Author

Bibtex

@techreport{855d9758d0174b4abaf58b7e72a1c223,
title = "A Vector Worth a Thousand Counts: A Temporal Semantic Similarity Approach to Patent Impact Prediction",
abstract = "Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patent{\^A}ťs technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators‘ to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.",
keywords = "Technological change, patent data, natural language processing, vector space modeling, quality indicators",
author = "Daniel Hain and Roman Jurowetzki and Tobias Buchmann and Patrick Wolf",
year = "2018",
language = "English",
type = "WorkingPaper",

}

RIS

TY - UNPB

T1 - A Vector Worth a Thousand Counts

T2 - A Temporal Semantic Similarity Approach to Patent Impact Prediction

AU - Hain, Daniel

AU - Jurowetzki, Roman

AU - Buchmann, Tobias

AU - Wolf, Patrick

PY - 2018

Y1 - 2018

N2 - Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patentÂťs technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators‘ to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.

AB - Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patentÂťs technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators‘ to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.

KW - Technological change

KW - patent data

KW - natural language processing

KW - vector space modeling

KW - quality indicators

M3 - Working paper

BT - A Vector Worth a Thousand Counts

ER -

ID: 282564570