A Vector Worth a Thousand Counts: A Temporal Semantic Similarity Approach to Patent Impact Prediction

Daniel Hain; Roman Jurowetzki; Tobias Buchmann; Patrick Wolf

A Vector Worth a Thousand Counts: A Temporal Semantic Similarity Approach to Patent Impact Prediction

Daniel Hain, Roman Jurowetzki, Tobias Buchmann, Patrick Wolf

Research output: Working paper/Preprint › Working paper › Research

543 Downloads (Pure)

Abstract

Patent data has long been used as a widely accessible measure of the
rate and direction of technological change. However, this long tradition of research has
so far focused on producing and analyzing measures of patent quantity, assuming the
number of patents produced to accurately capture the rate of progress and innovation
output. Existing attempts to measure patent quality are mostly limited to the use of
forward- and backward citation pattern. In contrast, in this paper, we derive a patent
quality indicator by leveraging the rich but up to now under-utilized textual information
in patent abstracts. We employ vector space modeling techniques to create a highdimensional
vector representation of the patents to capture their technological signature.
Using almost near linear-scaling approximate nearest neighbor matching techniques, we
are able to compute dyadic similarity scores across large bodies of patent data. Based on
the temporal distribution of a patents similarity scores, we compute ex-ante indicators
of a patentÂťs technological novelty and ex-post indicators of technological impact and
significance. At the case of circa 132.000 electro-mobility patents, we demonstrate the
proposed indicators‘ to map, analyze, and predict patent quality on individual, firm, and
country level, and its development over time.

Original language	English
Publication status	Published - 2022

Keywords

Technological change
patent data
natural language processing
vector space modeling
quality indicators

Access to Document

vector-worth-thousandSubmitted manuscript, 1.33 MB

AUB Link

Search for the material in Aalborg University Library's search engine

OECD IPSDM “Big Data Analytics” Challenge
Hain, Daniel S. (Recipient), Jurowetzki, Roman (Recipient), Buchmann, Tobias (Recipient), Wolf, Patrick (Recipient) & Simmering, Paul (Recipient), 13 Sept 2018
Prize: Other prizes

Cite this

@techreport{855d9758d0174b4abaf58b7e72a1c223,

title = "A Vector Worth a Thousand Counts: A Temporal Semantic Similarity Approach to Patent Impact Prediction",

abstract = "Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patent{\^A}{\v t}s technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators{\textquoteleft} to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.",

keywords = "Technological change, patent data, natural language processing, vector space modeling, quality indicators",

author = "Daniel Hain and Roman Jurowetzki and Tobias Buchmann and Patrick Wolf",

year = "2022",

language = "English",

type = "WorkingPaper",

}

TY - UNPB

T1 - A Vector Worth a Thousand Counts

T2 - A Temporal Semantic Similarity Approach to Patent Impact Prediction

AU - Hain, Daniel

AU - Jurowetzki, Roman

AU - Buchmann, Tobias

AU - Wolf, Patrick

PY - 2022

Y1 - 2022

N2 - Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patentÂťs technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators‘ to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.

AB - Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patentÂťs technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators‘ to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.

KW - Technological change

KW - patent data

KW - natural language processing

KW - vector space modeling

KW - quality indicators

M3 - Working paper

BT - A Vector Worth a Thousand Counts

ER -

A Vector Worth a Thousand Counts: A Temporal Semantic Similarity Approach to Patent Impact Prediction

Abstract

Keywords

Access to Document

AUB Link

Fingerprint

Prizes

OECD IPSDM “Big Data Analytics” Challenge

Cite this