A Vector Worth a Thousand Counts

A Temporal Semantic Similarity Approach to Patent Impact Prediction

Daniel Hain, Roman Jurowetzki, Tobias Buchmann, Patrick Wolf

Research output: Working paperResearch

170 Downloads (Pure)

Abstract

Patent data has long been used as a widely accessible measure of the
rate and direction of technological change. However, this long tradition of research has
so far focused on producing and analyzing measures of patent quantity, assuming the
number of patents produced to accurately capture the rate of progress and innovation
output. Existing attempts to measure patent quality are mostly limited to the use of
forward- and backward citation pattern. In contrast, in this paper, we derive a patent
quality indicator by leveraging the rich but up to now under-utilized textual information
in patent abstracts. We employ vector space modeling techniques to create a highdimensional
vector representation of the patents to capture their technological signature.
Using almost near linear-scaling approximate nearest neighbor matching techniques, we
are able to compute dyadic similarity scores across large bodies of patent data. Based on
the temporal distribution of a patents similarity scores, we compute ex-ante indicators
of a patentÂťs technological novelty and ex-post indicators of technological impact and
significance. At the case of circa 132.000 electro-mobility patents, we demonstrate the
proposed indicators‘ to map, analyze, and predict patent quality on individual, firm, and
country level, and its development over time.
Original languageEnglish
Publication statusIn preparation - 2018

Fingerprint

Prediction
Patents
Semantic similarity
Patent quality
Patent data
Nearest neighbor
Modeling
Technological change
Scaling
Citations
Novelty

Keywords

  • Technological change
  • patent data
  • natural language processing
  • vector space modeling
  • quality indicators

Cite this

@techreport{855d9758d0174b4abaf58b7e72a1c223,
title = "A Vector Worth a Thousand Counts: A Temporal Semantic Similarity Approach to Patent Impact Prediction",
abstract = "Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patent{\^A}ťs technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators‘ to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.",
keywords = "Technological change, patent data, natural language processing, vector space modeling, quality indicators",
author = "Daniel Hain and Roman Jurowetzki and Tobias Buchmann and Patrick Wolf",
year = "2018",
language = "English",
type = "WorkingPaper",

}

TY - UNPB

T1 - A Vector Worth a Thousand Counts

T2 - A Temporal Semantic Similarity Approach to Patent Impact Prediction

AU - Hain, Daniel

AU - Jurowetzki, Roman

AU - Buchmann, Tobias

AU - Wolf, Patrick

PY - 2018

Y1 - 2018

N2 - Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patentÂťs technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators‘ to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.

AB - Patent data has long been used as a widely accessible measure of therate and direction of technological change. However, this long tradition of research hasso far focused on producing and analyzing measures of patent quantity, assuming thenumber of patents produced to accurately capture the rate of progress and innovationoutput. Existing attempts to measure patent quality are mostly limited to the use offorward- and backward citation pattern. In contrast, in this paper, we derive a patentquality indicator by leveraging the rich but up to now under-utilized textual informationin patent abstracts. We employ vector space modeling techniques to create a highdimensionalvector representation of the patents to capture their technological signature.Using almost near linear-scaling approximate nearest neighbor matching techniques, weare able to compute dyadic similarity scores across large bodies of patent data. Based onthe temporal distribution of a patents similarity scores, we compute ex-ante indicatorsof a patentÂťs technological novelty and ex-post indicators of technological impact andsignificance. At the case of circa 132.000 electro-mobility patents, we demonstrate theproposed indicators‘ to map, analyze, and predict patent quality on individual, firm, andcountry level, and its development over time.

KW - Technological change

KW - patent data

KW - natural language processing

KW - vector space modeling

KW - quality indicators

M3 - Working paper

BT - A Vector Worth a Thousand Counts

ER -