PatentExplorer Proof-of-concept:Text-embedding-based Patent Search and Exploration

Hain, Daniel S. (PI (principal investigator))
Jurowetzki, Roman (PI (principal investigator))

Beskrivelse

Textual information to be found in patents such as their title, abstract, claims, and description represent rich information on the technological and functional features of an invention. This information is critical for the work of patient attorneys to identify prior art and spot intellectual property infringements, and for businesses, technology analysts and policymakers to map technological development and identify trends and opportunities.

Existing approaches to search patent databases and identify prior art prior-art search are in most cases solely string-search based (typically augmented with Boolean operators), meaning they offer the possibility to search for patents containing (or not) certain search strings. While this is a simple method that has been applied and proven over decades, it is subject to substantial shortcomings. Most severe, string-based approaches aim for exact or approximate string matches, but do not consider synonyms (different words share the same contextual meaning), homonyms (the same word has different contextual meaning), and changing meaning of certain words/phrases over time. Consequently, formulating search strings requires knowledge of domain-specific technical jargon, which is particularly cumbersome and prone to false positives and negatives in interdisciplinary domains.

Our award-winning method solves the problem of representing complex technological descriptions in patent applications into numerical vectors that can be processed by algorithms. We do so by using state-of-the-art text embedding techniques from natural language processing (NLP) and deep learning. This is a necessary step to enable functionalities such as semantic search, technological similarity calculation, prediction of patent value, and patent landscaping. Most importantly, it enables semantic search by calculating similarity scores between texts of arbitrary length and patents.

Lægmandssprog

offices, private law firms, and technology companies. Current software enabling prior art search is usually based on string matching, meaning manually defined search terms are matched with patent documents containing these keywords. However, patent descriptions such as abstracts, claims, or their full text are full of technical jargon, which on the one hand requires domain expert knowledge and on the other hand is often inconsistent across disciplines and over time.
To ease prior art search and patent exploration, we use state-of-the-art text embedding techniques from the field of natural language processing (NLP) and deep learning to represent patent text in a numerical format that is easily searchable and comparable to own search terms and comparison text. Embedding techniques follow the intuition that words that often appear in a similar context should be seen as related, therefore represented in a way that expresses this similarity. As a consequence, for example, different words that share the same meaning (synonyms) will be represented by very similar embeddings. This eases prior art search by enabling search by meaning rather than fixed search strings. At the same time, these numerical vector representation of patents enables a host of additional use-cases in a scalable manner, such as patent landscaping, technology forecasting, and patent value prediction.

Kort titel	PatentExplorer
Status	Afsluttet
Effektiv start/slut dato	01/09/2020 → 28/02/2021

PatentExplorer Proof-of-concept:Text-embedding-based Patent Search and Exploration

Projektdetaljer

Beskrivelse

Lægmandssprog

Fingerprint