Abstract
This study presents an efficient approach for utilizing text data to calculate patent-to-patent (p2p) technological similarity and proposes a hybrid framework for leveraging the resulting p2p similarity in applications such as semantic search and automated patent classification. To achieve this, we create embeddings using Sentence-BERT (SBERT) on patent claims. For domain adaptation of the general SBERT model, we implement an augmented approach to fine-tune SBERT using in-domain supervised patent claims data. The study utilizes SBERT's efficiency in creating embedding distance measures to map p2p similarity in large sets of patent data. We demonstrate applications of the framework for the use case of automated patent classification with a simple K Nearest Neighbors (KNN) model that predicts assigned Cooperative Patent Classification (CPC) based on the class assignment of the K patents with the highest p2p similarity. The results show that p2p similarity captures technological features in terms of CPC overlap, and the approach is useful for automatic patent classification based on text data. Moreover, the presented classification framework is simple, and the results are easy to interpret and evaluate by end-users via instance-based explanations. The study performs an out-of-sample model validation, predicting all assigned CPC classes on the subclass (663) level with an F1 score of 66 %, outperforming the current state-of-the-art in text-based multi-label patent classification. The study also discusses the applicability of the presented framework for semantic intellectual property (IP) search, patent landscaping, and technology mapping. Finally, the study outlines a future research agenda to leverage multi-source patent embeddings, evaluate their appropriateness across applications, and improve and validate patent embeddings by creating domain-expert curated Semantic Textual Similarity (STS) benchmark datasets.
Original language | English |
---|---|
Article number | 123536 |
Journal | Technological Forecasting and Social Change |
Volume | 206 |
ISSN | 0040-1625 |
DOIs | |
Publication status | Published - Sept 2024 |
Bibliographical note
Publisher Copyright:© 2024 Elsevier Inc.
Keywords
- Augmented SBERT
- Deep NLP
- Hybrid model
- Model explainability
- Patent classification
- Technological distance