• 0 Comments
  • /
  • last updated : 09 August, 2021

Patent Database- Accuracy is the key

Category: Articles
271

Abstract

Patent data analytics is valuable as long as it provides actionable insights. Such insights can only be derived if the analysis is accurate and for achieving accurate patent analytics, high-quality data is the key prerequisite.
One of the major reasons for the low quality of patent data is that it is collected from multiple sources. An extensive patent data coverage is a result of data collected from national patent offices as well as international organizations maintaining patent data. Multiple sources for data collection result in variations or inconsistencies in some data fields.
Identification of data inconsistencies or incorrectness is very critical for accurate data analysis. Key data points which have data inconsistencies are patent owner data, family data, missing /incorrect translations, legal status information.
XLSCOUT has developed unique algorithms to clean key data points in order to provide accurate data analysis to our users.

Introduction

Patent data analytics revolves around some key fields such as patent ownership, legal status, family coverage, to name a few. Incorrect or incomplete data can result in inaccurate data analysis. Data for the same fields are present in different formats across countries as all patent offices do not share a common rule for owner names, application numbering systems or legal status titles. Also, some of the patent offices have missing data. Therefore, it is imperative to identify such data fields which are critical for patent data analytics but are also available with inconsistencies.

Problem

Patent data from multiple patent offices worldwide and other open sources is raw and inconsistent. Patent data analytics on such incorrect, inconsistent or incomplete data is inaccurate and does not provide actionable insights.

Solution

Xlscout has developed algorithms for automated data enrichments and cleaning to ensure the availability of high-quality data for the users. Targeted algorithms have been developed for the following data fields and use cases:

  • Patent applicant/assignee name normalization
  • The original and corporate owner of the patent
  • Extended patent family
  • Normalizing legal status information

It is a common observation that applicant/assignee names are available in varied formats such as with different entity types (E.g. Sumitomo chemical corporation, Sumitomo Chemical Co. Ltd.) or with misspelled names (e.g. Fitbit Inc., Freebit, Inc.). In addition to the normalization of such variations in the applicant/assignee information, Xlscout has also overcome the challenge of the unavailability of translated names. Owing to inconsistencies in assignee name information, for any analysis related to patent ownership data, the most important step is to be able to extract the relevant and accurate dataset. Many patent offices do not maintain translated applicant names and provide applicant names only in their native language. Such results often get missed out in the analysis as the majority of the users perform assignee/applicant search by giving only the English name as the input. Xlscout has developed a proprietary database maintaining correct translations of Company names.

Another promising data field for patent analytics is the original/current owner of the patents as these two fields are very critical for tracking down competitors in any technology. The current and original owner of the patents can be identified by tracking down all the ownership transactions during the term of the patent and/or historical name changes. Xlscout algorithm processes all the legal transactions of a patent as well corporate name changes to provide the current and original owner of the patent as enriched data fields.

Family data for patents is yet another significant data point for patent analytics as it is the indicator of the geographical coverage of the invention protection. Mostly, the patent family member analysis is based on the extended INPADOC family. However, the extended INPADOC family has some drawbacks. Priority numbers and application numbers from different patent offices are in different formats. Due to this reason, the numbers are not normalized and the required link between the priority and/or application numbers are not formed. This results in many patent numbers having missing family information or showing no family members at all. Xlscout family overcomes this problem of missing links and provides accurate family information of patents. We have normalized the number formats and created direct/indirect links to identify all the family members of the patents.

The legal status is another critical data field for patent analytics. Legal events in patents of different countries are denoted by different titles. This implies that the legal event for a US or CN patent can be the same but the text defining the legal event is different and can be interpreted differently. This makes it very difficult to standardize the legal event data of patents irrespective of the country. Xlscout provides standardized legal status information as two separate fields- one is the simple legal status and the other one is the actual legal state of the patent which highlights the reason for the patent to be alive or dead. Eg. if the simple legal status is dead, then the legal state could be expired, revoked, non-payment, etc.

Conclusion

High quality data is the foundation of accurate data analytics. Xlscout realizes the importance of making informed business decisions from patent data analytics and has come up with verified accurate and enriched data for its customers. This will enable the users to derive actionable insights from analytics.

About the Author

This article was written by Stuti Misra. Currently, Stuti is working as a Product Manager for XLPAT LABS. She works with a team of data engineers and data scientists to create algorithms for patent data cleaning and enrichment. She also has prior experience working as a Patent Analyst which helped her gain expertise in understanding patent data points.