As long as it provides actionable insights, patent data analytics is valuable. Such insights can only be obtained if the analysis is correct, and high-quality patent data is a must for accurate patent analytics. One of the primary reasons for the poor quality of data is that it is compiled from a variety of sources. Data collected from national patent offices as well as international organizations that maintain patent data results in extensive patent data coverage. Variations or inconsistencies in some data fields result from data collection from multiple sources. The detection of data inconsistencies or errors is critical for accurate data analysis.
Patent data analytics revolves around some key fields such as patent ownership, legal status, and family coverage, to name a few. Incorrect or incomplete data can result in inaccurate data analysis. Data for the same fields are present in different formats across countries as all patent offices do not share a common rule for owner names, application numbering systems, or legal status titles. Also, some of the patent offices have missing data. Therefore, it is imperative to identify such data fields that are critical for patent data analytics but are also available with inconsistencies.
Patent data from multiple patent offices worldwide and other open sources is raw and inconsistent. Patent data analytics on such incorrect, inconsistent, or incomplete data is inaccurate and does not provide actionable insights.
XLSCOUT has developed algorithms for automated data enrichments and cleaning to ensure the availability of high-quality data for the users. Targeted algorithms have been developed for the following data fields and use cases:
- Patent applicant/assignee name normalization
- The original and corporate owner of the patent
- Extended patent family
- Normalizing legal status information
It is a common observation that applicant and assignee names are available in varied formats, such as with different entity types (e.g., Sumitomo Chemical Corporation, Sumitomo Chemical Co. Ltd.) or misspelled names (e.g., Fitbit, Inc., Freebit, Inc.). In addition to the normalization of such variations in the applicant and assignee information, XLSCOUT has also overcome the challenge of the unavailability of translated names. Owing to inconsistencies in assignee name information, for any analysis related to patent ownership data, the most important step is to be able to extract the relevant and accurate dataset. Many patent offices do not maintain translated applicant names and provide applicant names only in their native language. Such results often get missed out in the analysis as the majority of the users perform assignee/applicant search by giving only the English name as the input. XLSCOUT has developed a proprietary database maintaining correct translations of company names.
Another promising data field for patent analytics is the original or current owner of the patents, as these two fields are very critical for tracking down competitors in any technology. The current and original owners of the patents can be identified by tracking down all the ownership transactions during the term of the patent and/or historical name changes. The XLSCOUT algorithm processes all the legal transactions of a patent, as well as corporate name changes, to provide the current and original owner of the patent as enriched data fields.
Another important data point for patent analytics is family data, which indicates the geographical coverage of the invention. Mostly, the patent family member analysis is based on the extended INPADOC family. However, the extended INPADOC family has some drawbacks. Priority numbers and application numbers from different patent offices are in different formats. As a result, the numbers are not normalized, and the necessary link between priority and/or application numbers is not formed. This results in many patent numbers having missing family information or showing no family members at all. The XLSCOUT family overcomes this problem of missing links and provides accurate family information. We have normalized the number formats and created direct and indirect links to identify all the family members of the patents.
The legal status is another critical data field for patent analytics. Legal events in different countries are denoted by different titles. This implies that the legal events for a US patent and a CN patent can be the same. While the text defining the legal event differs and can be interpreted in various ways. This makes it very difficult to standardize the legal event data of patents, irrespective of the country. As two separate fields, XLSCOUT provides standardized legal status information. The first is the patent’s simple legal status, and the second is its actual legal state. These fields explain whether the patent is still alive or dead. E.g., if the simple legal status is dead, then the legal state could be expired, revoked, non-payment, etc.
High-quality data is the foundation of accurate data analytics. XLSCOUT realizes the importance of making informed business decisions from patent data analytics. And thus, has come up with verified accurate and enriched data for its customers. This will enable users to derive actionable insights from analytics.