• blog image0 Comments
  • /
  • last updated : 31 August, 2022

Noise Filtering

Category: Blog
326

Abstract

Analyzing patents related to a technology domain is not as easy as it seems. The standard method for analyzing a technology domain is to formulate multiple search queries to extract patent data sets and then filter the patents manually. The filtration of the result set to remove noise is very critical to ensure accurate analysis. This requires a lot of manual effort and consumes a significant amount of time.

With advancements in NLP and Machine Learning, the task of manual analysis of patents can be automated. XLSCOUT has developed an algorithm that can remove noise from result sets based on the learning from previous data.

Introduction

Patents hold an abundance of information related to the advancements in a technology domain and a company’s strategy. Business strategies are decided based on this information so it is very important to gather and extract the relevant information. The source for this information is in patents so extracting the correct set of patents related to the technology is of paramount importance.

Problem

Manually reading and extracting relevant patents related to a technology domain is time-consuming and requires a lot of manual grunt work. Different researchers reading the patents can have a different understanding of the technology concepts, leading to a noisy output.

Solution: XLSCOUT Noise Filtering Algorithm

At XLSCOUT we have experience in patent research and we have combined that with our technical expertise in NLP and Machine learning to develop a noise filtering algorithm that can learn from previous data and remove noise from new data sets.

With this algorithm, we can train a model for a technology domain using previous (historic) data sets of relevant and noise patents in a domain. The algorithm learns from this data and once it is ready, it can be used to predict relevant patents from future data-sets related to the same technology domain and reduce noise in the data set to the minimum.

Technology

For developing the noise filtering algorithm we have used BERT (as the NLP model). We have fine-tuned the standard BERT model by feeding patent data so that the model can understand the concepts and the semantics in patents. The trained BERT model is then used to transform the patent text into vector representation which can be understood by the machine.

The second part for developing the algorithm is training a Machine Learning model. The model is trained by feeding a labeled data set related to a domain. A labeled data-set corresponds to a set of patent documents that are labeled as relevant and not-relevant (noise) for a particular domain. The model creates associations between patent documents of the relevant set and between the patent documents of the not-relevant set. This helps the model learn and identify the important concepts in relevant patents and non-relevant patents to distinguish between the two.

Approach

The setup of the noise filtering algorithm has the following steps:

Data Collection
Labeled data-set related to a technology domain is curated

Data Structuring
The data-set is split into two parts, training and test data (usually in the ratio of 80:20). The labels are removed from the test data for output validation

Training of Machine Learning model
The labeled data-set is first transformed into vector representation and then fed to the Machine Learning model for training

Output Validation and Feedback
Once the model is trained, it is then used to predict relevant patents from the test data. The false predictions are fed back to the model in the form of feedback to learn again and optimize the understanding. Multiple iterations are done to ensure that the output is correct and that the model can capture all the relevant patents from the data set

The most important parameter to consider is the accuracy of the algorithm. 100% accuracy implies that all noise patents are removed and only relevant patents are extracted. No algorithm can be 100% accurate but with our approach of the feedback mechanism, we can assure that no relevant patents are excluded from the final predicted data set. At the same time, there will be some noise patents (significantly less) in the final data set.
We have successfully set up this algorithm for multiple clients and they have validated that the noise is reduced by 85-90%.

Use Cases

Precise Technology Tracking
Algorithm assists in extracting relevant patents to precisely understand the technology domain
Precise Competitor Tracking
Competitors patents can be extracted and precisely segmented according to the technology sector to remove noise and get accurate insights into competitor strategies
Accurate Landscape Insights
Extracting relevant results and reducing noise further helps to gain accurate insights from the landscape searches

Conclusion

Manual analysis is a thing of the past and developing trustworthy applications using NLP and Machine Learning is the need of the hour. Automated solutions can assist the manual research teams and make their life much easier.
With our experience in this field and the feedback that we have received, we have seen that the XLSCOUT Noise Filtering algorithm saves a lot of time and manual effort. The saved time can then be utilized to innovate and improve the technology.

 

To know more, get in touch with us. ( Fix a meeting )