• blog image0 Comments
  • /
  • last updated : 17 November, 2022

Noise Filtering

Category: Blog
Noise Filtering


Analyzing patents related to a technology domain is not as easy as it seems. Generally, the standard method for analyzing a technology domain is to formulate multiple search queries to extract patent data sets and then filter the patents manually. The filtration of the result set to remove noise is very critical to ensure accurate analysis. Moreover, this requires a lot of manual effort and consumes a significant amount of time. With advancements in NLP and Machine Learning, the task of manual analysis of patents can be automated. XLSCOUT has developed a Noise Filtering algorithm that can remove noise from result sets based on the learning from previous data.


Patents hold an abundance of information related to the advancements in a technology domain and a company’s strategy. Business strategies are decided based on this information so it is very important to gather and extract the relevant information. The source for this information is in patents so extracting the correct set of patents related to the technology is of paramount importance.


Manually reading and extracting relevant patents related to a technology domain is time-consuming and requires a lot of manual grunt work. Different researchers reading the patents can have a different understanding of the technology concepts, leading to a noisy output.

Solution: XLSCOUT Noise Filtering Algorithm

At XLSCOUT we have experience in patent research, and we have combined that with our technical expertise in NLP and Machine learning to develop a noise filtering algorithm that can learn from previous data and remove noise from new data sets.

With this algorithm, we can train a model for a technology domain using previous (historic) data sets of relevant and noise patents in a domain. The algorithm learns from this data and once it is ready, it can be used to predict relevant patents from future datasets related to the same technology domain and reduce noise in the data set to the minimum.


For developing the noise filtering algorithm, we have used BERT (as the NLP model). We have fine-tuned the standard BERT model by feeding patent data so that the model can understand the concepts and the semantics in patents. The trained BERT model is then used to transform the patent text into vector representation which can be understood by the machine.

The second part for developing the algorithm is training a Machine Learning model. The model is trained by feeding a labeled data set related to a domain. A labeled dataset corresponds to a set of patent documents that are labeled as relevant and not-relevant (noise) for a particular domain. The model creates associations between patent documents of the relevant set and between the patent documents of the not-relevant set. This allows the model to distinguish between relevant and non-relevant patents by learning and identifying the important concepts in each.


The setup of the noise filtering algorithm has the following steps:

Data Collection

Labeled dataset related to a technology domain is curated

Data Structuring

The dataset is split into two parts, training and test data (usually in the ratio of 80:20). The labels are removed from the test data for output validation

Training of Machine Learning model

The labeled dataset is first transformed into vector representation and then fed to the Machine Learning model for training

Output Validation and Feedback

Once the model is trained, it is then used to predict relevant patents from the test data. False predictions are fed back to the model as feedback, allowing it to learn and improve its understanding. Multiple iterations are done to ensure that the output is correct and that the model can capture all the relevant patents from the data set

The most important parameter to consider is the accuracy of the algorithm. 100% accuracy implies that all noise patents are removed, and only relevant patents are extracted. No algorithm can be completely accurate. However, by using our approach, we can ensure that no relevant patents are excluded from the final predicted data set. At the same time, there will be some noise patents (significantly less) in the final data set. We have set up this algorithm for multiple clients and they have validated that the noise is reduced by 85-90%.

Use Cases

Precise Technology Tracking
Algorithm assists in extracting relevant patents to precisely understand the technology domain
Precise Competitor Tracking
Competitors patents can be extracted and precisely segmented according to the technology sector to remove noise and get accurate insights into competitor strategies
Accurate Landscape Insights
Extracting relevant results and reducing noise further helps to gain accurate insights from the landscape searches


Manual analysis is a thing of the past and developing trustworthy applications using NLP and Machine Learning is the need of the hour. Automated solutions can assist the manual research teams and make their life much easier.
With our experience in this field and the feedback that we have received, we have seen that the XLSCOUT Noise Filtering algorithm saves a lot of time and manual effort. The saved time can then be utilized to innovate and improve the technology.


To know more, get in touch with us. ( Fix a meeting )