• /
  • last updated : 09 August, 2021

How Xlscout’s proprietary corpus is leading the way to Explainable AI in R&D?

Category: Articles
three-dimensional-network-topology-infographics-with-ip-addresses-3d-illustration (1)


To make AI more explainable, XLSCOUT came up with a unique approach corpus of technical concepts is created based on more than 3 billion words and 100GB of pre-processed data. The corpus has been developed on a Machine Learning model.


IP professionals constantly face the challenge of finding related keywords/semantics for a particular technical word. The majority of research documents published worldwide are written using different terminologies based on the origin country and the subjectivity of the writer. This presents multiple term variations used globally for a single technical word. The swiftly updating technology also introduces new jargon of words that were previously unknown worldwide.


Online dictionaries as of now, do not cater to the technical terms and are mostly based on routine English words. This makes the job of locating the semantics of technical words a time-consuming and arduous task. 


XLSCOUT-CORPUS solves this global problem and is developed on a data-set comprising of:

  1. Research Publication Data
  2. Global Patent Data
  3. Examiner Datasets

and concurrently training the machine learning model with researchers’ inputs from different technological backgrounds like electronics, mechanical, computer sciences, biotech, and more.


XLSCOUT Corpus is a large lexical database of the technology. Nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms, each expressing a distinct concept. Cognitive synonyms are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be retrieved using the XLSCOUT corpus weblink. XLSCOUT Corpus structure makes it a useful tool for computational linguistics and Natural Language Processing.

XLSCOUT Corpus superficially resembles a thesaurus, in that it groups words together based on their meanings. However, there are a few important distinctions. 

First, XLSCOUT Corpus interlinks not just word forms—strings of letters—but specific senses of words. As a result, words that are found in close proximity to one another in the network are semantically disambiguated.

Second, XLSCOUT Corpus labels the semantic relations among words, whereas the groupings of words in a thesaurus do not follow any explicit pattern other than meaning similarity.

Custom Training Option:

XLPAT Corpus is trained on bulk technology data (generic technology Data) without any reference to a particular technology. When the system predicts synonyms, it predicts all possible synonyms and relations that customers might find as overwhelming information.

To make it more focused and precise XLSCOUT Corpus provides an option of custom training the ML models by providing customer interest technology bias. This helps in verticalizing the learning of ML models with respect to specific technologies of interest. In turn, the system gives more focused synonyms with accurate inter-relations.

For Example:

Use Cases:

Explainable Taxonomy (Corpus Assisted)
Corpus assists in creating comprehensive taxonomy for technology breakdown into clusters.

Explainable Categorization
Rule-based Categorization backed by corpus with a possibility of training on expert validated data.

Context Capturing in Novelty & Invalidation Searches
Better semantic variations capturing to perform better prior art searches.