Products / Information Extraction

Information Extraction

Information ExtractionThe amount of digital text is huge and expanding at a rapid pace. With the growth of Web 2.0, online communities are contributing to content on the web thereby causing an explosion of online media. Who are the consumers of this content? How can they benefit from it? Obviously, a normal web surfer would consume such information to conduct a daily task or for infotainment purposes. Whereas, businesses can build huge amount of intelligence from textual data, both from WWW or from intranet. Information Extraction (or IE) technologies help businesses exactly to do that.

IE technologies are designed to make sense out of huge amounts of content, usually all of which is relevant to the organization. So what is Information Extraction? Information Extraction is the process of identifying relevant information where the criteria for relevance are predefined by the user in the form of a template that is to be filled. Typically, the template pertains to events or situations, and contains slots that denote who did what to whom, when, and where, and possibly why or what event has occurred and what are the details of the event. These templates are usually decided depending on the use-cases as to what the user is looking for. Once the templates are defined, the template filler has to predict what data will be of interest to the user and define its slots and selection criteria accordingly. If successful, IE delivers the template, filled with the appropriate values, as found in the text(s).

TOI

For example, you may want to keep track of all the news about your business area and automatically alert the right people in your organization. If you are a stock trading firm you may be interesting in specific events like mergers, acquisitions, product launches etc. If you are a pharmaceutical company you may be interested in all the new drug launches, IPR issues, legal battles in your space etc. Such use-cases can be found in every industry these days, since most of them are knowledge driven. IE engines could help solve such use-cases by providing structured information from large amounts of periodic data, such as news or blog feeds.

Simply put, an IE engine helps you to generate structured, unambiguous information which is ready for machine processing, from unstructured, ambiguous text written in human language. SETU's IE engine does exactly this task of automatic template filling and automatic concept and event identification and the associated details that can be found in the given text. Once such templates are filled at a very large scale (millions of concepts), one can say the IE engine enables Web 3.0.

SETU's IE Engine is currently capable of tagging millions of concepts and make their meta-data (Ontological information) available for use in further system building. Such ontological information can be useful to link the tagged data into your organizational ontology and hence our engine fits very well with any Web 3.0 initiatives in your organization. Our IE engine is capable of doing all this deep processing of text in a fraction of a second for an average article (Note: If you have specific performance requirements, please talk to us. We may have the right solution for you). What's more? we tag thousands of classes of entities (out-of-vocabulary words) and also handle co-references and entity spelling variations in text. If you want to further check out our product, go for the demo.