Our OneCalais solutions use Natural Language Processing (NLP), text analytics and data mining technologies to derive meaning from unstructured information, including news articles, blog posts, research reports and more.
Here's how it works:
OneCalais categorizes each piece of content using both IPTC news codes and ‘social tags.' (For instance, if a story compares the racing performance of Ferraris vs. Porsches, it will suggest auto racing, motorsport and sports cars.)
It then identifies and tags the people, places, companies, facts and events in content, and returns those tags in the official W3C Semantic Web specification for metadata, Resource Description Framework (RDF).
It also returns a unique document identifier that makes it easy to share content with others, as well as links to related assets in the Linking Open Data (LOD) cloud - a rapidly growing ecosystem of open data that includes Wikipedia, The CIA World Fact Book, GeoNames, BBC News, The New York Times and more.
The latter aspect of OneCalais makes Thomson Reuters one of the first media companies to publish a set of data assets for public use, providing developers with open access to information on publicly traded companies, including company descriptions, stock tickers, management teams and more.