Automated analysis of text - Isis Project No 3025
A new linguistic tool efficiently identifies the grammatical context and meaning of words, which dramatically improves automated language analysis.
MARKETING OPPORTUNITY
The use of computers to “understand” human languages lies at the heart of the field of computational linguistics. Just as computers can perform mathematical calculations faster and more accurately than humans, interpreting and extracting information from vast amounts of text (e.g. worldwide web, corporate databases) would be expedited by intelligent computational analysis. Language processing tools have an enormous range of applications including translation, spam filtering, document comparison (e.g. patent prior-art searches), marketing (opinion mining), next generation search engines (beyond keyword search), customer relationship management (automated handling of customer queries), and more.
A key first step in “understanding” any text is to assign grammatical labels to each word, a task known as part-of-speech (POS) tagging. Once the text has been tagged, other tools can be used to process the sequence of tags and determine the grammatical structure. Tagging accuracy is critical to the success of components further down the language processing pipeline. Taggers can also be used to assign semantic labels to words, such as whether the word is part of a sequence denoting a “named entity” such as a person, location or organisation.
THE OXFORD INVENTION
The Oxford researchers have developed an accurate and highly efficient tagger, using state-of-the-art statistical techniques. The tagger works by taking a large collection of manually tagged text, and learning the contexts in which particular tags occur. The tagger can then generalise and assign labels to words in sentences it has never seen before. The labels can indicate grammatical type, such as whether the word is a particular kind of verb or noun, or more “meaning related” information such as whether the word is referring to a person or location.
Using the Oxford tagger, in conjunction with other downstream language processing tools, spectacular increases in speed and accuracy have been demonstrated. For example, the Oxford language processing tools have successfully analysed one billion words of text in less than five days using only 18 processors, representing an order of magnitude improvement on existing processing tools. The Oxford tagger opens the door for information processing on an unprecedented scale.
COMMERCIAL OPPORTUNITY
Isis would like to talk to companies interested in developing the commercial opportunity that this technology represents. Please contact the Isis Project Manager to discuss this further.
Request Further Information: Project Number 3025 Automated Analysis of Text

