Text processing

overview

Intelligent text processing is a major interest for Nature Language Process. Our research focuses on developing new technologies to understand, recover, and analyse documents in both printed form , speech form and electronic ink. We are working on machine-learning methods for processing and natural language, including methods for question answering and grammar checking. Currently we are concentrating on Asian language processing, information retrieval and question answering. Special emphasis is given to Chinese language related projects.

Our research focuses on the development of more advanced and intelligent computer systems through the exploitation of statistical methods in machine learning and computer vision.

Statistical models and machine learning for information retrieval

We seek to develop and refine models to guide the design of information retrieval systems and implement and evaluate information retrieval functions. Our major focus is on statistical models and experimentation. The models include basic probabilistic models for retrieval, models that address characteristics of language or of users, and statistical machine learning models. We make extensive use of large experimental data sets for this work; experiments may involve for example topic-specific learning methods such as relevance feedback, or cross-topic learning.

Text Classification and Text Clustering for Knowledge Management

We have created technology that helps site administrators build and maintain category hierarchies for documents. The text-classification component of the system automatically assigns or suggests category labels to new (unlabeled) documents, based on the word content of the documents. The text-clustering component suggests a hierarchically organized set of categories when no such structure exists. Applications of text classification include junk-mail detection, auto-classification of email into folders, and auto-classification of urls into favorites. These techniques have been integerated into our Chinese Text Classification Software.

Interactive Information Tracking

We recognize the importance of the user's ability to discover ideas expressed through concepts and relationships among concepts in the text and multi-media documents. We are interested in designing, implementing, and evaluating techniques that will facilitate analysis and assimilation of information by the user. We intend to combine advanced linguistic analyses and knowledge resources with statistical and learning methods to create prototype tools.

Text Summarization

Text summarization is one of the key technologies to the information overload problem. With the explosion of the WWW, how to manage the unstructured information is a big challenge. The goal of text summarization is to take an information source as input, extract content from it, and present the most important content to the user in a condensed form and in a manner sensitive to the user's or application's needs. more...

Opinion Analysis and Mining

Patent Mining

Question-Answer System

Cross-Language Information Tracking

Spoken Document Retrieval

Text Filtering

Web Information Retrival

Image Retrival


Jump to top



 
Please send your question and comment to <nlplab@ics.neu.edu.cn>.
Last Update: 29 March, 2008