Text processing
overview
Intelligent text processing is a major interest for Nature Language
Process. Our research focuses on developing new technologies to
understand, recover, and analyse documents in both printed form
, speech form and electronic ink. We are working on machine-learning
methods for processing and natural language, including methods for
question answering and grammar checking. Currently we are concentrating
on Asian language processing, information
retrieval and question answering.
Special emphasis is given to Chinese language related projects.
Our research focuses on the development of more advanced and intelligent
computer systems through the exploitation of statistical methods
in machine learning and computer vision.
Statistical models and machine learning for information retrieval
We seek to develop and refine models to guide the design of information
retrieval systems and implement and evaluate information retrieval
functions. Our major focus is on statistical models and experimentation.
The models include basic probabilistic models for retrieval, models
that address characteristics of language or of users, and statistical
machine learning models. We make extensive use of large experimental
data sets for this work; experiments may involve for example topic-specific
learning methods such as relevance feedback, or cross-topic learning.
Text Classification and Text Clustering for Knowledge Management
We have created technology that helps site administrators build
and maintain category hierarchies for documents. The text-classification
component of the system automatically assigns or suggests category
labels to new (unlabeled) documents, based on the word content of
the documents. The text-clustering component suggests a hierarchically
organized set of categories when no such structure exists. Applications
of text classification include junk-mail detection, auto-classification
of email into folders, and auto-classification of urls into favorites.
These techniques have been integerated into our Chinese Text Classification
Software.
Interactive Information Tracking
We recognize the importance of the user's ability to discover ideas
expressed through concepts and relationships among concepts in the
text and multi-media documents. We are interested in designing,
implementing, and evaluating techniques that will facilitate analysis
and assimilation of information by the user. We intend to combine
advanced linguistic analyses and knowledge resources with statistical
and learning methods to create prototype tools.
Text Summarization
Text summarization is one of the key technologies to the information
overload problem. With the explosion of the WWW, how to manage
the unstructured information is a big challenge. The goal of text
summarization is to take an information source as input, extract
content from it, and present the most important content to the user
in a condensed form and in a manner sensitive to the user's or application's
needs. more...
Opinion Analysis and Mining
Patent Mining
Question-Answer System
Cross-Language Information Tracking
Spoken Document Retrieval
Text Filtering
Web Information Retrival
Image Retrival
|