NiuTrans: A Statistical Machine Translation System – Version 0.3.0

Introduction
NiuTrans is an open-source statistical machine translation system developed by the Natural Language Processing Group at Northeastern University, China. The NiuTrans system is fully developed in C++ language. So it runs fast and uses less memory. Currently it has already supported phrase-based, hierarchical phrase-based and syntax-based tree-to-string models for research-oriented studies.


Features
1. Written in
C++. So it runs fast.
2.
Multi-thread supported
3. Easy-to-use APIs for
feature engineering
4. Competitive performance for
Chinese-Foreign translation tasks
5. A compact but efficient
n-gram language model is embedded. It does not need external support from other softwares (such as SRILM)
6. Supports multiple SMT models
   a)
Phrase-based model
   b)
Hierarchical phrase-based model
   c)
Syntax-based models (coming soon)


Download
The system is open-source and available under the GNU General Public License (See more information about GPL here).
To download its source code and the sample data, please click here
.


Requirements
For Windows users, Visual Studio 2008, Cygwin, and perl (version 5.10.0 or higher) are required. It is suggested to install cygwin under path "C:\" by default.

For Linux users, gcc (version 4.1.2 or higher), g++ (version 4.1.2 or higher), GNU Make (version 3.81 or higher) and perl (version 5.8.8 or higher) are required.

NOTE: 2GB memory and 10GB disc space is a minimal requirement for running the system. Of course, more memory and disc space is helpful if the system is trained using large-scale corpus. To support large data/model (such as n-gram LM), 64bit OS is recommended.


Installation
Please unpack the downloaded package (surppose that the target directory is "NiuTrans") and follow the following instructions to install the system.

For Windows users,
   - open "NiuTrans.sln" in "NiuTrans\src\"
   - set configuration mode to "Release"
   - set platform mode to "Win32" (for 32bit OS) or "x64" (for 64bit OS)
   - build the whole solution
 You will then find that all binaries are generated in "NiuTrans\bin\".

For Linux users,
   - cd NiuTrans/src/
   - chmod a+x install.sh
   - ./install.sh -m32 (for 32bit OS) or ./install.sh (for 64bit OS)
   - source ~/.bashrc
 You will then find that all binaries are generated in "NiuTrans/bin/".


Step-by-Step Usage

  • NiuTrans.Phrase: A phrase-based SMT system. Basically it can be regarded as an instance of the general framework of phrasse-based translation. Two reordering models are involved, including Maximum Entropy-based reordering model and MSD lexicalized reordering model.
    Click here to learn how to use NiuTrans.Phrase!
    If you still use version 0.1.0 or version 0.2.0, please click here to see the old usage page!

  • NiuTrans.Hierarchy: A hierarchical phrase-based SMT system which adopts Sychronous Context-Free Grammars (SCFGs) for both grammar induction and decoding.
    Click here to learn how to use NiuTrans.Hierarchy!

  • NiuTrans.Syntax (coming soon): A syntax-based SMT system which uses syntactic information on both(either) source-language side and (or) target-language side.


Manual
The package also offers a manual to describe more details about the system, as well as various tricks to build better MT engines using NiuTrans (The current version is only for the phrase-based engine. The full documentation for all the engines will be released soon). Click here to download the manual in pdf.

Advanced Usage
A more detailed description of various settings can be found here. In general, the BLEU score can be further improved by using those advanced features. Hope it helps!


Team Member
Tong Xiao (Co-PI)
Jingbo Zhu (Co-PI)
Hao Zhang
Qiang Li
Rushan Chen
Shujie Yao
Muhua Zhu
Feiliang Ren
Ji Ma


How To Cite NiuTrans
If you use NiuTrans in your research and would like to acknowledge this project, please cite the following paper
Tong Xiao, Jingbo Zhu, Hao Zhang and Qiang Li. 2012. NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation. In Proc. of ACL, demonstration session (to appear).


Get Support
For any questions about NiuTrans, please e-mail to us (niutrans@mail.neu.edu.cn) directly.


History
NiuTrans version 0.3.0 - April 27, 2012 (hierarchical phrase-based model is supported)
NiuTrans version 0.2.0 - October 29, 2011 (bug-fixing, 32bit OS supported)
NiuTrans version 0.1.0 - July 5, 2011 (first version)


Acknowledgements
This project is supported in part by the National Science Foundation of China (60873091; 61073140), Specialized Research Fund for the Doctoral Program of Higher Education (20100042110031), and the Fundamental Research Funds for the Central Universities.

 

Locations of visitors to this page