1
I Use This!
Inactive
Analyzed about 7 hours ago. based on code collected about 7 hours ago.

Project Summary

This site offers a set of Bash scripts and Windows executables add-ins that, together, create a basic translation chain prototype able of processing very large corpora. It uses Moses, a widely known statistical machine translation system.

The idea is to help build a translation chain for the real world, but it should also enable a quick evaluation of Moses for actual translation work and guide users in their first steps of using Moses. The scripts cover the installation, the creation of representative test files, the training, the translation, the scoring and the transfer of trainings between persons or between several Moses installations.

A Help/Short Tutorial (http://moses-for-mere-mortals.googlecode.com/files/Help.odt) and a demonstration corpus (too small for doing justice to the qualitative results that can be achieved with Moses, but able of giving a realistic view of the relative duration of the steps involved) are available.

Two Windows add-ins allow the creation of Moses input files from *.TMX translation memories (Extract_TMX_Corpus.exe), as well as the creation of *.TMX files from Moses output files (Moses2TMX.exe). A synergy between machine translation and translation memories is therefore created.

The scripts were tested in Ubuntu 9.04 (64-bit version). Documents used for corpora training should be perfectly aligned and saved in UTF-8 character encoding. Documents to be translated should also be in UTF-8 format. One would expect the users of these scripts, perhaps after having tried the provided demonstration corpus, to immediately use and get results with the real corpora they are interested in.

Though already tested and used in actual work, this should be considered a work in progress. So as to protect the users not yet completely acquainted with Moses, these scripts try to avoid mistakes that would cost them dearly in terms of time and/or results, but do not completely insulate them (especially from the consequences of malformed corpora files).

Tags

bash corpora irstlm mgiza moses mt nlp python randlm scripts smt tmx

In a Nutshell, moses-for-mere-mortals...

Quick Reference

GNU General Public License v3.0 or later
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Use Patent Claims

Forbidden

Sub-License

Hold Liable

Required

Distribute Original

Disclose Source

Include Copyright

State Changes

Include License

Include Install Instructions

These details are provided for information only. No information here is legal advice and should not be used as such.

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record
Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities
Few reported vulnerabilities

Did You Know...

  • ...
    there are over 3,000 projects on the Open Hub with security vulnerabilities reported against them
  • ...
    check out hot projects on the Open Hub
  • ...
    use of OSS increased in 65% of companies in 2016
  • ...
    you can embed statistics from Open Hub on your site
About Project Security

Languages

Perl
51%
shell script
38%
7 Other
11%

30 Day Summary

Aug 21 2025 — Sep 20 2025

12 Month Summary

Sep 20 2024 — Sep 20 2025

Ratings

Be the first to rate this project
Click to add your rating
  
Review this Project!