1
I Use This!
Inactive
Analyzed about 6 hours ago. based on code collected about 12 hours ago.

Project Summary

This site offers a set of Bash scripts and Windows executables add-ins that, together, create a basic translation chain prototype able of processing very large corpora. It uses Moses, a widely known statistical machine translation system.

The idea is to help build a translation chain for the real world, but it should also enable a quick evaluation of Moses for actual translation work and guide users in their first steps of using Moses. The scripts cover the installation, the creation of representative test files, the training, the translation, the scoring and the transfer of trainings between persons or between several Moses installations.

A Help/Short Tutorial (http://moses-for-mere-mortals.googlecode.com/files/Help.odt) and a demonstration corpus (too small for doing justice to the qualitative results that can be achieved with Moses, but able of giving a realistic view of the relative duration of the steps involved) are available.

Two Windows add-ins allow the creation of Moses input files from *.TMX translation memories (Extract_TMX_Corpus.exe), as well as the creation of *.TMX files from Moses output files (Moses2TMX.exe). A synergy between machine translation and translation memories is therefore created.

The scripts were tested in Ubuntu 9.04 (64-bit version). Documents used for corpora training should be perfectly aligned and saved in UTF-8 character encoding. Documents to be translated should also be in UTF-8 format. One would expect the users of these scripts, perhaps after having tried the provided demonstration corpus, to immediately use and get results with the real corpora they are interested in.

Though already tested and used in actual work, this should be considered a work in progress. So as to protect the users not yet completely acquainted with Moses, these scripts try to avoid mistakes that would cost them dearly in terms of time and/or results, but do not completely insulate them (especially from the consequences of malformed corpora files).

Tags

bash corpora irstlm mgiza moses mt nlp python randlm scripts smt tmx

In a Nutshell, moses-for-mere-mortals...

Quick Reference

GNU General Public License v3.0 or later
Permitted

Commercial Use

Modify

Distribute

Place Warranty

Use Patent Claims

Forbidden

Sub-License

Hold Liable

Required

Distribute Original

Disclose Source

Include Copyright

State Changes

Include License

Include Install Instructions

These details are provided for information only. No information here is legal advice and should not be used as such.

Project Security

Vulnerabilities per Version ( last 10 releases )

There are no reported vulnerabilities

Project Vulnerability Report

Security Confidence Index

Poor security track-record
Favorable security track-record

Vulnerability Exposure Index

Many reported vulnerabilities
Few reported vulnerabilities

Did You Know...

  • ...
    65% of companies leverage OSS to speed application development in 2016
  • ...
    search using multiple tags to find exactly what you need
  • ...
    nearly 1 in 3 companies have no process for identifying, tracking, or remediating known open source vulnerabilities
  • ...
    you can embed statistics from Open Hub on your site
About Project Security

Languages

Perl
51%
shell script
38%
7 Other
11%

30 Day Summary

Mar 23 2024 — Apr 22 2024

12 Month Summary

Apr 22 2023 — Apr 22 2024

Ratings

Be the first to rate this project
Click to add your rating
  
Review this Project!