
Machine
Translation
The amount of multilingual
information that needs to be available today in parallel form is far too overwhelming
for the human translation industry to manage.
But both Machine Translation and Human Translation may be required, depending
on the type of translation needed and the ultimate use of the translated text.
1. What
is Machine Translation?
Machine Translation is the
use of computer software to translate text from one natural language into another.
This definition accounts for the grammatical structure of each language and
uses rules and assumptions to transfer the grammatical structure of the source
language (text to be translated) into the target language (translated text).
Translation is anything
but simple. It's not a mere substitution for each word, but being able to know
'all of the words' in a given sentence or phrase and how one may influence the
other. Human languages consist of morphology (the way words are built up from
small meaning-bearing units), syntax (sentence structure), semantics (meaning),
and countless ambiguities.
2. What is Human Translation?
Individuals translate text
from one natural language into another in order to provide Human Translation
Services. These translators must have extensive knowledge of both the source
(text intended for translation) and target (text of translation) languages as
well as a profound understanding of the specific subject matter they translate.
Many times, it is more difficult to find humans that maintain both the comprehensive
knowledge of the languages required for the translation and expert-level insight
of a specific domain.
3.
How do Machine Translation and Human Translation compare?
Capacity and Speed
A human translator can translate about 2,000 to 3,000 words in a day. SYSTRAN's
MT system translates 3,700 words per minute.
Cost Savings
Human translation costs anywhere from US $ 20 cents to US $ 60 cents per word.
The cost of a high-end MT solution pays for itself within the first year of
use.
Return on Investment
It's not possible to re-use human translated texts.
However, a powerful MT system, such as SYSTRAN's, allows for the re-use and
storage of previously translated documentation, in whole or in part. Corporations
that purchase SYSTRAN's technology capitalize on their investment.
Productivity
The most efficient way to translate and handle multilingual documents that need
to be 100% accurate is to run the document through the MT system and then have
a human translator do the post-editing. Documents that require 100% accuracy
include legal materials, user manuals, marketing collateral, and complex reports.
Statistics prove that this method provides corporations with savings in cost
and quicker turnaround time per project.
Accuracy
The accuracy of a text is determined by its author. They understand the end-use
of the translation and set the expectation level. But the accuracy of every
machine-translated text - at least while using SYSTRAN - can be improved.
First of all, there are certain guidelines that help users write in preparation for MT. For example, MT systems expect all documents to include correct spelling and proper punctuation.
Depending on your product's robustness, users can now create their own proprietary dictionaries to incorporate their own terminology. Since these dictionaries can be incorporated into the translation process, the results are higher translation quality.
Customization
Accuracy is a key component in the translation process and it requires customization.
SYSTRAN gets involved in the entire documentation and communications process
with customers. We help to structure technologies, styles, improve readability,
recycle well-translated material, and consequently, we can anticipate future
translation issues. In other words, we consider and allow for the maintenance
of multilingual corporate "information", databases, services, websites,
and documentation.
4.
Research and Case Studies
SYSTRAN is consistently researching new processes and ideas and studying the ways that customers use the system. These published papers show some of the research and work that has been done.
For further information, please contact info@omegafirst.co.uk
Overview
The general framework which SYSTRAN utilizes in all its Machine Translation
(MT) systems is proven to be powerful and effective. In its long history, many
improvements have been made to the original design, resulting in great modularity.
Use of existing modules, as well as consistent use of similar methods across different languages, when applicable, will allow quick and efficient development of a functional prototype system for any new language pair.
SYSTRAN's architecture is also very flexible and allows introduction of innovative methods. In fact, with every new language added to the SYSTRAN inventory some new techniques have been tried in response to new challenges of that language. Often such innovations are later found to be also applicable to other language pair systems.
Methodology
SYSTRAN's methodology is a sentence by sentence approach, concentrating on individual
words and their dictionary data, then on the parse of the sentence unit, followed
by the translation of the parsed sentence.
Modularity
Three major groups describe the SYSTRAN architecture: Dictionary, Systems Software
and Linguistic Software. Each of these consists of a great number of modules
which all work together to create a fully automatic MT system.
5. Dictionary
SYSTRAN traditionally employs three distinct, but interconnected types of dictionaries for the MT systems of all languages.
Stem Dictionary
The basic dictionary is a single-word Stem Dictionary. Words are
entered in a basic form with codes to indicate inflectional patterns, part-of-speech,
syntactic behavior, semantic properties, and target language meanings together
with codes needed for the target word generation. Homographic forms with part-of-speech
ambiguity are entered separately for each part-of-speech, cross-referenced to
the basic entries and indexed by type of part-of-speech ambiguity. The source
language related portion of the Stem dictionary is complemented by transfer
and target information for each word into several target languages.
Expression Dictionary
This is the dictionary of multiple-word expressions. These expressions include
co-occurrence-based and rule-based expressions, and may range from simple noun
phrases, to expressions containing translation rules based on the syntactic
or semantic link between individual words, or entire classes of words. Words
in the Expression dictionary are given in their basic form, and
indexing to the Stem Dictionary allows execution of the rule for all inflected
forms or alternate spellings as recognized in the Stem dictionary.
Customer Specific Dictionary
(CSD)
A PC/Windows based CSD allows the user to enter terms (words and a set of pre-defined
types of expressions) which were not found in the main dictionaries. The user
may also globally or conditionally change meanings found in the main dictionaries.
The CSD is designed for the individual or industrial user with limited needs.
System Software
A body of systems software, consistent across the various SYSTRAN language pairs,
handles formatting, character conversion, user interface, sentence and word
boundary determination, dictionary and morphology lookup, and not-found word
treatment. It controls the flow of linguistic modules and creates final formatted
output. Also supported are a variety of tools for dictionary preparation, quality
assurance, corpus manipulation, and parsing diagnostics.
6. Linguistic Software
Parser
The most challenging aspect of any MT system is the parser, the module that
analyzes each sentence and attempts to build up representations of the source
sentences. SYSTRAN parses with a battery of procedural modules which resolve,
step by step, various syntactic and semantic relationships and assign structure
within the sentence. The SYSTRAN parser is deterministic in nature, so each
module makes firm decisions and passes the results on to the next module. The
advantage is that every sentence, even an incomplete or malformed one, will
be parsed and therefore translated. The disadvantage of such determinism, is
that incorrect decisions may be passed on and compounded from module to module.
SYSTRAN is able to soften this by several mechanisms that flag uncertain decisions.
SYSTRAN's final step in this checking process is a Filter program which identifies
the major parse errors.
Target Language Translation
Modules
After a parse of the input sentence has been constructed, algorithms for the
construction of a translation are invoked. Translation information, on both
the word and expression levels, is derived during dictionary lookup and the
parsing phases of the translation, for use by two distinct Transfer and Synthesis
modules. The Transfer component performs situation-specific restructuring, depending
on the degree of difference between source and target languages. It is the only
module, besides the dictionary, which relates to both source and target language,
and it is rather small when the two languages are closely related.
Synthesis Module
Following this, the Synthesis module generates the target language strings which
correspond to the information provided by all previous modules. Synthesis is
a source-language independent module. The Synthesis modules contain sophisticated
algorithms for creating specialized target language constructs, such as negation,
questions, verbs with complete morphology, placement of adverbs, and articles
etc.
7. Development of Additional Languages
Development of new language pair translation capability between languages for
which SYSTRAN already has source and target modules, is the easiest to accomplish.
Only a new transfer module and the transfer/target dictionaries need to be created.
Development of additional target language capability for each source system is possible and quite economical because SYSTRAN systems are set up as Multi-target systems. Adding another target language would necessitate only the development of a new Transfer module and a new Synthesis module, as well as building up the Transfer / Target dictionaries.
Development of additional source language capability for each target system is more difficult, if a completely new parser has to be created. However, if the new source language is closely related to one of the existing SYSTRAN source languages, development of a new parser can take advantage of common rules within a language family via the use of existing Trunk Parsers, (such as Romance Trunk, Slavic Trunk, etc).