Installing apertium on Debian
Installing apertium and lttoolbox
It is important to
install both apertium and lttoolbox packages.
root@feynman:~# apt-get install apertium lttoolbox
Select and install language packs
Searching and
installing language pairs may be done in the orthodox manner.
root@feynman:~# apt-cache search apertium language pair
apertium - Shallow-transfer machine translation engine
apertium-es-pt - Apertium language pair: Spanish<->Portuguese
apertium-fr-ca - Apertium language pair: French<->Catalan
apertium-es-ca - Apertium language pair: Spanish<->Catalan
root@feynman:~# apt-get install apertium-fr-ca
Using apertium
The language packs are installed into
"/usr/share/apertium-1.0/pairs/" Create a file with source
language text in,
spectre@feynman:~$ cat > /tmp/french
La traduction est le fait d'interpréter le sens d'un texte dans une langue (langue
source, ou langue de départ), et de produire un texte ayant un sens et un effet équivalents
sur un lecteur ayant une langue et une culture différentes (langue cible, ou langue d'arrivée).
Use "apertium-translator" to translate the text. The basic
format of the command is as follows (see further instructions using
apertium-translator --help
or
man
apertium-translator
):
apertium-translator <path to language pair data> <translation direction>
So for example, to translate our text from French to Catalan,
we can use the following command:
spectre@feynman:~$ apertium-translator /usr/share/apertium-1.0/pairs/fr-ca fr-ca < /tmp/french
La traducció és el fet d'interpretar el sentit d'un text en una llengua (llengua font,
o llengua de sortida), i de produir un text havent-hi un sentit i un efecte *équivalents
sobre un lector havent-hi una llengua i una cultura diferents (llengua *cible, o llengua
d'arribada).
Common problems
Unsupported locale
The following error message
indicates that an ISO-8859-1 compatible locale is not
installed.
spectre@feynman:~$ apertium-translator /usr/share/apertium-1.0/pairs/fr-ca fr-ca < /tmp/french
Warning: unsupported locale, fallback to "C"
Warning: unsupported locale, fallback to "C"
Currently apertium uses the ISO-8859-1 encoding. If your
debian installation is not configured to enable this encoding, you
can enable it by reconfiguring the "locales" package. e.g.
root@feynman:~$ dpkg-reconfigure locales
Enable an ISO-8859-1 locale, such as "es_ES ISO-8859-1". The
next menu will ask you to set your default locale. You can leave
this as it is. Press "ok", and your new locales will be generated.
Output will be like:
Generating locales (this might take a while)...
en_GB.ISO-8859-1... done
en_GB.UTF-8... done
es_ES.ISO-8859-1... done
Generation complete.
Nonsense characters
If nonsense characters appear in
the translation, e.g. "La traducció és el fet de
*interpr
é*ter ...", first check that the
file you are trying to translate is in the right encoding. You
can use "file" to check,
spectre@feynman:~$ file /tmp/french
/tmp/french: UTF-8 Unicode text
If a file is in UTF-8, nonsense characters can appear. In
order to change the encoding to ISO-8859-1, the GNU "iconv" program
can be used.
spectre@feynman:~$ iconv -f UTF-8 -t ISO_8859-1 /tmp/french > /tmp/french.txt
Run "file" again, to check:
spectre@feynman:~$ file /tmp/french.txt
/tmp/french.txt: ISO-8859 text
If nonsense characters still appear, check that your terminal
is set to the correct character encoding. In Gnome Terminal, this
is done by going to "Terminal" -> "Set character encoding". Make
sure this is set to an ISO-8859-1 compatible setting. In PuTTY you
can do this by going to "Change settings" -> "Translation" and
make sure that it is set to "ISO-8859-1".