Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rosetta and TMs (translation memory) #249

Open
bittner opened this issue Dec 10, 2020 · 6 comments
Open

Rosetta and TMs (translation memory) #249

bittner opened this issue Dec 10, 2020 · 6 comments

Comments

@bittner
Copy link

bittner commented Dec 10, 2020

Hi there!

We're using Rosetta 0.9.4 on Django 2.2.17, and all is good. Apart from skepticism of professional translators, of course. The main theme is, "The tool doesn't provide a TM, hence we can't use it."

I need some help to understand this topic better.

Note that my wife is a professional translator and project manager in the translation industry, so I am informed largely about the concepts of "traditional translation" of documents (e.g. SDL Trados, Across, OmegaT) but also about the approach emerged from the software development industry (e.g. Transifex, Crowdin), which I have hands-on experience with.

Where is Rosetta's TM?

From my understanding, Rosetta is more or less a nice front-end to manipulate .po files, extracted by Python's gettext module integrated in Django. There are no models, yet still Rosetta does "automatic translation", which is visible by fuzzy matches (which I assume is also a feature coming from gettext again, really).

So in essence, the .po files themselves are that TM already. There is no additional or separate component, but as the entire "document" is identical to all (successful) translations that have been done in the past, there is not even a need for a separate TM. It's all read into "Rosetta's memory" in its entirety. There is no disadvantage of having "no TM", given we only deal with our domain specific vocabulary.

Is this view correct?

External TMs?

A related question, after having clarified whether Rosetta has a TM or no, is there a way to

  • download Rosetta's TM and/or
  • attach (or upload) an external TM

to add, say, more flexibility to the translation process?

@mbi
Copy link
Owner

mbi commented Dec 10, 2020

Hi Peter!

From my understanding, Rosetta is more or less a nice front-end to manipulate .po files, extracted by Python's gettext module integrated in Django.

That's correct: Rosetta's main task is offering a user-friendly interface to interact (read/write) .po catalogs produced by Django's makemessages and compiling them into . mo files that Django then reads at runtime.

So in essence, the .po files themselves are that TM already.

Yes, but as every "project" only manages its own gettext catalog, if you manage different projects, then you cannot access translations you've already provided in other projects.

From my limited understanding of what a TM is, such a tool should maintain a database of all the corpus a translator has ever produced, so that when a new string needs to be translated, the tool will provide suggestions based on some possibly fuzzy match on the database.

So if that's what you're expecting, then no, Rosetta doesn't provide that kind of feature at the moment, because again: the only datasource is the po catalog itself.

There are no models, yet still Rosetta does "automatic translation"

This is provided through a series on interfaces to online translation services, such as Google Translate, Bing Translate, Yandex translate and such.

But a professional translator will probably frown upon these services and rather prefer their own TM corpus. 🤷🏼‍♂️

A related question, after having clarified whether Rosetta has a TM or no, is there a way to
download Rosetta's TM and/or

No, you can download the PO catalog for the current project, but that's it.

attach (or upload) an external TM.

Ah well, now: if any such thing exists and is well documented (if there is a catalog to upload and / or an API to query) then I don't see why that wouldn't be doable.

Hope this helps, further discussion and PRs welcome 😉

@mondeja
Copy link
Contributor

mondeja commented Dec 10, 2020

Rosetta is simply a Django app that processes pofiles and compiles their correspondent mofiles. Rosetta does not imposes any way of pre or post-process your files. You could emulate a translation memory using a pofile compendium. But keep in mind that this process of discovering new fuzzy matches is not managed by Rosetta, but by the scripts written by the developer of the project.

I understand the lack of use of Rosetta in the translation industry, because, for example, if you need to go back for translations removed from the files, these will not be found in a separate database.

If I'm not wrong, you are asking for a pofile compendium that could be added as another pofile of the project and a button (or whatever other system) that could discover new matches, then another button that could download pofiles in different formats. Is this correct?

@bittner
Copy link
Author

bittner commented Dec 11, 2020

Alright, so a PO file compendium, which is a concept of GNU gettext, corresponds to what other translation tools maintain as a TM?

you are asking for a pofile compendium that could be added as another pofile of the project and a button (or whatever other system) that could discover new matches, then another button that could download pofiles in different formats.

Exactly. Basically, I want to satisfy the expectations of translation agencies. They can

  1. attach a TM to (or create a TM with) a translation project
  2. download the TM created or updated by the translation project

According to the Transifex docs downloading a TMX is possible. I wouldn't be surprised if that was actually a PO file compendium converted to XML. (You need a paid plan to do this, for what I can see on Transifex.)

For what regards Rosetta, in theory, the simplest approach (as a concept) might be

  • to use the existing translations from all INSTALLED_APPS in a Django-based project
  • and combine them to a PO file compendium.

That compendium could then be used to allow for automatic pre-translation or assisted translation (suggestions). It would be an automatic, fully integrated TM that doesn't need any separate management effort by the user. Allowing to download a TMX could be an optional feature.

Would that be realistic?

@mbi
Copy link
Owner

mbi commented Dec 11, 2020

Having a compendium is only the first step. We'd also need an intelligent way of matching past translations from the compendium and produce fuzzy suggestions in the PO catalog being translated.

@bittner
Copy link
Author

bittner commented Dec 11, 2020

True. If we had that, though, we could address one side of the criticism already: "It doesn't have a TM" would cease to be true. And converting a PO file compendium to TMX seems to be a thing that is already addressed by free projects. – Just saying.

@mbi
Copy link
Owner

mbi commented Dec 11, 2020

The Translation Toolkit seems to be a very good candidate to manage, import, export and possibly search TMX documents in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants