Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct title and author name for 2024.wmt-1.89 #4077

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions data/xml/2024.wmt.xml
Original file line number Diff line number Diff line change
Expand Up @@ -1103,10 +1103,10 @@
<bibkey>de-gibert-etal-2024-hybrid</bibkey>
</paper>
<paper id="89">
<title>Robustness of Fine-Tuned <fixed-case>LLM</fixed-case>s for Machine Translation with Varying Noise Levels: Insights for <fixed-case>A</fixed-case>sturian, <fixed-case>A</fixed-case>ragonese and Aranese</title>
<title>Robustness of Fine-Tuned Models for Machine Translation with Varying Noise Levels: Insights for <fixed-case>A</fixed-case>sturian, <fixed-case>A</fixed-case>ragonese and Aranese</title>
<author><first>Martin</first><last>Bär</last><affiliation>University of the Basque Country</affiliation></author>
<author><first>Elisa</first><last>Forcada Rodríguez</last><affiliation>University of the Basque Country</affiliation></author>
<author><first>Maria</first><last>Garcia-Abadillo</last><affiliation>University of the Basque Country</affiliation></author>
<author><first>María</first><last>García-Abadillo Velasco</last><affiliation>University of the Basque Country</affiliation></author>
<pages>918-924</pages>
<abstract>We present the LCT-LAP proposal for the shared task on Translation into Low-Resource Languages of Spain at WMT24 within the constrained submission category. Our work harnesses encoder-decoder models pretrained on higher-resource Iberian languages to facilitate MT model training for Asturian, Aranese and Aragonese. Furthermore, we explore the robustness of these models when fine-tuned on datasets with varying levels of alignment noise. We fine-tuned a Spanish-Galician model using Asturian data filtered by BLEU score thresholds of 5, 15, 30 and 60, identifying BLEU 15 as the most effective. This threshold was then applied to the Aranese and Aragonese datasets. Our findings indicate that filtering the corpora reduces computational costs and improves performance compared to using nearly raw data or data filtered with language identification. However, it still falls short of the performance achieved by the rule-based system Apertium in Aranese and Aragonese.</abstract>
<url hash="df247c4c">2024.wmt-1.89</url>
Expand Down