A specialized fine-tuning of Google's Gemma 2B model to understand and generate culturally-appropriate idioms across 71 languages. This project enables cross-cultural communication by matching figurative meanings with appropriate idioms while preserving cultural context.
- Multilingual Support: Covers 71 languages from diverse cultural backgrounds
- Cultural Preservation: Maintains cultural context and nuances in translations
- Efficient Fine-tuning: Uses LoRA to reduce training parameters by 99.9%
- High Accuracy: Achieves 79.01% accuracy in idiom matching
Our custom dataset includes:
- 10 idioms per language
- 71 languages
- 720 total examples
Each entry contains:
{
"idiom": "Original idiom",
"literal_meaning": "Word-for-word translation",
"figurative_meaning": "Actual meaning",
"example": "Usage example",
"language": "Source language"
}
- Base Model:
Gemma 2B en
- Fine-tuning Method: Low-Rank Adaptation (LoRA)
- Rank: 4
- Original Parameters: 2.6B (9.74 GB)
- Trainable Parameters: 2.9M (11.17 MB)
# Model settings
sequence_length = 256
batch_size = 1
epochs = 2
# Optimizer
optimizer = AdamW(
learning_rate=5e-5,
weight_decay=0.01
)
Training Metrics:
Epoch 1: Loss: 0.5887, Accuracy: 63.51%
Epoch 2: Loss: 0.3230, Accuracy: 79.01%
- Installation
pip install -q -U keras-nlp
pip install -q -U keras>=3
- Environment Setup
import os
os.environ['KERAS_BACKEND'] = 'jax'
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"
- Load Model and Generate
test_meaning = "to be stuck in a difficult situation"
prompt = (
"Instruction:\n"
"Find a suitable idiom for this situation: {}\n\n"
"Response:\n"
).format(test_meaning)
sampler = keras_nlp.samplers.TopKSampler(k=7, seed=2)
print(model.generate(prompt, max_length=512))
Instruction:
Find a suitable idiom for this situation: to be stuck in a difficult situation
Response:
Idiom: Bıçakta kalamak
Literal Meaning: To be stuck in a knife
Example Use: You might find yourself stuck in a knife if you don't find a solution quickly
Cultural Context: This idiom comes from the Turkish culture
- ✅ Cross-cultural idiom matching
- ✅ Preservation of cultural context
- ✅ Support for 71 languages
- ✅ Efficient fine-tuning with LoRA
- ✅ Easy-to-use inference API
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google's Gemma team for the base model
- Kaggle for the competition platform