Gemma Idioms: Cross-Cultural Idiom Understanding with Fine-tuned LLMs 🌍

A specialized fine-tuning of Google's Gemma 2B model to understand and generate culturally-appropriate idioms across 71 languages. This project enables cross-cultural communication by matching figurative meanings with appropriate idioms while preserving cultural context.

🎯 Project Highlights

Multilingual Support: Covers 71 languages from diverse cultural backgrounds
Cultural Preservation: Maintains cultural context and nuances in translations
Efficient Fine-tuning: Uses LoRA to reduce training parameters by 99.9%
High Accuracy: Achieves 79.01% accuracy in idiom matching

📊 Dataset

Our custom dataset includes:

10 idioms per language
71 languages
720 total examples

Each entry contains:

{
  "idiom": "Original idiom",
  "literal_meaning": "Word-for-word translation",
  "figurative_meaning": "Actual meaning",
  "example": "Usage example",
  "language": "Source language"
}

🛠️ Technical Implementation

Model Architecture

Base Model: Gemma 2B en
Fine-tuning Method: Low-Rank Adaptation (LoRA)
- Rank: 4
- Original Parameters: 2.6B (9.74 GB)
- Trainable Parameters: 2.9M (11.17 MB)

Training Configuration

# Model settings
sequence_length = 256
batch_size = 1
epochs = 2

# Optimizer
optimizer = AdamW(
    learning_rate=5e-5,
    weight_decay=0.01
)

📈 Performance

Training Metrics:

Epoch 1: Loss: 0.5887, Accuracy: 63.51%
Epoch 2: Loss: 0.3230, Accuracy: 79.01%

🚀 Quick Start

Installation

pip install -q -U keras-nlp
pip install -q -U keras>=3

Environment Setup

import os
os.environ['KERAS_BACKEND'] = 'jax'
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"

Load Model and Generate

test_meaning = "to be stuck in a difficult situation"
prompt = (
    "Instruction:\n"
    "Find a suitable idiom for this situation: {}\n\n"
    "Response:\n"
).format(test_meaning)

sampler = keras_nlp.samplers.TopKSampler(k=7, seed=2)
print(model.generate(prompt, max_length=512))

📋 Sample Output

Instruction:
Find a suitable idiom for this situation: to be stuck in a difficult situation

Response:
Idiom: Bıçakta kalamak
Literal Meaning: To be stuck in a knife
Example Use: You might find yourself stuck in a knife if you don't find a solution quickly
Cultural Context: This idiom comes from the Turkish culture

🌟 Features

✅ Cross-cultural idiom matching
✅ Preservation of cultural context
✅ Support for 71 languages
✅ Efficient fine-tuning with LoRA
✅ Easy-to-use inference API

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Google's Gemma team for the base model
Kaggle for the competition platform

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
LICENSE		LICENSE
README.md		README.md
gemma-2b-en-idioms.ipynb		gemma-2b-en-idioms.ipynb
gemma-idioms.ipynb		gemma-idioms.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemma Idioms: Cross-Cultural Idiom Understanding with Fine-tuned LLMs 🌍

🎯 Project Highlights

📊 Dataset

🛠️ Technical Implementation

Model Architecture

Training Configuration

📈 Performance

🚀 Quick Start

📋 Sample Output

🌟 Features

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Releases

Packages

Languages

License

PhoenixAlpha23/Finetune-gemma

Folders and files

Latest commit

History

Repository files navigation

Gemma Idioms: Cross-Cultural Idiom Understanding with Fine-tuned LLMs 🌍

🎯 Project Highlights

📊 Dataset

🛠️ Technical Implementation

Model Architecture

Training Configuration

📈 Performance

🚀 Quick Start

📋 Sample Output

🌟 Features

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages