+ 1
create bambara french translation
I fine-tuned facebook/nllb-distilled-600M on my Bambara french CSV dataset, all work but the model is licensed under none commercial use only, I could not use it in my website because my website shows ads. help me, maybe an other pretrained model or maybe how i get permission to use it on the advertising website. I tried mT5-small but the result of BLUE score is 0.0 here is the link [https://www.kaggle.com/code/hallohallomali/helsinki-fine-tuned] thanks in advance.
4 Respostas
+ 2
It appears that the model you fine-tuned, facebook/nllb-distilled-600M, is licensed for non-commercial use only, which restricts its deployment on a website that displays ads. You attempted to use mT5-small but encountered a BLEU score of 0.0, indicating poor performance. Additionally, you referenced a Kaggle notebook for fine-tuning, which may not have yielded satisfactory results. To address these challenges, consider the following options:
1. Explore Alternative Pretrained Models
Several pretrained models have demonstrated better performance for the Bambara-French language pair:
MBART: A multilingual sequence-to-sequence model that has shown effectiveness in low-resource languages.
M2M-100: A multilingual model capable of translating between 100 languages, including Bambara.
NLLB-200: Meta’s No Language Left Behind model supports 200 languages and has been fine-tuned for various African languages.
These models are available on Hugging Face and have open licenses suitable for commercial use.
2. Obtain Permission for Commercial Use
If you prefer to use the facebook/nllb-distilled-600M model, consider reaching out to Meta to request permission for commercial use. Clearly explain your intended application and how it aligns with their usage policies.
3. Utilize Open-Source Tools for Fine-Tuning
To improve your model's performance, consider using tools like fairseq or Hugging Face Transformers for fine-tuning.
4. Leverage High-Quality Datasets
Enhance your model's training by utilizing high-quality parallel datasets:
Bayelemabaga: A comprehensive dataset with 47,000 aligned Bambara-French sentences from diverse sources.
ACL Anthology
MAFAND-MT: A multilingual dataset like MBART, MT5, and M2M-100.
5. Consider Commercial Translation Services
If model development is not feasible, explore commercial translation services that support Bambara-French translation:
Merlin AI: Offers advanced AI-powered Bambara to French translation.
Cesco Linguistic Services: Provides professional Bambara translation services
+ 2
Hi Gaoussou Maiga,
I’m glad my previous suggestions helped! Since you’re working with MBART and currently have a BLEU score of 8.0, there are a few strategies you can try to improve translation quality:
1. Increase and Improve your Dataset
Data quantity matters: MBART performs better with more high-quality parallel sentences. Consider combining multiple datasets like Bayelemabaga, MAFAND-MT, and aligned sentences from ACL Anthology.
Clean your data: Remove noisy or misaligned sentences. Low-quality data can severely hurt BLEU scores.
Data augmentation: You could use back-translation — translating French to Bambara using an existing model and adding it to your training data.
2. Fine-Tuning Strategies
Longer training or more epochs: Sometimes models like MBART need more steps for low-resource languages.
Learning rate adjustment: A smaller learning rate often stabilizes training for low-resource languages.
Use the correct tokenizer: MBART requires the proper tokenizer for source/target languages. Misalignment can cause poor translations.
Gradual unfreezing: Fine-tune the top layers first, then progressively unfreeze more layers for better adaptation.
3. Evaluation Metrics
BLEU is useful, but for low-resource languages, sometimes it’s harsh. You might also track ChrF or METEOR, which can reflect translation quality more fairly.
4. Experiment with Other Models
M2M-100 (many-to-many model) often works better for low-resource African languages.
NLLB-200 models have open licenses for commercial use and sometimes outperform MBART in African language pairs.
5. Iterative Testing
Translate small batches and manually review errors. Focus on frequent mistakes and consider post-processing for common issues.
If you combine these techniques — better data, careful fine-tuning, and evaluation — you should see BLEU improve significantly over 8.0. Good luck!
+ 1
,🙏 thank you Riyadh JS, you helped me a lot.
I tried MBART and i'm able to create machine translation model but the translation result is not accurate, I'm trying to improve it.
It's BLUE score is 8.00, i hope i'll be able to make as a want it to be.
Thank you again.
0
Ok, i understand thank you.