Family planning education is most effective when it reaches young people in the language they actually speak. While working at Dimagi, I fine-tuned a machine translation layer to support a family planning chatbot operating in Sheng, a Swahili-English slang spoken by young people across Kenya.
The core technical challenge: LLMs are pretrained on whatever data is available at scale, and low-resource languages like Sheng are dramatically underrepresented. Zero-shot and few-shot prompting produced translations that native speakers flagged as awkward and unnatural. Injecting hundreds of example sentences into the prompt as a style guide improved quality, but at significant cost in latency and tokens. We implemented a solution that isolated machine translation into its own dedicated layer, fine-tuned separately from the health education chatbot. This kept language quality evaluation tractable and allowed independent optimization of each component.
Fine-tuning GPT-4o mini on 1,300 domain-specific sentences translated by native speakers improved the spBLEU score from 22.21 to 65.23. Two factors likely contributed to the magnitude of that improvement: the application scope was tight enough to tolerate overfitting the fine-tuned model to it, and Sheng — as a derivative of Swahili and English, two languages already well-represented in LLM pretraining data — gave the model a much stronger foundation than a truly isolated low-resource language would. In a parallel effort for Chichewa, spoken by approximately 7 million people predominantly in Malawi, I fine-tuned GPT-4o mini and reached performance comparable to the base GPT-4o model, but at a tenth of the inference cost.
I presented this work at OpenAI DevDay 2024 in San Francisco. Watch the talk.