In rural Neno district, Malawi, Medic's community health worker app and the facility-based electronic medical record system were operating in parallel — seeing the same patients, but not talking to each other. A community health worker might flag a patient for follow-up and they needed a reliable link to their clinic record to ensure the patient got adequate care. We needed better handoff so patients in HIV and non-communicable disease programs could get better continuum of care.
The technical challenge was record linkage: matching patient records across two systems that were designed independently, used different identifiers, and were maintained by different people in different places. The question we asked was whether machine learning could outperform that standard approach of using the Fellegi-Sunter algorithm for probabilistic record linkage.
Working with Medic and PIH Malawi, my team buildt and evaluated an ML-based record linkage model as part of the eTrace workflow. The model achieved 0.74 recall at 0.90 precision. That was a 12% improvement over the industry-standard probabilistic baseline. In the end, our ML model was meaningfully more reliable at finding the right patient match without burdening health workers with excessive false matches.
The full study, published in the ACM Journal on Computing and Sustainable Societies, is available here.
Precision and recall aren't just evaluation metrics — they're tradeoffs with real consequences. Knowing which direction to err matters as much as knowing the numbers.