ROCR: Turning State-of-the-art OCR into Automated Form Processing

← All posts

Riders for Health offers medical transportation and logistics services in Africa, especially in rural areas, using fleets of motorcycles to handle rougher terrains. However, their riders don't just get to be awesome riding motorcycles around — they also have to spend considerable time keeping careful paper logbooks of when, where, and what they collect and transport.

To assist with this, we built ROCR. ROCR (Riders for Health OCR) is a prototype automated form processing and handwriting prediction tool to aid community health workers with more efficient digital data collection. The ultimate goal of the project is to let Riders for Health do what they do best — manage the logistics of getting into remote healthcare outposts and picking up medical samples — by minimizing the time spent in the field writing in paper logbooks.

ROCR extracts and predicts the key handwritten information from regularly used forms. The figure below shows an example of this process. A photo of the World Health Organization case report for confirmed COVID-19 cases is passed through the ROCR tool that extracts and predicts two key fields: the reporting country and unique case identifier.

ROCR tool results on WHO COVID-19 case report — Actual ROCR tool results using Azure Form Recognizer OCR model on WHO case report for confirmed COVID-19 cases. Form data shown was filled out by our team and is not real.

This task presents several challenges: How do you find the key field you are searching for in each new image? How do you predict handwriting (which is much more difficult than computer-generated text)? Can you use domain knowledge of the form's use case to improve the predictions?

How ROCR works

To address these challenges, ROCR processes forms using the following steps:

Image Alignment — Warp the input form image to align with the blank template form
OCR / Handwriting Prediction — Predict the form's text
Key Field Extraction — Find OCR predictions text in regions of interest
Post Processing — Apply domain knowledge to improve OCR prediction outputs

ROCR pipeline — image alignment, OCR prediction, post processing

ROCR uses off-the-shelf OCR engines, Google Cloud Vision and Azure Form Recognizer, but needs "glue" code — proper image alignment, key field extraction, OCR cloud response handling, and post processing — to use them for the ROCR tool and get reasonable results.

Key technical takeaways

ROCR uses off-the-shelf tools like Google Cloud Vision and Azure Form Recognizer, and consequently benefits from a wealth of prior research and expertise in handwriting recognition. While we experimented with custom OCR models, we ultimately found these cloud solutions worked best.
Image alignment is hard. We perform image alignment here with feature matching. We had to do a fair amount of tuning to get it to work, and found that calibrating the feature match outlier detection (to a threshold much greater than the traditional parameters) gave us the most improvement. ROCR also features bad image alignment detection that flags to the user when human intervention is required.
We improved ROCR's performance by applying post processing techniques to OCR model outputs. They allow ROCR to leverage domain knowledge of a form or particular OCR engine issues to improve accuracy. One technique applied in ROCR is to compare the OCR prediction to a list of known possible values in order to determine the most likely output. In the WHO case report example above, we know that the reporting country has to be one of 194 WHO member countries — so if the OCR prediction is "United Bingdom", we can compare it to that list and update the output to "United Kingdom".

About the project

ROCR was built for Riders for Health: an international nonprofit that offers medical transportation and logistics services in Africa. It's difficult to overstate the challenge of bringing healthcare to rural villages in developing countries. Riders for Health overcomes these challenges mostly with a fleet of motorcycles capable of handling rougher terrains effectively and efficiently. Working with Riders for Health is basically like working with superheroes riding motorcycles, with kindness and humility to boot.

Riders for Health health worker on motorcycle — Photo courtesy of www.riders.org

This work is driven by DataKind, a nonprofit committed to the application of data science for social good. DataKind brings together pro-bono data scientists and social impact organizations who benefit from their time and expertise. Alex Fried, Karry Lu, Amy Roberts, Alexander Sack, and Anna Dixon formed the team that built and tested the ROCR tool. ROCR is part of a portfolio of projects designed to impact community health workers.

Computer Vision OCR Data Science Digital Data Collection