OCR on Edge Compute

Table of Contents

As advanced technology continues to become common place, computer vision has become a seemingly crucial tool with many applications.

It can be used to identify objects, patterns, people, plants and pets, providing valuable insights for users, aiding in decision-making and making it easier to sift through massive data sets.

Couple this with an increasing demand for real-time analysis and low-latency responses, edge compute has become the best place to deploy your trained model.

What is OCR? #

Optical Character Recognition (OCR) is the process of converting images of text into machine-encoded text. It’s commonly used to convert images of documents into text that can be searched and indexed.

You can use this to scan documents, receipts, business cards, and more. You can see why this would be useful at Blinq.

What is Edge Compute? #

Edge compute is the practice of running your code as close to the user as possible. This is typically done by running your code on a CDN, such as Cloudflare Workers or AWS Lambda@Edge.

This is different to traditional cloud compute, where your code is run in a data center that is typically far away from the user.

Why Edge Compute? #

Although it’s possible to run your model on device, keeping it on the edge has a few advantages:

Low latency: The model is closer to the user, so the response time is faster. It feels like it’s on the device.
Low cost: You don’t need to pay for the compute resources to run the model, you only pay for the requests.
Easy to update: You can update the model without having to update the app. This is especially useful if you have a lot of users on older versions of your app.

At Blinq, we maintain a long tail of mobile app versions, so being able to update the model without having to update the app is a huge win for us.

How to run your model on the edge #

Edge runtimes like Cloudflare Workers and Lambda@Edge are very different to traditional runtimes like Node.js and Golang. They’re designed to be fast and lightweight, so they don’t have access to the same libraries and APIs.

They’re very similar to running your code in a browser, so you can’t just copy and paste your code and expect it to work.

The model #

For this example, I’m using a simple OCR model - Keras OCR. It’s a TensorFlow model that takes an image and returns the text it finds in the image.

I’m also relying on Cloudflare’s Constellation feature (in beta), which allows you to run ONNX models on the edge. This is a great new option as it allows you to run models that aren’t supported by TensorFlow.js and aren’t limited by Cloudflare’s 10MB Worker quota.

Lambda@Edge has much higher limits, but it’s still a good idea to keep your model as small as possible to reduce the cold start time.

For this example, I have used the crnn_kurapan model available here.

Converting Keras to ONNX #

The first step is to convert the Keras model to ONNX. This is a fairly simple process, but it does require a few steps. I’m assuming that Conda is installed and you’re using Python 3.10.

If you’re on Apple Silicon, you’ll also need:

tensorflow-macos

First, create a conda environment.

$ conda create -n edge-onnx python=3.10 pip
$ conda activate edge-onnx

Next, install dependencies.

$ python -m tensorflow tf2onnx keras-ocr

If you are on Apple Silicon, you will need to install tensorflow-macos instead of tensorflow.

$ python -m pip install tensorflow-macos

Finally, convert the model using this snippet.

import tf2onnx
import onnx
import keras_ocr

recognition = keras_ocr.recognition.Recognizer(
    alphabet='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789', weights='kurapan')

output_path = 'crnn_kurapan.onnx'
model_proto, _ = tf2onnx.convert.from_keras(
    recognition.model, opset=None, output_path=output_path)

onnx.save(model_proto, output_path)

This will create an ONNX model in the current directory at crnn_kurapan.onnx.

Running the model on the edge #

To run the ONNX model on Edge compute, I’ve used Cloudflare Constellation as the model is much too large for Cloudflare Workers.

You can do this using the Cloudflare dashboard or the CLI. I’m using the CLI as it’s easier to automate.

First, we need to create a project.

$ npx wrangler constellation project create "ocr" ONNX
$ npx wrangler constellation project list

Take note of the project ID.

Next, upload the model:

$ wrangler constellation model upload ocr crnn_kurapan crnn_kurapan.onnx
$ npx wrangler constellation model list ocr

Take note of the model ID.

Then, setup the project and initialize wrangler.

$ npm create cloudflare@2

I named the project ocr and used the ‘Hello World’ template.

In the wrangler.toml file, add the constellation binding and enable node_compat.

name = "ocr"
main = "src/index.js"
node_compat = true
workers_dev = true
compatibility_date = "2023-07-01"

constellation = [
    {binding = 'OCR', project_id = '{{ project ID from earlier }}'},
]

Finally, install the constellation client & dependencies to stream images into the model.

npm install --save @cloudflare/constellation string-to-stream pngjs

Now, we can write the code to run the model.

import str from 'string-to-stream';
import { PNG } from 'pngjs/browser';

import { Tensor, run } from '@cloudflare/constellation';

const MODEL_ID = '{{ model ID from earlier }}';

function normalizeImage(data) {
  return new Promise(async (resolve, reject) => {
    const stream = str(data);

    const png = new PNG({ filterType: 4 });

    stream
      .pipe(png)
      .on('parsed', function () {
        const [r, g, b] = new Array(3).fill([]);

        for (let i = 0; i < this.data.length; i += 4) {
          r.push(this.data[i] / 255.0);
          g.push(this.data[i + 1] / 255.0);
          b.push(this.data[i + 2] / 255.0);
        }

        resolve({
          input: [...r, ...g, ...b],
          shape: [3, this.height, this.width, 3],
        });
      })
      .on('error', function (error) {
        reject({ err: error.toString() });
      });
  });
}

export default {
  async fetch(request, env) {
    if (request.method !== 'POST') {
      return new Response('Method not allowed', { status: 405 });
    }

    const formData = await request.formData();
    const image = formData.get('image');

    if (!image) {
      return new Response('No image found', { status: 400 });
    }

    const buffer = await image.arrayBuffer();

    try {
      const normalized = await normalizeImage(buffer);

      if (!normalized) {
        return new Response('Unable to normalize image', { status: 500 });
      }

      const input = new Tensor('float32', normalized.shape, normalized.input, 'input_1');

      const output = await run(
        env.OCR,
        MODEL_ID,
        input
      );

      return new Response(JSON.stringify(output), { status: 200 });
    } catch (err) {
      return new Response(err, { status: 500 });
    }
  },
};

While in beta, you may need to disable the constellation warnings with export NO_CONSTELLATION_WARNING=true.

Once you’re ready to deploy, just run wrangler deploy.

Testing the model #

You can now make a HTTP request to the Worker to test the model.

$ curl -X POST -F "
image=@/path/to/image.png" https://ocr.example.workers.dev

This will return a JSON response with the model output.

{
  "output_1": [...]
}

The output from the model will be an array of probabilities for each character. To convert this to text, you can use something similar to this snippet.

const charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';

const text = output.map((probabilities) => {
  const max = Math.max(...probabilities);
  const index = probabilities.indexOf(max);

  return charset[index];
}).join('');

Why do this? #

I wanted to see if it was possible to run a large~ish model on the edge. I also wanted to see how well it would perform.

Although this model is unlikely to be used in production, it’s a good example of what’s possible with Cloudflare Workers and Constellation. Larger OCR models typically perform better, but it’s possible to use smaller models if you’re willing to sacrifice accuracy. Although, you can always use Tesseract.js for that.

Performance #

The model takes around 1.5 seconds to run on the edge. This is much slower than running it locally, but it’s still fast enough to be used in production. Although, in a real world scenario you can also take advantage of offerings from Google & Amazon Web Services which are much more accurate. However, they are also much more expensive.

Accuracy #

The model is fairly accurate, but it’s not perfect. It struggles with some fonts and it’s not very good at recognizing text in images with a lot of noise. It only handles small images too, but this can be improved by using a sliding window.

Try it yourself #

Cloudflare is actively developing Constellation and it’s currently in beta. If you’d like to try it yourself, you can sign up for the beta here.

There’s also a handy Discord channel where you can ask questions and get help from the Cloudflare team.

Discussion on Hacker News