TensorFlow Object Detection API, ML Engine, and Swift

TensorFlow Item Search API, ML Engine, and Swift

Note: As of this writing there is no official TensorFlow library for Swift, I used Swift to build client applications for predictive requests against my model. That may change in the future, but Taylor has the final say.

The TensorFlow Object Detection APO demo helps you identify the location of objects in the image which can lead to some super cool applications. But because I spend more time taking photos of people, rather than things, I want to see if the same technique can be applied to identify faces. Turns out it worked well! I used it to build the Taylor Swift detector in the picture above.

In this post I will outline the steps to take the T-Swift images from the iOS app which predicts against the trained model:

Pre flow streams: Resize, label, split them into training and test sets, and convert to Pascal VOC format

Convert images to TFRecords to be fed into the Item Search API

Train the model on the Cloud ML engine using MobileNet

Export the trained model and apply it to the ML engine for service

Build an iOS frontend that makes predictive requests against trained models (in Swift, explicitly)

And if you skip the code, you can find it on GitHub.

Looking at it now, it all seems so easy

Before I dive into the steps, it helps to explain some of the techniques and terms we are using: TensorFlow is a framework built on top of TensorFlow to identify objects in API images. For example, you can train it with multiple photos of cats and once you get this training you can pass it on to the image of the cat and it will return a list of rectangles where it thinks the cat is in the image. And when it has an API in its name you can think of it as a set of useful utilities for transfer learning.

But it takes data and tons to train the model to identify objects in the image. The best aspect of object detection is that it supports five pre-trained models for transfer learning. There is a similarity here to help understand how transfer learning works: when a child is learning their first language they are very exemplary and improve if they identify something wrong. For example, the first time they learn to recognize a cat their parents show the cat and say the word "cat" and this repetition strengthens the pathways in their minds. When they learn how to recognize a dog, the child does not have to start from scratch. They can use the same identification process as they did for the plant, but apply it to a slightly different function. Similarly, learning to transfer also works.

I don't have time to find and label thousands of TSwift images but I can use the features brought from the models that trained millions of images by modifying the last million layers and applying them to my specific classification work (identifying TSwift).

Step 1: Moving images forward

Many thanks to Dot Tran for writing this excellent post to train a raccoon detector with TIT object detection. I followed the blog post to label the images and convert them to the correct format for TensorFlow. Her post has details; I will summarize my steps here.

My first step was downloading 200 images of Taylor Swift from Google Images. There's a Chrome extension out there - it downloads all the results from Google Image Search. Before labeling my images I divide them into two datasets: train and test. I reserved a test set to check the accuracy of my model which was not seen during this training. As per the recommendations per data, I wrote the resize script to make sure no sizes are larger than p00px.

Because the object detection API tells us where our object is in the image, you can't just pass it on to images and labels as training data. You need to cross the bounding box to identify the item that is in your image and the label associated with that bounding box (we will only have one label in our dataset, swift).

I used LabelImg to generate bounding boxes for our image, as recommended in the Data Raccoon Detector blog post. labeling is a Python program that lets you handle label images and returns an XML file for each image with a bounding box and related label (I spent the whole morning labeling swift images when people put related things on my desk). Here's how it works - I define the bounding box in the image and label it:

Now I have the image, the bound box, and the label but I need to convert it to a format that TensorFlow accepts - the binary representation of this data is called TFRecord. I wrote this script to be based on the guidelines provided in the object script repo. To use My Script, you need to clone TensorFlow / Model Repo locally and package the object detection API.

# From tensorflow / model / research /
Python setup.py sdist
(CD Slim and End Python setup. Pp sdist)

You are now ready to run the TFRecord script. Run the following command from the TensorFlow / Model / Research directory, and cross it with the following flags (run it twice: once for training data, once for test data):

Python convert_label_t_frackards.p
--output_path = train.record
--images_dir = Route / From / Your / Training / Image /
--labels_dir = Path / From / Training / Labels / xML /

Step 2: TSwift Detector Training on Cloud Machine Learning Engine

I can train this model on my laptop but it will take time, a lot of resources, and if I had to put my computer away and the training would stop abruptly. That's what the cloud is for! We can take advantage of running on multiple cores of the cloud to get the whole job done in a few hours. And when I use the Cloud ML engine I can also run training quickly using GPUs (Graphical Processing Units), which are special silicon chips that are excellent in the type of computing that our models perform. Using this processing power, I can stop a training job, and then TSwift will go out of the jam for a few hours when my model trains.

Installing the Cloud ML engine

With all my data in TFRecord format, I am ready to upload to the cloud and start training. First I created a project on Google Cloud Console and enabled the Cloud ML engine.

So I create a cloud storage bucket to package all the resources for my model. Be sure to specify the area for the bucket (do not select multi-area):

I create a data/subdirectory inside this bucket to keep training and TFRecord files.
Image for post

The Item Search API also requires a pbtxt file that relies on maps to the label. Because I only have one label, it will be too short:

Adding mobile net checkpoints for transfer education

I am not training this model from scratch so when I run the training I have to show the pre-trained model I am building. I chose to use the MobileNet model - MobileNets is a series of small models optimized for mobile. When I'm not serving my model directly on a mobile device, MobileNet will be trained quickly and will allow for quick prediction requests. I downloaded this mobile net checkpoint for use in my training. A checkpoint is a binary file that contains the state of the tensor flow model at a particular point in the training process. After downloading and unzipping the checkpoint, you will see that it contains three files:

I need to train all those models so I put them in the same data/directory in my cloud storage bucket.

There is a file to add before conducting the training. The object search script needs a way to find our model checkpoints, label maps, and training data. We do this with the config file. TF Item Search Repo has sample config files for each of the five pre-trained model types. I used MobileNet here and updated PATH_TO_BE_CONFIGURED placeholders with related paths in my cloud storage bucket. In addition to adding my model to the data in the cloud storage, this file configures several hyperparameters for the configuration size, activation functions, and steps for my model.

Here are all the files that should be in my / data cloud storage bucket before I start training:

I also create train / and evale / subdirectories in my bucket - this is where TensorFlow writes my model checkpoint files while running training and evaluation tasks.

Now I am ready to run the training, which I can do through the gcloud command-line tool. Note that you must clone TensorFlow / Model / locally and run this training script from that directory.

During the training, I also took a kick out of the assessment work. It evaluates the accuracy of my model using data that has not been seen before:

You can verify that your work is running correctly and inspect the logs for a specific task by navigating to the employment section of the ML Engine on your cloud console:

Step :: Deploying models to present forecasts

To fit the model to the ML engine I need to convert my model checkpoints to protofuf. In my train/bucket, I can see checkpoint files saved from a few points throughout my training process:

The first line of the checkpoint file will show me the latest checkpoint path - I download files locally from that checkpoint. There should be a .index, .meta, and .data file for each checkpoint. With these saved in a local directory, I can use the object_export_infer_graft script to convert these items to protobf. To run the script below, you need to define the local route in your MobileNet config file, the checkpoint number of the model checkpoint you downloaded from the training work, and the name of the directory you want to export the graph to. Written to:

After running this script, you should see the saved model/directory inside the .pb output directory. Upload the saved_model.PB file (don't worry about other generated files) to your cloud storage bucket/data directory.

Now you are ready to deploy the model in ML engine for service. Use gcloud to build your model first.

The gcloud ML-Engine model creates tswift_detector

So save the first version of your model by showing the model prototype you just uploaded to the cloud storage.

gcloud ml-engines version v1 --model = tswift_detector --origin = gs: // $ {YOUR_GCS_BUCKET} / data --runtime-version = 1.4

Once the model is deployed I am ready to use the ML engine's online forecast API to generate forecasts in the new image.

Step:: Building predictive clients with Firebase functions and Swift

I wrote an iOS client to Swift to request predictions on my model (because why write a TSwift detector in another language?). The Swift client uploads the image to the cloud storage, which triggers a firebase function that requests predictions in Node.js and consequently saves the forecast image and data to the cloud storage and restores.

First, in my Swift client, I added a button to access the users' device's photo library. Once a user selects a photo, it triggers the action that uploads the image to cloud storage:

Next, I triggered the Firebase function while uploading to the cloud storage bucket for my project. It takes the image, base 64 signals it, and sends it to the ML engine for prediction. You can find the full function code here. Below I have included excerpts from the function where I request the ML Engine Prediction API (thanks to Brett McGowan for helping with its expert cloud functions!):

In the ML Engine answer, we get:

Detection_boxes that we can use to define the bounding box around Taylor if he is found in the image

The Detection_Score returns the trust value for each detection box. Only more than 0% points I include explorations.

Detection_class tells us the label ID associated with our identity. In this case, it will always be 1 because there is only one label

In the function, I use the detection_boxes to draw a box on the image if Taylor is found, including the trust score. So I save the newly boxed image to the cloud storage, and write the image file path to the cloud firestore so I can read the route and download the new image (with a rectangle) to my iOS app:

Finally, in my iOS app, I can listen for updates on the Firestore route for the image. If one is found, I will download the image and display it in my app with an identity trust score. This function replaces the comment in the first Swift snippet above.

Oops! We have a working Taylor Swift detector. Note that the focus here was not on accuracy (I only had 1,140 images on my training set) so the model incorrectly identified some of the images that you might mistake for Swift. But if I have time to label more images I will update the model and publish the app in the app store :)

What now

This post covered a lot of information. Want to build your own? Here is a breakdown of the steps with links to the sources:

Pre-transmitted data: I followed Data's blog post to generate XML files with label images and bounding box data using label IMG. So I wrote this script to convert the written images to TFRecord

Training and an Object Assessment Detection Model: Using the approach from this blog post, I uploaded the training and test data to cloud storage and used the ML engine to run the training and illumination.

Deploying the model to the ML engine: I used the gTLD CLI to deploy my model to the ML engine.

Making Forecast Requests: I used the Firebase SDK for cloud function to request an online prediction to my ML engine model. This request was triggered by an upload of Firebase storage from my Swift application. At my ceremony, I wrote the prediction metadata on the Firestore.


TensorFlow Object Detection on GitHub: https://goo.gl/QYThDb

Building a pet detector with the Object Detection API: https://goo.gl/cxIquA

Building a raccoon detector with the Object Detection API: https://goo.gl/A8Sykp

Pascal VOC format: https://goo.gl/m2yT6N

Firebase iOS SDK: https://goo.gl/hnbrva

Cloud Functions for Firebase: https://goo.gl/1qBuce