Start with AutoML Vision alpha

Starting with AutoML Vision Alpha

I have a conversation when asked about one of the top questions is Google Cloud AutoML. Let's use AutoML Vision Alpha and use the machine learning model that recognizes different chairs, as well as some other items sprinkled for better measure. We're doing it all, all the way from raw data to model service, and all in between!

A lot of people are stating for access to AutoML Vision Alpha, and I'd like to walk away from the workflow to show you what it's using if you haven't gotten the waitlist yet. In the first video, we will get our data in the correct format for AutoML Vision. So in part two, we use this model to find out what the style of the chair is in the picture. We dive so ... What is AutoML?

The Cloud Vision API can identify the chair, but it is generic

One thing that makes AutoML so attractive is the customization model. There is no problem in identifying existing models and services like Cloud Vision API that may have a chair in it in the given picture, but what if you designed and built the chair, and need some way to make a list of chairs of different brands in your list? Isn't it nice to be able to use the "Custom" Vision API, then, who recognizes your particular chairs? This is what AutoML VISENE aims to do.

This is a yellow chair

Here are some more chairs: AutoML Vision takes a lot of input and a lot of labeled photos. How many pictures, you ask? Ideally, hundreds per item would be fine. So get out there and start snapping photos. If you are tired of clicking that shutter button, you can use an alternative approach that I am using.

Taking photos - with video!

To make it easier to capture data for AutoML Vision, I collected my training data by taking videos of the chairs I was interested in and using FFmpeg to extract the frames.

I went out on Google Sunnyvale campus and took a variety of videos of different outdoor chairs. I also took some videos of the tables that were placed around them, as well as the bikes, just to make things a little more interesting.

Let's look at an example of that scene.

There are chairs of different sizes, styles, and colors. No video is longer than 30 seconds in length. And we have a short clip of the table and another of the bikes. The data that we will work with.

The end state we want is a CSV file that has 1 row per image, and two columns, the first for the location of the image in Google Cloud Storage, and the second for the label, such as 'red chair', or 'table'. , Or 'blue chair'.

To make it easier to organize, I put each video in its own folder. We can run FFmpeg on each video file backward.

Once the frames are removed, you have a folder per label, filled with the image of that label. This is an easy way to organize your images, and simpler than keeping a huge folder of all the images.

FFmpeg -I president.mp4 chair% 03d.jpg

(% 0d in the filename will give us a padded number of digits, such as Chair 300.jpg, Chair 073..jpg, etc. If you have more than 999 images, you should use% 0dd or some other value as appropriate )

Next, we can upload images to Google Cloud Storage using gsutil, copying the folder structure of one folder per folder.

gsutil -m cp -r all_data gs: // cloudmail-demo-vcm / dataset

(This command repeatedly copies/uploads the entire folder structure to all_data and uses -m with multiple streams)

Structuring your data

AutoML needs to know where to find all your photos, and a way to know what is in each image. We need to create a CSV file that lists the paths and labels that I wanted to include in my data set for each image. There are many ways to achieve this, but I chose to spin a local Jupiter notebook and create a Ponds data frame to export as a CSV file. We can see this in the video below.

Looking for the code/notebook shown above? It's here!

Okay, so now we have a CSV file describing and describing the labels for all the images in our dataset. We are ready to train our model!

It looks like once you've loaded the images into AutoML Vision. The CSV file informs the platform what the correct labels are for each image. If you haven't already labeled your images, that's fine - there's a tool built into the UI to guide you through the labeling process and show which image is still labeled.

Part 2: Training and deploying AutoML vision

Model training

Training a model is as simple as clicking a train! This is the point of all the setup we are doing. This enables AutoML Vision to take data and train your data in sophisticated image models and automatically detects suitable hyperparameters such as network structure.

But before you go and advise more models, I recommend to start with the simple model first and see how it performs. This will give you the basics against which you can compare the relative performance of other models.

Once the training starts, go hiking or have coffee. Given how much data we have provided, it will take some time.

Evaluate your model

Once the training is complete, you will get all sorts of facts about your model, which you can use to see how it performed and some images that were misspelled, or other aspects that need to be corrected, and then re-train.

In our case, we've gathered a lot of specific, clean data from the design, so we've got some very high metrics. As far as what really counts is how it performs in the new, invisible data.

Prediction time!

To challenge this model I took some pictures and see what it turns out to be.

Let's try this picture, which includes the bike, with the yellow and blue chair.

It is true that the picture was mainly recognized as a bike, but it also got a yellow and blue chair. They are in the background and less pictured in this photo.

Let's try again.

This picture is mostly a yellow chair, but there are also some blue chairs. The model decides to look mostly yellow chair with its blue chair.

What about this picture of a mostly blue chair?

Yeah Al that sounds pretty crap to me, Looks like Al that sounds crap to me, Looks like Al that sounds crap to me, Looks like Al that sounds crap to me, Looks like Al that sounds crap to me, Looks like Al that sounds crap to me, Looks like Al that sounds crap to me. Not everything is going to be perfect, but so far the top option has proven to be good enough.

In the end, what about this picture, very similar to the last one, but the front chair is yellow? What would the model think about it?

Wow, the yellow chair at the front wins out big time! Finding spaces in your model and dataset can be entertaining, depending on your experiment to understand how you can go and collect more rigorous and representative data.


It is worth pointing out that in this mode, the model is available for calling through its RESP API. This service takes advantage of the online prediction capabilities of the Cloud ML engine to provide an optimized, automatic scaling, prediction service, trained in our dataset.

You can call your service via the REST API from any server or Internet-connected device

The clear part about all of this is that once your data pipeline is all done, it's a training process and the task of operating machine learning models is completely hands-free! This allows you to focus on getting your data in a better state, and overcome the challenges of building a suitable computer philosophy machine education model.

Note the annotation "AutoDepender" under the model name

Now if you forgive me, I'm going to take a few more videos of fun colored chairs I can expand my dataset for my AutoML Vision model!

Blessed p̵i̵c̵t̵u̵r̵e̵ video taking and AutoML vision model training!

Thanks for reading this part of Cloud AI Adventure. If you are enjoying the series, please applaud me for the article. If you want more machine learning work, be sure to follow me in the middle or subscribe to the YouTube channel to catch them when future episodes come out. More episodes coming to you soon!

AutoML data preparation notebook →