Quick Draw: the world’s largest doodle dataset

Quick Draw: The world's largest doodle dataset

A team at Google set up a dictionary game to make it fun and ended up with the world's largest doodling dataset, and a powerful machine learning model to boot. How did they do it?

What is the fast line?

"Quick, draw!" Initially at Google I / O in 201, it is a game where one player is encouraged to draw a picture of an object, and the other player must guess what it is. Just like p.

In 2017, the Magenta team at Google Research took that step by using this labeled dataset to train the Sketch-RNN model, to predict what the player was portraying, rather than guessing another player. The game is available online, and now over 1 billion hand-drawn doodles have been collected!

Let's take a look at some of the drawers from Quick Draw. Here we see broccoli attracted by many players.

How do you make broccoli?

Notice that the seas are portrayed differently by different players.

Image for post
It can be fun to browse datasets. If you find something that looks out of place, you can actually fix it, right there, on the page. This will make data better for everyone!

Quickdraw dataset

The team has opened this data and in various formats. You can find out more on their GitHub page.

There are four formats: Raw files previously stored in up (.ndjson) format. These files encode a complete set of information for each doodle. It contains time information for the strokes of each image.

There is also a simplified version, stored in the same format (.ndjson), with some pre-processing applied to normalize the data. The simplified version is also available as a binary framework for more efficient storage and transfer. There are examples of how to read files using both Python and NodeJS. \

The fourth format takes the simplified data and renders it into a 2pyx2 grayscale bitmap in the .npy format, which can be loaded using np.load ().

Why is it 2xx2? Well, this is the best replacement for any existing code you may have for MNIS data processing. So if you're looking for something more imaginative than 10 manuscript numbers, you can process more than 100 different categories of doodles.

Using RNNs on the fast line

If you want to be fancy and use a complete dataset (proper warning, it's too big!), You'll probably want to use a recurrent neural network (RNN), as it will learn from the sequence of strokes. In the amazing mode of drawing events, there is a special guide to using RNN in the Quick Draw dataset, so check out the tutorial if you are interested in trying that out. Maybe do it for the first time only for a subset of data, taking into account the training time :

Visibility of data exploration and quick draw

If you want to explore the dataset a bit more, you can look at QuickDataset using Fets. The Festus team has also taken the liberty of hosting it online and offering us some presets to play with! You can access the page here. We can load some random chairs and see how different players draw chairs around the world.

All chairs

We can also see which drawings were recognized as chairs and which cuts were not sufficient. There are many preset scenes that are also worth playing around with, and they serve as interesting starting points for further analysis.

Quick, Draw! Dataset on GitHub: https://goo.gl/dl3n8S 

Quick, Draw! in Facets: https://goo.gl/fLpEDR

Recurrent Neural Network Tutorial: https://goo.gl/hBqhwQ