TensorFlow Object Detection API, ML Engine, and Swift

TensorFlow Item Search API, ML Engine, and Swift


Note: As of this writing there is no official TensorFlow library for Swift, I used Swift to build client applications for predictive requests against my model. That may change in the future, but Taylor has the final say.


The TensorFlow Object Detection APO demo helps you identify the location of objects in the image which can lead to some super cool applications. But because I spend more time taking photos of people, rather than things, I want to see if the same technique can be applied to identify faces. Turns out it worked well! I used it to build the Taylor Swift detector in the picture above.





In this post I will outline the steps to take the T-Swift images from the iOS app which predicts against the trained model:

Pre flow streams: Resize, label, split them into training and test sets, and convert to Pascal VOC format

Convert images to TFRecords to be fed into the Item Search API

Train the model on the Cloud ML engine using MobileNet

Export the trained model and apply it to the ML engine for service

Build an iOS frontend that makes predictive requests against trained models (in Swift, explicitly)

And if you skip the code, you can find it on GitHub.

Looking at it now, it all seems so easy

Before I dive into the steps, it helps to explain some of the techniques and terms we are using: TensorFlow is a framework built on top of TensorFlow to identify objects in API images. For example, you can train it with multiple photos of cats and once you get this training you can pass it on to the image of the cat and it will return a list of rectangles where it thinks the cat is in the image. And when it has an API in its name you can think of it as a set of useful utilities for transfer learning.

But it takes data and tons to train the model to identify objects in the image. The best aspect of object detection is that it supports five pre-trained models for transfer learning. There is a similarity here to help understand how transfer learning works: when a child is learning their first language they are very exemplary and improve if they identify something wrong. For example, the first time they learn to recognize a cat their parents show the cat and say the word "cat" and this repetition strengthens the pathways in their minds. When they learn how to recognize a dog, the child does not have to start from scratch. They can use the same identification process as they did for the plant, but apply it to a slightly different function. Similarly, learning to transfer also works.




I don't have time to find and label thousands of TSwift images but I can use the features brought from the models that trained millions of images by modifying the last million layers and applying them to my specific classification work (identifying TSwift).

Step 1: Moving images forward


Many thanks to Dot Tran for writing this excellent post to train a raccoon detector with TIT object detection. I followed the blog post to label the images and convert them to the correct format for TensorFlow. Her post has details; I will summarize my steps here.

My first step was downloading 200 images of Taylor Swift from Google Images. There's a Chrome extension out there - it downloads all the results from Google Image Search. Before labeling my images I divide them into two datasets: train and test. I reserved a test set to check the accuracy of my model which was not seen during this training. As per the recommendations per data, I wrote the resize script to make sure no sizes are larger than p00px.

Because the object detection API tells us where our object is in the image, you can't just pass it on to images and labels as training data. You need to cross the bounding box to identify the item that is in your image and the label associated with that bounding box (we will only have one label in our dataset, swift).

I used LabelImg to generate bounding boxes for our image, as recommended in the Data Raccoon Detector blog post. labeling is a Python program that lets you handle label images and returns an XML file for each image with a bounding box and related label (I spent the whole morning labeling swift images when people put related things on my desk). Here's how it works - I define the bounding box in the image and label it:




Now I have the image, the bound box, and the label but I need to convert it to a format that TensorFlow accepts - the binary representation of this data is called TFRecord. I wrote this script to be based on the guidelines provided in the object script repo. To use My Script, you need to clone TensorFlow / Model Repo locally and package the object detection API.

# From tensorflow / model / research /
Python setup.py sdist
(CD Slim and End Python setup. Pp sdist)

You are now ready to run the TFRecord script. Run the following command from the TensorFlow / Model / Research directory, and cross it with the following flags (run it twice: once for training data, once for test data):

Python convert_label_t_frackards.p
--output_path = train.record
--images_dir = Route / From / Your / Training / Image /
--labels_dir = Path / From / Training / Labels / xML /

Step 2: TSwift Detector Training on Cloud Machine Learning Engine


I can train this model on my laptop but it will take time, a lot of resources, and if I had to put my computer away and the training would stop abruptly. That's what the cloud is for! We can take advantage of running on multiple cores of the cloud to get the whole job done in a few hours. And when I use the Cloud ML engine I can also run training quickly using GPUs (Graphical Processing Units), which are special silicon chips that are excellent in the type of computing that our models perform. Using this processing power, I can stop a training job, and then TSwift will go out of the jam for a few hours when my model trains.

Installing the Cloud ML engine


With all my data in TFRecord format, I am ready to upload to the cloud and start training. First I created a project on Google Cloud Console and enabled the Cloud ML engine.


So I create a cloud storage bucket to package all the resources for my model. Be sure to specify the area for the bucket (do not select multi-area):

I create a data/subdirectory inside this bucket to keep training and TFRecord files.
Image for post

The Item Search API also requires a pbtxt file that relies on maps to the label. Because I only have one label, it will be too short:



Adding mobile net checkpoints for transfer education


I am not training this model from scratch so when I run the training I have to show the pre-trained model I am building. I chose to use the MobileNet model - MobileNets is a series of small models optimized for mobile. When I'm not serving my model directly on a mobile device, MobileNet will be trained quickly and will allow for quick prediction requests. I downloaded this mobile net checkpoint for use in my training. A checkpoint is a binary file that contains the state of the tensor flow model at a particular point in the training process. After downloading and unzipping the checkpoint, you will see that it contains three files:


I need to train all those models so I put them in the same data/directory in my cloud storage bucket.


There is a file to add before conducting the training. The object search script needs a way to find our model checkpoints, label maps, and training data. We do this with the config file. TF Item Search Repo has sample config files for each of the five pre-trained model types. I used MobileNet here and updated PATH_TO_BE_CONFIGURED placeholders with related paths in my cloud storage bucket. In addition to adding my model to the data in the cloud storage, this file configures several hyperparameters for the configuration size, activation functions, and steps for my model.


Here are all the files that should be in my / data cloud storage bucket before I start training:


I also create train / and evale / subdirectories in my bucket - this is where TensorFlow writes my model checkpoint files while running training and evaluation tasks.


Now I am ready to run the training, which I can do through the gcloud command-line tool. Note that you must clone TensorFlow / Model / locally and run this training script from that directory.


During the training, I also took a kick out of the assessment work. It evaluates the accuracy of my model using data that has not been seen before:


You can verify that your work is running correctly and inspect the logs for a specific task by navigating to the employment section of the ML Engine on your cloud console:



Step :: Deploying models to present forecasts


To fit the model to the ML engine I need to convert my model checkpoints to protofuf. In my train/bucket, I can see checkpoint files saved from a few points throughout my training process:


The first line of the checkpoint file will show me the latest checkpoint path - I download files locally from that checkpoint. There should be a .index, .meta, and .data file for each checkpoint. With these saved in a local directory, I can use the object_export_infer_graft script to convert these items to protobf. To run the script below, you need to define the local route in your MobileNet config file, the checkpoint number of the model checkpoint you downloaded from the training work, and the name of the directory you want to export the graph to. Written to:


After running this script, you should see the saved model/directory inside the .pb output directory. Upload the saved_model.PB file (don't worry about other generated files) to your cloud storage bucket/data directory.


Now you are ready to deploy the model in ML engine for service. Use gcloud to build your model first.


The gcloud ML-Engine model creates tswift_detector


So save the first version of your model by showing the model prototype you just uploaded to the cloud storage.


gcloud ml-engines version v1 --model = tswift_detector --origin = gs: // $ {YOUR_GCS_BUCKET} / data --runtime-version = 1.4


Once the model is deployed I am ready to use the ML engine's online forecast API to generate forecasts in the new image.


Step:: Building predictive clients with Firebase functions and Swift


I wrote an iOS client to Swift to request predictions on my model (because why write a TSwift detector in another language?). The Swift client uploads the image to the cloud storage, which triggers a firebase function that requests predictions in Node.js and consequently saves the forecast image and data to the cloud storage and restores.


First, in my Swift client, I added a button to access the users' device's photo library. Once a user selects a photo, it triggers the action that uploads the image to cloud storage:


Next, I triggered the Firebase function while uploading to the cloud storage bucket for my project. It takes the image, base 64 signals it, and sends it to the ML engine for prediction. You can find the full function code here. Below I have included excerpts from the function where I request the ML Engine Prediction API (thanks to Brett McGowan for helping with its expert cloud functions!):


In the ML Engine answer, we get:


Detection_boxes that we can use to define the bounding box around Taylor if he is found in the image


The Detection_Score returns the trust value for each detection box. Only more than 0% points I include explorations.


Detection_class tells us the label ID associated with our identity. In this case, it will always be 1 because there is only one label


In the function, I use the detection_boxes to draw a box on the image if Taylor is found, including the trust score. So I save the newly boxed image to the cloud storage, and write the image file path to the cloud firestore so I can read the route and download the new image (with a rectangle) to my iOS app:


Finally, in my iOS app, I can listen for updates on the Firestore route for the image. If one is found, I will download the image and display it in my app with an identity trust score. This function replaces the comment in the first Swift snippet above.


Oops! We have a working Taylor Swift detector. Note that the focus here was not on accuracy (I only had 1,140 images on my training set) so the model incorrectly identified some of the images that you might mistake for Swift. But if I have time to label more images I will update the model and publish the app in the app store :)


What now


This post covered a lot of information. Want to build your own? Here is a breakdown of the steps with links to the sources:


Pre-transmitted data: I followed Data's blog post to generate XML files with label images and bounding box data using label IMG. So I wrote this script to convert the written images to TFRecord


Training and an Object Assessment Detection Model: Using the approach from this blog post, I uploaded the training and test data to cloud storage and used the ML engine to run the training and illumination.


Deploying the model to the ML engine: I used the gTLD CLI to deploy my model to the ML engine.

Making Forecast Requests: I used the Firebase SDK for cloud function to request an online prediction to my ML engine model. This request was triggered by an upload of Firebase storage from my Swift application. At my ceremony, I wrote the prediction metadata on the Firestore.

Links


TensorFlow Object Detection on GitHub: https://goo.gl/QYThDb

Building a pet detector with the Object Detection API: https://goo.gl/cxIquA

Building a raccoon detector with the Object Detection API: https://goo.gl/A8Sykp

Pascal VOC format: https://goo.gl/m2yT6N

Firebase iOS SDK: https://goo.gl/hnbrva

Cloud Functions for Firebase: https://goo.gl/1qBuce



Comments

Popular posts from this blog

Artificial intelligence (AI) - the ability of a digital computer.

Facebook's name has been changed to 'rebranding'

What is SEO and how to do search engine optimization?

Labels

in Facebook of What a phone and mobile This on you are smartphone Do internet Android IT workforce Nepal app your robot from iPhone Machine Learning Python use will company computer for account can data does password twitter with Apple these Instagram Whatsapp YouTube digital feature media not Tiktok like machine new ChatGPT China be by free why an out people search website without work Future India ML corona features find online or public that video Elon Musk Microsoft One apps has information million social user users way year Intelligence Laptop US billion education history home make market protect service Bitcoin Have Machine Learning Future Nepali Now Operators Scientists Wi-Fi Windows chrome code cyber download hacking money network photos tips videos world Amazon Artificial Intelligence Future Avoid Cryptocurrency If Know Learning TV Things artificial being browser human malware many need netflix photo smart software study system there update which 10 15 Beginners Buy Deep Learning Here Privacy Who about battery business chat digital marketing down hacker marketing millions number phones security sent virus want when work force Agriculture Bug Deep Did Earth GPS Gmail Google Maps Kaggle Keep NASA RAM Top Windows 11 World Cup Xiaomi address after as at camera career cloud dangerous difference drive earn easy email going job jobs language life look may message news old open price really search engine settings storage store such two used version watch windows 10 working 14 2020 2022 4 5 6 7 Cambridge Dark Web GB GPT Global Health-care Lite Messages More Oppo Pakistan PayPal Print Pro QR Risk SEO Samsung So Some TensorFlow Than Tutorial Type Types Vision Ways WiFi Zoom advertising also attack been best better biggest blue brain change chip comments country created cyber attacks electricity eyes fake files first football function game get go government hacked hackers hidden hours image install lost medical misused monitor moon once pay percent play problem processing program quantum robots scan science send share signal space stay story take their them thousands time tricks up using water web where while wireless workers 000 5G AI Education Alan Musk America Analytica Applications Army Blockchain Bounty CCTV COVID-19 Chat GPT Choose Clean Close Clubhouse Computer Vision Content Crypto DL Developer Docs Electric Explain Factory Finally Gemini Google chrome Google drive Healthcare Help I IBM Keras Kernels Large Lifestyle Looking MDMS Mac Maps Models Musk Natural Ncell Net Notebooks PC Preparing Reasons Russia SEE SIM SMS Save Scikit-Learn Skills SpaceX Stephen Hawking Telegram Tesla Theme Therefore Thinking VPN Variables Word WorldLink ability accounts ads airplane all any available aware background bandwidth bank beneficial between blocked bring bully cable call captions capture care cause charge chatbots check come coming companies complete computers consumption copyright corona-virus courses create currency cyber security dataset datasets days delete deleted deleting details developed device dislike doctor documents doing domain don't due during dynamic energy engine engineer engineering exactly found fraud full gadgets games getting given good got guest handle his humans iOS iPhone 14 iPhones important including increase industry its keyboard launch law learn listen live manager map meaning megapixel memory messenger mind mode model month months movies much name nonsense nuclear opening over own phishing physics porn post posts prevent problems product production programming protection quickly real-world reduce reward robotics run safe same scandal show site smartphones sold someone speaking spyware stuck students subscription systems target techology television tick today topics torrent traffic trillion universe upload verification voice war was weakest women worldwide years & 'Buy the Dip' 'HDR' 'I' 'Mr. Beast' 'Professional Mode' 'football intelligence' 'hidden' 'refill station' (IoT) (LLM) (NLP) 1 100 10:10 10th 12 145 16 17 19 2 200 2007 2024 25 35 3D 40 4000 48 4K 5 P's 60 7 C's 8 @everyone on A17 AI Tool AI ethics API AR Adjust Adobe Adopt Adsense Adsense Supports Africa Alexa Ali Baba Altman Amazon Jungle Amazon Prime Ambani American Anaconda Android 11 Android TV Android phone Annoyed Appoints Arithmetic Art Art through NFTs Artficial Intelligence Artificial neural Artuficial Intellegence Ashika Tamang Assignment Assistant Astronauts Astronomy Atrificial Inteligence Attacks Audiobooks Augmented Reality Australia Auto-GPT AutoML Avatar 2 Bachelors Banned Bard AI Because Before Bernie Sanders Big data BigQuery Bill Gates Bitwise Blind Blockchain Developer Blockchain Technology Books Brave Brave Browser Brazil C charger CPU CPU temperature CTEVT CV Cases Casting Changed ChatGBT Chery Chinese Citroën C5 Cloud Factory Cloud Factory Nepal Club House Colab Command Comparison Compute Concatenate Contactless Contactless payment system Copilot Couple Challenge Crash test Create your first Project on Python Crossover Cup DNS DRS Gaming Dark mode Datalab Deep Fake Deep Learinig Deep Learning with Python Deep Neural Networks Deepfake Demat Dept Development Development in predictive analytics Didn't Digital avatars Discontinuing Do not Dodge Dogecoin DuckDuckGo E-task EA ETF EU Earbuds Earth 2 Earthquake Edge Computing El Salvador Elected Electric Vehicles Electrical Elon Embedded Application Embedded Application (EA) Emoji Estimators Ethical Hacking Euro NCAP European Even Everyone Evolve Explained Explosion Express WiFi FPS Facebook Messenger Facebook's Facets Fears Federal Reserve System Finance Firefox FiveG Fixed wireless Follow Forge Fraud Call Freefire Freelancing GIF Git Gold Google Chat Google Cloud Google Meet Google Play Music Google Plus Google Plus code Google Workspace Google search Green room Greenroom. Spotify Guest Mode HDMI Happy Birthday Health sector Here's Holi Honest Honeygain Huawei Hyundai ID IMD IP ISP Identify Implementing Includes Increasing Indonesia Inflation InfoSec Input Inspiration Installation Integrated circuit Intel Intelligent Internet of Things (IoT) Introduction Iranian Island Isn't JBL JPG JPMorgan Chase & Co Jack Ma January Japan JavaScript Jio Joker Virus Jungle Jupyter Jupyter Notebooks Keys Korean LAN LLM LP Large Language Models Launch of better autonomous systems Lee Kun-hee Library Line Linux Logical Lucky MDMS Nepal ML Engine MSN MaAfee Mark Zuckerberg Max Meet Membership Mero Share Metaverse Microsoft Office Microsoft Teams Military Military weapons Mobile Operating System Module Mouse Mukesh Ambani Music Must NEA NFT NFTs Natural language processing (NLP) Nepal. radio mapping Nepali businesses Nepali game Nepali youth Nepalis NetTV Neural Network Neural Networks New Technology No Nokia North Korea Note Object Detection Open-source Opera Operating PDF PNG PPT PUBG Pandas Paytm Pendrive Photoshoot Pi Network Pip Plan Play Store Pokémon Pokémon Go Police Premium Preparations Prerequisite Prime Pro's Process Process discovery Pycharm Pyenv Python Programming Python Tutorial Python Tutorials Python for Beginners Python on Windows Quick Draw RCS Race Radically Ransomware Rashtra Bank Reboot Recommender Recommender Systems Redmi Reinforcement Reinforcement learning Reliance Reliance Jio Remove. bg Revolution Rice that grows for years once planted Rises Robot Sophia Roles Ronaldo Routine of Nepal Banda S&P 500 S&P Global Ratings SD Scale Scaling Scikit Screen Pinning Selection Seven Shorts Singapore Sitting SixG Snapchat Sophia South Korea Space X Spam Stable Coin Starlink Steve Jobs Stock market String Success Sun Sundar Pichai Supermarket Supervised Supervised Learning Supervised Machine Learning Supply Chain Attack Supports Swift TIFF Telecom TensorBoard TensorFLow Hub Thes Tiktok stop Time Travel Tool Training Data Transforming Trojan Truecaller Trump Trusting Type-C US Congress USA USB United States Unnecessary Unsupervised Unsupervised Learning Unsupervised LearningUnsupervised Machine Learning Unsupervised Machine Learning Upcoming Upcoming Technology Urges Using a drone VPNs VR Vehicles Virtual reality Virtualenv Visualize WWW Wait Walkthrough Walmart WeChat Wha What are Assignment Operators in Python What are Comparison Operators in Python What are Logical Operators in Python What are Operators in Python What are the basic laws of quantum physics What is What is Chat GPT What is Google Adsense What is Pycharm What is Python What is String in Python What is Variable in Python Whose Wi-Fi 6 Wikipedia WordPress Wrangling data Write X8 series XAI XOR XSS Ziglar Zipty Zuckerberg admin advertisers again age agency agricultural ai beauty air aircraft aired alert algorithm almost along alpha alternative analytics ancient angles announcement announces another answer answering antivirus anyone anything appear appearance appliances approaching approaching science meaning apps. google article artificial blood vessels arts associated attention audience automatic automatically autonomous avatars back backed ban bans bar basic batteries becoming beginner benefit benefits bitcoin mine bitcoins black block boarding bogged book bought box brand break brings broadband brought browsing bug bounty build but buttons buying bypass cable internet cables calculus calls cameras campaign can't cancer cannot car cards careeer carry cave center challenge channel charger charging chat.com cheap checkmarks chess child children choose. a class clicking climbers clock club coding colleges color combat common communicate compensates compete competing computer mouse computer science concept connect cons control controls controversies could countries credit crisis criteria crore crores crowdsourcing culture cyberattack d about damaged danger dark data center data science dating apps day debit dedicated delete data depression destination devices diary die different digit digital cameras digital land digital privacy disappeared discovered discovery displaced display document dog dollars doodle door downloads dream drone drug trafficking e features e-Rupee e-books e-passport e-sewa eBooks ePassport each earn money from Nepal easier eating economy edit effective electronic else email server emails emerged emergency emojis employee employees end enough espionage etflix ethics except excessive excuse existence expected expire extracts eye face app facial verification facts family far farm fax fdown.net fee feet fiber fight file film final five flying foldable food footprint forced foreigners forever forget forgotten form formats foundation free upgrade frequency freshman from search fruit game tips gamer gas geometry gets gives glasses goes good content goods google docs gossip granted great groups growing had hall hand handy happen happy harmful he head headphones headset heater hobby human brain human intelligence human trafficking hundreds hurting hydrogen hype iCloud iPhone 12 Pro illegal data illicit trade image processing processor impair inbox incidents income increased incur insecure instrument interest internal storage internet speed into intranet introduced invented invention invest investment invites jack join journalists journey kit known laboratory lakh languages last later latest launched launching lawmakers laws leak leaks legalize let letter letters light likes link lives loaded location locked longest lose loss love machine vision made main main features makes man manage management system mango marketplace martial mask matches measuring meetings melting meme meta microphone middle million. downloads mine mistake mistakes mobile number moble moment monitors most mountain move movie moving mute name-x naming near necessary neural neural networking new code new look new windows news anchor night mode non notes notifications now.gg nuclear energy obscene official offline open source opened operate operated operating system opposed optic optical fiber optimization option other others outbreak oversold owner page paid pandemic paper participant participate passports password. passwords patent pattern paying payment pen drive permanent permission person personal phone confidential picture pictures pirated placed planting platform platforms political popular popularity port possible practice predictive pregnant prepared principles private prize processor product key programmatically programming languages project prompt property pros protected proxies proxy quantum computer quantum internet quires quota r daily radio rain rainy season rate reach reading ready real reason rebranding record recovery reform refresh refrigerator regarding registered registration regulators relationship released remain remove removes removing replace report requiring reset residence resolution responsibilities restaurants returned revenue review rings risks risky road robotic dog rocket room rooms round ruin rules running safely safety sale satellite saving say saying says scary schedule scheme schools screen screens search engines searched secret secretly secure selectric cars selfie sell semi-final semiconductor sending series server services shared ships shocked shortage should shoulders shuffled shuts shutting sidebar simple since sites sky sleeping smartblock smartly social engineering hacking software. tech solve somewhere soon source sources space center space debris spaceships special spectrum speed spend spending sponsors sports spying star starship start starting starvation steps stocks stolen stop stories strategy streaming student studying subject subscribers suggested suggestions suitable suitcase surface surprised t are tag tagging talent talk teach team technlogy technoloy technonlogy telecommunication terminology test text they think thousand thread threat to through throwaway timer tinder toilet too took topic tossing touch pad tracking trackpad trading transact transport travel trending trends trip turn turns tweets unbuyable unemployed unemployment unpleasant unregistered unsafe unseen upgrades useful uses various very view viral virtual virtual currency virtual world vishing visit visiting vulnerabilities warning waterproof weapons web design websites week well went were wet willing woman works workspace world war worrie worth written wrong young
Show more