Steps of Machine Learning

7 steps of machine learning


From skin cancer detection to sorting out crabs, to finding escalators in need of maintenance, machine learning has given computer systems completely new capabilities.

But how does it really work under the hood? Let’s walk through a basic example, and use it as an excuse for the process of getting answers from your data using machine learning.

We pretend that we are asked to create a system that answers the question of whether the beverage is wine or beer. The question-answer system we build is called a "model", and this model is created through a process called "training". The goal of training is to create an accurate model that answers most of our questions. But to train a model, we need to collect data on the train. This is where we start.



If you are new to machine learning and want a quick overview first, check out this article before releasing:

Wine or beer?


Our data will be collected from glass wine and beer. There are many aspects of beverages that we can collect data on, from the amount of foam to the size of the glass.

For our purposes, we will choose only two simple ones: color (as a wavelength of light) and alcohol content (as a percentage). The hope is that we can split our two types of drinks with these two factors. We now call these our "features".

The first step in our process is to run out at a local grocery store and buy a bunch of different beers and wines, as well as get some tools to measure us - a spectrometer to measure color, and a hydrometer to measure alcohol content. Our grocery store has an electronics hardware segment

Collecting data


Once we have our tools and burdens, it's time for the first real phase of our machine learning: data collection. This step is very important because the quality and quantity of data you collect directly determine how good your predictive model might be. In this case, the data we collect will be for each drink, and the color and alcohol content.


It produces a table of color, wine% yield, and whether it is beer or wine. This will be our training data.

Data preparation


After a few hours of measurement, we gathered our training data. Now is the time for the next stage of machine learning: data preparation, where we load data into the appropriate data and prepare it for use in our machine learning training.
We will first put all our data together and then randomize the order. We don't want our data to affect the order we want, because that's not part of determining whether the drink is beer or wine. In other words, we decide what the drink is, before or after drinking it.

This is a good time to take a relevant look at any of your data, to show if there is any relevant relationship between the different variables that could benefit you, and also to show if there is any data imbalance. For example, if we collected more data points about beer than alcohol, the model we trained would be biased to assume that everything it sees is beer because it is almost all justified. However, in the real world, models can see beer and wine in equal proportions, which means that guessing "beer" would be half the time.

We also need to divide the data into two parts. The first part will be the majority of the datasets used in the training of our model. The second part will be used to demonstrate our trained model. We do not want to use the same data that the model was trained to evaluate, as it can only memorize "questions", as you will not use the same questions from your math homework in the exam.

Sometimes the data we collect requires other forms of adjustment and manipulation. Things like day-duping, generalization, error correction, and more. All this data would be in the preparation phase. In our case, we don't need data preparation ahead, so let's move on.

Choosing a model


The next step in our workflow is choosing a model. There are many models that researchers and data scientists have created over the years. Some are very suitable for image data, for other sequences (such as text, or music), some for numerical data, some for other text-based data. In our case, we have only two features, and color and alcohol%, we can use a small linear model, which is a very simple task to complete.


Training


Now we often move on to a large part of machine learning - training. At this stage, we will use our data to improve our model's ability to predict whether a given drink is wine or beer.


In some ways, it's similar to learning to drive first. At first, they did not know how any of the paddles, swings, and switches would work, or when any of them would be used. Anyway, after a lot of practice and right and wrong for them, a licensed driver emerges. Moreover, after a year of driving off, they have become quite efficient. The act of driving and responding to real-time data optimizes their driving abilities, respecting their art.


We will do this in very small with our beverages. Specifically, the formula for a straight line is y = m * x + b, where x is the input, m is the slope of that line, b is the y-intercept, and y is the value of the line in position. The values ​​available to us for x adjustment, or "training", are m and b. There is no other way to affect the position of the line since the other variables are only x, our input, and y, our output.
In machine learning, there are many m because there can be many features. These m values ​​are usually formed in a matrix, which we refer to as W, for the "weight" matrix. Similarly, for b, we arrange them together and call it bias.




The training process involves introducing some random values ​​for W and B and attempting to predict the output with those values. As you can imagine, it does good and bad. But we can compare the predictions of our model with the output it produces, and adjust the values ​​in W and b so that we have a more accurate prediction.

This process is then repeated. Each repetition or cycle of updating weights and biases is called a training “step”.

Let's see what this means for our dataset in this case. When we first start training, it is that we draw a random line through the data. So as each phase of training progresses, the line gradually moves forward, closer to an ideal division of wine and beer.

Evaluation


Once the training is complete, it is time to use this assessment to see if the model is any better. This is where the dataset that we have separated comes into play first. Evaluation allows us to test our model against data that has never been used for training. This metric helps us to see how the model performs against data that it has not yet seen. This is to be able to represent the model in the real world.

The best rule of thumb I used to split the training-evaluation somewhere in the order of Thumb0 / 20 or / 0/300. Much depends on the size of the original source dataset. If you have a lot of data, you don't need as much as a fraction for an evaluation dataset.

Parameter Tuning.


Once you have done that it is possible that you want to see if you can improve your training in any way. We can do this by tuning our parameters. There were some parameters that we explicitly accepted when we did our training, and now is a good time to go back and test those assumptions and use other values.

An example is how many times we gave in the training dataset during the training. What I mean is that we can show the model as "complete", not just once, but a complete dataset. This can sometimes cost more.




Another parameter is the "teaching rate". It defines how far we move the line during each phase, based on information from the previous training phase. All of these values ​​play a role in how accurate our model can be, and how long it takes to train.

For more complex models, the initial condition can play an important role in determining the outcome of the training. The differences can be seen in the fact that a model starts with training against some distribution of values ​​at the beginning of zero, which raises the question of which distribution to use.

The possible long journey of parameter tuning


As you can see there are a lot of ideas at this stage of training, and it is important that you define what makes a model “very good”, otherwise you may find yourself tweaking parameters for a very long time.

These parameters are commonly known as "hyperparameters". Adjustment, or tuning, of these hyperparameters, remains a point of art and is a more experimental process depending on the specificity of your dataset, model, and training process.

Once you are happy with your training and hyperparameters, guided by this evaluation step, it is finally time to use your model for something useful!


Prediction


Using data to answer machine learning questions. So prediction, or conjecture, is the stage where we answer some questions. This is the point of all work, where the value of machine learning is gained.


We can finally use our model whether a drink is a wine or beer, by giving it's color and alcohol percentage.


Big picture


The power of machine learning is how we can distinguish between wine and beer using our model instead of using human models and manual rules. You can expand on the ideas presented today in other problem domains, where similar principles apply:


Collecting data

Preparing that data

Choosing a model

Training

Evaluation

Hyperparameter tun.

Prediction

Tensorflow playground


Check out TensorFlow Playground for more ways to train and play with parameters. It is a full browser-based machine learning sandbox where you can use various parameters and run training against duplicate datasets.


Comments

Popular posts from this blog

Artificial intelligence (AI) - the ability of a digital computer.

Facebook's name has been changed to 'rebranding'

What is SEO and how to do search engine optimization?

Labels

in Facebook of What a phone and mobile This on you are smartphone Do internet Android IT workforce Nepal app your robot from iPhone Machine Learning Python use will company computer for account can data does password twitter with Apple these Instagram Whatsapp YouTube digital feature media not Tiktok like machine new ChatGPT China be by free why an out people search website without work Future India ML corona features find online or public that video Elon Musk Microsoft One apps has information million social user users way year Intelligence Laptop US billion education history home make market protect service Bitcoin Have Machine Learning Future Nepali Now Operators Scientists Wi-Fi Windows chrome code cyber download hacking money network photos tips videos world Amazon Artificial Intelligence Future Avoid Cryptocurrency If Know Learning TV Things artificial being browser human malware many need netflix photo smart software study system there update which 10 15 Beginners Buy Deep Learning Here Privacy Who about battery business chat digital marketing down hacker marketing millions number phones security sent virus want when work force Agriculture Bug Deep Did Earth GPS Gmail Google Maps Kaggle Keep NASA RAM Top Windows 11 World Cup Xiaomi address after as at camera career cloud dangerous difference drive earn easy email going job jobs language life look may message news old open price really search engine settings storage store such two used version watch windows 10 working 14 2020 2022 4 5 6 7 Cambridge Dark Web GB GPT Global Health-care Lite Messages More Oppo Pakistan PayPal Print Pro QR Risk SEO Samsung So Some TensorFlow Than Tutorial Type Types Vision Ways WiFi Zoom advertising also attack been best better biggest blue brain change chip comments country created cyber attacks electricity eyes fake files first football function game get go government hacked hackers hidden hours image install lost medical misused monitor moon once pay percent play problem processing program quantum robots scan science send share signal space stay story take their them thousands time tricks up using water web where while wireless workers 000 5G AI Education Alan Musk America Analytica Applications Army Blockchain Bounty CCTV COVID-19 Chat GPT Choose Clean Close Clubhouse Computer Vision Content Crypto DL Developer Docs Electric Explain Factory Finally Gemini Google chrome Google drive Healthcare Help I IBM Keras Kernels Large Lifestyle Looking MDMS Mac Maps Models Musk Natural Ncell Net Notebooks PC Preparing Reasons Russia SEE SIM SMS Save Scikit-Learn Skills SpaceX Stephen Hawking Telegram Tesla Theme Therefore Thinking VPN Variables Word WorldLink ability accounts ads airplane all any available aware background bandwidth bank beneficial between blocked bring bully cable call captions capture care cause charge chatbots check come coming companies complete computers consumption copyright corona-virus courses create currency cyber security dataset datasets days delete deleted deleting details developed device dislike doctor documents doing domain don't due during dynamic energy engine engineer engineering exactly found fraud full gadgets games getting given good got guest handle his humans iOS iPhone 14 iPhones important including increase industry its keyboard launch law learn listen live manager map meaning megapixel memory messenger mind mode model month months movies much name nonsense nuclear opening over own phishing physics porn post posts prevent problems product production programming protection quickly real-world reduce reward robotics run safe same scandal show site smartphones sold someone speaking spyware stuck students subscription systems target techology television tick today topics torrent traffic trillion universe upload verification voice war was weakest women worldwide years & 'Buy the Dip' 'HDR' 'I' 'Mr. Beast' 'Professional Mode' 'football intelligence' 'hidden' 'refill station' (IoT) (LLM) (NLP) 1 100 10:10 10th 12 145 16 17 19 2 200 2007 2024 25 35 3D 40 4000 48 4K 5 P's 60 7 C's 8 @everyone on A17 AI Tool AI ethics API AR Adjust Adobe Adopt Adsense Adsense Supports Africa Alexa Ali Baba Altman Amazon Jungle Amazon Prime Ambani American Anaconda Android 11 Android TV Android phone Annoyed Appoints Arithmetic Art Art through NFTs Artficial Intelligence Artificial neural Artuficial Intellegence Ashika Tamang Assignment Assistant Astronauts Astronomy Atrificial Inteligence Attacks Audiobooks Augmented Reality Australia Auto-GPT AutoML Avatar 2 Bachelors Banned Bard AI Because Before Bernie Sanders Big data BigQuery Bill Gates Bitwise Blind Blockchain Developer Blockchain Technology Books Brave Brave Browser Brazil C charger CPU CPU temperature CTEVT CV Cases Casting Changed ChatGBT Chery Chinese Citroën C5 Cloud Factory Cloud Factory Nepal Club House Colab Command Comparison Compute Concatenate Contactless Contactless payment system Copilot Couple Challenge Crash test Create your first Project on Python Crossover Cup DNS DRS Gaming Dark mode Datalab Deep Fake Deep Learinig Deep Learning with Python Deep Neural Networks Deepfake Demat Dept Development Development in predictive analytics Didn't Digital avatars Discontinuing Do not Dodge Dogecoin DuckDuckGo E-task EA ETF EU Earbuds Earth 2 Earthquake Edge Computing El Salvador Elected Electric Vehicles Electrical Elon Embedded Application Embedded Application (EA) Emoji Estimators Ethical Hacking Euro NCAP European Even Everyone Evolve Explained Explosion Express WiFi FPS Facebook Messenger Facebook's Facets Fears Federal Reserve System Finance Firefox FiveG Fixed wireless Follow Forge Fraud Call Freefire Freelancing GIF Git Gold Google Chat Google Cloud Google Meet Google Play Music Google Plus Google Plus code Google Workspace Google search Green room Greenroom. Spotify Guest Mode HDMI Happy Birthday Health sector Here's Holi Honest Honeygain Huawei Hyundai ID IMD IP ISP Identify Implementing Includes Increasing Indonesia Inflation InfoSec Input Inspiration Installation Integrated circuit Intel Intelligent Internet of Things (IoT) Introduction Iranian Island Isn't JBL JPG JPMorgan Chase & Co Jack Ma January Japan JavaScript Jio Joker Virus Jungle Jupyter Jupyter Notebooks Keys Korean LAN LLM LP Large Language Models Launch of better autonomous systems Lee Kun-hee Library Line Linux Logical Lucky MDMS Nepal ML Engine MSN MaAfee Mark Zuckerberg Max Meet Membership Mero Share Metaverse Microsoft Office Microsoft Teams Military Military weapons Mobile Operating System Module Mouse Mukesh Ambani Music Must NEA NFT NFTs Natural language processing (NLP) Nepal. radio mapping Nepali businesses Nepali game Nepali youth Nepalis NetTV Neural Network Neural Networks New Technology No Nokia North Korea Note Object Detection Open-source Opera Operating PDF PNG PPT PUBG Pandas Paytm Pendrive Photoshoot Pi Network Pip Plan Play Store Pokémon Pokémon Go Police Premium Preparations Prerequisite Prime Pro's Process Process discovery Pycharm Pyenv Python Programming Python Tutorial Python Tutorials Python for Beginners Python on Windows Quick Draw RCS Race Radically Ransomware Rashtra Bank Reboot Recommender Recommender Systems Redmi Reinforcement Reinforcement learning Reliance Reliance Jio Remove. bg Revolution Rice that grows for years once planted Rises Robot Sophia Roles Ronaldo Routine of Nepal Banda S&P 500 S&P Global Ratings SD Scale Scaling Scikit Screen Pinning Selection Seven Shorts Singapore Sitting SixG Snapchat Sophia South Korea Space X Spam Stable Coin Starlink Steve Jobs Stock market String Success Sun Sundar Pichai Supermarket Supervised Supervised Learning Supervised Machine Learning Supply Chain Attack Supports Swift TIFF Telecom TensorBoard TensorFLow Hub Thes Tiktok stop Time Travel Tool Training Data Transforming Trojan Truecaller Trump Trusting Type-C US Congress USA USB United States Unnecessary Unsupervised Unsupervised Learning Unsupervised LearningUnsupervised Machine Learning Unsupervised Machine Learning Upcoming Upcoming Technology Urges Using a drone VPNs VR Vehicles Virtual reality Virtualenv Visualize WWW Wait Walkthrough Walmart WeChat Wha What are Assignment Operators in Python What are Comparison Operators in Python What are Logical Operators in Python What are Operators in Python What are the basic laws of quantum physics What is What is Chat GPT What is Google Adsense What is Pycharm What is Python What is String in Python What is Variable in Python Whose Wi-Fi 6 Wikipedia WordPress Wrangling data Write X8 series XAI XOR XSS Ziglar Zipty Zuckerberg admin advertisers again age agency agricultural ai beauty air aircraft aired alert algorithm almost along alpha alternative analytics ancient angles announcement announces another answer answering antivirus anyone anything appear appearance appliances approaching approaching science meaning apps. google article artificial blood vessels arts associated attention audience automatic automatically autonomous avatars back backed ban bans bar basic batteries becoming beginner benefit benefits bitcoin mine bitcoins black block boarding bogged book bought box brand break brings broadband brought browsing bug bounty build but buttons buying bypass cable internet cables calculus calls cameras campaign can't cancer cannot car cards careeer carry cave center challenge channel charger charging chat.com cheap checkmarks chess child children choose. a class clicking climbers clock club coding colleges color combat common communicate compensates compete competing computer mouse computer science concept connect cons control controls controversies could countries credit crisis criteria crore crores crowdsourcing culture cyberattack d about damaged danger dark data center data science dating apps day debit dedicated delete data depression destination devices diary die different digit digital cameras digital land digital privacy disappeared discovered discovery displaced display document dog dollars doodle door downloads dream drone drug trafficking e features e-Rupee e-books e-passport e-sewa eBooks ePassport each earn money from Nepal easier eating economy edit effective electronic else email server emails emerged emergency emojis employee employees end enough espionage etflix ethics except excessive excuse existence expected expire extracts eye face app facial verification facts family far farm fax fdown.net fee feet fiber fight file film final five flying foldable food footprint forced foreigners forever forget forgotten form formats foundation free upgrade frequency freshman from search fruit game tips gamer gas geometry gets gives glasses goes good content goods google docs gossip granted great groups growing had hall hand handy happen happy harmful he head headphones headset heater hobby human brain human intelligence human trafficking hundreds hurting hydrogen hype iCloud iPhone 12 Pro illegal data illicit trade image processing processor impair inbox incidents income increased incur insecure instrument interest internal storage internet speed into intranet introduced invented invention invest investment invites jack join journalists journey kit known laboratory lakh languages last later latest launched launching lawmakers laws leak leaks legalize let letter letters light likes link lives loaded location locked longest lose loss love machine vision made main main features makes man manage management system mango marketplace martial mask matches measuring meetings melting meme meta microphone middle million. downloads mine mistake mistakes mobile number moble moment monitors most mountain move movie moving mute name-x naming near necessary neural neural networking new code new look new windows news anchor night mode non notes notifications now.gg nuclear energy obscene official offline open source opened operate operated operating system opposed optic optical fiber optimization option other others outbreak oversold owner page paid pandemic paper participant participate passports password. passwords patent pattern paying payment pen drive permanent permission person personal phone confidential picture pictures pirated placed planting platform platforms political popular popularity port possible practice predictive pregnant prepared principles private prize processor product key programmatically programming languages project prompt property pros protected proxies proxy quantum computer quantum internet quires quota r daily radio rain rainy season rate reach reading ready real reason rebranding record recovery reform refresh refrigerator regarding registered registration regulators relationship released remain remove removes removing replace report requiring reset residence resolution responsibilities restaurants returned revenue review rings risks risky road robotic dog rocket room rooms round ruin rules running safely safety sale satellite saving say saying says scary schedule scheme schools screen screens search engines searched secret secretly secure selectric cars selfie sell semi-final semiconductor sending series server services shared ships shocked shortage should shoulders shuffled shuts shutting sidebar simple since sites sky sleeping smartblock smartly social engineering hacking software. tech solve somewhere soon source sources space center space debris spaceships special spectrum speed spend spending sponsors sports spying star starship start starting starvation steps stocks stolen stop stories strategy streaming student studying subject subscribers suggested suggestions suitable suitcase surface surprised t are tag tagging talent talk teach team technlogy technoloy technonlogy telecommunication terminology test text they think thousand thread threat to through throwaway timer tinder toilet too took topic tossing touch pad tracking trackpad trading transact transport travel trending trends trip turn turns tweets unbuyable unemployed unemployment unpleasant unregistered unsafe unseen upgrades useful uses various very view viral virtual virtual currency virtual world vishing visit visiting vulnerabilities warning waterproof weapons web design websites week well went were wet willing woman works workspace world war worrie worth written wrong young
Show more