“42 is the answer to the question of life, the universe, and everything.” In Douglas Adams’ book, The Hitchhiker’s Guide to the Galaxy, a super computer named Deep Thought reveals after computing for 7.5 million years that the ultimate answer is 42. Deep Thought then says, “You have to know what the question actually IS in order to know what the answer MEANS.”
Machine Learning is directly analogous. You must ask the right question in the right way. Choosing the best machine learning algorithm when designing your query depends on what kind of answer you are seeking.
Linear Regression Algorithm
When the outcome (the dependent variable) is on a continuum, you can use linear regression. For example, you want to predict miles per gallon (dependent variable) based on how heavy a vehicle is (independent variable, the predictor). The answer is on a continuum: somewhere between 10mpg and 40mpg.
Logistic Regression Algorithm
In contrast, when you are looking for an answer that is categorical, logistic regression is a good choice. I want to predict buyer’s remorse based on the price of the product. Will a customer keep the product or return it? This is the outcome, the dependent variable. The cost of their new iPhone is the independent variable.
Recurrent Neural Network (RNN) Algorithm
If the answer you are looking for deals with sequential information, RNN is the way to go. The example here is predictive text. Texting apps try to predict your next word to save you on typing. This means that it’s imperative that your machine learning model learns how words go together sequentially to predict what you’re most likely to type next.
Feed-Forward Neural Network Algorithm
If you are trying to predict the classification of data, create a Feed-Forward Neural Network. This is one of the first things you learn in Google’s Data Engineering course in building a machine learning model. They show a graph of orange dots and blue dots.
Based on where the dots are located (coordinates), you want it to predict whether a dot is a blue or orange. As humans, we can easily see the pattern. If it’s in the center region, it’s probably a blue dot. To train your model to predict this, the Feed-Forward Neural Network takes the inputs, passes that information through layers of neurons (ML borrows concepts from how our brains work), and ultimately predicts blue or orange. Each neuron looks at the data from a different perspective.
The first neuron looks at the top left quadrant and draws an approximate line between blue and orange.
The second neuron examines at the top right quadrant and does the same.
The third and fourth neurons view the data from the bottom quadrants.
Combining all views, the model can predict with great accuracy which dots will be blue and which will be orange based on the independent variable (the coordinates).
Does ML involve a lot of statistical math? Oh yeah. However, Google has made the process of creating and tweaking ML models much easier without requiring ninja-like math skills. You can play around with this tool (even this exact example) for free through your browser.
Go to http://playground.tensorflow.org to check it out.