You might be aware that Harvard Business Review has termed Data Scientist as one of the ‘Sexiest job of the 21st century’. Also according to a recent study, machine learning algorithms are expected to replace about 25% of the jobs in the next decade.
Machine Learning is all about teaching computers learn from data, making decisions and predictions using the patterns learned from that data. The computer is able to learn patternsfor which it is not explicitly programmed.
We are living in the start of this revolutionized era. So, for those who are thinking to start out in the field of Machine Learning we have decided to compile a perfect machine learning algorithm list for you.
All the machine learning algorithms can be broadly categorized into 3 main categories.
Supervised Learning Algorithms: These algorithms requires a labelled data for training the model. A labelled data set has inputs as well as the desired output. During the training session ( when model learns ), the model will adjust its variables to map inputs to their corresponding output.
Unsupervised Learning Algorithms: These algorithm doesn’t requires a labelled for training the model. There is no target outcome. These algorithms are mainly used for clustering the data into different groups.
Reinforcement Learning Algorithms: These algorithms are trained on taking decisions. The algorithm will train itself based on the success/error outcome of the decision. Action – Reward analysis is done by the model and is saved for future decision.
We ‘ll be covering the following algorithms in this article.
- Linear Regression
- Logistic Regression
- SVM (Support Vector Machine)
- Decision Tree
- Random Forest
- KNN (K-Nearest Neighbors)
- Naive Bayes
- Gradient Boosting Algorithms
- Dimensional Reduction Algorithms
1. Linear Regression
Linear Regression algorithm will use the data points and tries to find out a line that will pass most of the data points. We can represent a simple line with the equation y = m*x + c, here y is our dependent variable and x is our independent variable. We can use calculus theories to find the values for m and c using the given data set.
There are 2 type of Linear Regression.
- Simple Linear Regression where only 1 independent variable is used and;
- Multiple Linear Regression where multiple independent variables are defined.
2. Logistic Regression
Logistic Regression is used where a discrete output is expected such as the occurrence of some event. Unlike linear regression which gives us continuous value, Logistic regression is used to get whether an event will occur or not. Usually, Logistic regression uses some function ( Eg. Sigmoid ) to squeeze values to a particular range.
“Sigmoid” (Logistic function) is an activation function which has “S” shape curve used for binary classification . It takes continuous values and bounds them between the range of 0, 1 which interpreted as a probability of occurring some event. Such functions are known as activation function.
y = e^(b0 + b1*x) / (1 + e^(b0 + b1*x))
Above is a simple logistic regression equation here b0, b1 are constants. You can see that it’s simple linear regression equation just passed into a sigmoid function.
While training over dataset the optimization algorithm is used such that the error between prediction and actual value become minimum.
3. SVM (Support Vector Machine)
SVM belongs to classification type algorithm. In Linear SVM the algorithm divides the data points using a line. This line chosen is such that it will be furthermost from the nearest data points in 2 categories.
In above diagram in 2D space there are 3 lines among them the red line is the best line. Since it is farthest from the nearest points. Based on the separating line data points are classified into 2 groups ( Black and White ).
4. Decision Tree
Decision Tree algorithm categorizes the population for several sets based on some chosen properties (independent variables) of a population. This algorithm is mainly used for classification based problems. Categorization is done by using some techniques such as entropy etc. Since this is a classification algorithm so it requires labelled data.
Let’s consider a population of people and use decision tree algorithm to identify who are most likely to purchase credit card. For example, considering the age and marital status to be properties of the population. If age>30 or the person is married, people are likely to prefer credit cards more and less otherwise.
This algorithm can be further extended by identifying suitable properties to define more categories. The whole population is broken in different chunks based on the conditions. In this example, if a person is married and he is over 30, they are more likely to have credit cards (100% preference). Test data is used to generate this decision tree.
5. Random Forest
Random forest can be simply explained as collection of decision trees. As the name suggests Random forest is a forest of decision trees. Each tree tries to classify the data points and this is called as a “vote”. The logic is simple, we consider each vote from each tree and chose the most voted classification.
The underlying decision making factor for each tree for classification is same as the decision tree algorithm.
This is the first unsupervised algorithm that we will talk about. It provides a solution for clustering problem. The algorithm form clusters which contain homogeneous data points. All the data points within the same cluster will have shortest distance from their centroids.
You need to provide the value of k as an input to the algorithm. Then the algorithm selects k number of centroids. The data points near the centroid combines with its centroid and this creates a cluster. New centroid is created among each cluster. Then data points near to new centroid will combine again to expand the cluster. This process is continued until centroids do not change.
7. KNN (K-Nearest Neighbors)
KNN is a simple algorithm which predicts unknown data point with its k nearest neighbors. The value of k tunes in the accuracy of prediction. The value for k is given by the data scientist, you will get to know about the optimal value of k as you learn more. The distance between 2 points is calculated using basic distance functions like Euclidean.
Computing distance of each point with all others makes it a computation intensive process so we need to normalize data initially to bring every data point to same range
8. Naive Bayes
This algorithm is mainly based on the “Bayes’ Theorem” in probability. On e of the main requirement of Bayes’s Theorem is that it can be only applied if the features are independent of each other. Similarly for applying Naive Bayes algorithm over a data set it should satisfy this condition. If we try to predict a flower type by its petal length and width, we can use Naive Bayes approach since both those features are independent.
Naive Bayes algorithm is also a Classification Algorithm. This algorithm is mostly used when there is multi class problem.
9. Gradient Boosting Algorithms
Gradient Boosting Algorithm is a combination of multiple weak algorithms to create a more powerful accurate algorithm. Instead of using a single estimator, having multiple will create a more stable and robust algorithm.
There are several Gradient Boosting Algorithms.
- XGBoost — uses liner and tree algorithms
- LightGBM — uses only tree-based algorithms
USP of Gradient Boosting Algorithms is their higher accuracy. Further, Machine Learning Algorithms like LightGBM has incredible high performance as well.
10. Dimensional Reduction Algorithms
Some datasets may contain too many variables that may cause very hard to handle them. Especially nowadays data collecting occur at very detailed level because of existence of more than enough resources. In such cases, the data sets may contain thousands of variables and most of them are useless.
In this case, it is almost impossible to identify the variables which can have the most impact on our prediction. Under such situations Dimensional Reduction Algorithms come into picture. It utilizes different algorithms like Decision Tree, Random Forest to find the most important variables.
Also PCA ( Principal component Analysis ) helps to merge information of 2 variables into one thus helping us to reduce the no of useful variables.
So we gave come to the end of the post. These are the basic and most important algorithms for anyone who is serious to have a career in Machine Learning. Thanks for reading, Follow our website to learn the latest technologies, and concepts. Xpertup with us.
You can also check out our post on: Unsupervised learning with Python