This is used when the information used to train is neither classified nor labelled. The system does not determine the correct output, instead it explores the data and can draw inferences from data-sets to describe hidden structures from unlabelled data.
The goal is to have the computer learn how to do something that we don't tell it how. There are two approaches to unsupervised learning.
When there are no correct answers and no teacher!
The algorithms discover and present interesting structures in the data.
EXAMPLE OF UNSUPERVISED LEARNING
It is like learning without a teacher
The machine learns through observation & finds structures in data
There are two main types of supervised learning
Clustering: A clustering problem is where you want to discover the inherent groupings in the data
Association: An association rule learning problem is where you want to discover rules that describe large portions of your data
Machine Learning (ML) is the development of computer programs that can access data and use it to learn.
Traditional software development takes data and program as input and produce some form of output. By comparison, ML takes data and the desired output, processes it to produce a program.
There are three main categories of ML
This is about applying to new data what has been learned in the past using labelled examples to predict future events. Starting with a known training data-set, this algorithm produces an inferred function which is used for predicting output values. The algorithm can compare its output with intended, correct output and then find errors. This can then be used to modify the model.
Supervised learning is the most common technique for training neural networks and decision trees. Both of these techniques are highly dependent on the information given by the predetermined classifications. In the case of neural networks, the classification is used to determine the error of the network and then adjust the network to minimise it, and in decision trees, the classifications are used to determine what attributes provide the most information that can be used to solve the classification problem.
In the classification problem, the goal of the learning algorithm is to minimise the error with respect to the given inputs. These inputs, often called the "training set", are the examples from which the agent tries to learn. But learning the training set well is not necessarily the best thing to do. A common problem is over-fitting the data and essentially memorising the training set rather than learning a more general classification technique.
The correct answers are known. The algorithm iteratively makes predictions on the training data and is corrected by the teacher. Learning stops when the an acceptable level of performance is achieved by the algorithm.
EXAMPLE OF SUPERVISED LEARNING
SUMMARY OF SUPERVISED LEARNING
It is like learning with a teacher.
The training data-set is like a teacher.
The training data-set is used to train the machine.
There are two main types of supervised learning
Classification: Machine is trained to classify something into some class.
Regression: Machine is trained to predict some value like price, weight or height.
WHAT IS SEMI-SUPERVISED LEARNING
This is a method that leverages both unsupervised and supervised learning.
In essence, it is a hybrid approach.
The biggest difference between supervised and unsupervised machine learning is:
Semi-supervised learning algorithms are trained on a combination of labelled and unlabelled data. This is useful for a few reasons. First, the process of labelling massive amounts of data for supervised learning is often prohibitively time-consuming and expensive. What’s more, too much labelling can impose human biases on the model. That means including lots of unlabelled data during the training process actually tends to improve the accuracy of the final model while reducing the time and cost spent building it.
Semi-supervised learning is an optimal approach for use cases like web-page classification, speech recognition, or even genetic sequencing. In all of these cases, data scientists can access large volumes of unlabelled data, but the process of actually assigning supervision information to all of it would be an insurmountable task.
Taking web-page classification as the example, assume you want to classify any given web-page into one of several categories (e.g "Travel"; "Entertainment", "Business news", etc.). This is a case where it's prohibitively expensive to go through tens of thousands of web-pages and have a human workforce annotate them. However, in terms of availability, there is an almost bottomless ocean of web-pages are available.
We will introduce three approaches to machine learning