What is classification?
In this chapter, we
will discuss supervised classification techniques. The process of classification is one such technique where we classify data into a
given number of classes.During
classification,
we arrange data into a fixed number
of categories so that it can be used
most effectively and efficiently.
In machine
learning, classification solves the problem of identifying the category to
which a new data point
belongs.
We
build the classification model based on the training dataset containing data
points and the corresponding labels.
For example, let's say that we want to check
whether the given image contains a person's face or not. We would build a
training dataset containing classes corresponding to these two classes: face and no-face.
We then train the model based on the
training samples we have. This trained model is then used for inference.
A good
classification system makes it easy to find and retrieve data. This is used
extensively in face recognition, spam identification,
recommendation engines, and so on.
The algorithms for data classification
will come up with the right criteria to separate the given data into the given number of classes.
We need to provide a sufficiently
large number of samples so that it can generalize those criteria. If there is
an insufficient number of samples, then the algorithm will overfit to the training
data. This means that it won't perform well on unknown data because it
fine-tuned the model too much to fit into the patterns observed in training
data. This is actually a very common problem that occurs in the world of
machine learning. It's good to consider this factor when you build various
machine learning models.
|
Goto Index Preprocessing >>>