Label encoding
When we perform classification,
we usually deal with a lot of labels. These labels can be in the form of words,
numbers, or something else.
The machine learning functions
in sklearn expect them to be numbers. So
if they are already numbers, then we can use them directly to start training.
But this is not usually the case.
In the real world, labels are
in the form of words, because words are human readable. We label our training
data with words so that the mapping can be tracked.
To convert word labels into
numbers, we need to use a label encoder.
Label encoding refers to the
process of transforming the word labels into numerical form. This enables the
algorithms to operate on our data.
Create a new Python file (name: label_encoder.py )and import the following
packages:
import numpy as np
from sklearn import preprocessing
|
Define some sample labels:
# Sample input labels
input_labels = ['red', 'black', 'red', 'green', 'black', 'yellow',
'white']
|
Create the label encoder object and train
it:
# Create label encoder and fit the labels
encoder = preprocessing.LabelEncoder()
encoder.fit(input_labels)
|
Print the mapping between words and
numbers:
# Print the mapping
print("\nLabel mapping:")
for i, item in enumerate(encoder.classes_):
print(item, '-->', i)
|
Let's encode a set of randomly ordered
labels to see how it performs:
# Encode a set of labels using the encoder
test_labels = ['green', 'red', 'black']
encoded_values = encoder.transform(test_labels)
print("\nLabels =", test_labels)
print("Encoded values =", list(encoded_values))
|
Let's decode a random set of numbers:
# Decode a set of values using the encoder
encoded_values = [3, 0, 4, 1]
decoded_list = encoder.inverse_transform(encoded_values)
print("\nEncoded values =", encoded_values)
print("Decoded labels =", list(decoded_list))
|
If you run the code, you will see the following
output:
Goto Index