Creating an E-Commerce Product Category Classifier using Deep Learning — Part 2

5 min readFeb 13, 2022

The part focuses on the actual machine learning model creation, the first part here focussed on data understanding, data preprocessing, and analysis.

Problem Description :

We aim to create a product category API that utilizes machine learning and deep learning to predict the possible categories/classes for any provided product name and its descriptions. The problem is considered for an e-commerce domain and the dataset used to train our models will contain some products and their labeled categories.

Machine Learning Pipeline :

To solve multi-label problems, we mainly have approaches:

Binary classification: This strategy divides the problem into several independent binary classification tasks. It resembles the one-vs-rest method, but each classifier deals with a single label, which means the algorithm assumes they are mutually exclusive.
Multi-class classification: The labels are combined into one big binary classifier called powerset. For instance, having the targets A, B, and C, with 0 or 1 as outputs, we have A B C -> [0 1 0], while the binary classification transformation treats it as A B C -> [0] [1] [0].

Data Splitting

The next step after data preprocessing, converting it into a form the machine learning models can understand is to split it into train/test set.

See how the information column is chosen as the independent variable also denoted by X and the product labels are considered as dependent variable y.

It is highly recommended to use TF-IDF, a very common algorithm to transform the text into a meaningful representation of numbers which is used to fit machine algorithms for prediction. The working of TF-IDF can be studied here.

Fig 2. A mathematical explanation of TF-IDF

Binary Classification Technique :

We will first use the Binary classification technique, which has been also explained above. In the below, you can see how we are creating a separate classifier for a separate product category, in machine learning this technique is called one-vs-all. We have used a simple linear regression model as a single product classification model. Other models worth trying are Naive Bayes, SVC, Random Forest.

By using the above code, we create classifiers for each product category, print its individual accuracy, AUC ROC, and overall accuracy of the model as a multiclassification model.

Fig 3. Snippet showing individual accuracy/ROC of individual product classification model and finally accuracy/ROC of the overall model.

The below code is an API that takes the product name and its description, performs the necessary data preprocessing and conversion required to fit the models, and finally creates predictions on each and every trained simple linear regression model. If the individual models predict it as 1, that product category is considered as a probable class for that product.

Let us test on a few products and try to predict they're possible categories.

Fig 4. Predictions on a few samples, we can observe how it predicts Samsung as cell phone, Duracell battery as houseware.

Deep Learning-Based Models :

In this section, we will start creating deep learning-based models which follow the multi-class classification-based modeling. The data first needs to be preprocessed, tokenized, and split which will be quite similar to what we have done for the previous model. The deep learning-based model using vanilla neural nets is as bellow.

Fig 5. Simple neural net-based multi-classification product category prediction model

This model is trained for 30 epochs, and the losses and AUC are plotted below.

Fig 6. Performance tracking for neural net model

The model provided some decent predictions on the new data which can be seen below.

Fig 7. Prediction by the neural net multi-classification model, we can observe how it has classified Kung Fu Panda as a Video Game which is quite well.

But still, we can improve the performance of this neural net model, by using a more powerful technique called convolution which is quite popular with images. A convolution-based model is designed below.

Fig 8. Conv neural net-based multi-classification product category prediction model

This model is trained for 30 epochs and its performance over epoch is quite stable as compared to the previous model.

Fig 9. Performance tracking for conv neural net model

This model provides more concrete predictions as compared to the previous one but can still be improved by a powerful LSTM and Glove-based model.

Fig 10. LSTM + Glove neural net-based multi-classification product category prediction model

Fig 11. Performance tracking for LSTM + Glove neural net model

Finally, we can use this API as a REST API which can look something like this:

GET /product/categoryPrediction
{
  "name":"product name",
  "description":"product description"
}

I have also attached a few predictions of the final LSTM+ Glove-based model, which looks most stable and least overfitted as compared to other older models.