How the Multinomial Logistic Regression Model Works
ModelingPredictive AnalyticsStatisticsAlgorithmsMachine LearningPythonposted by Saimadhu Polamuri March 31, 2017 Saimadhu Polamuri
In the pool of supervised classification algorithms, the logistic regression model is the first most algorithm to play with. This classification algorithm again categorized into different categories. These categories purely based on the number of target classes.
If the logistic regression model used for addressing the binary classification kind of problems it’s known as the binary logistic regression classifier. Whereas the logistic regression model used for multiclassification kind of problems, it’s called the multinomial logistic regression classifier.
As we discussed each and every block of binary logistic regression classifier in our previous article. Now we use the binary logistic regression knowledge to understand in details about, how the multinomial logistic regression classifier works.
I recommend first to check out the how the logistic regression classifier works article and the Softmax vs Sigmoid functions article before you read this article.
Learn each and every stage of multinomial logistic regression classifier.CLICK TO TWEET
Before we start the drive, let’s look at the table of contents of this article.
Table of Contents:
What is Logistic Regression?
The logistic regression model is a supervised classification model. Which use the techniques of the linear regression model in the initial stages to calculate the logits (Score). So technically we can call the logistic regression model as the linear model. In the later stages uses the estimated logits to train a classification model. The trained classification model perform the multiclassification task.
If you were not aware of the logits and the basic linear regression model techniques don’t be frightened. In the coming sections. We are going to learn each block of multinomial logistic regression from inputs to the target output representation.
As we discussed earlier the logistic regression model are categorized based on the number of target classes and uses the proper functions like sigmoid or softmax functions to predict the target class.
In short:
 Sigmoid function: used in the logistic regression model for binary classification.
 Softmax function: used in the logistic regression model for multiclassification.
To learn more about sigmoid and softmax functions checkout difference between softmax and sigmoid functions article.
What is Multinomial Logistic Regression?
Multinomial logistic regression is also a classification algorithm same like the logistic regression for binary classification. Whereas in logistic regression for binary classification the classification task is to predict the target class which is of binary type. Like Yes/NO, 0/1, Male/Female. When it comes to multinomial logistic regression, the idea is to use the logistic regression techniques to predicted the target class from
When it comes to multinomial logistic regression. The idea is to use the logistic regression techniques to predicted the target class for more than 2 target classes.
The underline technique will be same like the logistic regression for binary classification until calculating the probabilities for each target. Once the probabilities were calculated. We need to transfer them into one hot encoding and uses the cross entropy methods in the training process for getting the proper weights.
We are going to learn the whole process multinomial logistic regression from giving inputs to the final one hot encoding in the upcoming sections of this article. Before that lets quickly check few examples to understand what kind of problems we can solve using the multinomial logistic regression.
Before that lets quickly check few examples to understand what kind of problems we can solve using the multinomial logistic regression.
Multinomial Logistic Regression Example
Using the multinomial logistic regression. We can address different types of classification problems. Where the trained model is used to predict the target class from more than 2 target classes. Below are few examples to understand what kind of problems we can solve using the multinomial logistic regression.
 Predicting the Iris flower species type
 Targets: different species
 Evaluating the acceptability of car using its given features
 Targets: Good,vGood, Acc, unAcc
 Predicting the animal category using the given animal features
 Targets: Dog, Cat, Tiger, Lion
Now let’s move on to the key part this article “understand how the multinomial logistic regression model works’
How Multinomial Logistic Regression model works
The above image illustrates the workflow of multinomial logistic regression classifier. To simply the understanding process let’s split the multinomial logistic classifier techniques into different stages from inputs to the outputs. Then we can discuss each stage of the classifier in detail.
Multinomial Logistic Regression Workflow/ Stages:
 Inputs
 Linear model
 Logits
 Softmax Function
 Cross Entropy
 OneHotEncoding
Inputs
The inputs to the multinomial logistic regression are the features we have in the dataset. Suppose if we are going to predict the Iris flower species type, the features will be the flower sepal length, width and petal length and width parameters will be our features. These features will treat as the inputs for the multinomial logistic regression.
The keynote to remember here is the features values always numerical. If the features are not numerical, we need to convert them into numerical values using the proper categorical data analysis techniques.
Just a simple example: If the feature is color and having different attributes of the color features are RED, BLUE, YELLOW, ORANGE. Then we can assign an integer value to each attribute of the features like for RED we can assign 1. For BLUE we can assign the value 2 likewise of the other attributes for the color feature. Later we can use the numerically converted values as the inputs for the classifier.
Linear Model
The linear model equation is the same as the linear equation in the linear regression model. You can see this linear equation in the image. Where the X is the set of inputs, Suppose from the image we can say X is a matrix. Which contains all the feature( numerical values) X = [x1,x2,x3]. Where W is another matrix includes the same input number of weights W = [w1,w2,w3].
In this example, the linear model output will be the w1*x1, w2*x2, w3*x3
The weights w1, w2, w3, w4 will update in the training phase. We will learn about this in the parameters optimization section of this article.
Logits
The Logits also called as scores. These are just the outputs of the linear model. The Logits will change with the changes in the calculated weights.
Softmax Function
The Softmax function is a probabilistic function which calculates the probabilities for the given score. Using the softmax function return the high probability value for the high scores and fewer probabilities for the remaining scores. This we can observe from the image. For the logits 0.5, 1.5, 0.1 the calculated probabilities using the softmax function are 0.2, 0.7, 0.1
For the Logit 1.5, we are getting the high probability value 0.7 and very less probability value for the remaining Logits 0.5 and 0.1
Keynote: The Logits and the probabilities in the image were just for the purpose of understanding the multinomial logistic regression model. The values are not the real values computed using the softmax function.
Softmax Function Properties
Below are the few properties of softmax function.
 The calculated probabilities will be in the range of 0 to 1.
 The sum of all the probabilities is equals to 1.
Implementing Softmax Function In Python
Now let’s implement the softmax function in Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

# Required Python Package
import numpy as np
def softmax(inputs):
“”
Calculate the softmax for the give inputs (array)
:param inputs:
:return:
“”
return np.exp(inputs) / float(sum(np.exp(inputs)))
softmax_inputs = [2, 3, 5, 6]
print “Softmax Function Output :: {}”.format(softmax(softmax_inputs))

Script Output
1

Softmax Function Output :: [ 0.01275478 0.03467109 0.25618664 0.69638749]

If we observe the function output for the input value 6 we are getting the high probabilities. This is what we can expect from the softmax function. Later in classification task, we can use the high probability value for predicting the target class for the given input features.
Cross Entropy
The cross entropy is the last stage of multinomial logistic regression. Uses the crossentropy function to find the similarity distance between the probabilities calculated from the softmax function and the target onehotencoding matrix.
Before we learn more about Cross Entropy, let’s understand what it is mean by OneHotEncoding matrix.
OneHotEncoding:
OneHot Encoding is a method to represent the target values or categorical attributes into a binary representation. From this article main image, where the input is the dog image, the target having 3 possible outcomes like bird, dog, cat. Where you can find the onehotencoding matrix like [0, 1, 0].
The onehotencoding matrix is so simple to create. For every input features (x1, x2, x3) the onehotencoding matrix is with the values of 0 and the 1 for the target class. The total number of values in the onehotencoding matrix and the unique target classes are the same.
Suppose if we have 3 input features like x1, x2, and x3 and one target variable (With 3 target classes). Then the onehotencoding matrix will have 3 values. Out of 3 values, one value will be 1 and all other will be 0s.
You will know where to place the 1 and where to place the 0 value from the training dataset. Let’s take one observation from the training dataset which contains values for x1, x2, x3 and what will be the target class for that observation. The onehotencoding matrix will be having 1 for the target class for that observation and 0s for other.
Cross Entropy
The Crossentropy is a distance calculation function which takes the calculated probabilities from softmax function and the created onehotencoding matrix to calculate the distance. For the right target class, the distance value will be less, and the distance values will be larger for the wrong target class.
With this, we discussed each stage of the multinomial logistic regression. All the stages/ workflow will happen for each observation in the training set.
Observation: Single observation is just the single row values from the traging set. Which contains the features and the correspoinding target class.
For each observation in the training set will pass through the all the stages of multinomial logistic regression and the proper weights W (w1, w2, w3) values will be computed. How the weights calculated and update the weights is know as the Parameters Optimization.
Parameters Optimization
The expected output after training the multinomial logistic regression classifier is the calculated weights. Later the calculated weights will be used for the prediction task.
This Parameters optimization is an iteration process where the calculated weights for each observation used to calculate the cost function which is also known as the Loss function.
The Iteration process ends when the loss function value is less or significantly negligible. Now let’s learn about this loss function to sign off from this lengthy article ?
Loss function
The input parameters for the loss function is the calculated weights and all the training observations. The function calculates the distance between the predicted class using the calculated weights for all the features in the training observation and the actual target class.
If the loss function value is fewer means with the estimated weights, we are confident to predict the target classes for the new observations (From test set). In the case of high loss function value, the process of calculating the weights will start again with derivated weights of the previously calculated weights.
The process will continue until the loss function value is less.
This is the whole process of multinomial logistic regression. If you are thinking, it will be hard to implement the loss function and coding the entire workflow. Don’t frighten. We were so lucky to have the machine learning libraries like scikitlearn. Which performs all this workflow for us and returns the calculated weights.
In the next article, we are going to implement the logistic regression model using the scikitlearn library to perform the multiclassification task.
Conclusion
This article gives the clear explanation on the each stage of multinomial logistic regression. Below are the discussed workflow or stages.
 Inputs
 Linear model
 Logits
 Softmax Function
 Cross Entropy
 OneHotEncoding
I hope you like this post. If you have any questions, then feel free to navigate to the original here and comment. If you want me to write on one particular topic, then do tell it to me in that comment area.
Originally posted at dataaspirant.com