Humor Detection in Sentences using Binary Classification
Introduction
In this blog post, we will explore the implementation of a binary classification model for humor detection using a neural network. In today's world, where humor plays a significant role in entertainment, communication, and even marketing, the ability to detect and understand humor in text has become increasingly important. And humor detection in sentences is an interesting problem that can be solved using machine learning techniques. For this reason, we choose this topic. By accurately detecting humor in sentences, we can automate processes that require humor recognition, such as content moderation, sentiment analysis, and personalization. This automation can save time and resources while providing more engaging and entertaining experiences for users. There were we used 200000 samples in the dataset obtained from Kaggle. The model will be trained on a dataset of sentences labeled with humor tags, and its performance will be evaluated on a test set.
The objective of this project is to develop a binary classification model that can effectively detect humor in sentences. By training this model on a large dataset of labeled humorous and non-humorous sentences, we aim to create a robust and reliable model for identifying humor in text. The resulting model can be integrated into various applications to enhance their capabilities and provide users with a more enjoyable and engaging experience. Continuously we will discuss the problem to be addressed and solution, data preparation, data pre-processing, model design, hyper parameter selection, training process, optimization of hyper parameters, final optimized model, test results, and provide a discussion on the findings.
Problem to be addressed and Solution
The
problem we aim to address is the lack of an automated and accurate method for
detecting humor in sentences. While humans can effortlessly recognize humor
through contextual cues, wordplay, and linguistic patterns, developing a
computational model that can mimic this ability poses several challenges.
Existing approaches often fall short in accurately identifying humor, leading
to inaccurate classifications and missed opportunities for utilizing humor in
various applications. Our goal is to overcome these limitations and build a
robust classification model that can reliably detect humor in a wide range of
sentence structures and contexts.
To address the challenge of
detecting humor in sentences, we propose developing a binary classification
model. By training the model on a diverse
dataset of labeled humorous and non-humorous sentences, we can enable it to
learn the patterns and features indicative of humor. Leveraging techniques such
as feature extraction, text preprocessing, and model optimization, we will
fine-tune the model's ability to accurately classify sentences into humorous or
non-humorous categories. The resulting model can be utilized in various
applications, including social media content moderation, chatbots, sentiment
analysis, and recommendation systems, to enhance user experiences and
engagement.
Data Preparation
The first step in our implementation is to load the dataset.
We got dataset from Kaggle. We use the pandas library to read the data from a
CSV file. The dataset contains two columns: 'text' which represents the
sentences, and 'humor' which represents the humor tags. We split the dataset
into sentences and tags and store them in separate variables. We stored our
dataset file in our drive which as ‘Humor1’.
Here is the code we used for importing libraries and load data;
Here are that necessary libraries we used:
|
Libraries |
Used for |
|
‘Pandas’ |
Data manipulation and analysis |
|
‘Numpy’ |
Numerical computations |
|
‘Tokenizer’ and ‘pad_sequences’
from ‘tensorflow.keras.preprocessing.text’ |
Tokenizing and padding the input
text data |
|
‘train_test_split’ from
‘sklearn.model_selection’ |
Splitting the dataset into training and testing sets |
|
‘Sequential’, ‘Embedding’, ‘Dense’,
and ‘GlobalMaxPooling1D’ from ‘tensorflow.keras.models’ and ‘tensorflow.keras.layers’ |
Constructing the neural network
model |
|
‘drive’ and ‘drive.mount’ from
‘google.colab’ |
Mounting the Google Drive |
The “drive.mount(‘/gdrive’)” command is used to mount the Google Drive in order
to access the dataset stored in the Drive. After When Loading the dataset the
line “df = pd.read_csv(‘/gdrive/My Drive/Colab Notebooks/Humor1.csv’)” reads
the CSV file containing the dataset into a Pandas DataFrame named “df”. The
path to the file is specified as “/gdrive/My Drive/Colab Notebooks/Humor1.csv”,
assuming that the dataset is stored in the specified location on the mounted
Google Drive.
Data Pre-processing
To prepare the text data for training, we perform text
cleaning and tokenization. We use the Tokenizer class from the Keras library to
convert the sentences into sequences of integers. The tokenizer is fitted on
the sentences, and then the sentences are converted into sequences using the
tokenizer. We also apply padding to make all sequences of equal length, using
the pad_sequences function from Keras. This ensures that all input sequences
have the same shape, which is required by the neural network model.
We used following codes for data pre-processing [7];
Above code begins by importing the necessary libraries. In this case, it imports the “LabelEncoder” class from the “sklearn.preprocessing” module. This used to encode the target variable into numerical values.
This code creates an instance of the “LabelEncoder” called “lab”. The “LabelEncoder” is used to convert categorical labels into numerical values.
Here, the “fit_transform()” method is applied to the “humor”
column in the DataFrame “df”. It encodes the labels in the “humor” column into
numerical values and assigns them back to the “humor” column in “df”. The code
extracts the sentences from the “text” column of the DataFrame “df” and assigns
them to the variable “sentences”. It also extracts the encoded “humor” labels
from the DataFrame “df” and assigns them to the variable “tags”.
This code initializes a “Tokenizer” object called “tokenizer”.
The “Tokenizer” class is used to tokenize text data, converting sentences into
sequences of integers. The “fit_on_texts()” method is applied to the
“sentences” variable, which fits the tokenizer on the text data. The
“texts_to_sequences()” method is then used to convert the tokenized sentences
into sequences of integers. These tokenized sequences are assigned to the
variable “tokenized_sentences”.
Here, the code specifies the maximum length of the padded
sequences to be 100, which can be adjusted based on the specific data. The
“pad_sequences()” function is applied to the “tokenized_sentences” variable,
padding or truncating the sequences to the specified maximum length. The padded
sequences are stored in the variable “padded_sentences”.
This code converts the “tags” array into a NumPy array using “np.array()”, which prepares the target labels for the classification model and assigns them to the variable “labels”.
Model Design
The
neural network architecture is designed to understand the patterns and
connections between input features and the target variable to effectively
predict humor in sentences. The selection of the number of layers, neurons, and
activation functions depends on the problem's complexity and
the dataset size.
Here is the code we used for build the model;
For our humor detection model, we use a sequential model from the Keras library. The model consists of an embedding layer, a global max pooling layer, input layer, two dense layers, which are hidden layer with ReLU activation function and an output layer with a sigmoid activation function. It has 32 and 1 nodes respectively. The embedding layer learns the dense vector representations of words in the input sequences, and the global max pooling layer extracts the most important features from the embedding’s. Dense layers mean its fully connected that performs non-linear transformations, and the final dense layer outputs the probability of the input sentence. We use ReLu activation function to overcome the gradient varnishing problem and in last layer we use sigmoid activation function for it is useful for estimating probabilities, as in binary classification. Sigmoid function squashes the output between 0 and 1.
The code compiles the model with binary cross-entropy loss and the SGD optimizer. The reason we used binary_crossentropy loss function is that specially designed for binary classification task, where the goal to predict one of two classes (for e.g. in our model humor or non-humor). We used 200000 data sets for training and testing, for this large-scale dataset SGD performs well and computationally efficient. The model is trained on the training data for 10 epochs indicating the number of times the entire dataset will be passed through the network and with a batch size of 1500 which determines the number of samples used in each training iteration means that the network will be trained on 100 samples at a time before updating the weights. The validation data is used to evaluate the model's performance during training.
Following is the
figure of structure overview for the model we build;
Hyperparameter Selection
Here are the hyperparameters used in our code;
- max_length: The maximum length of the padded sentences. It is used for padding and truncating sentences to a fixed length. we set 100 as max lengths.
- embedding_dim: The dimensionality of the word embedding vectors.
- batch_size: The number of samples per gradient update during training. Larger batch sizes can lead to faster training. We took 1500 batch sizes.
- epochs: The number of times the model will iterate over the entire training dataset. One epoch is a complete pass through the entire dataset. Here we used 10 epochs.
- optimizer: It determines the optimization algorithm used to update the model's weights during training. We used the SGD (Stochastic Gradient Descent) optimizer.
- loss: We used binary_crossentropy loss function to measure the difference between the predicted and actual labels.
These hyperparameters modified based on our problem and dataset
to potentially improve our model's performance.
Overview of the Implementation Platform
The implementation is done using Python programming language.
We use various libraries such as pandas, NumPy, TensorFlow, and scikit-learn
for data handling, model creation, and evaluation. The code is executed in an
environment that supports these libraries, such as Jupyter Notebook or Google
Colab. Overall, implementation platform leverages the power of tensorflow.keras
and other libraries to preprocess the data, build a binary classification model
for humor detection, train the model, and evaluate its performance.
Training
To train the model, we split the pre-processed data into
training and testing sets using the train_test_split function from
scikit-learn. We set aside 25% of the data for testing. Further we split the
training set into training and validation sets to monitor the model's
performance during training. The model is compiled with the binary
cross-entropy loss function and the SGD optimizer. Then we fit the model to the
training data, specifying the batch size, number of epochs, and validation
data. The training process updates the model's parameters iteratively to
minimize the loss function and improve accuracy.
The
code splits the data into training and testing sets using train_test_split from
scikit-learn. 75% of the data is used for training (X_train, y_train) and 25%
for testing (X_test, y_test). Additionally, a validation set (x_val, y_val) is
created by further splitting the training set.
Optimizing the Hyperparameters
After training the initial model, we optimized the
hyperparameters to improve its performance. It’s done through a process called
hyperparameter tuning. Hyperparameters such as the learning rate, batch size,
and number of epochs adjusted to find the optimal values that maximize the
model's accuracy. The following tables shows how we optimized the hyper
parameter by changing that;
Final Optimized Model
Hyper parameter optimization has been done in several ways. In
the beginning the model was over fit and the graphs were not arranged properly.
After changing the hyper parameters, most graph became regular and the model
over fit. A try was correct when changing the hyper parameters to avoid
constant over fit. But the accuracy was low. So we did not choose it as
optimized model. Based on the evaluation of all the attempts we have made, we
have selected the improved model which is better than the other attempts, the
accuracy is fit at low percentage and the loss graph is continuously decreasing
and the accuracy graph is continuously increasing than
the other attempts. This figure shows our final optimized model;
After training the final model, we evaluated its performance
on the test set. Then we calculated the loss and accuracy of the model on the
test data using the evaluate function.
Here is the output of train and loss accuracy;
Train loss: 0.6783716082572937 Train accuracy: 0.7980800271034241 ******************** 1563/1563 [==============================] - 3s 2ms/step - loss: 0.6786 - accuracy: 0.7935 Test score: 0.6785876750946045 Test accuracy: 0.7934600114822388
The model was trained for 1563 steps: The statement "1563/1563" means that the training process consisted of 1563 steps or iterations. Each step involves processing a batch of training data, making predictions, calculating the loss, and updating the model's weights to minimize the loss. These steps collectively make up an epoch, which represents a complete pass through the entire training datasetOur model achieved a test accuracy almost 0.79. This accuracy is the proportion of correctly predicted labels out of all the samples in the test set. A value of 0.793 means that our model correctly classified approximately 79.3% of the test samples. This accuracy value is slightly lower than the train accuracy, so our model indicates a small amount of overfitting.
Discussion
In this blog post, we have implemented a binary
classification model for humor detection in sentences using a neural network.
We discussed the problem to be addressed and solution, data preparation, data pre-processing, model design,
hyperparameter selection, training process, optimization of hyperparameters,
final optimized model, and test results.
We selected the final optimized model based on the evaluation of
different attempts. Overall, the output indicates that our developed model has
achieved reasonably good performance on both the training and test data.
However, there is still some room for improvement, as the loss values lower,
and the test accuracy closer to the train accuracy. It could be worth
experimenting with different model architectures, hyper parameters, or
regularization techniques to further enhance the model's performance.
References
- Pandas Documentation https://pandas.pydata.org/docs/
- NumPy Documentation https://numpy.org/doc/
- TensorFlow Documentation https://www.tensorflow.org/api_docs
- scikit-learn Documentation https://scikit-learn.org/stable/documentation.html
- Brownlee, J. (2019). Deep Learning for Natural Language Processing. Machine Learning Mastery.
- Chollet, F. (2017). Deep Learning with Python. Manning Publications.
- Label Encoding in Python - GeeksforGeeks
Comments
Post a Comment