Ponder: A Quotes Bot for Crisis

17 min readDec 11, 2020

An exploration of training machines to help motivate users with quotes.

Abstract

Motivation: Famous quotes come from people who have suffered through the pain and from realization of something that helped them through their crisis, or issues, in life. In scenarios where people have nowhere to go with their feelings and emotions, can they talk to a machine, and stay motivated with their journey? Our hypothesis is that machines can motivate people just as people do.

Modeling: To test our hypothesis, I designed six different models that use a combination of n-grams (uni-, bi-, and tri-) with different similarity calculation techniques including Jaccard Similarity and Cosine Similarity. In addition to these models, I also experiment with two extension models: SpaCY Embeddings using Wu-Palmer Similarity, and BERT Language Model using Cosine Similarity. Because different models use different similarity methods, I only compare models if they compare sentences using the same similarity method. For example, n-gram models that use Jaccard Similarity as the similarity method are only compared with each other models that use the same similarity method, and similarly for Cosine Distance. Then, I evaluate the performance of each model by calculating their respective similarity method between a set of sample user inputs and predicted quotes from each algorithm. In our analysis, I discover that unigram models perform the best when quantitatively comparing the scores (Jaccard Similarity or Cosine Similarity). However, higher n-gram model predictions are closer to a human response.

Data: I use a Kaggle dataset, which contains approximately 500,000 quotes from over 100,000 authors, and 360,000 categories from love and happiness to war and darkness. For the purposes of our experiment, I required a large dataset of quotes such that the returned quote from the algorithm is diverse and matches various user inputs. And, for that reason, I selected this dataset due to the large number of quotes.

Application: To demo this work, I built an application called Ponder using Streamlit.

I. Introduction

Can we share our feelings with a machine and expect a relevant, helpful response? In this paper, I explore this idea in greater detail by building several models to help a computer understand human feelings and emotions, and then respond back to the human with a famous quote based on the user input: a feeling. An illustration of this concept can be observed by the interaction shown below:

Human: I’m feeling really sad today. My friends and family do not understand me.
Machine: I think if I’ve learned anything about friendship, it’s to hang in, stay connected, fight for them, and let them fight for you. Don’t walk away, don’t be distracted, don’t be too busy or tired, don’t take them for granted. Friends are part of the glue that holds life and faith together. Powerful stuff.

In the illustration above, I observe that the machine returns a quote related to the user’s feelings. In this paper, I try to understand machine translation of user input, evaluate various algorithms that take user input to match a relevant response, and discuss future extensions of this work.

II. Review of Related work

Crisis Text Line

Crisis Text Line is a global not-for-profit organization providing free mental health texting service. This service is an apt motivation which helps people gain emotional support during difficult times through the use of messaging. Most crisis lines respond to texts in the order in which they arrive. Crisis Text Line acts more like a hospital emergency room, where a person with a gunshot wound gets helped before a person with a broken leg. They call it texter triage. An algorithm runs in the background and assesses a texter’s suicidal risk based on their first few messages. Texters at high risk get marked as “code orange” and move to the highest order in the queue. Their motivation is shorter wait times mean lives saved. Our motivation to build a machine learning algorithm that uses natural language processing came from the realization that our efforts can also save lives and help people.

Similarity Methods

To test our hypothesis (a machine being able to match user input — feeling — to an appropriate quote), I need to find similarity between the two sets of texts, where the first set is the user input and the second set is a quote. Finding similarity between words is a necessary part of text similarity. As our user input and quotes are lists of text, we must compare each word in either the user input or quote to find similarity. It is the primary stage for sentence and document similarities. Words can be possibly similar in two ways: lexically, or semantically. Words are lexically similar if they have a similar character sequence. In contrast, if words have the same theme, then they are semantically similar. Further, if they have dissimilar themes but are used in the same context, then they are not related to each other.

When documents are represented as term vectors, the similarity of two documents corresponds to the correlation between the vectors. This is quantified as the cosine of the angle between vectors, which is one method I used called Cosine Similarity. Cosine similarity is one of the most popular similarity measures applied to text documents, such as in numerous information retrieval applications and clustering too. In this paper, the author states “By this model (cosine similarity), the similarity between two documents can be derived by calculating cosine value between two documents’ term vectors”.

Another similarity method that I use is called Jaccard Distance. Jaccard Distance is a measure of how dissimilar two sentences are. Lower the value of Jaccard Distance, higher the similarity. For example, if I have two strings: “cat” and “cats”, the intersection of the two sets is 3 because there are 3 similar characters. So the Jaccard Similarity is 3/4 = 0.75 and the Jaccard Distance is 1–0.75 = 0.25. In our modeling, I use Jaccard Similarity for simplicity. In this paper, the author states “Jaccard similarity measures the similarity between two nominal attributes by taking the intersection of both and dividing it by their union.”

SpaCY Embeddings

SpaCY is an open-source library that is used in natural language processing to understand large volumes of text. It can be used to build information extraction or natural language understanding systems, or to pre-process text for deep learning. I chose SpaCY embeddings for its linguistic concept features like Tokenization, Part-of-speech(POS) Tagging, Lemmatization, Text Classification, etc. During processing, SpaCY first tokenizes the text (segments text into words), punctuation and so on. After tokenization, SpaCY can parse and tag a given document. I use the similarity method that comes default with SpaCY. In their documentation here, they discuss (with examples) how the similarity method works.

BERT with Cosine Similarity

Bidirectional Encoder Representations from Transformers (“BERT”) is a popular technique in natural language processing, which uses minimal task-specific fine-tuning efforts. It has been able to perform better than benchmarks by leveraging pre-trained models that can easily be implemented to produce state of the art results. In our analysis, I discover that a large dataset as the one I have used can be computationally constrained by such types of pre-trained models. In the original paper, the author states that “BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.”

III. Model

I build six models using a combination of various n-grams and similarity methods. These models are:

Unigram with Jaccard Similarity
Bigram with Jaccard Similarity
Trigram with Jaccard Similarity
Unigram with Cosine Similarity
Bigram with Cosine Similarity
Trigram with Cosine Similarity

Along with the models above, I also try two extension models that rely on pre-trained model libraries. These extended models are:

SpaCY Embeddings with Wu-Palmer similarity
BERT Language Model

Model 1: Unigram with Jaccard Similarity

The unigram model compares two sentences word by word to find similarities between the two sentences. Similarity in this model is defined by calculating the Jaccard Similarity, where the formula is defined by the following equation:

Our Jaccard Similarity function takes in two sentences (e.g., a and b), and calculates the Jaccard Similarity by taking sets of our two sentences, and applying the formula above as such:

Note that for the unigram model, I send tuples of one word from our sentences, and not the raw sentences.

Model 2: Bigram with Jaccard Similarity

In this model, I compare two sentences in sets of two words to find similarities between the two sentences. Again, I calculate the Jaccard Similarity between the sets of two words to find similarity between two sentences. For example, if a sentence is: “Hi, I’m feeling sad today.”, then the two words sets (tuples) are: (Hi, I’m), (I’m, feeling), (feeling, sad), (sad, today), and (today, .). Note that for the bigram model, I send these tuples from both sentences to our jaccard_distance function, and not the raw sentences.

Model 3: Trigram with Jaccard Similarity

In this model, I compare two sentences in sets of three words to find similarities between the two sentences. Again, I calculate the Jaccard Similarity between the sets of three words to find similarity between two sentences. For example, if a sentence is “Hi, I’m feeling sad today.”, then the three word sets (tuples) are: (Hi, I’m, feeling), (I’m, feeling, sad), (feeling, sad, today), and (sad, today, .). Note that for the trigram model, I send these tuples from both sentences to our jaccard_distance function, and not the raw sentences.

Model 4: Unigram with Cosine Similarity

As learned earlier, the unigram model compares two sentences word by word to find similarity between the two sentences. The difference between this model and model 1 is the similarity calculation, which in this case is Cosine Similarity. The formula for calculating Cosine Similarity between two words is defined by the following equation:

Our Cosine Similarity function takes in two sentences (e.g., a and b), and calculates the Cosine Similarity by taking the sets of our two sentences, and applying the formula above. First, I get tuples from our sentence. For example, if our sentence is “Hi, I’m feeling sad today.”, our tuples for unigram are:

[(‘Hi,’,), (“I’m”,), (‘feeling’,), (‘sad’,), (‘today.’,)]

Then, I calculate the count of each tuple by applying the Counter function. Then, I calculate the intersection of each item across both sentence vectors (of tuples). Then, I calculate the numerator for our Cosine Similarity by taking the sum of items that are in our intersection. Then, I calculate the denominator for our Cosine Similarity by taking the square root of the sum of items that are in our vector1 multiplied by the sum of items that are in our vector2. Our Cosine Similarity function is shown below:

Model 5: Bigram with Cosine Similarity

In this model, I compare two sentences in sets of two words to find similarities between the two sentences. I use Cosine Similarity between the sets of two words to find similarity between the two sentences.

Model 6: Trigram with Cosine Similarity

In this model, I compare two sentences in sets of three words to find similarities between the two sentences. I use Cosine Similarity between the sets of three words to find similarity between the two sentences.

Extension Model 1: Spacy Embeddings

For our Spacy Embeddings model, I download a pre-trained model called en_core_web_md. Then, I apply the model to both our sentences (user input and a loop over each quote in the database), and find similarity between the two sentences.

Note that the pseudo code above shows only one quote object, but in actual code, this is a loop over all quotes. Note that due to computational constraints, I cannot evaluate the performance of this model.

Extension Model 2: BERT Language Model

For this model, I use the BERT Language Model to find similarity between two sentences. First, I use sent2vec.vectorizer to vectorize each sentence. Then, I compute distance among our sentences by using their vector. For example, if I have two sentences, s1 and s2, I calculate their vectors v1 and v2, respectively. Then, the distance between v1 and v2 is computed by Cosine Similarity. The following snippet of code is pseudo code for this model:

As shown above, the cos_dist call will calculate the cosine distance between the computed vectors of sentence 0 and 1. Also note that due to computational constraints, I cannot evaluate the performance of this model.

IV. Experimental Results

The dataset I chose for this experiment is provided by Kaggle and is called Quotes- 500k. The dataset, as the name suggests, contains approximately 500,000 quotes from various authors. The dataset has three columns: quote, author, and category.

Pre-Processing: The first step was to clean the dataset, so I started by removing null values where the quote is null. Our quotes dataset contains many rows where either the author, quote, or category are not available, but I only remove records where a quote is missing.

I also perform some sentence pre-processing, such as:

Manual Pre-Processing: “won’t” to “will not”, “can\’t” to “can not”, “n\’t” to “ not”, “\’re” to “ are”, “\’s” to “ is”, “\’d” to “ would”, “\’ll” to “ will”, “\’t” to “ not”, “\’ve” to “ have”, and “\’m” to “ am”.
Lemmatization: an example here is the word “cats”, where the lemmatization pre-processing will convert it to “cat”.
Lower Case: “Cats” → “cats”
Removing Stop Words + Punctuation: “Cats!” → “Cats”

Given our modeling approach, our main task is comparing sentence similarities and returning the score for each similarity between two given sentences.

Performance: To evaluate the performance of our models, I compare performance based on the similarity method used. For models using Jaccard Similarity, I compare the performance of only models using Jaccard Similarity, and similarly for Cosine Similarity. Note that Spacy is using Wu-Palmer Similarity, so I cannot quantitatively compare this model to our other models, but the output from our experiments is shown. In order to evaluate performance, I designed a set of 5 sample user inputs as shown below:

1. I’m feeling really sad today. My friends and family do not understand me.
2. I would like more love in my life.
3. I’m scared to go to school because I’ll get bullied.
4. I just feel awful. I’m in the bathroom crying.
5. I’m lonely and depressed and looking for someone to talk to.

Given each of the five user inputs, I predict the best quote using the six models. For the models that were built to calculate similarity using Jaccard Similarity, the performance is shown in below.

Table 1 — Jaccard Models Performance

The best performance for each user input is highlighted in light green.

As shown above, the best performance across all user inputs is from the unigram models. Note that our function outputs Jaccard Similarity and not Jaccard Distance. A full list of predicted output is available in the appendix section Jaccard Model Output.

Table 2 — Cosine Models Performance

Table 2 shows the performance of each of the cosine models. The best performance for each user input is highlighted in light green.

As shown in Table 2 — Cosine Model Performance, the best performance across all user inputs is from the unigram models. A full list of predicted output is available in the appendix section Cosine Model Output.

V. Discussion

In our experiments, I discovered that quantitative performance must be comparative to the similarity method used between two sentences. The two similarity methods used in our primary models were Jaccard Similarity and Cosine Similarity. The scores from the two methods are not comparable to each other, so I must perform comparative performance analysis. In our comparative performance analysis, I discovered that in both modeling methods, the unigram model quantitatively outperformed the bigram and trigram models. However, from a qualitative perspective, the higher n-gram models are much closer to a human response. For example, for the third user input: “I’m scared to go to school because I’ll get bullied.”, the Unigram Model using Jaccard predicted: “Go to hell, I’m reading!”, whereas the Bigram Model using Jaccard predicted: “If you are scared to go to the brink you are lost.”. From a user experience, the latter prediction is more positive and appropriate. However, from a quantitative distance measure perspective, the first prediction is more accurate. This was true for the Cosine models as well. For example, for the fifth user input: “I’m lonely and depressed and looking for someone to talk to.”, the Unigram Model using Cosine predicted: “I believe in living healthily and sustainably, and looking to nature for renewal and inspiration.”, whereas the bigram predicted: “I\’m looking for someone to quench my thirst-for all eternity”, and the trigram predicted: “We are all looking for someone to save us.”. From a sense disambiguation perspective, the bigram and trigram predicted output make a lot of sense. Hence, the relationship between higher n-gram and qualitative predictions is directly linear, where higher the n-gram, higher the quality of prediction.

In the future, I would like to evaluate the performance of the extended models. Those models are using vectorization of sentences to calculate the distance between two sentences as well as a library of pre-trained models. The benefit of using those models for prediction is the pre-trained knowledge that those models have, whereas the disadvantage is training time and computational constraint.

Key Message

Making yourself vulnerable about what is going on in your life is the ultimate sign of bravery, but may be difficult.
Whether communicating with a real human or a machine, being vulnerable could help you work through any crisis and help you strategize ways to get out of a dark place in the long term.

V. Appendix

Jaccard Model Output

User Input 1: I’m feeling really sad today. My friends and family do not understand me.

Algorithm 1: Unigram Model with Jaccard Similarity

(‘To love me, my family does not need to understand me.’, 0.27)

Algorithm 2: Bigram Model with Jaccard Similarity

(“I don’t understand boys.”, 0.14)

Algorithm 3: Trigram Model with Jaccard Similarity

(‘I like to spend as much time with my friends and family as possible.’, 0.09)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

User Input 2: I would like more love in my life.

Algorithm 1: Unigram Model with Jaccard Similarity

(‘I am in love with life.’, 0.40)

Algorithm 2: Bigram Model with Jaccard Similarity

(‘You enlighten the light of love in my life.’, 0.25)

Algorithm 3: Trigram Model with Jaccard Similarity

(‘You enlighten the light of love in my life.’, 0.18)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

User Input 3: I’m scared to go to school because I’ll get bullied.

Algorithm 1: Unigram Model with Jaccard Similarity

(“Go to hell, I’m reading!”, 0.33)

Algorithm 2: Bigram Model with Jaccard Similarity

(‘If you are scared to go to the brink you are lost.’, 0.16)

Algorithm 3: Trigram Model with Jaccard Similarity

(‘If you are scared to go to the brink you are lost.’, 0.11)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

User Input 4: I just feel awful. I’m in the bathroom crying.

Algorithm 1: Unigram Model with Jaccard Similarity

(‘I am the soul in limbo.’, 0.36)

Algorithm 2: Bigram Model with Jaccard Similarity

(‘I am in the world to change the world’, 0.23)

Algorithm 3: Trigram Model with Jaccard Similarity

(‘I am in the world to change the world’, 0.15)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

User Input 5: I’m lonely and depressed and looking for someone to talk to.

Algorithm 1: Unigram Model with Jaccard Similarity

(“I’m looking forward to the future, and feeling grateful for the past.”, 0.35)

Algorithm 2: Bigram Model with Jaccard Similarity

(“I’m looking for someone to quench my thirst-for all eternity”, 0.21)

Algorithm 3: Trigram Model with Jaccard Similarity

(‘We are all looking for someone to save us.’, 0.13)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Cosine Model Output

User Input 1: I’m feeling really sad today. My friends and family do not understand me.

Algorithm 4: Unigram Model with Cosine Similarity

(“I don’t like hobbies. I read and travel and see my friends before they disown me.”, 0.42)

Algorithm 5: Bigram Model with Cosine Similarity

(“I don’t understand boys.”, 0.28)

Algorithm 6: Trigram Model with Cosine Similarity

(“I don’t understand boys.”, 0.17)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

User Input 2: I would like more love in my life.

Algorithm 4: Unigram Model with Cosine Similarity

(‘If I had my life to live over I would start barefoot earlier in the spring and stay that way later in the fall. I would go to more dances. I would ride more merry-go-rounds. I would pick more daisies.’, 0.57)

Algorithm 5: Bigram Model with Cosine Similarity

(‘You enlighten the light of love in my life.’, 0.40)

Algorithm 6: Trigram Model with Cosine Similarity

(‘You enlighten the light of love in my life.’, 0.30)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

User Input 3: I’m scared to go to school because I’ll get bullied.

Algorithm 4: Unigram Model with Cosine Similarity

(“I tell people I won’t vote to go to war unless I’m ready to go or send my kids.”, 0.67)

Algorithm 5: Bigram Model with Cosine Similarity

(“I tell people I won’t vote to go to war unless I’m ready to go or send my kids.”, 0.32)

Algorithm 6: Trigram Model with Cosine Similarity

(‘If you are scared to go to the brink you are lost.’, 0.19)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

User Input 4: I just feel awful. I’m in the bathroom crying.

Algorithm 4: Unigram Model with Cosine Similarity

(“I feel I’m living in parentheses”, 0.67)

Algorithm 5: Bigram Model with Cosine Similarity

(‘Once I am in the square circle, I am in my home.’, 0.43)

Algorithm 6: Trigram Model with Cosine Similarity

(‘Once I am in the square circle, I am in my home.’, 0.30)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

User Input 5: I’m lonely and depressed and looking for someone to talk to.

Algorithm 4: Unigram Model with Cosine Similarity

(‘I believe in living healthily and sustainably, and looking to nature for renewal and inspiration.’, 0.58)

Algorithm 5: Bigram Model with Cosine Similarity

(‘I’m looking for someone to quench my thirst-for all eternity’, 0.34)

Algorithm 6: Trigram Model with Cosine Similarity

(‘We are all looking for someone to save us.’, 0.23)

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Ponder: A Quotes Bot for Crisis

Abstract

I. Introduction

II. Review of Related work

Crisis Text Line

Similarity Methods

SpaCY Embeddings

BERT with Cosine Similarity

III. Model

Model 1: Unigram with Jaccard Similarity

Model 2: Bigram with Jaccard Similarity

Model 3: Trigram with Jaccard Similarity

Model 4: Unigram with Cosine Similarity

Model 5: Bigram with Cosine Similarity

Model 6: Trigram with Cosine Similarity

Extension Model 1: Spacy Embeddings

Extension Model 2: BERT Language Model

IV. Experimental Results

Table 1 — Jaccard Models Performance

Table 2 — Cosine Models Performance

V. Discussion

Key Message

V. Appendix

Jaccard Model Output

Cosine Model Output

Written by Kumar Brahmbhatt