Natural language processing technologies have become quite sophisticated over the past few years. From tech giants to hobbyists, many are rushing to build rich interfaces that can analyze, understand, and respond to natural language. Amazon’s Alexa, Microsoft’s Cortana, Google’s Google Home, and Apple’s Siri all aim to change the way we interact with computers.

Sentiment analysis, a subfield of natural language processing, consists of techniques that determine the tone of a text or speech. Today, with machine learning and large amounts of data harvested from social media and review sites, we can train models to identify the sentiment of a natural language passage with fair accuracy.

Email sentiment analysis bot tutorial

In this tutorial, you will learn how you can build a bot that can analyze the sentiment of emails that it receives and notify you about emails that may require your attention immediately.

Analyzing Sentiment in Emails

The bot will be built using a mix of Java and Python. The two processes will communicate with each other using Thrift. If you are not familiar with one or both of these languages, you can still read on as the fundamental concepts of this article will hold for other languages as well.

To determine if an email needs your attention, the bot will parse it and determine if there is a strong negative tone. It will then send out a text alert if needed.

We will use Sendgrid to connect to our mailbox and Twilio to send out text alerts.

Sentiment Analysis: A Deceptively Simple Problem

There are words that we associate with positive emotions, such as love, joy, and pleasure. And, there are words that we associate with negative emotions, such as hate, sadness, and pain. Why not train the model to recognize these words and count the relative frequency and strength of each positive and negative word?

Well, there are a couple of problems with that.

First, there is a problem of negation. For example, a sentence like “The peach is not bad” implies a positive emotion using a word that we most often associate with being negative. A simple bag-of-words model will not be able to recognize the negation in this sentence.

Furthermore, mixed sentiments prove to be yet another problem with naive sentiment analysis. For example, a sentence like “The peach is not bad, but the apple is truly terrible” contains mixed sentiments of mixed intensities that interact with each other. A simple approach will not be able to resolve the combined sentiments, the different intensity, or the interactions between the sentiments.

Sentiment Analysis Using Recursive Neural Tensor Network

The Stanford Natural Language Processing library for sentiment analysis resolves these issues using a Recursive Neural Tensor Network (RNTN).

RNTN on a sentence

The RNTN algorithm first splits a sentence up into individual words. It then constructs a neural network where the nodes are the individual words. Finally, a tensor layer is added so that the model can properly adjust for interactions between the words and phrases.

You can find a visual demonstration of the algorithm on their official website.

The Stanford NLP group trained the Recursive Neural Tensor Network using manually-tagged IMDB movie reviews and found that their model is able to predict sentiment with very good accuracy.

Bot That Receives Emails

The first thing you want to do is set up email integration so that data can be piped to your bot.

There are many ways to accomplish this, but for the sake of simplicity, let’s set up a simple web server and use Sendgrid’s inbound parse hook to pipe emails to the server. We can forward emails to Sendgrid’s inbound parse address. Sendgrid will then send a POST request to our web server, and we will then be able to process the data through our server.

To build the server, we’ll use Flask, a simple web framework for Python.

In addition to building the web server, we will want to connect the web service to a domain. For brevity, we will skip writing about this in the article. However, you can read more about it here.

Building a web server in Flask is incredibly simple.

Simply create an app.py and add this to the file:

from flask import Flask, request
import datetime
 
app = Flask(__name__)
 
@app.route('/analyze', methods=['POST'])
def analyze():
    with open('logfile.txt', 'a') as fp_log:
        fp_log.write('endpoint hit %s \n' % datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))
    return "Got it"
 
app.run(host='0.0.0.0')

If we deploy this app behind a domain name and hit the endpoint “/analyze” endpoint, you should see something like this:

>>> requests.post('http://sentiments.shanglunwang.com:5000/analyze').text
'Got it'

Next, we want to send emails to this endpoint.

You can find more documentation here but you essentially want to set up Sendgrid to be your email processor and have Sendgrid forward the emails to our web server.

Here is my setup on Sendgrid. This will forward emails to @sentibot.shanglunwang.com as POST requests to “http://sentiments.shanglunwang.com/analyze”:

Sendgrid configuration

You can use any other service that supports sending inbound emails over webhooks.

After setting everything up, try sending an email to your Sendgrid address, You should see something like this in the logs:

endpoint hit 2017-05-25 14:35:46

That’s great! You now have a bot that is able to receive emails. That is half of what we are trying to do.

Now, you want to give this bot the ability to analyze sentiments in emails.

Email Sentiment Analysis with Stanford NLP

Since the Stanford NLP library is written in Java, we will want to build the analysis engine in Java.

Let’s start by downloading the Stanford NLP library and models in Maven. Create a new Java project, add the following to your Maven dependencies, and import:

<dependency>
   <groupId>edu.stanford.nlp</groupId>
   <artifactId>stanford-corenlp</artifactId>
   <version>3.6.0</version>
</dependency>

Stanford NLP’s sentiment analysis engine can be accessed by specifying the sentiment annotator in pipeline initialization code. The annotation can then be retrieved as a tree structure.

For the purposes of this tutorial, we just want to know the general sentiment of a sentence, so we won’t need to parse through the tree. We just need to look at the base node.

This makes the main code relatively simple:

package seanwang;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;
 
 
import java.util.*;
 
public class App
{
    public static void main( String[] args )
    {
        Properties pipelineProps = new Properties();
        Properties tokenizerProps = new Properties();
        pipelineProps.setProperty("annotators", "parse, sentiment");
        pipelineProps.setProperty("parse.binaryTrees", "true");
        pipelineProps.setProperty("enforceRequirements", "false");
        tokenizerProps.setProperty("annotators", "tokenize ssplit");
        StanfordCoreNLP tokenizer = new StanfordCoreNLP(tokenizerProps);
        StanfordCoreNLP pipeline = new StanfordCoreNLP(pipelineProps);
        String line = "Amazingly grateful beautiful friends are fulfilling an incredibly joyful accomplishment. What an truly terrible idea.";
        Annotation annotation = tokenizer.process(line);
        pipeline.annotate(annotation);
        // normal output
        for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
            String output = sentence.get(SentimentCoreAnnotations.SentimentClass.class);
            System.out.println(output);
        }
    }
}

Try some sentences and you should see the appropriate annotations. Running the example code outputs:

Very Positive
Negative

Integrating the Bot and the Analysis Engine

So we have a sentiment analyzer program written in Java and an email bot written in Python. How do we get them to talk to each other?

There are many possible solutions to this problem, but here we will use Thrift. We will spin up the Sentiment Analyzer as a Thrift server and the email bot as a Thrift client.

Thrift is a code generator and a protocol used to enable two applications, often written in different languages, to be able to communicate with one another using a defined protocol. Polyglot teams use Thrift to build networks of microservices to leverage the best of each language they use.

To use Thrift, we will need two things: a .thrift file to define the service endpoints, and generated code to make use of the protocol defined in the .proto file. For the analyzer service, the sentiment.thrift looks like this:

namespace java sentiment
namespace py sentiment
 
service SentimentAnalysisService
{
        string sentimentAnalyze(1:string sentence),
}

We can generate client and server code using this .thrift file. Run:

thrift-0.10.0.exe --gen py sentiment.thrift
thrift-0.10.0.exe --gen java sentiment.thrift

Note: I generated the code on a Windows machine. You will want to use the appropriate path to the Thrift executable in your environment.

Now, let’s make the appropriate changes to the analysis engine to create a server. Your Java program should look like this:

SentimentHandler.java

package seanwang;
 
public class SentimentHandler implements SentimentAnalysisService.Iface {
    SentimentAnalyzer analyzer;
    SentimentHandler() {
        analyzer = new SentimentAnalyzer();
    }
 
    public String sentimentAnalyze(String sentence) {
        System.out.println("got: " + sentence);
        return analyzer.analyze(sentence);
    }
 
}

This handler is where we receive the analysis request over the Thrift protocol.

SentimentAnalyzer.java

package seanwang;
 
// ...
 
public class SentimentAnalyzer {
    StanfordCoreNLP tokenizer;
    StanfordCoreNLP pipeline;
 
    public SentimentAnalyzer() {
        Properties pipelineProps = new Properties();
        Properties tokenizerProps = new Properties();
        pipelineProps.setProperty("annotators", "parse, sentiment");
        pipelineProps.setProperty("parse.binaryTrees", "true");
        pipelineProps.setProperty("enforceRequirements", "false");
        tokenizerProps.setProperty("annotators", "tokenize ssplit");
        tokenizer = new StanfordCoreNLP(tokenizerProps);
        pipeline = new StanfordCoreNLP(pipelineProps);
    }
 
    public String analyze(String line) {
        Annotation annotation = tokenizer.process(line);
        pipeline.annotate(annotation);
        String output = "";
        for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
            output += sentence.get(SentimentCoreAnnotations.SentimentClass.class);
            output += "\n";
        }
        return output;
    }
}

The Analyzer uses the Stanford NLP library to determine the sentiment of the text and produces a string containing the sentiment annotations for each sentence in the text.

SentimentServer.java

package seanwang;
 
// ...
 
public class SentimentServer {
    public static SentimentHandler handler;
 
    public static SentimentAnalysisService.Processor processor;
 
    public static void main(String [] args) {
        try {
            handler = new SentimentHandler();
            processor = new SentimentAnalysisService.Processor(handler);
 
            Runnable simple = new Runnable() {
                public void run() {
                    simple(processor);
                }
            };
 
            new Thread(simple).start();
        } catch (Exception x) {
            x.printStackTrace();
        }
    }
 
    public static void simple(SentimentAnalysisService.Processor processor) {
        try {
            TServerTransport serverTransport = new TServerSocket(9090);
            TServer server = new TSimpleServer(new Args(serverTransport).processor(processor));
 
            System.out.println("Starting the simple server...");
            server.serve();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Note that I did not include the SentimentAnalysisService.java file in here since it is a generated file. You will want to put the generated code in a place where the rest of your code can access it.

Now that we have the server up, let’s write a Python client to use the server.

client.py

from sentiment import SentimentAnalysisService
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
 
class SentimentClient:
    def __init__(self, server='localhost', socket=9090):
        transport = TSocket.TSocket(server, socket)
        transport = TTransport.TBufferedTransport(transport)
        protocol = TBinaryProtocol.TBinaryProtocol(transport)
        self.transport = transport
        self.client = SentimentAnalysisService.Client(protocol)
        self.transport.open()
 
    def __del__(self):
        self.transport.close()
 
    def analyze(self, sentence):
        return self.client.sentimentAnalyze(sentence)
 
if __name__ == '__main__':
    client = SentimentClient()
    print(client.analyze('An amazingly wonderful sentence'))

Run this and you should see:

Very Positive

Great! Now that we have the server running and talking to the client, let’s integrate it with the email bot by instantiating a client and piping the email into it.

import client
 
# ...
 
@app.route('/analyze', methods=['POST'])
def analyze():
    sentiment_client = client.SentimentClient()
    with open('logfile.txt', 'a') as fp_log:
        fp_log.write(str(request.form.get('text')))
        fp_log.write(request.form.get('text'))
        fp_log.write(sentiment_client.analyze(request.form.get('text')))
    return "Got it"

Now deploy your Java service to the same machine where you’re running the web server, start the service, and restart the app. Send an email to the bot with a test sentence and you should see something like this in the log file:

Amazingly wonderfully positive and beautiful sentence.
Very Positive

Analyzing the Email

All right! Now we have an email bot that is able to perform sentiment analysis! We can send an email over and receive a sentiment tag for each sentence we sent. Now, let’s explore how we can make the intelligence actionable.

To keep things simple, let’s focus on emails where there is a high concentration of negative and very negative sentences. Let’s use a simple scoring system and say that if an email contains more than 75% negative sentiment sentences, we will mark that as a potential alarm email that may require an immediate response. Let’s implement the scoring logic in the analyze route:

@app.route('/analyze', methods=['POST'])
def analyze():
    text = str(request.form.get('text'))
    sentiment_client = client.SentimentClient()
    text.replace('\n', '')  # remove all new lines
    sentences = text.rstrip('.').split('.')  # remove the last period before splitting
    negative_sentences = [
        sentence for sentence in sentences
        if sentiment_client.analyze(sentence).rstrip() in ['Negative', 'Very negative']  # remove newline char
    ]
    urgent = len(negative_sentences) / len(sentences) > 0.75
    with open('logfile.txt', 'a') as fp_log:
        fp_log.write("Received: %s" % (request.form.get('text')))
        fp_log.write("urgent = %s" % (str(urgent)))
 
    return "Got it"

The code above makes a few assumptions but will work for demonstration purposes. Send a couple emails to your bot and you should see the email analysis in the logs:

Received: Here is a test for the system. This is supposed to be a non-urgent request.
It's very good! For the most part this is positive or neutral. Great things
are happening!
urgent = False
 
Received: This is an urgent request. Everything is truly awful. This is a disaster.
People hate this tasteless mail.
urgent = True

Sending Out an Alert

We’re almost done!

We have built an email bot that is able to receive emails, perform sentiment analysis, and determine if an email requires immediate attention. Now, we just have to send out a text alert when an email is particularly negative.

We will use Twilio to send out a text alert. Their Python API, which is documented here, is pretty straightforward. Let’s modify the analyze route to send out a request when it receives an urgent request.

def send_message(body):
    twilio_client.messages.create(
        to=on_call,
        from_=os.getenv('TWILIO_PHONE_NUMBER'),
        body=body
    )
 
app = Flask(__name__)
 
 
@app.route('/analyze', methods=['POST'])
def analyze():
    text = str(request.form.get('text'))
    sentiment_client = client.SentimentClient()
    text.replace('\n', '')  # remove all new lines
    sentences = text.rstrip('.').split('.')  # remove the last period before splitting
    negative_sentences = [
        sentence for sentence in sentences
        if sentiment_client.analyze(sentence).rstrip() in ['Negative', 'Very negative']  # remove newline char
    ]
    urgent = len(negative_sentences) / len(sentences) > 0.75
    if urgent:
        send_message('Highly negative email received. Please take action')
    with open('logfile.txt', 'a') as fp_log:
        fp_log.write("Received: " % request.form.get('text'))
        fp_log.write("urgent = %s" % (str(urgent)))
        fp_log.write("\n")
 
    return "Got it"

You will need to set your environment variables to your Twilio account credentials and set the on-call number to a phone that you can check. Once you have done that, send an email to the analysis endpoint and you should see a text being sent to the phone number in question.

And we’re done!

Natural Language Processing Made Easy with Stanford NLP

In this article, you learned how to build an email sentiment analysis bot using the Stanford NLP library. The library helps abstract away all the nitty-gritty details of natural language processing and allows you to use it as a building block for your NLP applications.

I hope this post has demonstrated one of the many amazing potential applications of sentiment analysis, and that this inspires you to build an NLP application of your own.

You can find the code for the email sentiment analysis bot from this NLP tutorial on GitHub.

Understanding the Basics

What is natural language processing?

Natural language processing is the use of algorithms to analyze and understand ordinary human speech to determine metrics such as sentiment.

About the author

Shanglun (Sean) Wang, United States
member since April 1, 2016
Sean is a passionate C/C++ and Python developer with extensive experience in full-stack web development, system administration, and data science. He is capable of working in both Linux and Windows environments and has developed everything from machinery interface to market intelligence software. Sean is also an excellent communicator and spends his spare time coaching speech and debate. [click to continue...]
Hiring? Meet the Top 10 Freelance Java Developers for Hire in June 2017

Comments

comments powered by Disqus
Subscribe
The #1 Blog for Engineers
Get the latest content first.
No spam. Just great engineering posts.
The #1 Blog for Engineers
Get the latest content first.
Thank you for subscribing!
You can edit your subscription preferences here.
Trending articles
Relevant Technologies
About the author
Shanglun (Sean) Wang
Python Developer
Sean is a passionate C/C++ and Python developer with extensive experience in full-stack web development, system administration, and data science. He is capable of working in both Linux and Windows environments and has developed everything from machinery interface to market intelligence software. Sean is also an excellent communicator and spends his spare time coaching speech and debate.