How I used NLP to find users I would have never targeted before.

Over the better part of a decade, Twitter has become an extremely useful tool for startups and entrepreneurs alike. With over 600 million users and an average of 58 million tweets on any given day, it allows early stage startups to find and get in touch with their initial target audience easier than ever before.

For example, looking at these tweets, there’s room for user acquisition depending on your product or service you’re selling. Most of these people are asking “Where can I find ____?” rhetorically, sarcastically or quoting something or someone but they’re a few asking serious questions pertaining to them trying to acquire some type of good or service.

What if there was an easy way to programatically view tweets pertaining to your idea or startup and respond to them in a non-spammy way? This would allow you to target users who would have a high chance of using your product. This was the problem I set out to solve at Outpost.

The first obstacle was how to gather a sample or selection of tweets that might relate to what we’re doing. Twitter’s Streaming API has three endpoints available that would work in this situation, filtered, firehose, and sample. Both firehose and sample returned tweets that weren’t related more than those that were so I decided to use filtered streaming cause I could then hone in to key words or phrases. Now the issue with filtered streaming is that it requires the user to input almost every combination of phrases that you’d like to track which meant I had to generate that somehow. A simple for loop and replace solved that issue.

Next up is dissecting the tweets & striping the stop words. Processing the tweets helped us categorize and rank chunks of text to come up with a general idea whether the tweet in question was truly relevant to what we’re offering. If it wasn’t relevant, skip it and go on to the next tweet. If it was relevant, I used Named Entity Recognition methods to try to identify locations in the tweet. We tried to develop a process for NER in-house but realized that we’d probably need to use code from a third party service such as OpenCalais, AlchemyAPI and Yahoo. In the end, 7% of the tweets get extracted from in house service, 13% from Stanford NER library and the remaining 80% from 3rd party service.

Once the NER is finished and relevant data is extracted, it’s time to generate a response and pass it off for someone to reply to. Relevant tweets and their proposed replies are then passed into a Firebase DB. Once in the Firebase DB one of our interns would reply.

The end result was around 9,000 tweets a hour being processed from Twitter’s API. Around 100 tweets an hour that were ranked semi relevant. About 15 tweets an hour that we reply to. About 80% of those visit our site and use it for more than 10 min. This resulted in a 10% increase of unique visitors now on our site and almost tripled our referral links from Twitter.