Learning from chatterbot

Last week I started writing code for an open-source chat-bot program. My goal was to create something that was basically an open-source version of Cleverbot. There isn't a lot out there to choose from if someone were looking for a program like this so I thought it would be a cool idea to create.

I wrote a program that could respond to user input by finding the closest match based on past conversations. This worked well when a match existed, however when presented with a completely new conversation the program failed to reply with coherent responses. The simplest solution seemed to be to provide the program with more "conversation experience" so that it had a greater selection to choose a response from. In order to improve the program's database of conversations, I hooked it up to CleverBot's api and set it loose chatting away with CleverBot.

The communication between my program and CleverBot was really interesting to observe. The conversation started out awful and was totally incoherent. However, after about ten iterations of the two talking back and forward, my program started replying with some of the new replies it had "learned" from talking with CleverBot. This progressed to the point where they could have an almost completely coherent conversation with each other. The only problem is that my program was still only able to provide replies from the selection of statements that CleverBot had said to it.

I decided to extend the range of where my program could attain conversation data from. This is where I made the mistake. I decided that Twitter would be the optimal source to have the program pull data from. Tweets are short an sentence-like and there is plenty of people on Twitter constantly creating new messages and conversations that are all publicly available. Naturally this seemed like the perfect source of conversation data.

salvius: "hey mc, doing well. Thanks for sharing nmirz new video."
cleverbot: "Are you a dog?"
salvius: "lmao are you a dog person?"
cleverbot: "Who was the who?"
salvius: "Who? Who is but a form following the function of what and what I am is a man in a mask."
cleverbot: "It's not your powers of observation I doubt, but merely the paradoxical nature of asking a masked man who is. But tell me, do you like music?"
salvius: "I like seeing movies."
cleverbot: "I wish I was The Mad Hatter."
salvius: "You're entirely bonkers. But I'll tell you a secret. All the best people are."

The conversations that resulted from pulling conversation data from Twitter's api was not what I had expected. On the positive side, my chat bot was now amazingly up to date with popular culture, even being converse about recent television events that had just premiered (such as the latest Dr. Who episode). Talking with Cleverbot, the two stopped speaking English at one point, preferring to address each other in French. While the responses that the program returned were relevant and coherent, they also were extremely prone to reflect what is probably best described as the chatter of the internet's most profound trolls. A plethora of profanity ensued regularly whenever the subject of a conversation began to have anything to do with sports, various actors, or politics.

For anyone who is interested, I have published the code for my chat bot program in a repository on GitHub, https://github.com/gunthercox/ChatterBot.