PyGaze

Tutorial: creating a Twitterbot

Edwin Dalmaijer — Fri, 04 Mar 2016 11:15:28 +0000

TL;DR

Although it sounds like a lot of effort, creating a Twitter bot is actually really easy! This tutorial, along with some simple tools, can help you create Twitter bots that respond when they see certain phrases, or that periodically post a tweet. These bots work with Markov chains, which can generate text that looks superficially good, but is actually quite nonsensical. You can make the bots read your favourite texts, and they will produce new random text in the same style!

All code is available on GitHub

The examples on this page use a custom Python library, written by Edwin Dalmaijer (that’s me). This library is open source, and available on GitHub for free. You are very welcome to download and use it, but I would like to kindly ask you to not use it for doing evil stuff. So don’t start spamming or harassing people!

Step 1: Create a Twitter account

This is an easy step, no code here yet. Simply follow the instructions to create a new Twitter account.

Go to Twitter’s sign-up page.
Fill out all details, and make sure to include your phone number. This is a requirement for remote access, and you will need that to make the Twitter bot work.

Step 2: Create a Twitter app

Not all Twitter accounts are created equal. You will need developer access to yours, if you want to use it via a Python script.

Go to apps.twitter.com
Click on the ‘Create New App’ button.
Fill out the details on the form. You have to give your app a name, description, and website (this can be a simple place holder, like http://www.example.com)
Read the Developer Agreement, and check the box at the bottom if you agree. Then click on the ‘Create your Twitter application’ button.

Step 3: Keys and access tokens

This is an important step, as you will need the keys and access tokens for you app. They allow you to sign in to your account via a Python script.

After creating your new app, you were redirected to its own page. If you weren’t, go to apps.twitter.com and click on your apps name.
On the app’s page, click on the ‘Keys and Access Tokens’ page.
At the bottom of this page, click on the ‘Create my access token’ button.
Make sure you make note of the following four keys, as you will need these later. (Just leave the tab open in your browser, or copy them to a text file or something. Make sure nobody else can access them, though!)

Consumer Key (API Key)	[ copy this value from Consumer Settings ]
Consumer Secret (API Secret)	[ copy this value from Consumer Settings ]
Access Token	[ copy this value from Your Access Token ]
Access Token Secret	[ copy this value from Your Access Token ]

Getting a Python Twitter library

Before being able to log into Twitter via Python, you need a Python wrapper for the Twitter API. There are several options out there, and I haven’t excessively researched all of them. This guide will go with Mike Verdone’s (@sixohsix on Twitter) Python Twitter Tools library, which is nice and elegant. You can also find its source code on GitHub.

If you don’t have it already, you need to install setuptools. Download the ez_setup.py script, and run it with the Python installation you want to have it installed in.
BONUS STEP: If you don’t know how to run a Python script, read this step. On Linux and OS X, open a terminal. Then type cd DIR and hit Enter, but replace DIR by the path to the folder that contains the ez_setup.py script you just downloaded. Next, type python ez_setup.py and hit Enter. This will run the script that installs setuptools. On Windows, the easiest thing to do is to make a new batch file. To do this, open a text editor (Notepad is fine). Now write “C:\Python27\python.exe” “ez_setup.py” on the first line, and pause on the second line. (Replace “C:\Python27″ with the path to your Python installation!) Save the file as “run_ez_setup.bat“, and make sure to save it in the same folder as the ez_setup.py script. The .bat extension is very important, as it will make your file into a batch file. Double-click the batch file to run it. This will open a Command Prompt in which the ez_setup.py script will be run.
After installing setuptools, you can use it to easily install the Twitter library. On Linux and OS X, open a terminal and type easy_install twitter. On Windows, create another batch file, write “C:\Python27\Scripts\easy_install.exe” twitter on the first line, and pause on the second line. (Replace “C:\Python27″ with the path to your Python installation!)
If you did everything correctly, the Twitter library should now be installed. Test it by opening a Python console, and typing import twitter. If you don’t see any errors, that means it works.

Setting up

Time to get things ready for your Twitter bot script. You need to things: a dedicated folder to store things in, and the markovbot Python library. The markovbot library is written by me, and you can easily grab it off GitHub. You’re also very welcome to post any questions, issues, or additions to the code on GitHub.

Create a new folder, and give it a name. In this example, we will use the name ‘TweetBot’ for this folder.
Go to the markovbot GitHub page.
Click on the ‘Download ZIP’ button, or use this direct download link
Unzip the ‘markovbot-master.zip’ file you just downloaded.
Copy the ‘markovbot’ folder into the ‘TweetBot’ folder.

Getting data

To establish a Markov chain, you need data. And lots of it. You also need the data to be in machine-readable form, e.g. in a plain text file. Fortunately, Project Gutenberg offers an excellent online library, with free books that also come in the form of text files. I’m talking about . (Please do note that Project Gutenberg is intended for humans to read, and not for bots to crawl through. If you want to use their books, make sure you download and read them yourself. Also, make sure that you have the right to download a book before you do it. Not all countries have the same copyright laws as the United States, where Gutenberg is based.)

Download a book of your choice, for example Sigmund Freud’s Dream Psychology. Make sure to download the utf-8 encoded text file! (At the bottom of the list.)
Copy (or move) the text file you just downloaded to the ‘TweetBot’ folder, and name it ‘Freud_Dream_Psychology.txt’.

Writing your script

You should have everything now: A Twitter account with developer’s access, a Twitter library for Python, the custom markovbot library, and some data to read. Best of all: You’ve organised all of this in a folder, in precisely the way it is described above. Now, it’s finally time to start coding the actual Twitterbot!

Start your favourite code editor, and open a new Python script.
Save the script in the ‘TwitterBot’ folder, for example as ‘my_first_bot.py’.
Start by importing the MarkovBot class from the markovbot module you need:

import os
from markovbot import MarkovBot

I’m assuming you are familiar enough with Python that you know what importing a library means. If you are not, maybe it’d be good to read about them.
The next step is to initialise your bot. The MarkovBot class requires no input arguments, so creating an instance is as simple as this:

# Initialise a MarkovBot instance
tweetbot = MarkovBot()

The next step is important! Before he can generate any text, the MarkovBot needs to read something. You can make it read your Freud example book.
The bot expects you to give him a full path to the file, so you need to construct that first:

# Get the current directory's path
dirname = os.path.dirname(os.path.abspath(__file__))
# Construct the path to the book
book = os.path.join(dirname, 'Freud_Dream_Psychology.txt')
# Make your bot read the book!
tweetbot.read(book)

At this point, your bot is clever enough to generate some text. You can try it out, by using its generate_text method. This takes one argument, and three (optional) keyword arguments (but only one of those is interesting).
generate_text‘s argument is the number of words you want in your text. Let’s try 25 for now.
generate_text‘s interesting keyword argument is seedword. You can use it to define one or more keywords that you would like the bot to try and start its sentence with:

my_first_text = tweetbot.generate_text(25, seedword=['dream', 'psychoanalysis'])
print("tweetbot says:")
print(my_first_text)

That’s cool, but what about the Twitter part? Remember you generated those keys and access tokens? You’ll need them now:

# ALL YOUR SECRET STUFF!
# Consumer Key (API Key)
cons_key = ''
# Consumer Secret (API Secret)
cons_secret = ''
# Access Token
access_token = ''
# Access Token Secret
access_token_secret = ''

Replace each set of empty quotes (”) with your own keys and tokens (these should also between quotes).
First note on the codes here: It’s actually not very advisable to stick your crucial and secret information in a script. Your script can be read by humans and machines, so this is a highly unsafe procedure! It’s beyond the scope of this tutorial, but do try to find something better if you have the time.
Second note: Now that you are pasting your secret stuff into a plain script, make sure you paste it correctly! There shouldn’t be any spaces in the codes, and it’s really easy to miss a character while copying. If you run into any login error, make sure that your keys have been copied in correctly!
Time for the bot to log in to Twitter:

# Log in to Twitter
tweetbot.twitter_login(cons_key, cons_secret, access_token, access_token_secret)

Only one more thing to do before you can start up the bot: You need to decide what your bot should do.
There are two different things the MarkovBot class can do. The first is to periodically post something. This is explained further down.
The second thing the MarkovBot class can do, is to monitor Twitter for specific things. You can specify a target string, which the bot will then use to track what happens on Twitter. For more information on the search string, see the Twitter API website.
The target string determines what tweets your bot will reply to, but it doesn’t determine what the bot says. For that, you need to specify keywords. These can go in a big list, which the bot will use whenever he sees a new tweet that matches the target string. Your bot will try to find any of your keywords in the tweets he reads, and will then attempt to use the keywords he found to start a reply with. For example, if your target string is ‘#MarryMeFreud’, and your keywords are [‘marriage’, ‘ring’, ‘flowers’, ‘children’], then your bot could find a tweet that reads I want your flowers and your children! #MarryMeFreud. In this case, the bot would read the tweet, find ‘flowers’ and ‘children’, and it will attempt to use those to start his reply. (Note: He won’t use both words, this is a very simple hierarchical thing, where the bot will try ‘flowers’ first, and ‘children’ if ‘flowers’ doesn’t work.)
In addition to the above, you can also have the MarkovBot add prefixes and suffixes to your tweets. This allows you to, for example, always start your tweet with a mention of someone (e.g. ‘@example’), or to always end with a hashtag you like (e.g. ‘#askFreud’).
Finally, the MarkovBot allows you to impose some boundaries on its behaviour. Specifically, it allows you to specify the maximal conversational depth at which it is still allowed to reply. If you are going to use your bot to reply to people, this is something you really should do. For example, if your bot always replies to people who mention ‘@example’, they are likely to wish to talk to Edward Xample. It’s funny to get one or two random responses, but as the conversation between people and Edward Xample continuous, you really don’t want your bot to keep talking to them. For this purpose, you can set the maxconvdepth to 2. This will allow your bot to reply only in conversations with no more than two replies.

# Set some parameters for your bot
targetstring = 'MarryMeFreud'
keywords = ['marriage', 'ring', 'flowers', 'children', 'religion']
prefix = None
suffix = '#FreudSaysIDo'
maxconvdepth = None

The MarkovBot’s twitter_autoreply_start method can start a Thread that will track tweets with your target string, and automatically reply to them using your chosen parameters.
If you want to stop your MarkovBot from automatically replying, you can call its twitter_autoreply_start method.
How quick your bot replies to tweets is highly dependent on how many tweets are posted that match your target string.

# Start auto-responding to tweets
tweetbot.twitter_autoreply_start(targetstring, keywords=keywords, prefix=prefix, suffix=suffix, maxconvdepth=maxconvdepth)

# Use the following to stop auto-responding
# (Don't do this directly after starting it, or your bot will do nothing!)
tweetbot.twitter_autoreply_stop()

Another thing the MarkovBot can do, is to periodically post a new tweet. You can start this with the twitter_tweeting_start method.
The keywords, prefix, and suffix keywords are available for this function too. The keywords work a bit different though: For every tweet, one of them is randomly selected. You can also pass None instead of a list of keywords, in which case your bot will just freestyle it.
One very important thing, is the timing. The twitter_tweeting_start method provides three keywords to control its timing: days, hours, and minutes. The time you specify here, is the time the bot waits between tweets. You can use all three keywords, or just a single one. If you don’t specify anything, the bot will use its default of one day.
If you want your bot to stop, you can use the twitter_tweeting_stop method.

# Start periodically tweeting
tweetbot.twitter_tweeting_start(days=0, hours=19, minutes=30, keywords=None, prefix=None, suffix='#PyGaze')

# Use the following to stop periodically tweeting
# (Don't do this directly after starting it, or your bot will do nothing!)
tweetbot.twitter_tweeting_stop()

Spamming and trolling

Although Twitter bots can easily be used to spam and troll people, I kindly ask you not to do it. You gain absolutely nothing by doing it, and Twitter’s API is built in such a way that it automatically blocks accounts that do too much, so you will be shut down for spamming. Nobody likes spammers and trolls, so don’t be one.

Conclusion

Creating a Twitter bot is easy! If you want to use my software to create your own, please do go ahead. It’s free and open, and this page lists the instructions on how to download it. I would love to hear about your projects, so please feel free to leave a comment with your story.

UPDATE: Join competitions with your bot!

Stefan Bohacek was kind enough to point out the existence of a monthly bot competition that he organises. Have a look at his Botwiki website to learn about this month’s theme, and how to compete.

Sigmund Freud Twitter Bot

Edwin Dalmaijer — Wed, 02 Mar 2016 17:09:32 +0000

TL;DR

Sigmund Freud is back! He returned in the form of a Twitter bot that replies when someone uses the hashtag #askFreud in their tweets. Not unlike the real Freud, Sigbot produces nonsensical, but real-looking text that is produced using a Markov chain. The bot can recognise and respond to specific keywords, and it can speak both German and English.

What does it say?

Sigbot Freud replies to every tweet that has the #askFreud hashtag. The bot can pick up on keywords that relate to psychoanalysis, and it will reply in Freud’s writing style. You can try for yourself by posting a tweet with the #askFreud hashtag, or see Sigbot’s timeline below:

As you might well have heard, Sigmund Freud (1856 – 1939) was a famous (and infamous) psychologist. He is the founder of psychoanalysis, which a lot of people consider to be at the basis of our current psychological therapy. Nowadays, you would be hard pressed to find a psychologist who actually believes in Freud’s theories, but that does not mean his legacy is worthless. Freud introduced or popularised several key psychological concepts, such as talk-based therapy, the unconsciousness, and childhood trauma (or regular development). In addition, he was an important voice against (theistic) religion, and overly prude and restrictive societies. In sum, although Freud’s ideas are highly controversial, he did have an undeniable impact on philosophy and psychology.

Tweets by @SigbotFreud

How does it work?

Thanks to the efforts of Project Gutenberg, a lot of Sigmund Freud’s books are available online, for free (copyrights on his works have expired). Sigbot has read a lot of these books to learn about how Freud would phrase things. The bot was specifically interested in the superficial statics of Freud’s texts, and asked the question: How often do words occur in each other’s vicinity?

After learning about the statistics of Freud’s writing, Sigbot uses a Markov chain to generate random text that is based on the statistics of Freud’s writing. The principle is as follows: If you feed Sigbot two words, it will check what other words in Freud’s writing are likely to follow your two words. The bot will then randomly choose a likely match (with more probable matches being selected more often). You now have three words: the two original ones, and the one generated by the bot. The bot will use the last two words (one original and one bot-generated one) to generate a fourth word. It will then go on to generate a fifth word, again based on the last two words. The cool thing is that the bot will continue to generate words until you are satisfied with the length of the produced text. (If you’re confused at this point, don’t worry: There is an example in the next paragraphs.)

Sigmund Freud and his cigar. Photo from WikiMedia.

The above theory might be a bit confusing, so here is an example. In the sentences “My mother works on Mondays” and “My mother cycles around town”, the words “My mother” are followed by “works” and “cycles”. A Markov-based bot that is learning about these sentences will notice and remember the co-occurrence of “My mother” and “cycles” / “works”. In other words, the bot learned about the statistics of the two sentences.

After the bot has learned about the sentences, you can feed it the words “My mother”. The bot will produce either “cycles” or “works”, because these were the words that co-occurred with “My mother” in the two training sentences. Let’s say the bot chooses “cycles”. It can then go on by itself, using the words “mother cycles” to generate another word. In the two sentence that the bot learned about, the only word that could follow “mother cycles” was “around”, which means that the bot can only generate that word.

The current sentence is “My mother cycles around”, and the bot can again use the last two words to generate a word that is likely to follow “cycles around”. In this case that word would be “around”, as this is the only word in the two learned sentences that co-occurs with “cycles around”.

The example bot has only learned about two sentences, so it will only be able to produce a very limited amount of text. The Sigbot, on the other hand, had 6 English and 21(!) German books to learn from. This means it can produce an incredibly large number of different texts!

To make Sigbot more interactive, I calculated word frequencies in Freud’s books. The result is a very long list of words that occur very often. Some of these are obvious, such as ‘and’, ‘or’, and ‘the’, but I have filtered those out (that was the only manual labour; if you know of a way to automate it, I would love to hear about it!). What remains after filtering the boring words, is a large selection of hundreds of keywords that relate to Freud’s work. If you use these keywords in tweets with #askFreud, Sigbot will recognise them, and it will use them to generate its response to your tweets.

Is Sigbot Intelligent?

After hearing about Sigbot’s efforts, you might think that it is quite a clever bot. After all, it managed to read books, it can find and remember patterns in language, and it can use its knowledge to talk. It can even reply to the specific things that people ask. However, this does not mean Sigbot is intelligent! The bot has no idea about what it does, it does not understand language, and it does not understand your questions. Everything Sigbot does, is purely probabilistic.

An xkcd comic on Twitter bots.

In Freud’s writing, sometimes words occurred more frequently in combination with other words. Sigbot simply uses those frequencies to produce text that matches the word-combination frequencies in Freud’s work. That’s it. Unlike the bot in the relevant xkcd comic above, Sigbot won’t become sentient, and it won’t try to harm you.

UPDATE (3 March 2016, 00:36): Having said all that stuff about Sigbot not becoming any cleverer, I was a bit surprised to see the tweet below. Sigbot seems to be trying to sell someone an eBook. Maybe it is becoming sentient!

@van_Vulpen Dream about the Mission of Project Gutenberg's Reflections on War and Death, by Sigmund Freud This eBook is for rest.

— Sigmund Freud (@SigbotFreud) March 2, 2016

Bilingualism

Freud wrote in German, and his books have been translated into English by others. Sigbot is completely agnostic to language, and has learned about the books of each language separately. This means he can produce both German and English responses to your questions. Twitter can automatically detect the language of your tweets, and Sigbot will use this information to select the right language.