Skip to main content

Create markov chain ("_ebooks") accounts on Twitter

Project description

Create markov chain (”_ebooks”) accounts on Twitter

The audience for this library is those with at least basic Python experience. Before you set this up, you’ll need:

  • A twitter account

  • A twitter application (register at dev.twitter.com) with authentication keys for the account (read more)

  • A corpus for the bot to learn, which can be a text file or a Twitter archive. Several thousand lines are needed to get decent results, with fewer than 100 or so it won’t work at all.

Install

Run pip install twitter_markov, or download/clone the package and run python setup.py install. Feel free to use a virtualenv, if you’re into that.

Corpus Pocus

“Corpus” is just a fancy-schmancy word for “a bunch of text”. twittermarkov expects a corpus that’s a text file with one tweet per line.

The twittermarkov corpus command will create such a file from a Twitter archive, with options to ignore replies or retweets, and to filter out mentions, urls, media, and/or hashtags.

When reading an archive, these arguments use the tweet’s metadata to precisely strip the offending content. This may not work well for tweets posted before 2011 or so. For text files or older tweets, a regular expression search is used.

# Usage is twittermarkov corpus archive output
# This creates the file corpus.txt
twittermarkov corpus twitter/archive/path corpus.txt

twittermarkov corpus --no-retweets --no-replies twitter/archive/path corpus-no-replies.txt
# Teweets like this will be ignored:
# RT @sample I ate a sandwich

# Tweets like this will be read in without the @sample:
# @sample Was it tasty?

All the filtering options: * --no-retweets - skip retweets * --no-replies - filter out replies (keeps the tweet, just removes the starting username) * --no-mentions - filter out mentions * --no-urls - filter out urls * --no-media - filter out media * --no-hashtags - filter out hashtags

If you’re using a Twitter archive, the archive argument should be the tweet.csv file found in the archive folder (which usually has a long name like 16853453_3f21d17c73166ef3c77d7994c880dd93a8159c88).

Config

See the bots.yaml file for a full list of settings. Plug your settings in and save the file as bots.yaml to your home directory or ~/bots. You can also use JSON, if that’s your thing.

At a minimum, your config file will need to look like this:

apps:
    example_app_name:
        consumer_key: ...
        consumer_secret: ...

users:
    example_screen_name:

        key: ...
        secret: ...

        app: example_app_name

        # If you want your bot to continue to learn, include this
        parent: your_screen_name

Read up on dev.twitter.com on obtaining authentication tokens.

First Tweet

Once a corpus is set up, the twittermarkov tweet command will send tweets out. By default, the command will learn recent tweets from the parent account and send one tweet.

The learning also won’t happen if twittermarkov can’t find it’s previous tweets, which might happen if there are problems with the Twitter API, or your _ebooks account has never tweeted.

Since learning depends on the _ebooks account having an existing tweet, send a first tweet with the --no-learn flag.

twittermarkov tweet --no-learn example_screen_name

To have your bot reply to mentions, use:

twittermarkov tweet --reply example_screen_name

Automating

On a *nix system, set up a cron job like so:

0 10-20 * * * twittermarkov tweet example_screen_name
15,45 10-20 * * * twittermarkov tweet --reply example_screen_name

API

If you want to write a script to expand on twitter_markov, you’ll find a fairly simple set of tools.

class twitter_markov.TwitterMarkov(screen_name, corpus=None, config=None, api=None)

  • screen_name - Twitter user account

  • corpus - Path to a corpus file, or a list of paths. If omitted, Twitter_markov looks in its config for corpus and/or corpora entries.

  • config - A dictionary of configuration settings. But default, twitter_markov will try to read this from the bots.yaml file (see above)/

  • api - A tweepy-like API object. In the twitter_markov class, this is a twitter_bot_utils.API object.

  • blacklist - A list of words to avoid generating

The first corpus in the found corpora (or in the config file) will be the default. When using the class with more than corpus, you can specify a corpus with the model keyword argument using the basename of the given file, e.g. “special.txt” for the corpus stored at “dir/special.txt”.

Properties: * recently_tweeted - A list of the 20 (or config['checkback']) most recent tweets from self.screen_name.

Methods:

  • check_tweet(text): Check if a string contains blacklisted words or is similar to a recent tweet.

  • reply(status, model=None): Compose a reply to the giventweepy.Status`.

  • reply_all(model=None): Reply to all mentions since the last time self.screen_name sent a reply tweet.

  • compose(model=None, max_len=140): Returns a string generated from “model” (or the default model).

  • tweet(model=None): Post a tweet composed by “model” (or the default model).

  • learn_parent(corpus=None, model=None): Add recent tweets from the parent account (since the last time self.screen_name tweeted) to the corpus. This is subject to the filters described in bots.yaml.

Example

This assumes a corpus file (corpus.txt) and config file (config.yaml).

from twitter_markov import TwitterMarkov

tm = TwitterMarkov('example_screen_name', 'corpus.txt', config_file='config.yaml')
tweet = tm.compose()

# do something more with tweet, or use the Tweepy API in a different way
tm.api.update_status(tweet)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twitter_markov-0.4.0.tar.gz (12.7 kB view details)

Uploaded Source

File details

Details for the file twitter_markov-0.4.0.tar.gz.

File metadata

File hashes

Hashes for twitter_markov-0.4.0.tar.gz
Algorithm Hash digest
SHA256 52098e2e4d75a52bd3d77b36289b09a8ea3a4dbdbdca3cce3fca7ff3ef5f8abc
MD5 51d7e79c32fba5f65045ade192c0a6cf
BLAKE2b-256 ffa0f6152813e9038a537d3898599844b606d5bab2fdae75a77320f4b8a1524b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page