Skip to main content

Script to store tweets of a list of users in a databases for NLP processing.

Project description

Tweet Archiveur

This project aim at storing tweets in a database. But you could use it without database.

  • Input : tweetos id in a CSV file
  • Output : A databases of tweets and hastags

The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.

But you could use the project for other purpose with other people.

How to install the package

TODO : push it to Pipy when :

pip install tweetarchiveur

How to use the package in your project

There is two class :

  • A Scrapper() to use the Twitter API
  • A Database() to store tweets and hastags in it
from tweet_archiveur.scrapper import Scrapper
from tweet_archiveur.database import Database

# Force some variable outside Docker
from os import environ
environ["DATABASE_PORT"] = '8479'
environ["DATABASE_HOST"] = 'localhost'
environ["DATABASE_USER"] = 'tweet_archiveur_user'
environ["DATABASE_PASS"] = '1234leximpact'
environ["DATABASE_NAME"] = 'tweet_archiveur'

scrapper = Scrapper()
df_users = scrapper.get_users_accounts('../tests/sample-users.csv')
users_id = df_users.twitter_id.tolist()
database = Database()
database.create_tables_if_not_exist()
database.insert_twitter_users(df_users)
scrapper.get_all_tweet_and_store_them(database, users_id[0:2])
del database
del scrapper
2021-03-22 10:21:59,837 -  tweet-archiveur INFO     Scrapper ready
2021-03-22 10:21:59,841 -  tweet-archiveur INFO     Loading database module...
2021-03-22 10:21:59,842 -  tweet-archiveur DEBUG    DEBUG : connect(user=tweet_archiveur_user, password=XXXX, host=localhost, port=8479, database=tweet_archiveur, url=None)
2021-03-22 10:22:03,915 -  tweet-archiveur INFO     Done scrapping, we got 400 tweets from 2 tweetos.

How we use it

We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.

We then explore them with Apache Superset.

How we deploy it

Prepare the environment :

git clone https://github.com/leximpact/tweet-archiveur.git
cd tweet-archiveur
cp docker/docker.env .env

Edit the .env to your needs.

Run the application :

docker-compose up -d

To view what's going on :

docker logs tweet-archiveur_tweet_archiveur_1 -f

The script archiveur.py use the package to get the parliament accounts from https://github.com/regardscitoyens/twitter-parlementaires

The parameters is read in a .env file.

It is launched by the entrypoint.sh script every 8 hours.

To stop it :

docker-compose down

The data is kept in a docker volume, to clean them :

docker-compose down -v

What to do with it ?

  • Most used hashtag (per period, per person)
  • Most/Less active user
  • Timeline of
  • NLP Topic detection
  • Word cloud

Annexes

Exit code :

  • 1 : Unknown error when storing tweets
  • 2 : Unknown error getting tweets
  • 3 : Failed more than 3 consecutive times
  • 4 : no env

If one thing fail no tweet will be saved.

status code = 429 : 429 'Too many requests' error is returned when you exceed the maximum number of requests allowed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweet_archiveur-0.0.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

tweet_archiveur-0.0.1-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file tweet_archiveur-0.0.1.tar.gz.

File metadata

  • Download URL: tweet_archiveur-0.0.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6

File hashes

Hashes for tweet_archiveur-0.0.1.tar.gz
Algorithm Hash digest
SHA256 654bf9da344b338b0166c6227e18d3285fdba2430a9bc1a080b6beefc245ee5c
MD5 6025dfc647d4e0fe60d50f31891814c0
BLAKE2b-256 3bae27aee847b43d044a4de6ce2c4de729e72157d6bd7b4e7084eca7296cc59d

See more details on using hashes here.

File details

Details for the file tweet_archiveur-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: tweet_archiveur-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.7.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6

File hashes

Hashes for tweet_archiveur-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3a208ee80b143e7abdf69485473848ef71f67dc7c6c15217cf74410ae7710a95
MD5 5e8f8766af64580a884ccf6dde39f6c3
BLAKE2b-256 119b216c684763344d4803712f97568f6822d8d05f6f192d9ea451ff6bdfd190

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page