Skip to main content

Assign labels to emails in Google Mail based on their similarity to other emails assigned to the same label.

Project description

Sort your emails automatically

Python package Coverage Status Code style: black

The gmailsorter is a python module to automate the filtering of emails on the Google mail service using the their API. It assigns labels to emails based on their similarity to other emails assigned to the same label.

Motivation

Many people struggle with the increasing email volume leading to hundreds of unread emails. As the capabilities of even the best search engine are limited when it comes to large numbers of emails, the only way to keep an overview is filing emails into folders. The manual work of filing emails into folders is tedious, still most people are too lazy to create email filters and keep their email filters up to date. Finally, in the age of mobile computing when most people access their emails from their smartphone, the challenge of sorting emails is more relevant than ever.

The solution to this challenge is to automatically filter emails depending on their similarity to existing emails in a given folder. This solution was already proposed in a couple of research papers ranging from the filtering of spam emails 1 to the specific case of sorting emails into folders 2. Even a couple of open source prototypes were available like 3 and 4.

This is basically a similar approach specific to the Google Mail API. It is a python script, which can be executed periodically for example with a cron task to sort the emails for the user.

Installation

The gmailsorter is available on the conda-forge or pypi repositories and can be installed using either:

conda install -c conda-forge gmailsorter

or alternatively:

pip install gmailsorter

Configuration

The gmailsorter requires two steps of configuration:

  • The user has to create a Google Mail API credentials file credentials.json following the Google Mail API documentation.
  • Access to an SQL database, this can be provided as connection string, alternatively gmailsorter is going to use a local SQLite database named email.db located in the current directory. This results in the following connection string: sqlite:///email.db

Python interface

Import the Gmail class and the function load_client_secrets_file from the gmailsorter module

from gmailsorter import Gmail, load_client_secrets_file

Initialize gmailsorter

Create a gmail object from the Gmail() class:

gmail = Gmail(
    client_config=load_client_secrets_file(
        client_secrets_file="/absolute/path/to/credentials.json"
    ),
    connection_str="sqlite:////absolute/path/to/email.db",
)

Based on the configuration from the previous section, the function load_client_secrets_file is used to load the credentials.json file and provide its content as python dictionary to the client_config parameter of the Gmail() class. In addition to the client_config parameter the Gmail() class also requires a connection to an SQL database which is provided as connection_str. In addition the email_download_format can be specified as either metadata or full, where the primary difference is whether the content of the email is stored or not. Finally, as optional parameter the port can be specified which is used to authenticate the Google Mail API via a web browser, by default this 8080.

Sync local database with email account

To reduce the communication overhead, the emails are stored locally in an SQLite database.

gmail.update_database(quick=False)

By setting the optional flag quick to True only new emails are downloaded while changes to existing emails are ignored.

Generate pandas dataframe for emails

Load all emails from the local SQLite database and combine them in a pandas DataFrame for further postprocessing:

df = gmail.get_all_emails_in_database()

Download specific label from email server

Download emails with the label "MyLabel" from the email server:

df = gmail.download_emails_for_label(label="MyLabel")

In this case the emails are not stored in the local SQLite database.

Filter emails using machine learning

Assign new email labels to the emails with the label "MyLabel":

gmail.filter_messages_from_server
    label="MyLabel",
    recommendation_ratio=0.9,
)

This functionality is based on the download_emails_for_label() function above. It checks the server for new emails for a selected label "MyLabel". Then reloads the machine learning model from the local SQLite database and trys to predict the correct labels for these emails. The recommendation_ratio defines the level of certainty required to actually move the email, with 0.9 equalling a certainty of 90%.

Command Line interface

The command line interface implements the same functionality as the Python interface, it supports the following options:

  • gmailsorter -c/--credentials path to credentials file provided by Google e.g. credentials.json .
  • gmailsorter -d/--database connection string to connect to database e.g. sqlite:///email.db .
  • gmailsorter -u/--update update the local email database and retrain the machine learning model.
  • gmailsorter -l/--label=MyLabel assign new labels to the emails with label MyLabel.
  • gmailsorter -p/--port port for authentication webserver to run e.g. 8080 .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gmailsorter-0.1.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

gmailsorter-0.1.0-py3-none-any.whl (1.4 MB view details)

Uploaded Python 3

File details

Details for the file gmailsorter-0.1.0.tar.gz.

File metadata

  • Download URL: gmailsorter-0.1.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for gmailsorter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 655eae89ebc3beef51b8b5d2df17381cf39dd5763560b158462f271a8406b1c0
MD5 63e9a3c4273921eb18dd3ea9e33b176d
BLAKE2b-256 8e27961e367105b8d33581fd63dbb17c6bb683168d96c7a417b6bb010fcaeb93

See more details on using hashes here.

File details

Details for the file gmailsorter-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gmailsorter-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for gmailsorter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e135e36d3f8614234c187be213c456c5cd90ddd085ad2036dda4a5fed234c67
MD5 0847df4bd30fce78380931a93e19da9c
BLAKE2b-256 12c168698dcdfd583f499e5555b5f8118f5bf93fb353aa9c29354bf9439cdcf3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page