Skip to main content

Assign labels to emails in Google Mail based on their similarity to other emails assigned to the same label.

Project description

Sort your emails automatically

Python package Coverage Status Code style: black

The pygmailsorter is a python module to automate the filtering of emails on the Google mail service using the their API. It assigns labels to emails based on their similarity to other emails assigned to the same label.

Motivation

Many people struggle with the increasing email volume leading to hundreds of unread emails. As the capabilities of even the best search engine are limited when it comes to large numbers of emails, the only way to keep an overview is filing emails into folders. The manual work of filing emails into folders is tedious, still most people are too lazy to create email filters and keep their email filters up to date. Finally, in the age of mobile computing when most people access their emails from their smartphone, the challenge of sorting emails is more relevant than ever.

The solution to this challenge is to automatically filter emails depending on their similarity to existing emails in a given folder. This solution was already proposed in a couple of research papers ranging from the filtering of spam emails 1 to the specific case of sorting emails into folders 2. Even a couple of open source prototypes were available like 3 and 4.

This is basically a similar approach specific to the Google Mail API. It is a python script, which can be executed periodically for example with a cron task to sort the emails for the user.

Installation

The pygmailsorter is available on the conda-forge or pypi repositories and can be installed using either:

conda install -c conda-forge pygmailsorter

or alternatively:

pip install pygmailsorter

After the installation the user has to create a Google Mail API credentials file credentials.json following the Google Mail API documentation. This file is then stored in the configuration directory ~/.pygmailsorter/credentials.json.

Configuration

The pygmailsorter stores the configuration files in the users home directory ~/.pygmailsorter. This folder contains:

  • ~/.pygmailsorter/credentials.json the authentication credentials for the Google API, which requires access to Gmail.
  • ~/.pygmailsorter/token_files the token directory is used to store the active token for accessing the APIs, these are created automatically, there should be no need for the user to modify these.
  • ~/.pygmailsorter/email.db a local SQLite database to store the emails and machine learning models to accelerate the sorting.

Python interface

Import the pygmailsorter module

from pygmailsorter import Gmail

Initialize pygmailsorter

Create a gmail object from the Gmail() class

gmail = Gmail()

For testing purposes you can use the optimal client_service_file parameter to specify the location of the authentication credentials in case they are not stored in ~/.pygmailsorter/credentials.json. Or alternatively, you can provide the path to the configuration directory config_folder, in case this is not located at ~/.pygmailsorter.

Sync local database with email account

To reduce the communication overhead, the emails are stored locally in an SQLite database.

gmail.update_database(quick=False)

By setting the optional flag quick to True only new emails are downloaded while changes to existing emails are ignored.

Generate pandas dataframe for emails

Load all emails from the local SQLite database and combine them in a pandas DataFrame for further postprocessing:

df = gmail.get_all_emails_in_database()

Download specific label from email server

Download emails with the label "MyLabel" from the email server:

df = gmail.download_emails_for_label(label="MyLabel")

In this case the emails are not stored in the local SQLite database.

Filter emails using machine learning

Assign new email labels to the emails with the label "MyLabel":

gmail.filter_messages_from_server
    label="MyLabel",
    recommendation_ratio=0.9,
)

This functionality is based on the download_emails_for_label() function above. It checks the server for new emails for a selected label "MyLabel". Then reloads the machine learning model from the local SQLite database and trys to predict the correct labels for these emails. The recommendation_ratio defines the level of certainty required to actually move the email, with 0.9 equalling a certainty of 90%.

Command Line interface

The command line interface is currently rather limited, it supports the following options:

  • pygmailsorter -c/--config=~/.pygmailsorter the configuration directory can be specified manually.
  • pygmailsorter -u/--update update the local email database and retrain the machine learning model.
  • pygmailsorter -l/--label=MyLabel assign new labels to the emails with label MyLabel.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygmailsorter-0.0.2.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

pygmailsorter-0.0.2-py3-none-any.whl (24.4 kB view details)

Uploaded Python 3

File details

Details for the file pygmailsorter-0.0.2.tar.gz.

File metadata

  • Download URL: pygmailsorter-0.0.2.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for pygmailsorter-0.0.2.tar.gz
Algorithm Hash digest
SHA256 a9393a0d625fa71c8d79712b143459d0e55877b0af265c4cf53d1e325284872a
MD5 4ee697d7e66672591faf1ffadb235bd6
BLAKE2b-256 66dde8666849fb7b7717e58fe202f85019763c64ab7313868d79d3be75a62934

See more details on using hashes here.

File details

Details for the file pygmailsorter-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for pygmailsorter-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c2a8780dae1248f536babf4ee1c76f04d11469fa4abcbe2778f012c80a617137
MD5 c8e9f9cb2add9e603bcac33dde8f9193
BLAKE2b-256 512d2d5dbb17638b7ff698c180a7dee6e4cb0fcd77d6a4f2966f287fd7dbacd6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page