Skip to main content

Provides central depot for incoming mail for use by applications.

Project description

=================
repoze.postoffice
=================

`repoze.postoffice` provides a centralized depot for collecting incoming email
for consumption by multiple applications. Incoming mail is sorted into queues
according to rules with the expectation that each application will then consume
its own queue. Each queue is a first-in-first-out (FIFO) queue, so messages
are processed in the order received.

ZODB is used for storage and is also used to provide the client interface.
`repoze.postoffice` clients create a ZODB connection and manipulate models.
This makes consuming the message queue in the context of a transaction,
relatively simple.

Setting up the depot
====================

`repoze.postoffice` assumes that a message transport agent (MTA), such as
Postfix, has been configured to deliver messages to a folder using the Maildir
format. Configuring the MTA is outside of the scope of this document.

Configuration File
++++++++++++++++++

The depot is configured via a configuration file in ini format. The ini file
consists of a single 'post office' section followed by one or more named
queue sections. The 'post office' section contains information about the ZODB
set up as well as the location of the incoming Maildir::

[post office]
# Required parameters
zodb_uri = zconfig://%(here)s/zodb.conf#main
maildir = %(here)s/incoming/Maildir

# Optional parameters
zodb_path = /postoffice
ooo_loop_frequency = 60 # 1 Hertz
ooo_loop_headers = To,Subject
ooo_throttle_period = 300 # 5 minutes
max_message_size = 500m

`zodb_uri` is interpreted using `repoze.zodbconn` and follows the format laid
out there. See: http://docs.repoze.org/zodbconn/narr.html

`zodb_path` is the path in the db to the postoffice queues. This parameter
is optional and defaults to '/postoffice'.

`maildir` is the path to the incoming Maildir format folder from which messages
are pulled.

`ooo_loop_frequency` specifies the threshold frequency of incoming messages
from the same user to the same queue, in messages per minute. When the
threshold is reached by a particular user, messages from that user will be
marked as rejected for period of time in an attempt to break a possible out
of office auto-reply loop. If not specified, no check is performed on
frequency of incoming messages.

`ooo_loop_headers` optionally causes loop detection to use the specified email
headers as discriminators. If specified, these headers must match for incoming
messages to trigger the ooo throttle. If not specified, no header matching is
done, and messages need only be sent from the same user to the same queue to
trigger the throttle.

`ooo_throttle_period` specifies the amount of time, in minutes, for which a
user's incoming mail will be marked as rejected if loop detection is in use
and the user reaches the `ooo_loop_frequency` threshold. Defaults to 5
minutes. If `ooo_loop_frequency` is not set, this setting has no effect.

`max_message_size` sets the maximum size, in bytes, of incoming messages.
Messages which exceed this limit will have their payloads discarded and will
be marked as rejected. The suffixes 'k', 'm' or 'g' may be used to specify
that the number of bytes is expressed in kilobytes, megabytes or gigabytes,
respectively. A number without suffix will be interpreted as bytes. If not
set, no limit will be imposed on incoming message size.

Each message queue is configured in a section with the prefix 'queue:'::

[queue:Customer A]
filters =
to_hostname: app.customera.com app.aliasa.com

[queue:Customer B]
filters =
to_hostname: .customerb.com

Filters
+++++++

Filters are used to determine which messages land in which queues. When a new
message enters the system each queue is tried in the order specified in the
ini file until a match is found or until all of the queues have been tried.
For each queue each filter for that queue is processed. In order to match for
a queue a message must match all filters for that queue.

At the time of the following filters are implemented:

+ `to_hostname`: This filter matches the hostname of the email address in the
'To' or 'CC' headers of the message. Hostnames which beging with a period will
match any hostname that ends with the specified name, ie '.example.com'
matches 'example.com' and 'app.example.com'. If the hostname does not begin
with a period it must match exactly. Multiple hostnames, delimited by
whitespace, may be listed. If multiple hostnames are used, an incoming message
need match only one.

+ `header_regexp`: This filter allows the matching of arbitrary regular
expressions against the headers of a message. Only a single regular
expression can be specified. An example::

[queue:Parties]
filters =
header_regexp: Subject:.+[Pp]arty.+

+ `header_regexp_file`: This filter is the same as `header_regexp` except that
multiple regular expressions can be written in a file. Regular expressions are
newline delimited in the file. The argument to this filter is the path to the
file::

[queue:Weddings]
filters =
header_regexp_file: %(here)s/wedding_invitation_header_checks.txt

+ `body_regexp`: Like `header_regexp` except the regular expression must match
some text in one of the message part bodies.

+ `body_regexp_file`: Like `header_regexp_file` except the regular expressions
must match some text in one of the message part bodies.

Global Reject Filters
+++++++++++++++++++++

In addition to defining filters for queues, filters can be defined globally
for rejection of messages before they can be assigned to queues. Any filter
that can be used for a queue can be used here. The basic difference, though,
is that for a queue, if a filter matches, the message goes into the queue.
Here, though, if a filter matches the message is rejected. ::

[post office]
reject_filters =
header_regexp_file: reject_headers.txt
body_regexp_file: reject_body.txt
to_hostname: *.partycentral.com # We need to get them to change their MX

Populating Queues
=================

Queues are populated using the `postoffice` console script that is provided
when the `repoze.postoffice` egg is installed. This script reads messages from
the incoming maildir and imports them into the ZODB-based depot. Messages are
matched and placed in appropriate queues. Messages which do not match any
queues are erased. There are no required arguments to the script--if it can
find its .ini file, it will work::

$ bin/postoffice

The `postoffice` script will search for an ini file named 'postoffice.ini'
first in the current directory, then in an 'etc' folder in the current
directory, then an 'etc' folder that is a sibling of the 'bin' folder which
contains the `postoffice` script and then, finally, in '/etc'. You can also
use a non-standard location for the ini file by passing the path as an
argument to the script::

$ bin/postoffice -C path/to/config.ini

Use the '-h' or '--help' switch to see all of the options available.

Out of Office Loop Detection
============================

`repoze.postoffice` does attempt to address out of office loops. An out of
office loop can occur when `repoze.postoffice` is used to populate content in
an application which generates an email to alert users of the new content.
Essentially, a poorly behaved email client will respond to the new content
alert email with an out of office reply which in turn causes more content to
be created and another alert email to be sent. Without some form of loop
detection, this can lead to a large amount of junk content being generated
very quickly.

When a new email enters the system, `repoze.postoffice` first checks for some
headers that could be set by well behaved MTA's to indicate automated
responses and marks as rejected messages which match these known heuristics.
First, the non-standard, but widely supported, 'Precedence' header is checked
and messages with a precedence of 'bulk', 'junk', or 'list' are marked as
rejected. Next `repoze.postoffice` will check for the presence of the
'Auto-Submitted' header which is described in rfc3834 and is standard, but not
yet widely supported. Messages containing this header are marked. In either of
these two cases, the incoming message is marked by adding the header::

X-Postoffice-Rejected: Auto-response

Out of office messages sent by certain clients (Microsoft) will typically not
use either of the above standards to indicate an automated reply. As a last
line of defense, `repoze.postoffice` also tracks the frequency of incoming
mail by email address and, optionally, other headers specified by the
'ooo_loop_headers' configuration option. When the number of messages arriving
from the same user surpasses a particular, assumedly inhuman, threshold, a
temporary block is placed on messages from that user, such that all messages
from that user are marked as rejected for a certain period of time, hopefully
breaking the auto reply feedback loop. Messages which trigger are fall under a
throttle are marked with header::

X-Postoffice-Rejected: Throttled

Messages marked with the 'X-Postoffice-Rejected' header are still conveyed to
the client. It is up to the client to check for this header and take
appropriate action. This allows the client to choose and take appropriate
action, such as bouncing with a particular bounce message, etc.

Message Size Limit
==================

If 'max_message_size' is specified in the configuration, messages which exceed
this size will have their payloads (body and any attachments) discarded and
will be marked with the header:

X-Postoffice-Rejected: Maximum Message Size Exceeded

The trimmed message is still conveyed to the client, which should check for
the 'X-Postoffice-Rejected' header and take appropriate action, possibly
including bouncing the message with an appropriate bounce message.

Consuming Queues
================

Client applications consume message queues by establishing a connection to the
ZODB which houses the depot and interacting with queue and message objects.
`repoze.postoffice.queue` contains a helper method, `open_queue` which given
connection information can open the connection for you and return a Queue
instance::

from my.example import process_message
from my.example import validate_message
from repoze.postoffice.queue import open_queue
import sys
import transaction

ZODB_URI = zconfig://%(here)s/zodb.conf#main
queue_name = 'my queue'
queue = open_queue(ZODB_URI, queue_name, path='/postoffice')
while queue:
message = queue.pop_next()
if not validate_message(message):
queue.bounce(message, 'Message is invalid.')
try:
process_message(message)
transaction.commit()
except:
transaction.abort()
queue.quarantine(message, sys.exc_info())
transaction.commit()


0.21 (2012-09-05)
-----------------

- Make regexp filters case sensitive.

0.20 (2012-05-14)
-----------------

- Only close databases we create ourselves in open_queue.

0.19 (2012-05-10)
-----------------

- Allow calling application to provide an already open database to open_queue.
(LP #985546)

0.18 (2012-04-19)
-----------------

- Added graceful degeneration when provided with uknown character encoding in
a MIME text part.

0.17 (2011-09-26)
-----------------

- Added a header, 'X-Postoffice-Date', to queued messages. It records
the time each message was received (as seconds since the epoch.) It
is set to the modified time of the maildir message file.

0.16 (2011-06-30)
-----------------

- Added better fault tolerance for insane date headers generated by spambots.
(LP #697033)

0.15 (2011-06-15)
-----------------

- Body checks are now multiline regexp checks. (LP #787573)

0.14 (2011-05-17)
-----------------

- Fixed problem where the zodb_uri could be unicode, which eventually breaks
the ZEO client for ZEO uris. zodb_uri is now converted to a UTF-8 string.

0.13 (2011-05-05)
-----------------

- Fixed problem with header filters not working properly with non-ASCII
characters in headers. (LP #777455)

- Fixed bug in regular expression body filter which improperly parsed the
'Content-Type' header in order to extract the character set of a Mime part.

0.12 (2011-04-25)
-----------------

- Respect leading and trailing whitespace in rules files.

0.11 (2011-04-25)
-----------------

- When a message is rejected by a filter, a message is logged showing which
filter triggered the rejection.

0.10 (2011-04-20)
-----------------

- Improved logging output now includes a timestamp.

- Worked around (probable) bug in stdlib email parser where a message part
might have a charset set in the 'Content-Type' header, but
message.get_charset() returns None.

0.9 (2011-04-15)
----------------

- Added greater fault tolerance for malformed email addresses. (Shakes fist at
spammers.)

- Added four new filter types based on regular expression matching:
`header_regexp`, `header_regexp_file`, `body_regexp`, `body_regexp_file`.
See README.txt for information on how to use these new filters.

- Added a new option to the global configuration: `reject_filters`. This allows
you to set up filters at a global level for rejecting certain messages. See
README.txt for more information.

0.8 (2011-01-14)
----------------

- The 'to_hostname' filter now parses multiple email addresses and checks the
'Cc' header as well as the 'To' header. (LP #659243)

- If multiple incoming messages in a 24 hour period have the same Message-Id,
they are presumed to be duplicates and all but the first are discarded.
(LP #659243)

0.7 (2010-09-15)
----------------

- Fixed another case where non-RFC 2047 compliant headers could cause an
exception to be raised. (LP #637484)

0.6 (2010-09-13)
----------------

- Added Queue.requeue_quarantined_messages() convenience method to API.

- Allow for multiple hosts in 'to_hostname' filter. (LP #614528)

- Added graceful degradation for non-RFC 2047 compliant headers, in order to
avoid crashing when spambots send us malformed messages. (LP #637484)

0.5 (2010-08-03)
----------------

- Added 'X-Postoffice: Bounced' header to outgoing bounce and quarantine
messages. The presence of this header is checked when importing messages and
any messages which contain it are discarded. This is to prevent possible
ricochets of bounce messages back into the system. (LP #612587)

- Incoming messages with a 'From' header which matches exactly its 'To' header
are now discarded as probable spam. (LP #612588)

0.4 (2010-07-30)
----------------

- Fixed bug in processing body of bounce messages when non-ascii unicode
characters are present.

0.3 (2010-07-20)
----------------

- Fixed divide by zero error when calculating instantaneous message frequency.

- Fixed bug in repoze.postoffice.queue.open_queue where a ZEO connection would
be left open if there was a KeyError on the queue name.

0.2 (2010-06-29)
----------------

- Fixed bug in parsing headers with no values.

- Added ability to use arbitrary message headers as discriminator values in
out of office loop detection.

- When messages exceed maximum message size, are throttled or are found to be
an auto-response, they are no longer discarded. Instead these messages get
an 'X-Postoffice-Rejected' header added where the value gives the reason for
rejection. These messages are then consumable by clients in the normal way.
It is up to the client to detect the 'X-Postoffice-Rejected' header and take
appropriate action. This change was made to allow the client to determine
what, if any, sort of bounce message should be generated if any of these
conditions are true.

0.1 (2010-06-03)
----------------

- Initial Release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repoze.postoffice-0.21.tar.gz (30.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page