Skip to content

Emails

The emails handling is split in 2 parts:

  • the server part, dealing with IMAP connection, fetching/adding mailboxes folders etc.
  • the object part, dealing with the email content per-se.s

Preamble¤

There are a couple of oddities with IMAP servers that you should be aware of.

Standards are for idiots who believe in them¤

Gmail and Outlook seem to take an ill-placed pleasure in ignoring IMAP standards (the RFCs), that are otherwise well supported by open-source server software like Dovecot. As such, there are several things we can’t do “the simple way” because we have to account for those discrepancies. Methods are provided in the imap_server class to get data (like the names of the default mailboxes) properly, no matter the server implementation. You are advised to always use the provided methods to get mailboxes data for your filters, because they take care of the discrepancies internally and allow you to write portable filters.

Mail account vs. mailbox¤

A mail account has usually (always ?) a root, main, default, top-level mailbox named INBOX or Inbox, depending on servers (it’s case-sensitive). That’s where incoming emails end up. Then, this one has subfolders, like Sent, Junk, etc., also named mailboxes by IMAP specification. That can be confusing, so I always refer to them here as “folders” and “subfolders”.

IMAP servers only let you grab emails from one mailbox at a time, in a non-recursive fashion. It means that we will need to iterate over the list of known folders and subfolders to fetch all emails from a mail account. This list can be found in protocols.imap_server.Server.folders.

Emails have no truly unique ID¤

The IMAP UID of an email is only the positional order of reception of the email in the current mailbox. When moving emails to another mailbox, their UID will actually change. But moving emails to another mailbox and back to their original mailbox will not give them back their original UID either, as it is an index that can only be incremented.

The RFC 822 defines the Message-ID header, that is indeed an unique identifier set when sending an email, like abcdef@mailserver.com, where abcdef is a random hash. The problem is this ID is set at the discretion of the email sender, and spam/spoofed emails don’t have one.

To circumvent this issue, the [protocols.imap_object.EMail.create_hash][] method creates a truly unique and persistent hash, using the data available in the email, like its date, sender and Message-ID header, in order to identify emails in logs through their moves between mailboxes.

Unfortunately, IMAP actions still have to use the IMAP UID.

API¤

Server ¤

Server(logfile, secretary)

Bases: connectors.Server[imap_object.EMail], imaplib.IMAP4_SSL

IMAP server connector using mandatory SSL. Non-SSL connection is not implemented on purpose, because the Internet is a dangerous place and SSL only adds a little bit of safety, but it’s better than going out naked.

This class inherits from the Python standard class imaplib.IMAP4_SSL, so all the method are available, although most of them are re-wrapped here for direct and higher-level data handling.

The connection credentials are passed from [secretary.Secretary.load_connectors][] from the settings.ini file of the current config subfolder.

Examples:

Mandatory content of the settings.ini file to declare IMAP connection credentials:

[imap]
    user = me@server.com
    password = xyz
    server = mail.server.com
    entries = 20

Attributes¤

mailbox instance-attribute ¤

mailbox: str = ''

The currently-opened or last-opened mailbox (aka (sub)folder).

folders instance-attribute ¤

folders: list[str]

The list of all IMAP mailboxes (folders and subfolders) found on the current server. This attribute is auto-set when initializing a connection to a server. It gets refreshed when new folders are added programmatically at runtime.

inbox instance-attribute ¤

inbox: str

The case-sensitive name of the system top-level and default mailbox. Gmail and Dovecot comply with the standard and call it INBOX, but Outlook/Office365 gets creative and call it Inbox. This attribute is properly set for the current server and should be used for portability instead of hard-coding "INBOX" in filters.

junk instance-attribute ¤

junk: str

The case-sensitive name of the server spam mailbox, typically called Junk or Spam.

trash instance-attribute ¤

trash: str

The case-sensitive name of the server trashbin mailbox.

sent instance-attribute ¤

sent: str

The case-sensitive name of the server mailbox where copies of sent emails are kept. Note that some client store sent emails in the same folder as the email they reply to.

archive instance-attribute ¤

archive: str

The case-sensitive name of the server mailbox where old emails may be automatically archived. Not all servers use it.

drafts instance-attribute ¤

drafts: str

The case-sensitive name of the server mailbox where emails written but not yet sent may be saved.

flagged instance-attribute ¤

flagged: str

The case-sensitive name of the server mailbox where emails marked as important (having the standard flag \Flagged) may be moved or duplicated. Not all servers use it.

n_messages instance-attribute ¤

n_messages: int

Default number of emails to retrieve (starting from the most recent). Set from the entries config parameter.

server instance-attribute ¤

server: str

URL or IP of the mailserver. Set from the server config parameter.

user instance-attribute ¤

user: str

Username of the mail account on the mailserver.

password instance-attribute ¤

password: str

Password of the mail account on the mailserver.

port instance-attribute ¤

port: int = 993

Connection port on the mailserver. Defaults to 993 (IMAP SSL).

Functions¤

build_subfolder_name ¤

build_subfolder_name(path: list) -> str

Assemble a complete subfolder name using the separator of the server.

Path should be the complete list of parent folders, e.g. path = ["INBOX", "Money", "Taxes"] will be assembled as INBOX.Money.Taxes or INBOX/Money/Taxes, depending on server’s defaults.

Then, replace the INBOX marker with the actual case-sensitive inbox name. This is to deal with Outlook/Office365 discrepancies in folders name.

PARAMETER DESCRIPTION
path

the tree of parents folders

TYPE: list

RETURNS DESCRIPTION
path

IMAP-encoded UTF-8 path

TYPE: str

split_subfolder_path ¤

split_subfolder_path(folder: str) -> list[str]

Find out what kind of separator is used on server for IMAP subfolder and split the parent folders into a list of folders.

Most servers use dots, like INBOX.Money.Taxes, but Outlook/Office365 uses slashes, like INBOX/Money/Taxes.

PARAMETER DESCRIPTION
folder

IMAP folder path

TYPE: str

RETURNS DESCRIPTION
tree

list of parent folders

TYPE: list

encode_imap_folder ¤

encode_imap_folder(folder: str) -> bytes

Ensure the subfolders are properly separated using the actual server separator (. or /) and encode the names in IMAP-custom UTF-7, taking care of non-latin characters and enquoting strings containing whitespaces. The result is ready for direct use in IMAP server commands.

This function takes fully-formed IMAP mailbox folder pathes, like INBOX.Money.Taxes or INBOX/Money/Taxes and will replace the subfolder separators with the server separator. The main INBOX will also be replaced by the proper, case-sensitive, inbox name for the current server.

PARAMETER DESCRIPTION
folder

IMAP folder as Python string

TYPE: str

RETURNS DESCRIPTION
folder

IMAP folder as IMAP-custom UTF-7 bytes.

TYPE: bytes

get_imap_folders ¤

get_imap_folders()

List all inbox subfolders as plain text, to be reused by filter definitions. Update the Server.folders list.

get_email ¤

get_email(uid: str, mailbox=None) -> imap_object.EMail | None

Get an arbitrary email by its UID. If no mailbox is specified, use the one defined when we got objects in self.set_objects().

If a mailbox is specified, we select it temporarilly and we restore the original mailbox used to get objects. If no mailbox is selected, we use the previously-selected one, typically in Server.get_objects.

PARAMETER DESCRIPTION
uid

the unique ID of the email in the mailbox. Be aware that this ID is unique only in the scope of one mailbox (aka IMAP (sub)folder) because it is defined as the positional order of reception of each email in the mailbox.

TYPE: str

RETURNS DESCRIPTION
message

the email object.

TYPE: imap_object.EMail

get_objects ¤

get_objects(mailbox: str, n_messages=-1)

Get the n last emails in a mailbox. Update the [Server.objects][connectors.Server.objects] list.

Processed email get logged with the number

PARAMETER DESCRIPTION
mailbox

the full path of the mailbox. It will be sanitized for folder/subfolder separator and actual INBOX name internally.

TYPE: str

n_messages

number of messages to fetch, starting with the most recent. If -1, the preference set in settings.ini will be used. Any other value will set it temporarily.

TYPE: int DEFAULT: -1

run_filters ¤

run_filters(filter, action, runs=1)

Run the function filter and execute the function action if the filtering condition is met

PARAMETER DESCRIPTION
filter

function performing checking arbitrary conditions on emails, returning True if the action should be performed. It will get an EMail object as argument.

TYPE: callable

action

function performing the actual action. It will get an EMail object as argument.

TYPE: callable

runs

how many times a filter should run at most on each email. -1 means no limit.

TYPE: int DEFAULT: 1

close_connection ¤

close_connection()

High-level method to logout from a server

create_folder ¤

create_folder(folder: str)

Create an IMAP (sub)folder recursively if needed (create the parent(s) if missing, then create the child).

Calls Server.create at each recursivity level, so IMAP folder names are fully sanitized.

create ¤

create(mailbox: str)

Create a new mailbox. This is a wrapper over imaplib.IMAP4.create method, where mailbox is directly encoded to IMAP-custom UTF-7 format through Server.encode_imap_folder.

The direct use of this function is discouraged, see Server.create_folder for a nicer method. Namely, this will error if trying to create a subfolder of a non-existing parent folder, and does not update Server.folders.

append ¤

append(mailbox: str, flags: str, email: email.message.EmailMessage) -> str

Add an arbitrary email to the specified mailbox. The email doesn’t need to have been actually sent, it can be generated programmatically or be a copy of a sent email.

PARAMETER DESCRIPTION
mailbox

the name of the target mailbox (folder or subfolder).

TYPE: str

flags

IMAP flags, standard (like '(\Seen)' to mark as read, or '(\Flagged)' to mark as important) or custom (can be any string not starting with \).

TYPE: str

email

a proper EmailMessage object with initialized content, ready to send.

TYPE: email.message.EmailMessage

RETURNS DESCRIPTION
str

status (str)

EMail ¤

EMail(raw_message: list, server)

Bases: connectors.Content

Attributes¤

urls instance-attribute ¤

urls = []

list[tuple[str]]: List of URLs found in email body.

ips instance-attribute ¤

ips = []

list[str]: List of IPs found in the server delivery route (in Received headers)

domains instance-attribute ¤

domains = []

list[str]: List of domains found in the server delivery route (in Received headers)

server instance-attribute ¤

server: 'Server'

(Server): back-reference to the Server instance from which the current email is extracted.

msg instance-attribute ¤

msg: email.message.EmailMessage = email.message_from_bytes(
    self.raw, policy=policy.default
)

Standard Python email object

Functions¤

has_header ¤

has_header(header: str) -> bool

Check if the case-insensitive header exists in the email headers.

PARAMETER DESCRIPTION
header

the RFC 822 email header.

TYPE: str

RETURNS DESCRIPTION
bool

presence of the header

get_sender ¤

get_sender() -> list[list, list]

Get the full list of senders of the email, using the From header, splitting their name (if any) apart from their address.

RETURNS DESCRIPTION
list[list, list]

list[0] contains the list of names, rarely used, list[1] is the list of email addresses.

parse_urls ¤

parse_urls(input: str) -> list[tuple]

Update self.urls with a list of all URLs found in input, split as (domain, page) tuples.

Examples:

Each result in the list is a tuple (domain, page), for example :

  • google.com/index.php is broken into ('google.com', '/index.php')
  • google.com/ is broken into ('google.com', '/')
  • google.com/login.php?id=xxx is broken into ('google.com', '/login.php')

get_body ¤

get_body(preferencelist=('related', 'html', 'plain')) -> str

Get the body of the email.

PARAMETER DESCRIPTION
preferencelist

sequence of candidate properties in which to pick the email body, by order of priority. If set to "plain", return either the plain-text variant of the email if any, or build one by removing (x)HTML markup from the HTML variant if no plain-text variant is available.

TYPE: tuple | str DEFAULT: ('related', 'html', 'plain')

Note

Emails using quoted-printable transfer encoding but not UTF-8 charset are not handled. This weird combination has been met only in spam messages written in Russian, so far, and should not affect legit emails.

is_in ¤

is_in(
    query_list: list[str] | str,
    field: str,
    case_sensitive: bool = False,
    mode: str = "any",
) -> bool

Check if any or all of the elements in the query_list is in the email field.

PARAMETER DESCRIPTION
query_list

list of keywords or unique keyword to find in field.

TYPE: list[str] | str

field

any RFC 822 header or "body".

TYPE: str

case_sensitive

True if the search should be case-sensitive. This has no effect if field is a RFC 822 header, it only applies to the email body.

TYPE: str DEFAULT: False

mode

"any" if any element in query_list should be found in field to return True. "all" if all elements in query_list should be found in field to return True.

TYPE: str DEFAULT: 'any'

RETURNS DESCRIPTION
bool

True if any or all elements (depending on mode) of query_list have been found in field.

tag ¤

tag(keyword: str)

Add any arbitrary IMAP tag (aka label), standard or not, to the current email.

Warning

In Mozilla Thunderbird, labels/tags need to be configured first in the preferences (by mapping the label string to a color) to properly appear in the GUI. Otherwise, any undefined tag will be identified as “Important” (associated with red), no matter its actual string.

Horde, Roundcube and Nextcloud mail (based on Horde) treat those properly.

untag ¤

untag(keyword: str)

Remove any arbitrary IMAP tag (aka label), standard or not, to the current email.

delete ¤

delete()

Delete the current email directly without using the trash bin. It will not be recoverable.

Use EMail.move to move the email to the trash folder to get a last chance at reviewing what will be deleted.

Note

As per IMAP standard, this only add the \Deleted flag to the current email. Emails will be actually deleted when the expunge server command is launched, which is done automatically at the end of Server.run_filters.

spam ¤

spam(spam_folder='INBOX.spam')

Mark the current email as spam, adding Mozilla Thunderbird Junk flag, and move it to the spam/junk folder.

move ¤

move(folder: str)

Move the current email to the target folder, that will be created recursively if it does not exist. folder will be internally encoded to IMAP-custom UTF-7 with Server.encode_imap_folder.

mark_as_important ¤

mark_as_important(mode: str)

Flag or unflag an email as important

PARAMETER DESCRIPTION
mode

add to add the \Flagged IMAP tag to the current email, remove to remove it.

TYPE: str

mark_as_read ¤

mark_as_read(mode: str)

Flag or unflag an email as read (seen).

PARAMETER DESCRIPTION
mode

add to add the \Seen IMAP tag to the current email, remove to remove it.

TYPE: str

mark_as_answered ¤

mark_as_answered(mode: str)

Flag or unflag an email as answered.

PARAMETER DESCRIPTION
mode

add to add the \Answered IMAP tag to the current email, remove to remove it.

TYPE: str

Note

if you answer programmatically, you need to manually pass the Message-ID of the original email to the In-Reply-To and Referencess of the answer to get threaded messages. In-Reply-To gets only the immediate previous email, References get the whole thread.

is_read ¤

is_read() -> bool

Check if this email has been opened and read.

is_unread ¤

is_unread() -> bool

Check if this email has not been yet opened and read.

is_recent ¤

is_recent() -> bool

Check if this session is the first one to get this email. It doesn’t mean user read it.

Note

this flag cannot be set by client, only by server. It’s read-only app-wise.

is_draft ¤

is_draft() -> bool

Check if this email is maked as draft.

is_answered ¤

is_answered() -> bool

Check if this email has been answered.

is_important ¤

is_important() -> bool

Check if this email has been flagged as important.

is_mailing_list ¤

is_mailing_list() -> bool

Check if this email has the typical mailing-list headers.

Warning

The headers checked for hints here are not standard and not systematically used.

is_newsletter ¤

is_newsletter() -> bool

Check if this email has the typical newsletter headers.

Warning

The headers checked for hints here are not standard and not systematically used.

spf_pass ¤

spf_pass() -> int

Check if any of the servers listed in the Received email headers is authorized by the DNS SPF rules to send emails on behalf of the email address set in Return-Path.

RETURNS DESCRIPTION
score
  • = 0: neutral result, no explicit success or fail, or server configuration could not be retrieved/interpreted.
  • > 0: success, server is explicitly authorized or SPF rules are deliberately permissive.
  • < 0: fail, server is unauthorized.
  • = 2: explicit success, server is authorized.
  • = -2: explicit fail, server is forbidden, the email is a deliberate spoofing attempt.

TYPE: int

Note

The Return-Path header is set by any proper mail client to the mailbox collecting bounces (notice of undelivered emails), and, while it is optional, the RFC 4408 states that it is the one from which the SPF domain will be inferred. In practice, it is missing only in certain spam messages, so its absence is treated as an explicit fail.

Warning

Emails older than 6 months will at least get a score of 0 and will therefore never fail the SPF check. This is because DNS configuration may have changed since the email was sent, and it could have been valid at the time of sending.

dkim_pass ¤

dkim_pass() -> int

Check the authenticity of the DKIM signature.

Note

The DKIM signature uses an asymetric key scheme, where the private key is set on the SMTP server and the public key is set in DNS records of the mailserver. The signature is a cryptographic hash of the email headers (not their content). A valid signature means the private key used to hash headers matches the public key in the DNS records AND the headers have not been tampered with since sending.

RETURNS DESCRIPTION
score
  • = 0: there is no DKIM signature.
  • = 1: the DKIM signature is valid but outdated. This means the public key in DNS records has been updated since they email was sent.
  • = 2: the DKIM signature is valid and up-to-date.
  • = -2: the DKIM signature is invalid. Either the headers have been tampered or the DKIM signature is entirely forged (happens a lot in spam emails).

TYPE: int

Warning

Emails older than 6 months will at least get a score of 0 and will therefore never fail the DKIM check. This is because DNS configuration (public key) may have changed since the email was sent, and it could have been valid at the time of sending.

arc_pass ¤

arc_pass() -> int

Check the authenticity of the ARC signature.

Note

The ARC signature is still experimental and not widely used. When an email is forwarded, by an user or through a mailing list, its DKIM signature will be invalidated and the email will appear forged/tampered. ARC authentifies the intermediate servers and aims at solving this issue.

RETURNS DESCRIPTION
score
  • = 0: there is no ARC signature,
  • = 2: the ARC signature is valid
  • =-2: the ARC signature is invalid. Typically, it means the signature has been forged.

TYPE: int

authenticity_score ¤

authenticity_score() -> int

Compute the score of authenticity of the email, summing the results of EMail.spf_pass, EMail.dkim_pass and EMail.arc_pass. The weighting is designed such that one valid check compensates one fail.

RETURNS DESCRIPTION
score
  • == 0: neutral, no explicit authentification is defined on DNS or no rule could be found
  • > 0: explicitly authenticated by at least one method,
  • == 6: maximal authenticity (valid SPF, DKIM and ARC)
  • < 0: spoofed, at least one of SPF or DKIM or ARC failed and

TYPE: int

is_authentic ¤

is_authentic() -> bool

Helper function for EMail.authenticity_score, checking if at least one authentication method succeeded.

RETURNS DESCRIPTION
bool

True if EMail.authenticity_score returns a score greater or equal to zero.

age ¤

age() -> timedelta

Compute the age of an email at the time of evaluation

RETURNS DESCRIPTION
timedelta

time difference between current time and sending time of the email

now ¤

now() -> str

Helper to get access to date/time from within the email object when writing filters

query_referenced_emails ¤

query_referenced_emails() -> list[EMail]

Fetch the list of all emails referenced in the present message, aka the whole email thread in wich the current email belongs.

The list is sorted from newest to oldest. Queries emails having a Message-ID header matching the ones contained in the References header of the current email.

RETURNS DESCRIPTION
list[EMail]

All emails referenced.

query_replied_email ¤

query_replied_email() -> EMail

Fetch the email being replied to by the current email.

RETURNS DESCRIPTION
EMail

The email being replied to.