Emails
The emails handling is split in 2 parts:
- the server part, dealing with IMAP connection, fetching/adding mailboxes folders etc.
- the object part, dealing with the email content per-se.s
Preamble¤
There are a couple of oddities with IMAP servers that you should be aware of.
Standards are for idiots who believe in them¤
Gmail and Outlook seem to take an ill-placed pleasure in ignoring IMAP standards (the RFCs), that are otherwise well supported by open-source server software like Dovecot. As such, there are several things we can’t do “the simple way” because we have to account for those discrepancies. Methods are provided in the imap_server
class to get data (like the names of the default mailboxes) properly, no matter the server implementation. You are advised to always use the provided methods to get mailboxes data for your filters, because they take care of the discrepancies internally and allow you to write portable filters.
Mail account vs. mailbox¤
A mail account has usually (always ?) a root, main, default, top-level mailbox named INBOX
or Inbox
, depending on servers (it’s case-sensitive). That’s where incoming emails end up. Then, this one has subfolders, like Sent
, Junk
, etc., also named mailboxes by IMAP specification. That can be confusing, so I always refer to them here as “folders” and “subfolders”.
IMAP servers only let you grab emails from one mailbox at a time, in a non-recursive fashion. It means that we will need to iterate over the list of known folders and subfolders to fetch all emails from a mail account. This list can be found in protocols.imap_server.Server.folders.
Emails have no truly unique ID¤
The IMAP UID of an email is only the positional order of reception of the email in the current mailbox. When moving emails to another mailbox, their UID will actually change. But moving emails to another mailbox and back to their original mailbox will not give them back their original UID either, as it is an index that can only be incremented.
The RFC 822 defines the Message-ID
header, that is indeed an unique identifier set when sending an email, like abcdef@mailserver.com
, where abcdef
is a random hash. The problem is this ID is set at the discretion of the email sender, and spam/spoofed emails don’t have one.
To circumvent this issue, the [protocols.imap_object.EMail.create_hash][] method creates a truly unique and persistent hash, using the data available in the email, like its date, sender and Message-ID
header, in order to identify emails in logs through their moves between mailboxes.
Unfortunately, IMAP actions still have to use the IMAP UID.
API¤
Server ¤
Bases: connectors.Server[imap_object.EMail]
, imaplib.IMAP4_SSL
IMAP server connector using mandatory SSL. Non-SSL connection is not implemented on purpose, because the Internet is a dangerous place and SSL only adds a little bit of safety, but it’s better than going out naked.
This class inherits from the Python standard class imaplib.IMAP4_SSL, so all the method are available, although most of them are re-wrapped here for direct and higher-level data handling.
The connection credentials are passed from [secretary.Secretary.load_connectors][] from the settings.ini
file of the current config subfolder.
Examples:
Mandatory content of the settings.ini
file to declare IMAP connection credentials:
Attributes¤
mailbox
instance-attribute
¤
The currently-opened or last-opened mailbox (aka (sub)folder).
folders
instance-attribute
¤
The list of all IMAP mailboxes (folders and subfolders) found on the current server. This attribute is auto-set when initializing a connection to a server. It gets refreshed when new folders are added programmatically at runtime.
inbox
instance-attribute
¤
The case-sensitive name of the system top-level and default mailbox. Gmail and Dovecot comply with the standard and call it INBOX
, but Outlook/Office365 gets creative and call it Inbox
. This attribute is properly set for the current server and should be used for portability instead of hard-coding "INBOX"
in filters.
junk
instance-attribute
¤
The case-sensitive name of the server spam mailbox, typically called Junk
or Spam
.
sent
instance-attribute
¤
The case-sensitive name of the server mailbox where copies of sent emails are kept. Note that some client store sent emails in the same folder as the email they reply to.
archive
instance-attribute
¤
The case-sensitive name of the server mailbox where old emails may be automatically archived. Not all servers use it.
drafts
instance-attribute
¤
The case-sensitive name of the server mailbox where emails written but not yet sent may be saved.
flagged
instance-attribute
¤
The case-sensitive name of the server mailbox where emails marked as important (having the standard flag \Flagged
) may be moved or duplicated. Not all servers use it.
n_messages
instance-attribute
¤
Default number of emails to retrieve (starting from the most recent). Set from the entries
config parameter.
server
instance-attribute
¤
URL or IP of the mailserver. Set from the server
config parameter.
port
instance-attribute
¤
Connection port on the mailserver. Defaults to 993 (IMAP SSL).
Functions¤
build_subfolder_name ¤
Assemble a complete subfolder name using the separator of the server.
Path should be the complete list of parent folders, e.g.
path = ["INBOX", "Money", "Taxes"]
will be assembled as
INBOX.Money.Taxes
or INBOX/Money/Taxes
, depending on server’s defaults.
Then, replace the INBOX
marker with the actual case-sensitive inbox name.
This is to deal with Outlook/Office365 discrepancies in folders name.
PARAMETER | DESCRIPTION |
---|---|
path |
the tree of parents folders
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
path
|
IMAP-encoded UTF-8 path
TYPE:
|
split_subfolder_path ¤
Find out what kind of separator is used on server for IMAP subfolder and split the parent folders into a list of folders.
Most servers use dots, like INBOX.Money.Taxes
, but Outlook/Office365 uses slashes, like INBOX/Money/Taxes
.
PARAMETER | DESCRIPTION |
---|---|
folder |
IMAP folder path
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
tree
|
list of parent folders
TYPE:
|
encode_imap_folder ¤
Ensure the subfolders are properly separated using the actual server separator (.
or /
) and encode the names in IMAP-custom UTF-7, taking care of non-latin characters and enquoting strings containing whitespaces. The result is ready for direct use in IMAP server commands.
This function takes fully-formed IMAP mailbox folder pathes, like INBOX.Money.Taxes
or INBOX/Money/Taxes
and will replace the subfolder separators with the server separator. The main INBOX
will also be replaced by the proper, case-sensitive, inbox name for the current server.
PARAMETER | DESCRIPTION |
---|---|
folder |
IMAP folder as Python string
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
folder
|
IMAP folder as IMAP-custom UTF-7 bytes.
TYPE:
|
get_imap_folders ¤
List all inbox subfolders as plain text, to be reused by filter definitions. Update the Server.folders list.
get_email ¤
Get an arbitrary email by its UID. If no mailbox is specified,
use the one defined when we got objects in self.set_objects()
.
If a mailbox is specified, we select it temporarilly and we restore the original mailbox used to get objects. If no mailbox is selected, we use the previously-selected one, typically in Server.get_objects.
PARAMETER | DESCRIPTION |
---|---|
uid |
the unique ID of the email in the mailbox. Be aware that this ID is unique only in the scope of one mailbox (aka IMAP (sub)folder) because it is defined as the positional order of reception of each email in the mailbox.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
message
|
the email object.
TYPE:
|
get_objects ¤
Get the n last emails in a mailbox. Update the [Server.objects][connectors.Server.objects] list.
Processed email get logged with the number
PARAMETER | DESCRIPTION |
---|---|
mailbox |
the full path of the mailbox. It will be sanitized for folder/subfolder separator and actual
TYPE:
|
n_messages |
number of messages to fetch, starting with the most recent. If
TYPE:
|
run_filters ¤
Run the function filter
and execute the function action
if the filtering condition is met
PARAMETER | DESCRIPTION |
---|---|
filter |
function performing checking arbitrary conditions on emails, returning
TYPE:
|
action |
function performing the actual action. It will get an EMail object as argument.
TYPE:
|
runs |
how many times a filter should run at most on each email.
TYPE:
|
create_folder ¤
Create an IMAP (sub)folder recursively if needed (create the parent(s) if missing, then create the child).
Calls Server.create at each recursivity level, so IMAP folder names are fully sanitized.
create ¤
Create a new mailbox. This is a wrapper over imaplib.IMAP4.create method, where mailbox
is directly encoded to IMAP-custom UTF-7 format through Server.encode_imap_folder.
The direct use of this function is discouraged, see Server.create_folder for a nicer method. Namely, this will error if trying to create a subfolder of a non-existing parent folder, and does not update Server.folders.
append ¤
Add an arbitrary email to the specified mailbox. The email doesn’t need to have been actually sent, it can be generated programmatically or be a copy of a sent email.
PARAMETER | DESCRIPTION |
---|---|
mailbox |
the name of the target mailbox (folder or subfolder).
TYPE:
|
flags |
IMAP flags, standard (like
TYPE:
|
email |
a proper
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
status (str) |
EMail ¤
Bases: connectors.Content
Attributes¤
ips
instance-attribute
¤
list[str]
: List of IPs found in the server delivery route (in Received
headers)
domains
instance-attribute
¤
list[str]
: List of domains found in the server delivery route (in Received
headers)
server
instance-attribute
¤
(Server)
: back-reference to the Server instance from which the current email is extracted.
msg
instance-attribute
¤
Standard Python email object
Functions¤
has_header ¤
get_sender ¤
parse_urls ¤
Update self.urls
with a list of all URLs found in input
, split as (domain, page)
tuples.
Examples:
Each result in the list is a tuple (domain, page), for example :
google.com/index.php
is broken into('google.com', '/index.php')
google.com/
is broken into('google.com', '/')
google.com/login.php?id=xxx
is broken into('google.com', '/login.php')
get_body ¤
Get the body of the email.
PARAMETER | DESCRIPTION |
---|---|
preferencelist |
sequence of candidate properties in which to pick the email body, by order of priority. If set to |
Note
Emails using quoted-printable
transfer encoding but not UTF-8 charset are not handled. This weird combination has been met only in spam messages written in Russian, so far, and should not affect legit emails.
is_in ¤
is_in(
query_list: list[str] | str,
field: str,
case_sensitive: bool = False,
mode: str = "any",
) -> bool
Check if any or all of the elements in the query_list
is in the email field
.
PARAMETER | DESCRIPTION |
---|---|
query_list |
list of keywords or unique keyword to find in |
field |
any RFC 822 header or
TYPE:
|
case_sensitive |
TYPE:
|
mode |
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
|
tag ¤
Add any arbitrary IMAP tag (aka label), standard or not, to the current email.
Warning
In Mozilla Thunderbird, labels/tags need to be configured first in the preferences (by mapping the label string to a color) to properly appear in the GUI. Otherwise, any undefined tag will be identified as “Important” (associated with red), no matter its actual string.
Horde, Roundcube and Nextcloud mail (based on Horde) treat those properly.
untag ¤
Remove any arbitrary IMAP tag (aka label), standard or not, to the current email.
delete ¤
Delete the current email directly without using the trash bin. It will not be recoverable.
Use EMail.move to move the email to the trash folder to get a last chance at reviewing what will be deleted.
Note
As per IMAP standard, this only add the \Deleted
flag to the current email. Emails will be actually deleted when the expunge
server command is launched, which is done automatically at the end of Server.run_filters.
spam ¤
Mark the current email as spam, adding Mozilla Thunderbird Junk
flag, and move it to the spam/junk folder.
move ¤
Move the current email to the target folder
, that will be created recursively if it does not exist. folder
will be internally encoded to IMAP-custom UTF-7 with Server.encode_imap_folder.
mark_as_important ¤
Flag or unflag an email as important
PARAMETER | DESCRIPTION |
---|---|
mode |
TYPE:
|
mark_as_read ¤
Flag or unflag an email as read (seen).
PARAMETER | DESCRIPTION |
---|---|
mode |
TYPE:
|
mark_as_answered ¤
Flag or unflag an email as answered.
PARAMETER | DESCRIPTION |
---|---|
mode |
TYPE:
|
Note
if you answer programmatically, you need to manually pass the Message-ID of the original email to the In-Reply-To and Referencess of the answer to get threaded messages. In-Reply-To gets only the immediate previous email, References get the whole thread.
is_recent ¤
Check if this session is the first one to get this email. It doesn’t mean user read it.
Note
this flag cannot be set by client, only by server. It’s read-only app-wise.
is_mailing_list ¤
Check if this email has the typical mailing-list headers.
Warning
The headers checked for hints here are not standard and not systematically used.
is_newsletter ¤
Check if this email has the typical newsletter headers.
Warning
The headers checked for hints here are not standard and not systematically used.
spf_pass ¤
Check if any of the servers listed in the Received
email headers is authorized by the DNS SPF rules to send emails on behalf of the email address set in Return-Path
.
RETURNS | DESCRIPTION |
---|---|
score
|
TYPE:
|
Note
The Return-Path
header is set by any proper mail client to the mailbox collecting bounces (notice of undelivered emails), and, while it is optional, the RFC 4408 states that it is the one from which the SPF domain will be inferred. In practice, it is missing only in certain spam messages, so its absence is treated as an explicit fail.
Warning
Emails older than 6 months will at least get a score of 0
and will therefore never fail the SPF check. This is because DNS configuration may have changed since the email was sent, and it could have been valid at the time of sending.
dkim_pass ¤
Check the authenticity of the DKIM signature.
Note
The DKIM signature uses an asymetric key scheme, where the private key is set on the SMTP server and the public key is set in DNS records of the mailserver. The signature is a cryptographic hash of the email headers (not their content). A valid signature means the private key used to hash headers matches the public key in the DNS records AND the headers have not been tampered with since sending.
RETURNS | DESCRIPTION |
---|---|
score
|
TYPE:
|
Warning
Emails older than 6 months will at least get a score of 0
and will therefore never fail the DKIM check. This is because DNS configuration (public key) may have changed since the email was sent, and it could have been valid at the time of sending.
arc_pass ¤
Check the authenticity of the ARC signature.
Note
The ARC signature is still experimental and not widely used. When an email is forwarded, by an user or through a mailing list, its DKIM signature will be invalidated and the email will appear forged/tampered. ARC authentifies the intermediate servers and aims at solving this issue.
RETURNS | DESCRIPTION |
---|---|
score
|
TYPE:
|
authenticity_score ¤
Compute the score of authenticity of the email, summing the results of EMail.spf_pass, EMail.dkim_pass and EMail.arc_pass. The weighting is designed such that one valid check compensates one fail.
RETURNS | DESCRIPTION |
---|---|
score
|
TYPE:
|
is_authentic ¤
Helper function for EMail.authenticity_score, checking if at least one authentication method succeeded.
RETURNS | DESCRIPTION |
---|---|
bool
|
True if EMail.authenticity_score returns a score greater or equal to zero. |
age ¤
Compute the age of an email at the time of evaluation
RETURNS | DESCRIPTION |
---|---|
timedelta
|
time difference between current time and sending time of the email |
now ¤
Helper to get access to date/time from within the email object when writing filters
query_referenced_emails ¤
Fetch the list of all emails referenced in the present message, aka the whole email thread in wich the current email belongs.
The list is sorted from newest to oldest. Queries emails having a Message-ID
header matching the ones contained in the References
header of the current email.
RETURNS | DESCRIPTION |
---|---|
list[EMail]
|
All emails referenced. |