Skip to content

core.database¤

core.database ¤

Create an SQLite database of web_pages to be used by a search engine.

© 2024 - Aurélien Pierre

Attributes¤

core.database.regex_starter module-attribute ¤

regex_starter = '(?<=^|\\s|\\[|\\(|\\{|\\<|\\\'|\\"|`|;|\\>)'

Start of line, or start of document, or start of markup

core.database.regex_stopper module-attribute ¤

regex_stopper = '(?=$|\\s|\\]|\\)|\\}|\\>|\\\'|\\"|`|;|\\<)'

End of line, or end of document, or end of markup

core.database.end_of_word module-attribute ¤

end_of_word = '(?=$|\\s|\\]|\\)|\\}|\\>|\\\'|\\"|`|;|:|,|\\?|\\!|\\.|\\<)'

End of word, or end of line, or end of document, or end of markup

core.database.regex_algebra module-attribute ¤

regex_algebra = '[\\+\\-\\=\\\\±]'

Algebraic signs

core.database.IP_PATTERN module-attribute ¤

IP_PATTERN = re.compile(
    "%s%s%s" % (regex_starter, regex_ip, regex_stopper), re.IGNORECASE
)

IPv4 and IPv6 patterns where the whole IP is captured in the first group.

core.database.EMAIL_PATTERN module-attribute ¤

EMAIL_PATTERN = re.compile(
    "<?([0-9a-z\\-\\_\\+\\.]+?@[0-9a-z\\-\\_\\+]+(\\.[0-9a-z\\_\\-]{2,})+)>?",
    re.IGNORECASE,
)

Emails patterns like <me@mail.com> or me@mail.com where the whole address is captured in the first group.

core.database.URL_PATTERN module-attribute ¤

URL_PATTERN = re.compile(
    "%s%s%s" % (regex_starter, regex_url, end_of_word), re.IGNORECASE
)

URL patterns like http(s)://domain.ext/page/subpage?q=x&r=0:1#anchor or //domain.ext/page. URL must follow RFC3986, meaning query parameters should be before anchors, if any. Relying on this assumption allows a faster regex parsing.

  • the protocol (ftp, ftps, http, https) is captured as the first group,
  • domain.ext is captured as the second group,
  • /page/etc is the third group, including leading and trailing /,
  • page query parameters ?s=x&r=0, including ?, is the fourth group if the URL declares ...?params#anchor,
  • anchor #anchor is the fifth group, including #, if the URL declares ...?params#anchor.

URLs are captured if they are:

  • alone on their own line,
  • enclosed in {}, [], ()
  • enclosed in whitespaces.

Warning: URLs enclosed in (), [] and {} may retain the closing sign as part of the page name since () and [] are valid in URL pathes and parameters. This pattern will work on plain text only: Markdown, XML, HTML and JSON will need to be parsed ahead.

core.database.MEMBERS_PATTERN module-attribute ¤

MEMBERS_PATTERN = re.compile('(?<=[a-z])(\\.)(?=[a-z])', re.IGNORECASE)

Domain patterns without leading protocol like cdn.company.com or class members in object-oriented programming languages like params.cookies.client.

core.database.DATE_PATTERN module-attribute ¤

DATE_PATTERN = re.compile(date_regex, re.IGNORECASE)

Dates like 2022-12-01, 01-12-2022, 01-12-22, 01/12/2022, 01/12/22 where the whole date is captured in the first group, then each group of digits is captured in the order of appearance, in the next 3 groups

core.database.TIME_PATTERN module-attribute ¤

TIME_PATTERN = re.compile(time_regex, re.IGNORECASE)

Identify more or less standard time patterns, like :

  • 12h15
  • 12:15
  • 12:15:00
  • 12am
  • 12 am
  • 12 h
  • 12:15:00Z
  • 12:15:00+01
  • 12:15:00 UTC+1
  • 11:27:45+0000
RETURNS DESCRIPTION
0

1- or 2-digits hour,

TYPE: str

1

hour/minutes separator or half-day marker among ["h", ":", "am", "pm"] (case-insensitive)

TYPE: str

2

2-digits minutes, if any, or None

TYPE: str

3

2-digits seconds, if any.

TYPE: str

4

hour marker (h or H), half-day marker (case-insensitive ["am", "pm"]), or time zone marker (case-sensitive ["Z", "UTC"])

TYPE: str

5

1-or 2-digits signed integer timezone shift (referred to UTC).

TYPE: str

Examples:

see https://regex101.com/r/QNtZAK/2

see src/tests/test-patterns.py

core.database.DOMAIN_PATTERN module-attribute ¤

DOMAIN_PATTERN = re.compile(
    "from ((?:[a-z0-9\\-_]{0,61}\\.)+[a-z]{2,})", re.IGNORECASE
)

Matches patterns like from (domain.ext) from RFC-822 Received header in emails.

core.database.UID_PATTERN module-attribute ¤

UID_PATTERN = re.compile('UID ([0-9]+)')

Matches email integer UID from IMAP headers.

core.database.FLAGS_PATTERN module-attribute ¤

FLAGS_PATTERN = re.compile('FLAGS \\((.*?)\\)')

Matches email flags from IMAP headers.

core.database.PATH_PATTERN module-attribute ¤

PATH_PATTERN = re.compile('%s%s%s' % (regex_starter, path_regex, end_of_word))

File path pattern like ~/file, /home/file, ./file or C:\windows

core.database.PARTIAL_PATH_REGEX module-attribute ¤

PARTIAL_PATH_REGEX = re.compile(
    "%s%s%s" % (regex_starter, partial_path_regex, end_of_word)
)

Partial, invalid path patterns missing the leading root, like home/user/stuff. We start capturing after at least two folder separators (slash or backslash).

Warning

this will collide with date detection, so run it after in the pipeline.

core.database.RESOLUTION_PATTERN module-attribute ¤

RESOLUTION_PATTERN = re.compile('\\d+(?:×|x|X)\\d+')

Pixel resolution like 10x20 or 10×20. Units are discarded.

core.database.NUMBER_PATTERN module-attribute ¤

NUMBER_PATTERN = re.compile(
    "%s%s%s" % (regex_starter, regex_number, regex_stopper)
)

Signed integers and decimals, fractions and numeric IDs with interal dashes and underscores. Numbers with starting or trailing units are not considered. Lazy decimals (.1 and 1.) are considered.

core.database.HASH_PATTERN module-attribute ¤

HASH_PATTERN = re.compile(
    "%s%s%s" % (regex_starter, regex_hash, end_of_word), re.IGNORECASE
)

Cryptographic hexadecimal hashes and fingerprints, of a min length of 8 characters.

core.database.MULTIPLE_LINES module-attribute ¤

MULTIPLE_LINES = re.compile('(?: ?[\\t\\r\\n]{2,} ?)+')

Detect more than 2 newlines and tab, possibly mixed with spaces

core.database.MULTIPLE_NEWLINES module-attribute ¤

MULTIPLE_NEWLINES = re.compile('(?: ?[\\t\\r\\n]+ ?){2,}')

Detect broken sequences of newlines and spaces.

core.database.INTERNAL_NEWLINE module-attribute ¤

INTERNAL_NEWLINE = re.compile('(?<=\\w)[\\n\\t\\r]{1}(?=\\w)')

Detect single newline characters nested inside text. Mostly useful for parsed PDF where line wrapping is quite literal ( used instead of space).

core.database.EXPOSURE module-attribute ¤

EXPOSURE = re.compile(
    "%s%s%s" % (regex_starter, exposure_regex, end_of_word), flags=re.IGNORECASE
)

Exposure values in EV or IL

core.database.PHOTOSPEED module-attribute ¤

PHOTOSPEED = re.compile(
    "%s%s%s" % (regex_starter, photospeed_regex, end_of_word),
    flags=re.IGNORECASE,
)

Exposure values in EV or IL

core.database.SENSIBILITY module-attribute ¤

SENSIBILITY = re.compile(
    "%s%s%s" % (regex_starter, sensibility_regex, end_of_word),
    flags=re.IGNORECASE,
)

Photographic sensibility in ISO or ASA

core.database.LUMINANCE module-attribute ¤

LUMINANCE = re.compile(
    "%s%s%s" % (regex_starter, luminance_regex, end_of_word),
    flags=re.IGNORECASE,
)

Luminance/radiance in nits or Cd/m²

core.database.DIAPHRAGM module-attribute ¤

DIAPHRAGM = re.compile(
    "%s%s" % (regex_starter, diaphragm_regex), flags=re.IGNORECASE
)

Photographic diaph aperture values like f/2.8 or f/11

core.database.GAIN module-attribute ¤

GAIN = re.compile(
    "%s%s%s" % (regex_starter, gain_regex, end_of_word), flags=re.IGNORECASE
)

Gain, attenuation and PSNR in dB

core.database.FILE_SIZE module-attribute ¤

FILE_SIZE = re.compile(
    "%s%s%s" % (regex_starter, filesize_regex, end_of_word), flags=re.IGNORECASE
)

File and memory size in bit, byte, or octet and their multiples

core.database.DISTANCE module-attribute ¤

DISTANCE = re.compile(
    "%s%s%s" % (regex_starter, distance_regex, end_of_word), flags=re.IGNORECASE
)

Distance in meter, inch, foot and their multiples

core.database.PERCENT module-attribute ¤

PERCENT = re.compile('%s%s%s' % (regex_starter, percent_regex, end_of_word))

Number followed by %

core.database.WEIGHT module-attribute ¤

WEIGHT = re.compile(
    "%s%s%s" % (regex_starter, weight_regex, end_of_word), flags=re.IGNORECASE
)

Weight (mass) in British and SI units and their multiples

core.database.ANGLE module-attribute ¤

ANGLE = re.compile(
    "%s%s%s" % (regex_starter, angle_regex, end_of_word), flags=re.IGNORECASE
)

Angles in radians, degrees and steradians

core.database.TEMPERATURE module-attribute ¤

TEMPERATURE = re.compile(
    "%s%s%s" % (regex_starter, temperature_regex, end_of_word),
    flags=re.IGNORECASE,
)

Temperatures in °C, °F and K

core.database.FREQUENCY module-attribute ¤

FREQUENCY = re.compile(
    "%s%s%s" % (regex_starter, frequency_regex, end_of_word),
    flags=re.IGNORECASE,
)

Frequencies in hertz and multiples

core.database.TEXT_DATES module-attribute ¤

TEXT_DATES = re.compile(
    "([0-9]{1,2})? (jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec|jan|fév|mar|avr|mai|jui|jui|aou|sep|oct|nov|déc|janvier|février|mars|avril|mai|juin|juillet|août|septembre|octobre|novembre|décembre|january|february|march|april|may|june|july|august|september|october|november|december)\\.?( [0-9]{1,2})?( [0-9]{2,4})(?!\\:)",
    flags=re.IGNORECASE | re.MULTILINE,
)

Find textual dates formats:

  • English dates like 01 Jan 20 or 01 Jan. 2020 but avoid capturing adjacent time like 12:08.
  • French dates like 01 Jan 20 or 01 Jan. 2020 but avoid capturing adjacent time like 12:08.
RETURNS DESCRIPTION
0

2 digits (day number or year number, depending on language)

TYPE: str

1

month (full-form or abbreviated)

TYPE: str

2

2 digits (day number or year number, depending on language)

TYPE: str

3

4 digits (full year)

TYPE: str

core.database.BASE_64 module-attribute ¤

BASE_64 = re.compile(
    "((?:[A-Za-z0-9+\\/]{4}){64,}(?:[A-Za-z0-9+\\/]{2}==|[A-Za-z0-9+\\/]{3}=)?)"
)

Identifies base64 encoding

core.database.BB_CODE module-attribute ¤

BB_CODE = re.compile('\\[(img|quote)[a-zA-Z0-9 =\\"]*?\\].*?\\[\\/\\1\\]')

Identifies left-over BB code markup [img] and [quote]

core.database.MARKUP module-attribute ¤

MARKUP = re.compile('(?:\\[|\\{|\\<)([^\\n\\r]+?)(?:\\]|\\}|\\>)')

Identifies left-over HTML and Markdown markup, like <...>, {...}, [...]

core.database.USER module-attribute ¤

USER = re.compile('([\\w\\-\\+\\.]+)?@([\\w\\-\\+\\.]+)|(user\\-?\\d+)')

Identifies user handles or emails

core.database.REPEATED_CHARACTERS module-attribute ¤

REPEATED_CHARACTERS = re.compile('(.)\\1{9,}')

Identifies any character repeated more than 9 times

core.database.UNFINISHED_SENTENCES module-attribute ¤

UNFINISHED_SENTENCES = re.compile('(?<![?!.;:])\\n\\n|\\r\\n')

Identifies sentences finishing with 2 newlines characters without having ending punctuations

core.database.MULTIPLE_DOTS module-attribute ¤

MULTIPLE_DOTS = re.compile('\\.{2,}')

Identifies dots repeated more than twice

core.database.MULTIPLE_DASHES module-attribute ¤

MULTIPLE_DASHES = re.compile('[-~]{1,}')

Identifies dashes repeated more than once

core.database.MULTIPLE_QUESTIONS module-attribute ¤

MULTIPLE_QUESTIONS = re.compile('\\?{1,}')

Identifies question marks repeated more than once

core.database.ORDINAL_FR module-attribute ¤

ORDINAL_FR = re.compile('n° ?([0-9]+)')

French ordinal numbers (numéros n°)

core.database.FRANCAIS module-attribute ¤

FRANCAIS = re.compile(
    "%s(j|t|s|d|qu|lorsqu|quelqu|jusqu|m|c|n)\\'(?=[aeiouyéèàêâîôûïüäëöh][\\w\\s])"
    % regex_starter,
    flags=re.IGNORECASE,
)

French contractions of pronouns and determinants

core.database.DASHES module-attribute ¤

DASHES = re.compile('(?<=\\w)(-|_|=)+(?=\\w)', re.IGNORECASE)

Dashes in the middle of ASCII/Latin compounded words. Will not work if accented or Unicode characters are immediately surrounding the dash.

core.database.ALTERNATIVES module-attribute ¤

ALTERNATIVES = re.compile('(?<=[a-z])(\\/)(?=[a-z])', re.IGNORECASE)

Slash-separated word alternatives like and/or mr/mrs

core.database.PLURAL_S module-attribute ¤

PLURAL_S = re.compile('(?<=[a-zA-Z]{4,})s?e{0,2}s%s' % end_of_word)

Identify plural form of nouns (French and English), adjectives (French) and third-person present verbs (English) and second-person verbs (French) in -s.

core.database.FEMININE_E module-attribute ¤

FEMININE_E = re.compile('(?<=\\w{4,})e{1,2}%s' % end_of_word)

Identify feminine form of adjectives (French) in -e.

core.database.DOUBLE_CONSONANTS module-attribute ¤

DOUBLE_CONSONANTS = re.compile(
    "(?<=\\w{2,})([bcfghjklmnpqrstvwxz])\\1", re.IGNORECASE
)

Identify double consonants in the middle of words.

core.database.FEMININE_TRICE module-attribute ¤

FEMININE_TRICE = re.compile('(?<=\\w{4,})t(rice|eur|or)%s' % end_of_word)

Identify French feminine nouns in -trice.

core.database.ADVERB_MENT module-attribute ¤

ADVERB_MENT = re.compile('(?<=\\w{4,})e?ment%s' % end_of_word)

Identify French adverbs and English nouns ending en -ment

core.database.SUBSTANTIVE_TION module-attribute ¤

SUBSTANTIVE_TION = re.compile('(?<=\\w{4,})(t|s)ion%s' % end_of_word)

Identify French and English substantives formed from verbs by adding -tion and -sion

core.database.SUBSTANTIVE_AT module-attribute ¤

SUBSTANTIVE_AT = re.compile('(?<=\\w{4,})at%s' % end_of_word)

Identify French and English substantives formed from other nouns by adding -at

core.database.PARTICIPLE_ING module-attribute ¤

PARTICIPLE_ING = re.compile('(?<=\\w{4,})ing%s' % end_of_word)

Identify English substantives and present participles formed from verbs by adding -ing

core.database.ADJECTIVE_ED module-attribute ¤

ADJECTIVE_ED = re.compile('(?<=\\w{4,})ed%s' % end_of_word)

Identify English adjectives formed from verbs by adding -ed

core.database.ADJECTIVE_TIF module-attribute ¤

ADJECTIVE_TIF = re.compile('(?<=\\w{2,})ti(f|v)%s' % end_of_word)

Identify English and French adjectives formed from verbs by adding -tif or -tive

core.database.SUBSTANTIVE_Y module-attribute ¤

SUBSTANTIVE_Y = re.compile('(?<=\\w{3,})y%s' % end_of_word)

Identify English substantives ending in -y

core.database.VERB_IZ module-attribute ¤

VERB_IZ = re.compile('(?<=\\w{4,})(i|y)z%s' % end_of_word)

Identify American verbs ending in -iz that French and Brits write in -is

core.database.STUFF_ER module-attribute ¤

STUFF_ER = re.compile('(?<=\\w{5,})er%s' % end_of_word)

Identify French 1st group verb (infinitive) and English substantives ending in -er

core.database.BRITISH_OUR module-attribute ¤

BRITISH_OUR = re.compile('(?<=\\w{3,})our%s' % end_of_word)

Identify British spelling ending in -our (colour, behaviour).

core.database.SUBSTANTIVE_ITY module-attribute ¤

SUBSTANTIVE_ITY = re.compile('(?<=\\w{4,})it(y|e)%s' % end_of_word)

Identify substantives in -ity (English) and -ite (French).

core.database.SUBSTANTIVE_IST module-attribute ¤

SUBSTANTIVE_IST = re.compile('(?<=\\w{3,})is(t|m)%s' % end_of_word)

Identify substantives in -ist and -ism.

core.database.SUBSTANTIVE_IQU module-attribute ¤

SUBSTANTIVE_IQU = re.compile('(?<=\\w{3,})i(qu|c)%s' % end_of_word)

Identify French substantives in -iqu

core.database.SUBSTANTIVE_EUR module-attribute ¤

SUBSTANTIVE_EUR = re.compile('(?<=\\w{3,})eur%s' % end_of_word)

Identify French substantives -eur

core.database.HYPHENIZED module-attribute ¤

HYPHENIZED = re.compile('(?<=\\w{3,})[-–—]+ *[\\n\\r]{1,2}(?=\\w)')

Detect hyphenized words at the end of a PDF text line.

core.database.WAYBACK_RE module-attribute ¤

WAYBACK_RE = re.compile('https?://web\\.archive\\.org/web/[^/]+/(https?://.+)')

Find the canonical URL from web.archive.org (Wayback Machine) URLs

Classes¤

core.database.SQLitePageCorpus ¤

SQLitePageCorpus(
    db,
    query,
    params=(),
    atomic_types=(str, bytes),
    max_depth=None,
    yield_rows=False,
)

Lazily stream rows from an SQLite request, avoiding full copy.

Example

    corpus = SQLitePageCorpus(
        db,
        """
        SELECT tokenized
        FROM pages
        WHERE lang IN ('fr', 'en')
        """,
        max_depth=0
    )
- max_depth=0 will not flatten the content, so it will return the original list[list[str]] (list of sentences, aka list of list of words), - max_depth=1 flattens documents, to it will return list[str] (list of words)

Functions¤

core.database.split_url ¤

split_url(url: str) -> tuple[str, str, str, str, str] | None

Split a well-formed URL following RFC3986 into base elements.

RETURNS DESCRIPTION
tuple[str, str, str, str, str] | None

a tuple of (protocol, domain, page, parameters, anchor).

tuple[str, str, str, str, str] | None

Empty/missing fields are inited with empty strings so there is no need for individual None checks.

tuple[str, str, str, str, str] | None

If the url input doesn’t match an URL format, return None.

core.database.adapt_array ¤

adapt_array(arr: np.ndarray)

core.database.create_db ¤

create_db(name: str) -> sqlite3.Connection

Create the pages table if needed and add any missing columns. This doesn’t destroy existing tables, rows or columns, so it’s safe to run on any database.

Warning

Columns are inferred directly from web_page.__annotations__. Existing columns are preserved unchanged.

The url column is used as the PRIMARY KEY.

core.database.create_temp_db ¤

create_temp_db(
    min_free: float = 2.0, filename: str | None = None
) -> sqlite3.Connection

Create a temporary SQLite database file (in /dev/shm when available) and initialize the pages table according to web_page annotations.

PARAMETER DESCRIPTION
min_free

minimum available disk space in GiB required to create the temporary database. This is checked at runtime and the function will raise an error if the condition is not met.

TYPE: float DEFAULT: 2.0

filename

the full path and filename to save the temporary database, if it needs to be reused at some point.

TYPE: str | None DEFAULT: None

RETURNS DESCRIPTION
sqlite3.Connection

the sqlite3.Connection opened in bulk mode.

WARNING

the temporary SQLite database doesn’t use web_page URL as primary key, to allow later deduplication.

core.database.delete_temp_db ¤

delete_temp_db(db: sqlite3.Connection)

Close and delete a temporary database in one shot.

core.database.open_db ¤

open_db(name: str, mode: str = 'rw') -> sqlite3.Connection

Open an SQLite database with workload-specific optimizations.

PARAMETER DESCRIPTION
name

Database identifier/path passed to get_models_folder().

TYPE: str

mode
  • “rw”: Generic read/write mode.
  • “ro”: Read-only immutable mode optimized for serving/search workloads.
  • “bulk”: Bulk-ingestion mode optimized for large batch writes.

TYPE: str DEFAULT: 'rw'

RETURNS DESCRIPTION
sqlite3.Connection

sqlite3.Connection

core.database.compress_db ¤

compress_db(
    db: sqlite3.Connection,
    delete_query: str | None = None,
    delete_params: tuple | None = None,
    delete_columns: list[str] | None = None,
)

Optionally delete rows, then reclaim SQLite disk space.

PARAMETER DESCRIPTION
db

SQLite connection

TYPE: sqlite3.Connection

delete_query

full DELETE SQL query

TYPE: str | None DEFAULT: None

delete_params

optional SQL parameters

TYPE: tuple | None DEFAULT: None

core.database.is_primary_key ¤

is_primary_key(db: sqlite3.Connection, table: str, column: str) -> bool

Check whether column is part of the PRIMARY KEY of table.

core.database.populate_db ¤

populate_db(
    db: sqlite3.Connection, pages: list[web_page], batch_size: int = 4096
)

Insert or update web_page records into the SQLite database.

Existing rows are matched using the PRIMARY KEY url.

Warning

Array-like Python values are converted to bytearray then to bytes in order to be handled as BLOB by SQLite.

core.database.db_to_list ¤

db_to_list(db: sqlite3.Connection) -> list[web_page]

Extract all web_page rows from the pages table in db as a list of web_page

core.database.migrate_url_to_primary_key ¤

migrate_url_to_primary_key(db: sqlite3.Connection)

Rebuild the pages table using url as PRIMARY KEY for older databases that didn’t use a primary key.

core.database.merge_databases ¤

merge_databases(old_db: sqlite3.Connection, new_db: sqlite3.Connection)

Merge two pages databases.

Rows from old_db are inserted into new_db only if their URL does not already exist.

Existing rows in new_db are preserved unchanged.

Only columns existing in BOTH databases are copied.

core.database.update_pages_from_database ¤

update_pages_from_database(
    target_db: sqlite3.Connection, source_db: sqlite3.Connection
) -> list[str]

Update rows in target_db.pages from source_db.pages using url as PRIMARY KEY.

Only shared columns are updated.

Returns missing_urls: URLs present in target_db but absent from source_db.

core.database.import_pages ¤

import_pages(
    source_db: str | sqlite3.Connection,
    destination_db: str | sqlite3.Connection,
    where_clause: str = "1=1",
    params: tuple = (),
) -> int

Import rows from one SQLite database into another.

Both source_db and destination_db may be either a filesystem path (str) or an active sqlite3.Connection handle. Passing a Connection is the only way to target a :memory: database, since those cannot be addressed by path.

Connection lifecycle - Path supplied – the function opens, commits, and closes the connection itself (original behaviour). - Connection supplied – the caller retains full control; the connection is neither committed nor closed here, so the import can participate in a larger transaction.

Rows are copied from source.pages into destination.pages. Existing rows are updated on conflict of the url primary key. Columns present in the destination but absent from the source receive NULL. Both schemas are discovered at runtime, so the function adapts automatically if either evolves.

PARAMETER DESCRIPTION
source_db

Path to, or an open connection for, the source SQLite database.

TYPE: str | sqlite3.Connection

destination_db

Path to, or an open connection for, the destination SQLite database.

TYPE: str | sqlite3.Connection

where_clause

SQL WHERE clause applied to source.pages. Example: "domain = ? AND date >= ?"

TYPE: str DEFAULT: '1=1'

params

Positional parameters bound to where_clause.

TYPE: tuple DEFAULT: ()

RETURNS DESCRIPTION
int

Number of affected rows.

Examples::

# File → file (unchanged from before)
import_pages("old.db", "new.db", "domain = ?", ("example.com",))

# In-memory source → file destination
import_pages(mem_conn, "new.db")

# File source → in-memory destination (e.g. for tests)
import_pages("prod.db", mem_conn, "date >= ?", ("2024-01-01",))

# Both in-memory
import_pages(src_conn, dst_conn)

core.database.inspect_db ¤

inspect_db(db: sqlite3.Connection, message: str = '') -> None

Print useful metadata and statistics about a SQLite database.

PARAMETER DESCRIPTION
db

active database connection

TYPE: sqlite3.Connection

message

optional additional message to indentify several inspections if any.

TYPE: str DEFAULT: ''