Postgres 14 Internals: Data Organisation

Posted on 5 June, 2023

  • A single PostgreSQL instance can serve several databases at a time; together they are called a database cluster.

  • The directory that contains all the files related to the cluster is usually called PGDATA.

  • After a cluster is intialised 3 databases are available.

    1. template0: used for restoring data from a logical backup or creating a database with a different encoding; it must never be modified.

    2. template1: serves as a template for all the other databases that a user can create in the cluster.

    3. postgres: a regular database that you can use at your discretion.

    You can find the list of dbs using select * from pg_database.

System Catalog

  • Metadata of all cluster objects (such as tables, indexes, data types, or functions) is stored in tables that belong to the system catalog.

  • All system catalog tables begin with pg_, like in pg_database.

Schemas

  • Schemas are namespaces that store all objects of a database.

  • Some predefined schemas include

    • public: default schema for user objects.

    • pg_catalog: used for system catalog tables.

    • information_schema: provides an alternative view for the system catalog as defined by the SQL standard.

    • pg_toast: for objects related to TOAST.

    • pg_temp: comprises temporary tables.

  • If the schema is not specified explicitly when an object is accessed, Postgres selects the first suitable schema from the search path.

Tablespaces

  • Tablespaces define physical data layout.

  • A tablespace is a directory in a file system.

  • One and the same tablespace can be used by different databases, and each database can store data in several tablespaces.

  • Each database has the so-called default tablespace.

  • You can find all tablespaces using the following query.

    select * from pg_tablespace;

    Sample output:

Files, Forks & Pages

  • All information associated with a relation is stored in several different files called forks, each containing data of a particular type.

  • Its filename consists of a numeric ID (oid), which can be extended by a suffix that corresponds to the fork’s type.

  • This file can grow over time, and when its size reaches 1 GB, another file of this fork is created (such files are sometimes called segments).

  • A single relation (table) can be represented on disk by several files.

  • All files are logically split into pages (or blocks) which is the minimum amount of data that can be read or written (8KB)

Types of Forks

TODO

TOAST

  • The Oversized Attributes Storage Technique (TOAST) is used to store large values when their size overheads the maximum page size.

  • If the main table contains potentially long attributes, a separate TOAST table is created for it right away, one for all the attributes. For example, if a table has a column of the numeric or text type, a TOAST table will be created even if this column will never store any long values.

    SELECT
      attname,
      atttypid::regtype,
      attstorage,
      CASE attstorage
        WHEN 'p' THEN 'plain'
        WHEN 'e' THEN 'external'
        WHEN 'm' THEN 'main'
        WHEN 'x' THEN 'extended'
      END AS storage
    FROM
      pg_attribute
    WHERE
      attrelid = 'tablename'::regclass
      AND attnum > 0;

    Here's a glimpse of the output of the above query.

    • plain means that TOAST is not used (applied to “short” data-types such as the integer type).

    • extended allows both compressing attributes and storing them in a separate TOAST table.

    • external implies that long attributes are stored in the TOAST table in an uncompressed state.

    • main requires long attributes to be compressed first; they will be moved to the TOAST table only if compression version did not help.

  • TOAST tables reside in a separate schema called pg_toast, but they are usually hidden. To find the TOAST table for a particular table, you can use the following query.

    SELECT
      reltoastrelid::regclass
    FROM
      pg_class
    WHERE
      oid = 'tablename'::regclass;

    Sample output:

Processes & Memory

  • First process to start is the postmaster process, which is responsible for managing the cluster. If any process dies, the postmaster will restart it.

  • To enable process interaction, postmaster allocates shared memory, which is avail- able to all the processes.

  • postmaster process also listens for incoming connections. Once a new client appears, postmaster spawns a separate backend process. This involves authentication, then taking in SQL queries as text, parsing them, and executing them.

Last updated