Postgres 14 Internals: Data Organisation β
Posted on 5 June, 2023
A single PostgreSQL instance can serve several databases at a time; together they are called a database cluster.
The directory that contains all the files related to the cluster is usually called
PGDATA
.After a cluster is intialised 3 databases are available.
template0
: used for restoring data from a logical backup or creating a database with a different encoding; it must never be modified.template1
: serves as a template for all the other databases that a user can create in the cluster.postgres
: a regular database that you can use at your discretion.
You can find the list of dbs using
select * from pg_database
.
System Catalog β
- Metadata of all cluster objects (such as tables, indexes, data types, or functions) is stored in tables that belong to the system catalog.
- All system catalog tables begin with
pg_
, like inpg_database
.
Schemas β
- Schemas are namespaces that store all objects of a database.
- Some predefined schemas include
public
: default schema for user objects.pg_catalog
: used for system catalog tables.information_schema
: provides an alternative view for the system catalog as defined by the SQL standard.pg_toast
: for objects related toTOAST
.pg_temp
: comprises temporary tables.
- If the schema is not specified explicitly when an object is accessed, Postgres selects the first suitable schema from the search path.
Tablespaces β
Tablespaces define physical data layout.
A tablespace is a directory in a file system.
One and the same tablespace can be used by different databases, and each database can store data in several tablespaces.
Each database has the so-called default tablespace.
You can find all tablespaces using the following query.
sqlselect * from pg_tablespace;
Sample output:
oid spcname spcowner spcacl spcoptions 1663 pg_default 10 1664 pg_global 10
Files, Forks & Pages β
- All information associated with a relation is stored in several different files called forks, each containing data of a particular type.
- Its filename consists of a numeric ID (oid), which can be extended by a suffix that corresponds to the forkβs type.
- This file can grow over time, and when its size reaches 1 GB, another file of this fork is created (such files are sometimes called segments).
- A single relation (table) can be represented on disk by several files.
- All files are logically split into pages (or blocks) which is the minimum amount of data that can be read or written (8KB)
Types of Forks β
TODO
TOAST β
The Oversized Attributes Storage Technique (TOAST) is used to store large values when their size overheads the maximum page size.
If the main table contains potentially long attributes, a separate TOAST table is created for it right away, one for all the attributes. For example, if a table has a column of the numeric or
text
type, a TOAST table will be created even if this column will never store any long values.sqlSELECT attname, atttypid::regtype, attstorage, CASE attstorage WHEN 'p' THEN 'plain' WHEN 'e' THEN 'external' WHEN 'm' THEN 'main' WHEN 'x' THEN 'extended' END AS storage FROM pg_attribute WHERE attrelid = 'tablename'::regclass AND attnum > 0;
Here's a glimpse of the output of the above query.
attname atttypid attstorage storage a uuid p plain b text x extended c jsonb x extended d timestamp with time zone p plain e text[] x extended - plain means that TOAST is not used (applied to βshortβ data-types such as the integer type).
- extended allows both compressing attributes and storing them in a separate TOAST table.
- external implies that long attributes are stored in the TOAST table in an uncompressed state.
- main requires long attributes to be compressed first; they will be moved to the TOAST table only if compression version did not help.
TOAST tables reside in a separate schema called
pg_toast
, but they are usually hidden. To find the TOAST table for a particular table, you can use the following query.sqlSELECT reltoastrelid::regclass FROM pg_class WHERE oid = 'tablename'::regclass;
Sample output:
reltoastrelid pg_toast.pg_toast_16415
Processes & Memory β
- First process to start is the postmaster process, which is responsible for managing the cluster. If any process dies, the postmaster will restart it.
- To enable process interaction, postmaster allocates shared memory, which is avail- able to all the processes.
- postmaster process also listens for incoming connections. Once a new client appears, postmaster spawns a separate backend process. This involves authentication, then taking in SQL queries as text, parsing them, and executing them.