How to Build PostgreSQL Time-Series Partitioning with Automatic Retention

Introduction: Why PostgreSQL Time-Series Partitioning Matters

In every PostgreSQL system I’ve managed, time-series data has been the first thing to grow out of control. Logs, metrics, events, IoT readings, financial ticks – they all share the same pattern: endless inserts, mostly read by recent time ranges, and only occasionally scanned historically. Without PostgreSQL time-series partitioning, these tables quietly turn into multi-gigabyte monsters that slow down queries, choke autovacuum, and make backups painful.

When I first started dealing with this, I tried to fix performance with more indexes and bigger hardware. It helped for a while, but the core issue remained: the database was doing too much work on old, low-value rows that should have been archived or deleted. That’s where a proper partitioning and retention strategy completely changed the game.

By the end of this tutorial, you’ll be able to:

Design a practical PostgreSQL time-series partitioning scheme based on your data’s time column.
Create partitions automatically so you’re not manually adding tables every month or week.
Apply automatic retention policies that drop old partitions cleanly and quickly, instead of running massive DELETE operations.
Keep queries fast and predictable, even as the total dataset grows into the hundreds of millions of rows.

The goal is to give you a setup I’d be comfortable running in production myself: simple enough to maintain, but robust enough to handle real-world time-series workloads.

Prerequisites and Assumptions for This PostgreSQL Time-Series Setup

PostgreSQL Version and Features

For the approach I use in production, you’ll need PostgreSQL 11 or later, because we rely on native declarative partitioning and ATTACH PARTITION. If you’re on PostgreSQL 14+ you’ll get better planner behavior and quality-of-life improvements, but everything I describe will still work on 11 and 12.

Required Extensions and Environment

I’ll assume you have superuser or equivalent privileges to create schemas, tables, functions, and background jobs. The core solution uses only built-in features, but I’ll briefly mention options like pg_cron or an external scheduler for automating retention. pg_cron – GitHub Repository

Baseline SQL and psql Familiarity

You should be comfortable creating tables, indexes, and functions, and running commands in psql or a GUI like pgAdmin. In my own teams, I’ve found that anyone who can read and write basic SQL can follow this pattern, especially once they’ve seen the partitioning DDL a couple of times.

psql -h your_host -U your_user -d your_db

Designing a Time-Series Schema Optimized for Partitioning

Before wiring up PostgreSQL time-series partitioning, I always spend a bit of time on table design. A clean schema makes partitioning simpler, queries faster, and retention rules easier to reason about.

Choosing the Right Time Column and Data Types

You need a single, authoritative time column that all partitioning will be based on. In my experience, timestamptz (timestamp with time zone) is the safest default, because it avoids confusion when data comes from multiple time zones.

Here’s a typical events table structure I like to start with:

CREATE TABLE metrics_events (
    event_time  timestamptz   NOT NULL,
    source_id   bigint        NOT NULL,
    metric_name text          NOT NULL,
    value       double precision NOT NULL,
    tags        jsonb,
    PRIMARY KEY (event_time, source_id)
) PARTITION BY RANGE (event_time);

The composite primary key keeps the partition key (event_time) first, which plays nicely with index-only scans and partition pruning.

Primary Keys, Indexes, and Query Patterns

When I design these tables, I start from real queries: recent time-window reads, plus filters on source or metric. That usually leads to:

A primary key on (event_time, source_id) for uniqueness and time-ordered storage.
A supporting index for common filters, for example (source_id, event_time DESC).

CREATE INDEX ON metrics_events (source_id, event_time DESC);

This index keeps recent data for a given source clustered together, which has helped me a lot when debugging hot sources or narrow time ranges.

Keeping the Table Narrow and Partition-Friendly

For high-ingest time-series, I try to keep the table as narrow as possible: numeric values, short identifiers, and only essential JSONB. Wide text columns and large blobs quickly become a liability when you’re inserting thousands of rows per second.

One thing I learned the hard way was to avoid putting rarely used, heavy fields directly in the hot time-series table. Instead, I move that data into a companion table joined by key when needed. That keeps partition operations (like ATTACH, DETACH, and DROP) lightweight and predictable as the dataset grows.

Creating a Partitioned Time-Series Table in PostgreSQL

With the schema in place, the next step in PostgreSQL time-series partitioning is to turn that logical table into a real partitioned structure. I like to start with a clear, consistent partitioning strategy and then codify it directly in the DDL so future maintenance is predictable.

Defining the Parent Partitioned Table

The parent table holds the schema and constraints, but no actual data. In my setups, I always mark it as PARTITION BY RANGE on the time column we chose earlier, typically event_time.

CREATE TABLE metrics_events (
    event_time  timestamptz      NOT NULL,
    source_id   bigint           NOT NULL,
    metric_name text             NOT NULL,
    value       double precision NOT NULL,
    tags        jsonb,
    PRIMARY KEY (event_time, source_id)
) PARTITION BY RANGE (event_time);

I keep all the core constraints (like NOT NULL and the primary key) on the parent so every partition inherits the same rules automatically. That’s saved me from subtle bugs more than once.

Creating Time-Based RANGE Partitions

Next, we create child tables for specific time windows. In production, I usually choose daily or monthly partitions, depending on data volume. Here’s an example using monthly partitions:

CREATE TABLE metrics_events_2024_01
    PARTITION OF metrics_events
    FOR VALUES FROM ('2024-01-01 00:00:00+00') TO ('2024-02-01 00:00:00+00');

CREATE TABLE metrics_events_2024_02
    PARTITION OF metrics_events
    FOR VALUES FROM ('2024-02-01 00:00:00+00') TO ('2024-03-01 00:00:00+00');

CREATE TABLE metrics_events_2024_03
    PARTITION OF metrics_events
    FOR VALUES FROM ('2024-03-01 00:00:00+00') TO ('2024-04-01 00:00:00+00');

The key rule I follow is: every time range is contiguous, non-overlapping, and fully covered for the data I expect. Overlaps or gaps here will bite you later when inserting or querying data.

If you have very high ingest rates, daily partitions look like this:

CREATE TABLE metrics_events_2024_03_01
    PARTITION OF metrics_events
    FOR VALUES FROM ('2024-03-01 00:00:00+00') TO ('2024-03-02 00:00:00+00');

Verifying Inserts and Query Behavior

Once the partitions are in place, I always smoke-test inserts and queries before wiring up anything more advanced. That catches misaligned ranges or timezone mistakes early.

INSERT INTO metrics_events (event_time, source_id, metric_name, value, tags)
VALUES (
    '2024-03-01 12:34:00+00',
    42,
    'cpu_usage',
    0.87,
    '{"host": "api-1", "region": "eu-west"}'::jsonb
);

SELECT tableoid::regclass AS physical_table, *
FROM metrics_events
WHERE event_time BETWEEN '2024-03-01 00:00+00' AND '2024-03-02 00:00+00';

The tableoid trick is something I use a lot when validating partitioning: it shows which child table actually stored the row. If the insert fails with a constraint error or lands in the wrong partition, I know my time boundaries are off.

To confirm that PostgreSQL is pruning partitions correctly, I also run an EXPLAIN on a time-filtered query:

EXPLAIN
SELECT avg(value)
FROM metrics_events
WHERE event_time >= '2024-03-01 00:00+00'
  AND event_time < '2024-03-02 00:00+00';

In my experience, if you see only the relevant child tables in the plan, you’re on the right track: partition pruning is working, and queries will scale much better as data grows. Getting this part right upfront makes the later automation and retention logic far simpler to reason about.

Automating New Partition Creation for PostgreSQL Time-Series Data

Once basic PostgreSQL time-series partitioning is in place, the real quality-of-life improvement comes from automation. I don’t want to be on call just to create next month’s partition, so I always add a small function plus a scheduler job that keeps future partitions ready ahead of time.

Strategy: Stay Ahead of Incoming Data

The goal is simple: always have partitions created a bit into the future (for example, 7–30 days), so inserts never fail with “no partition found” errors. In my setups, I usually:

Pick a partition granularity (daily or monthly).
Write a function that calculates missing future ranges.
Run that function on a schedule (cron, pg_cron, or another scheduler).

This keeps operations hands-off while still being easy to inspect and debug.

PL/pgSQL Function to Create Missing Partitions

Here’s a simplified example function that maintains daily partitions for the next 7 days. You can adapt the date math if you prefer monthly ranges.

CREATE OR REPLACE FUNCTION ensure_metrics_events_partitions()
RETURNS void LANGUAGE plpgsql AS $$
DECLARE
    start_date date := current_date;
    end_date   date := current_date + 7; -- how far ahead to create
    d          date;
    part_name  text;
BEGIN
    d := start_date;
    WHILE d < end_date LOOP
        part_name := format('metrics_events_%s', to_char(d, 'YYYY_MM_DD'));

        IF NOT EXISTS (
            SELECT 1 FROM pg_class c
            JOIN pg_namespace n ON n.oid = c.relnamespace
            WHERE c.relkind = 'r'
              AND n.nspname = 'public'
              AND c.relname = part_name
        ) THEN
            EXECUTE format($$
                CREATE TABLE public.%I
                    PARTITION OF public.metrics_events
                    FOR VALUES FROM (%L) TO (%L);
            $$, part_name,
                d::timestamptz,
                (d + 1)::timestamptz);
        END IF;

        d := d + 1;
    END LOOP;
END;
$$;

In my experience, keeping this function small and focused makes it much easier to maintain than trying to cram retention and partition creation into a single mega-function.

Scheduling the Partition Maintenance Job

To actually run this regularly, you can call it from a system cron job or use a database-side scheduler. For example, a simple cron entry might look like:

*/30 * * * * psql -d your_db -c "SELECT ensure_metrics_events_partitions();"

If you prefer a fully in-database approach, tools like GitHub – citusdata/pg_cron: Run periodic jobs in PostgreSQL let you schedule the function directly from SQL, which I’ve found handy in environments where OS-level cron access is locked down.

Implementing Automatic Retention Policies on Time-Series Partitions

Partitioning only solves half of the problem; the real payoff in PostgreSQL time-series partitioning comes when you attach a clear retention policy. In my own systems, the most effective pattern has been to treat entire partitions as disposable units: once they’re older than the retention window, I either archive them or drop them in one fast operation.

Choosing a Retention Strategy: Drop vs Archive

Before writing any code, I always sit down with stakeholders and answer two questions:

How long do we truly need this data online? (for example, 30 days, 6 months, 1 year)
What must happen to data after that? (hard delete, cheap storage, or compliance archive)

If the answer is “we just don’t need it,” I simply DROP old partitions. If compliance or analytics need longer history, I first move the partition to cheaper storage (for example, another database or an S3-based warehouse) and then drop it. I’ve found that clarifying this up front prevents panicked calls later when someone realizes last year’s data is gone.

PL/pgSQL Function to Drop Old Partitions by Age

With a retention window decided, we can codify it in a small function. Here’s an example that keeps 90 days of data and drops anything older, based on partition names that encode dates (like metrics_events_2024_03_01):

CREATE OR REPLACE FUNCTION drop_old_metrics_events_partitions(
    keep_days integer DEFAULT 90
) RETURNS void
LANGUAGE plpgsql AS $$
DECLARE
    cutoff_date date := current_date - keep_days;
    rel record;
    part_date date;
BEGIN
    FOR rel IN
        SELECT n.nspname, c.relname
        FROM pg_class c
        JOIN pg_namespace n ON n.oid = c.relnamespace
        WHERE c.relkind = 'r'
          AND n.nspname = 'public'
          AND c.relname LIKE 'metrics_events_%'
    LOOP
        BEGIN
            -- parse YYYY_MM_DD from the table name
            part_date := to_date(substring(rel.relname from 'metrics_events_(\d{4}_\d{2}_\d{2})'), 'YYYY_MM_DD');
        EXCEPTION WHEN others THEN
            -- skip tables that don't match the expected pattern
            CONTINUE;
        END;

        IF part_date < cutoff_date THEN
            RAISE NOTICE 'Dropping old partition: %.%', rel.nspname, rel.relname;
            EXECUTE format('DROP TABLE %I.%I', rel.nspname, rel.relname);
        END IF;
    END LOOP;
END;
$$;

One thing I learned the hard way was to always test this sort of function in a staging database first, with a very small keep_days value and plenty of logging, so I can see exactly which partitions would be dropped.

Scheduling and (Optional) Archiving Workflow

Just like with partition creation, retention only works when it runs regularly. I typically schedule the function once per day during a quiet period:

# Run daily at 02:00
0 2 * * * psql -d your_db -c "SELECT drop_old_metrics_events_partitions(90);"

If you need to archive before dropping, you can extend the function to call an archive helper instead of immediately dropping. For example, you might:

Export the partition to a file using COPY or logical export.
Ship it to object storage or a data lake.
Only after a successful export, DROP the original partition.

To keep that logic manageable, I usually separate concerns: one function identifies candidate partitions and hands them off to an external archiving process, and a second function (or flag) confirms they can be safely dropped. Tools like Scheduling maintenance with the PostgreSQL pg_cron extension – AWS Documentation are helpful when you want retention jobs, archive exports, and health checks to run in a predictable sequence without a lot of custom orchestration.

Once this is in place, you get the real benefit of PostgreSQL time-series partitioning: the database automatically sheds old weight, keeps storage costs in check, and avoids those painful multi-hour DELETE jobs that lock up your busiest tables.

Querying Partitioned PostgreSQL Time-Series Data Efficiently

Once PostgreSQL time-series partitioning is wired up, how you query the data determines whether you actually see the performance gains. In my own workloads, the biggest wins have come from writing time filters that allow partition pruning and aligning them with the indexes I created earlier.

Use Sargable Time Filters for Partition Pruning

Partition pruning works best when PostgreSQL can clearly see the time range in the WHERE clause. I always stick to simple, sargable predicates on the partition key:

SELECT avg(value)
FROM metrics_events
WHERE event_time >= '2024-03-01 00:00+00'
  AND event_time <  '2024-03-02 00:00+00'
  AND source_id = 42;

What I avoid is wrapping the time column in functions, which can block pruning:

-- Anti-pattern: may prevent partition pruning
WHERE date_trunc('day', event_time) = date '2024-03-01';

Instead, I push the function to the constant side or precompute boundaries in the application.

Align Queries with Indexes and Aggregations

Because I usually index (source_id, event_time DESC), I write filters and orderings that match:

SELECT event_time, value
FROM metrics_events
WHERE source_id = 42
  AND event_time >= now() - interval '1 hour'
ORDER BY event_time DESC
LIMIT 1000;

In my experience, this pattern gives very fast index-only scans on just the relevant partitions. When I need aggregates across many sources, I still keep the time filter tight so PostgreSQL can skip old partitions entirely.

Inspect Plans and Avoid Cross-Partition Row-by-Row Work

Any time a new query hits production, I quickly sanity-check it with EXPLAIN to make sure it touches only the expected partitions:

EXPLAIN
SELECT max(value)
FROM metrics_events
WHERE event_time >= now() - interval '7 days';

When I see scans over every partition, it’s usually because the time predicate is missing, too wide, or hidden behind a function. I also try to avoid row-by-row functions that need to inspect each record across many partitions—batching work in SQL (for example using GROUP BY with good filters) almost always scales better for time-series workloads.

Monitoring and Troubleshooting Your Time-Series Partitioning Setup

Once PostgreSQL time-series partitioning and retention are live, I treat them like any other critical component: I monitor them. A bit of visibility up front has saved me from nasty surprises like missing partitions or runaway table counts.

Checking That Partitions Exist and Are Up to Date

The first thing I verify regularly is that new partitions are being created on schedule and old ones are being removed. A simple inventory query helps me spot gaps:

SELECT
    inhrelid::regclass  AS partition_table,
    reltuples::bigint   AS est_rows
FROM pg_inherits
JOIN pg_class ON inhrelid = pg_class.oid
WHERE inhparent = 'metrics_events'::regclass
ORDER BY partition_table;

In my own setups, I’ll dump this to a dashboard or even just review it weekly. If I see future dates missing, I know my partition-creation job isn’t running. If I see very old partitions still present, retention may be failing.

Validating Partition Pruning and Query Performance

To confirm that queries actually benefit from partitioning, I lean heavily on EXPLAIN and EXPLAIN ANALYZE. I want to see only the relevant child tables touched:

EXPLAIN
SELECT avg(value)
FROM metrics_events
WHERE event_time >= now() - interval '1 day'
  AND event_time <  now();

If the plan shows scans over many or all partitions, I know something is off: either the time predicate is missing, too broad, or wrapped in a function. One habit that’s helped me is to re-check plans whenever I or a teammate changes index definitions or query patterns.

Common Operational Issues and How to Fix Them

Over time I’ve run into a few recurring issues:

“No partition found for row” errors: Usually means future partitions don’t exist. I verify the automation function, check scheduler logs, and manually run the function once to catch errors.
Retention not kicking in: I check that the drop function runs without exceptions, and that its date parsing logic still matches the actual partition naming convention.
System slowly getting slower: Often tied to too many small partitions or queries without proper time filters. In those cases I either coarsen partition granularity (e.g., from daily to monthly) or tighten application-side query windows.

When I first started doing this, I underestimated how often naming conventions or schedules drift over time. Now I treat a small set of health queries like the ones above as part of my regular database checkup, which keeps PostgreSQL time-series partitioning predictable instead of mysterious.

Conclusion and Next Steps for Scaling PostgreSQL Time-Series Partitioning

By this point, you’ve put together a solid PostgreSQL time-series partitioning setup: a range-partitioned parent table, automated creation of future partitions, and a retention job that trims old data at the partition level. In my experience, this combination alone transforms a fragile “giant table” into something you can run confidently in production.

What You’ve Built and Key Best Practices

Along the way, you’ve seen how to:

Design a partitioned schema around a clear time key.
Use simple, sargable time filters so PostgreSQL can prune partitions efficiently.
Automate both partition creation and retention with small, focused PL/pgSQL functions.
Monitor partitions and query plans so you catch issues early instead of during an outage.

From my own work with these systems, the biggest long-term win is discipline: consistent naming, predictable schedules, and regular checks keep everything boring and reliable.

Where to Go Next: Compression, Extensions, and Beyond

Once this foundation feels stable, you can start exploring more advanced options: built-in TOAST and index tuning for heavy columns, external cold-storage archives for very old partitions, and even specialized extensions like TimescaleDB for automatic chunking and compression. If you want to compare the native approach you just implemented with an extension-based one, resources on Designing high-performance time series data tables on Amazon RDS for PostgreSQL can help you decide when it’s worth layering on extra tooling as your dataset and performance requirements grow.

Cary Huang

Hi, I’m Cary Huang — a tech enthusiast based in Canada. I’ve spent years working with complex production systems and open-source software. Through TechBuddies.io, my team and I share practical engineering insights, curate relevant tech news, and recommend useful tools and products to help developers learn and work more effectively.