Posts

Mod Logs: Save every change, thank yourself later

By Scott Robinson·

After over twelve years of building web and desktop apps, I finally added a modification log to one. One table. Only a few columns. It's some of the simplest infrastructure I've ever built, and the one I wish I'd had from day one.

At its core, a modification log is just a table that tracks all changes to your data. Although far from novel, it's truly a simple idea, and it's incredibly powerful.

There are a ton of use cases for this, to name a few:

  • Audit trail
  • Display historical trends
  • Track user activity
  • Replay data to replicate bugs
  • See the state of your database at any point in time

The nice thing about modification logs is that all of these use-cases can come later. You don't need to build them on day one. All you need to do is log changes and then at some point in the future you can extract the data you need from the modification log table.

They're also surprisingly simple to implement in any app. It's just a single table with a few fields:

  • parent_model
  • parent_id
  • field_name
  • old_value
  • new_value
  • user_id
  • created_at

This allows the modification log to be flexible enough to track changes to any model in the app. To start, I'd recommend only tracking changes to your core application models - users, orders, or whatever drives the business - and expanding from there. Any time a tracked model is updated, a new record is created in the modification log table for each changed field.

Once you have the table, you need to populate it whenever data changes. Two common approaches:

  • Explicit logging in API routes: After each create, update, or delete, write one row per changed field to the modification log.
  • Model hooks: Use your ORM's callbacks (before_save, after_update, etc.) to log changes automatically, so you don't have to touch every route.
  • Write-Ahead Logging (WAL): If your database has one, inspect the WAL to log the changes to the modification log table.

Pick whichever fits your codebase. The important part is consistency - every change to a tracked model should land in the log.

The harder part here is actually reading the data back. Normal tables store current state: one row per user, one row per post, one row per order. A modification log stores history: one row per field change. To answer questions like "what did this record look like last Tuesday?" you need to replay those changes up to a point in time.

An Example

Let's say we have a table called line_items that tracks the items in an order. These line items can change over time, with quantities increasing, decreasing, or even being deleted.

CREATE TABLE line_items (
    id               INT PRIMARY KEY,
    product_id       INT,
    customer_id      INT,
    quantity_ordered INT,
    year             INT,
    is_deleted       BOOLEAN DEFAULT FALSE
);

We then want to answer the question: As of end of day on January 10th, what was total quantity_ordered for 2026 orders?

You'll quickly realize these queries can be expensive at scale, so they're usually a better fit for scheduled reports than live dashboards. Here's one approach that worked for us in Postgres:

WITH latest_modifications AS (
    -- Step 1: For each line item + field, keep only the most recent
    -- change that happened on or before the cutoff date.
    SELECT
        ml.parent_id AS line_item_id,
        ml.field_name,
        ml.new_value,
        ROW_NUMBER() OVER (
            PARTITION BY ml.parent_id, ml.field_name
            ORDER BY ml.created_at DESC
        ) AS row_num
    FROM modification_logs ml
    WHERE ml.parent_model = 'line_items'
      AND ml.created_at <= 1768024800000   -- Jan 10 cutoff
),

line_items_snapshot AS (
    -- Step 2: Pivot row-oriented log entries into column-oriented state.
    -- One row per line item, with the latest known value for each field.
    SELECT
        lm.line_item_id,
        COALESCE(
            MAX(CASE
                WHEN lm.field_name = 'quantity_ordered'
                 AND TRIM(lm.new_value) != ''
                THEN lm.new_value::DOUBLE PRECISION
            END), 0
        ) AS quantity_ordered,
        MAX(CASE
            WHEN lm.field_name = 'year'
            THEN lm.new_value::INT
        END) AS year,
        COALESCE(
            MAX(CASE
                WHEN lm.field_name = 'is_deleted'
                 AND TRIM(lm.new_value) = '1'
                THEN 1 ELSE 0
            END), 0
        )::BOOLEAN AS is_deleted
    FROM latest_modifications lm
    WHERE lm.row_num = 1   -- latest change per field only
    GROUP BY lm.line_item_id
),

line_items_with_details AS (
    -- Step 3: Join to the live table for fields that were set at creation
    -- and never modified — and therefore never logged.
    SELECT
        lis.line_item_id,
        lis.quantity_ordered,
        lis.year,
        lis.is_deleted,
        li.product_id
    FROM line_items_snapshot lis
    LEFT JOIN line_items li
        ON lis.line_item_id = li.id::TEXT
)

-- Step 4: Aggregate with query filters
SELECT
    COALESCE(SUM(lit.quantity_ordered), 0) AS total_quantity_ordered
FROM line_items_with_details lit
WHERE lit.year = 2026
  AND lit.is_deleted = FALSE;

From a high level, here is how it works:

  1. Rank changes: For each line item and field, find the most recent log entry on or before the cutoff date.
  2. Pivot to columns: Turn one-row-per-change into one-row-per-line-item, with the latest value for each field.
  3. Join: Pull fields from the live line_items table that were never modified and therefore never logged.
  4. Aggregate: Sum the quantity ordered.

The query uses Common Table Expressions (CTEs) - which is basically a virtual table defined with WITH name AS (...). Each CTE behaves like a regular table while the query runs, but nothing is actually written to disk. When the query finishes, the virtual table is gone. Here we use three of them to break the reconstruction into steps you can read top to bottom: filter for the latest changes, pivot log rows into columns, join back to the live table, then aggregate. You could write this as one giant nested subquery, but CTEs make each stage explicit and much easier to debug.

For any app with reasonable activity, this will result in millions (or even billions!) of lines being logged. With each line representing a single field change, the table grows fast. Even with proper indexing, reconstructing historical state is slow enough that you'll usually want to run these queries in a scheduled job rather than on every page load.

That seems like a fair tradeoff to me. You get a complete audit trail and the ability to answer historical questions, all from one table you can add on day one. Start logging changes to your core models now, and the audit trails, trend charts, and bug replays can come whenever you need them.

Follow along for more

Get 1-2 posts per month that optimize for the overlap of engineering and curiosity.

No spam or ads. Unsubscribe anytime. Privacy Policy