Exclude optional fields from database schema #1923

sgillespie · 2024-12-16T21:04:29Z

We have several optional database columns, which we may not fill in, depending on the configuration. Currently, we have 2 mechanisms to deal with that:

Course grained schema variants
Fill in null.

We should explore more general ways to solve this.

The text was updated successfully, but these errors were encountered:

sgillespie · 2024-12-16T21:40:31Z

In the case that we do not fill in the optional column, we usually fill in null. There are potentially two problems with this:

Null storage is not free (explained below)
Tables with large number of number columns can be difficult to read

The Effect of NULL in PostgreSQL

In PostgreSQL, table row is physically represented as specified in the manual:

All table rows are structured in the same way. There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field, and the user data.

The per-row null bitmap is present only if the row contains least one null value. If it is not present, all values are assumed to be non-null. The null bitmap occupies enough bytes to have one bit per data column.

In effect, there is no storage overhead for tables with 8 columns or less (because 8 bits equals the padding of one byte). Otherwise, the per-row overhead will most likely use 8 bytes (enough for up to 64 columns).

Solutions

Wait for Persistent v3

There is a proposed redesign that would solve this. It would generate separate, extensible types for inserting. Unfortunately, there has been no update for at least 4 years.

Course Grained Schema Variants

Another approach, which is already in use, is to define the schema in small pieces, and then stitch them together at runtime.

This works quite well in practice, but it may be a bit heavyhanded for dealing with one or two columns. Common fields have to be duplicated across each variant. This also means combining options will be difficult.

Write NULLs

The easiest solution would be to ignore all the problems above and continue to write nullss. There likely wouldn't be any improvement in records that have ANY nulls.

Consider the case tx_out, where most records don't contain an inline_datum_id (about 2% by my count).

Custom Backend

Yet another option would be extend the PostgreSQL backend, to override its insert implementation. Then we could redact any column we shouldn't be writing to.

sgillespie · 2024-12-16T21:58:03Z

The current persistent way to exclude columns from inserts is to define separate entities, very similar to our schema variants.

I propose we use a combination of our existing strategies:

Course grained schema variants
Write nulls

Schema variants could be useful when dealing with whole tables or a large number of fields (say, three or more). When dealing with one or two columns, we can live with nulls. If we find we are accumulating a lot of them, we can group them and create configuration options.

sgillespie added the bug Something isn't working label Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exclude optional fields from database schema #1923

Exclude optional fields from database schema #1923

sgillespie commented Dec 16, 2024

sgillespie commented Dec 16, 2024 •

edited

Loading

sgillespie commented Dec 16, 2024

Exclude optional fields from database schema #1923

Exclude optional fields from database schema #1923

Comments

sgillespie commented Dec 16, 2024

sgillespie commented Dec 16, 2024 • edited Loading

The Effect of NULL in PostgreSQL

Solutions

Wait for Persistent v3

Course Grained Schema Variants

Write NULLs

Custom Backend

sgillespie commented Dec 16, 2024

sgillespie commented Dec 16, 2024 •

edited

Loading