Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude optional fields from database schema #1923

Open
sgillespie opened this issue Dec 16, 2024 · 2 comments
Open

Exclude optional fields from database schema #1923

sgillespie opened this issue Dec 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@sgillespie
Copy link
Contributor

We have several optional database columns, which we may not fill in, depending on the configuration. Currently, we have 2 mechanisms to deal with that:

  • Course grained schema variants
  • Fill in null.

We should explore more general ways to solve this.

@sgillespie sgillespie added the bug Something isn't working label Dec 16, 2024
@sgillespie
Copy link
Contributor Author

sgillespie commented Dec 16, 2024

In the case that we do not fill in the optional column, we usually fill in null. There are potentially two problems with this:

  • Null storage is not free (explained below)
  • Tables with large number of number columns can be difficult to read

The Effect of NULL in PostgreSQL

In PostgreSQL, table row is physically represented as specified in the manual:

All table rows are structured in the same way. There is a fixed-size header (occupying 23 bytes on most machines), followed by an optional null bitmap, an optional object ID field, and the user data.

The per-row null bitmap is present only if the row contains least one null value. If it is not present, all values are assumed to be non-null. The null bitmap occupies enough bytes to have one bit per data column.

In effect, there is no storage overhead for tables with 8 columns or less (because 8 bits equals the padding of one byte). Otherwise, the per-row overhead will most likely use 8 bytes (enough for up to 64 columns).

Solutions

Wait for Persistent v3

There is a proposed redesign that would solve this. It would generate separate, extensible types for inserting. Unfortunately, there has been no update for at least 4 years.

Course Grained Schema Variants

Another approach, which is already in use, is to define the schema in small pieces, and then stitch them together at runtime.

This works quite well in practice, but it may be a bit heavyhanded for dealing with one or two columns. Common fields have to be duplicated across each variant. This also means combining options will be difficult.

Write NULLs

The easiest solution would be to ignore all the problems above and continue to write nullss. There likely wouldn't be any improvement in records that have ANY nulls.

Consider the case tx_out, where most records don't contain an inline_datum_id (about 2% by my count).

Custom Backend

Yet another option would be extend the PostgreSQL backend, to override its insert implementation. Then we could redact any column we shouldn't be writing to.

@sgillespie
Copy link
Contributor Author

The current persistent way to exclude columns from inserts is to define separate entities, very similar to our schema variants.

I propose we use a combination of our existing strategies:

  • Course grained schema variants
  • Write nulls

Schema variants could be useful when dealing with whole tables or a large number of fields (say, three or more). When dealing with one or two columns, we can live with nulls. If we find we are accumulating a lot of them, we can group them and create configuration options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant