Sort out domains #20

DPeterK · 2020-02-26T11:28:43Z

The multi-attr tiledb writer currently uses a different set of domains to what is calculated by the data model. The data model provides only the 'super-domains' (the minimum set of highest-dimensionality domains that enclose the maximum number of input datasets), but the multi-attr writer makes one domain for each unique set of dimensions and writes all the datasets that match that set of dimensions.

For example, taking the following datasets:

a --> [x, y, z, t]
b --> [x, y, t]
c --> [x, y, t]
d --> [x, y, t1]
e --> [x1, y1, z, t1]
f --> [x1, y1, t1]
g --> [x, y, z, t]

This is the set of domains that would be made by the data model:

x,y,z,t --> domain_0
x1,y1,z,t1 --> domain_1
x,y,t1 --> domain_2

domain_0 --> a, b, c, g
domain_1 --> e, f
domain_2 --> d

And this is the set of domains that would be made by the multi-attr writer:

x,y,z,t --> a, g
x,y,t --> b, c
x,y,t1 --> d
x1,y1,z,t1 --> e
x1,y1,t1 --> f

Some time we should tidy this discrepancy. Assuming that multi-attr append goes in (see #19) then the multi-attr case should become the default, and the data model domain assignation algorithm should just be updated to match what the multi-attr writer is doing.

Here's the TODO list:

decide on a single writing strategy - potentially prefer multi-attr as it seems to be the best approach for storing multiple data vars
commonalise the domain algorithm between the data model and multi-attr writer
commonalise to a single writer

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sort out domains #20

Sort out domains #20

DPeterK commented Feb 26, 2020 •

edited

Loading

Sort out domains #20

Sort out domains #20

Comments

DPeterK commented Feb 26, 2020 • edited Loading

DPeterK commented Feb 26, 2020 •

edited

Loading