Skip to content

Commit

Permalink
data_sources -> data_tables & source -> data
Browse files Browse the repository at this point in the history
  • Loading branch information
brynpickering committed Sep 25, 2024
1 parent 683a5b7 commit a512239
Show file tree
Hide file tree
Showing 59 changed files with 480 additions and 480 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
## 0.7.0.dev5 (Unreleased)

### User-facing changes

|changed| `data_sources` -> `data_tables` and `data_sources.source` -> `data_tables.data`.
This change has occurred to avoid confusion between data "sources" and model energy "sources" (#673).

## 0.7.0.dev4 (2024-09-10)

### User-facing changes
Expand Down
74 changes: 37 additions & 37 deletions docs/creating/data_sources.md → docs/creating/data_tables.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# Loading tabular data (`data_sources`)
# Loading tabular data (`data_tables`)

We have chosen YAML syntax to define Calliope models as it is human-readable.
However, when you have a large dataset, the YAML files can become large and ultimately not as readable as we would like.
For instance, for parameters that vary in time we would have a list of 8760 values and timestamps to put in our YAML file!

Therefore, alongside your YAML model definition, you can load tabular data from CSV files (or from in-memory [pandas.DataFrame][] objects) under the `data_sources` top-level key.
Therefore, alongside your YAML model definition, you can load tabular data from CSV files (or from in-memory [pandas.DataFrame][] objects) under the `data_tables` top-level key.
As of Calliope v0.7.0, this tabular data can be of _any_ kind.
Prior to this, loading from file was limited to timeseries data.

The full syntax from loading tabular data can be found in the associated [schema][data-source-schema].
The full syntax from loading tabular data can be found in the associated [schema][data-table-schema].
In brief it is:

* **source**: path to file or reference name for an in-memory object.
* **data**: path to file or reference name for an in-memory object.
* **rows**: the dimension(s) in your table defined per row.
* **columns**: the dimension(s) in your table defined per column.
* **select**: values within dimensions that you want to select from your tabular data, discarding the rest.
Expand Down Expand Up @@ -126,9 +126,9 @@ In this section we will show some examples of loading data and provide the equiv
YAML definition to load data:

```yaml
data_sources:
data_tables:
pv_capacity_factor_data:
source: data_sources/pv_resource.csv
data: data_tables/pv_resource.csv
rows: timesteps
add_dims:
techs: pv
Expand Down Expand Up @@ -181,9 +181,9 @@ In this section we will show some examples of loading data and provide the equiv
YAML definition to load data:

```yaml
data_sources:
data_tables:
tech_data:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: [techs, parameters]
```

Expand Down Expand Up @@ -224,9 +224,9 @@ In this section we will show some examples of loading data and provide the equiv
YAML definition to load data:

```yaml
data_sources:
data_tables:
tech_data:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: [techs, parameters]
add_dims:
costs: monetary
Expand Down Expand Up @@ -272,7 +272,7 @@ In this section we will show some examples of loading data and provide the equiv
1. To limit repetition, we have defined [templates](templates.md) for our costs.

!!! info "See also"
Our [data source loading tutorial][loading-tabular-data] has more examples of loading tabular data into your model.
Our [data table loading tutorial][loading-tabular-data] has more examples of loading tabular data into your model.

## Selecting dimension values and dropping dimensions

Expand All @@ -290,9 +290,9 @@ Data in file:
YAML definition to load only data from nodes 1 and 2:

```yaml
data_sources:
data_tables:
tech_data:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: [techs, parameters]
columns: nodes
select:
Expand All @@ -312,22 +312,22 @@ You will also need to `drop` the dimension so that it doesn't appear in the fina
YAML definition to load only data from scenario 1:

```yaml
data_sources:
data_tables:
tech_data:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: [techs, parameters]
columns: scenarios
select:
scenarios: scenario1
drop: scenarios
```

You can then also tweak just one line of your data source YAML with an [override](scenarios.md) to point to your other scenario:
You can then also tweak just one line of your data table YAML with an [override](scenarios.md) to point to your other scenario:

```yaml
override:
switch_to_scenario2:
data_sources.tech_data.select.scenarios: scenario2 # (1)!
data_tables.tech_data.select.scenarios: scenario2 # (1)!
```

1. We use the dot notation as a shorthand for [abbreviate nested dictionaries](yaml.md#abbreviated-nesting).
Expand All @@ -348,9 +348,9 @@ For example, to define costs for the parameter `cost_flow_cap`:
| tech3 | monetary | cost_flow_cap | 20 | 45 | 50 |

```yaml
data_sources:
data_tables:
tech_data:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: [techs, costs, parameters]
columns: nodes
```
Expand All @@ -364,9 +364,9 @@ For example, to define costs for the parameter `cost_flow_cap`:
| tech3 | 20 | 45 | 50 |

```yaml
data_sources:
data_tables:
tech_data:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: techs
columns: nodes
add_dims:
Expand All @@ -384,9 +384,9 @@ Or to define the same timeseries source data for two technologies at different n
| 2005-01-01 01:00 | 200 | 200 |

```yaml
data_sources:
data_tables:
tech_data:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: timesteps
columns: [nodes, techs, parameters]
```
Expand All @@ -401,16 +401,16 @@ Or to define the same timeseries source data for two technologies at different n
| 2005-01-01 01:00 | 200 |

```yaml
data_sources:
data_tables:
tech_data_1:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: timesteps
add_dims:
techs: tech1
nodes: node1
parameters: source_use_max
tech_data_2:
source: data_sources/tech_data.csv
data: data_tables/tech_data.csv
rows: timesteps
add_dims:
techs: tech2
Expand All @@ -420,10 +420,10 @@ Or to define the same timeseries source data for two technologies at different n

## Loading CSV files vs `pandas` dataframes

To load from CSV, set the filepath in `source` to point to your file.
To load from CSV, set the filepath in `data` to point to your file.
This filepath can either be relative to your `model.yaml` file (as in the above examples) or an absolute path.

To load from a [pandas.DataFrame][], you can specify the `data_source_dfs` dictionary of objects when you initialise your model:
To load from a [pandas.DataFrame][], you can specify the `data_table_dfs` dictionary of objects when you initialise your model:

```python
import calliope
Expand All @@ -433,19 +433,19 @@ df2 = pd.DataFrame(...)
model = calliope.Model(
"path/to/model.yaml",
data_source_dfs={"data_source_1": df1, "data_source_2": df2}
data_table_dfs={"data_source_1": df1, "data_source_2": df2}
)
```

And then you point to those dictionary keys in the `source` for your data source:
And then you point to those dictionary keys in the `data` for your data table:

```yaml
data_sources:
data_tables:
ds1:
source: data_source_1
data: data_source_1
...
ds2:
source: data_source_2
data: data_source_2
...
```

Expand All @@ -454,7 +454,7 @@ data_sources:
Rows correspond to your dataframe index levels and columns to your dataframe column levels.

You _cannot_ specify [pandas.Series][] objects.
Ensure you convert them to dataframes (`to_frame()`) before adding them to your data source dictionary.
Ensure you convert them to dataframes (`to_frame()`) before adding them to your data table dictionary.

## Important considerations

Expand All @@ -468,8 +468,8 @@ This could be defined in `rows`, `columns`, or `add_dims`.
3. `add_dims` to add dimensions.
This means you can technically select value "A" from dimensions `nodes`, then drop `nodes`, then add `nodes` back in with the value "B".
This effectively replaces "A" with "B" on that dimension.
3. The order of tabular data loading is in the order you list the sources.
If a new table has data which clashes with preceding data sources, it will override that data.
3. The order of tabular data loading is in the order you list the tables.
If a new table has data which clashes with preceding tables, it will override that data.
This may have unexpected results if the files have different dimensions as the dimensions will be broadcast to match each other.
4. CSV files must have `.csv` in their filename (even if compressed, e.g., `.csv.zip`).
If they don't, they won't be picked up by Calliope.
Expand All @@ -481,7 +481,7 @@ E.g.,
nodes:
node1.techs: {tech1, tech2, tech3}
node2.techs: {tech1, tech2}
data_sources:
data_tables:
...
```
6. We process dimension data after loading it in according to a limited set of heuristics:
Expand Down
8 changes: 4 additions & 4 deletions docs/creating/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ We distinguish between:
- the model **definition** (your representation of a physical system in YAML).

Model configuration is everything under the top-level YAML key [`config`](config.md).
Model definition is everything else, under the top-level YAML keys [`parameters`](parameters.md), [`techs`](techs.md), [`nodes`](nodes.md), [`templates`](templates.md), and [`data_sources`](data_sources.md).
Model definition is everything else, under the top-level YAML keys [`parameters`](parameters.md), [`techs`](techs.md), [`nodes`](nodes.md), [`templates`](templates.md), and [`data_tables`](data_tables.md).

It is possible to define alternatives to the model configuration/definition that you can refer to when you initialise your model.
These are defined under the top-level YAML keys [`scenarios` and `overrides`](scenarios.md).
Expand All @@ -52,7 +52,7 @@ The layout of that directory typically looks roughly like this (`+` denotes dire
+ model_definition
- nodes.yaml
- techs.yaml
+ data_sources
+ data_tables
- solar_resource.csv
- electricity_demand.csv
- model.yaml
Expand All @@ -63,7 +63,7 @@ In the above example, the files `model.yaml`, `nodes.yaml` and `techs.yaml` toge
This definition could be in one file, but it is more readable when split into multiple.
We use the above layout in the example models.

Inside the `data_sources` directory, tabular data are stored as CSV files.
Inside the `data_tables` directory, tabular data are stored as CSV files.

!!! note
The easiest way to create a new model is to use the `calliope new` command, which makes a copy of one of the built-in examples models:
Expand All @@ -85,4 +85,4 @@ The rest of this section discusses everything you need to know to set up a model
- More details on the [model configuration](config.md).
- The key parts of the model definition, first, the [technologies](techs.md), then, the [nodes](nodes.md), the locations in space where technologies can be placed.
- How to use [technology and node templates](templates.md) to reduce repetition in the model definition.
- Other important features to be aware of when defining your model: defining [indexed parameters](parameters.md), i.e. parameter which are not indexed over technologies and nodes, [loading tabular data](data_sources.md), and defining [scenarios and overrides](scenarios.md).
- Other important features to be aware of when defining your model: defining [indexed parameters](parameters.md), i.e. parameter which are not indexed over technologies and nodes, [loading tabular data](data_tables.md), and defining [scenarios and overrides](scenarios.md).
Loading

0 comments on commit a512239

Please sign in to comment.