More sophisticated sampling techniques #7

oxinabox · 2020-12-07T19:29:11Z

The idea behind randomly/periodically selecting weekly blocks is that it allows use to adequately sample a 2 year period with some statistical guarantees about the proportional representation of weekdays/weekends and seasons within the validation and holdout sets.

The implication is that this provides (albeit somewhat weaker) guarantees about the distribution of the underlying grid state, seasonality effects, and our performance over the period.

In the ensembling squad this assumption was undermined by one Problem as our model performance varied a lot between years and one year happened to be sampled more than the other.

The current RandomSelector is therefore not robust enough to provide any guarantees about the statistics of our returns which are necessary to provide a reliable baseline against which we can compare optimised models.

This issue is more to document the concern and some possible avenues for taking this forward with different, and more sophisticated, selectors in future. Namely:

As a simple remedy to the above, we might have instead done something like

Cluster the dates by season
Within each cluster, sort the dates by their some difficulty measure
Systematically select dates (e.g every second date or in blocks of 7) to ensure (roughly) proportional statistics

This would retain the same seasonal guarantees as before, somewhat weakened the weekday/weekend guarantee, but at the benefit of more similar return statistics. This is just a simple example, perhaps there's an easier/better way to doing it.

Moreoever, if we ever wish to discriminate by other criteria, e.g. grid regimes, the example gets more complicated but the same principle applies.

sdl1 mentioned this issue Sep 27, 2021

Add rejection sampling capability #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More sophisticated sampling techniques #7

More sophisticated sampling techniques #7

oxinabox commented Dec 7, 2020

More sophisticated sampling techniques #7

More sophisticated sampling techniques #7

Comments

oxinabox commented Dec 7, 2020