You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The idea behind randomly/periodically selecting weekly blocks is that it allows use to adequately sample a 2 year period with some statistical guarantees about the proportional representation of weekdays/weekends and seasons within the validation and holdout sets.
The implication is that this provides (albeit somewhat weaker) guarantees about the distribution of the underlying grid state, seasonality effects, and our performance over the period.
In the ensembling squad this assumption was undermined by one Problem as our model performance varied a lot between years and one year happened to be sampled more than the other.
The current RandomSelector is therefore not robust enough to provide any guarantees about the statistics of our returns which are necessary to provide a reliable baseline against which we can compare optimised models.
This issue is more to document the concern and some possible avenues for taking this forward with different, and more sophisticated, selectors in future. Namely:
As a simple remedy to the above, we might have instead done something like
Cluster the dates by season
Within each cluster, sort the dates by their some difficulty measure
Systematically select dates (e.g every second date or in blocks of 7) to ensure (roughly) proportional statistics
This would retain the same seasonal guarantees as before, somewhat weakened the weekday/weekend guarantee, but at the benefit of more similar return statistics. This is just a simple example, perhaps there's an easier/better way to doing it.
Moreoever, if we ever wish to discriminate by other criteria, e.g. grid regimes, the example gets more complicated but the same principle applies.
The text was updated successfully, but these errors were encountered:
from @glennmoy
The idea behind randomly/periodically selecting weekly blocks is that it allows use to adequately sample a 2 year period with some statistical guarantees about the proportional representation of weekdays/weekends and seasons within the validation and holdout sets.
The implication is that this provides (albeit somewhat weaker) guarantees about the distribution of the underlying grid state, seasonality effects, and our performance over the period.
In the ensembling squad this assumption was undermined by one Problem as our model performance varied a lot between years and one year happened to be sampled more than the other.
The current RandomSelector is therefore not robust enough to provide any guarantees about the statistics of our returns which are necessary to provide a reliable baseline against which we can compare optimised models.
This issue is more to document the concern and some possible avenues for taking this forward with different, and more sophisticated, selectors in future. Namely:
As a simple remedy to the above, we might have instead done something like
This would retain the same seasonal guarantees as before, somewhat weakened the weekday/weekend guarantee, but at the benefit of more similar return statistics. This is just a simple example, perhaps there's an easier/better way to doing it.
Moreoever, if we ever wish to discriminate by other criteria, e.g. grid regimes, the example gets more complicated but the same principle applies.
The text was updated successfully, but these errors were encountered: