You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create two real-world reference projects that showcase Ibis and IbisML at scale.
Outcomes
Documented end-to-end ML projects, including:
data ingestion
data exploration (using Ibis; stretch: produce visualizations using existing Ibis integrations)
data processing (including feature engineering using Ibis)
train-test split (manually using Ibis)
last-mile feature preprocessing (using IbisML)
handoff to model (approach TBD)
modeling (one using Dask-XGBoost on GPU, another using PyTorch)
stretch: real-time inference
Ideally, these can be written up as (series of) blog posts in the future.
They can also be submitted to conferences.
It could be useful to track approximate time needed for each stage of the project (e.g. to confirm whether most time really is spent on feature engineering).
Lessons learned on model handoff that can inform future work (if any necessary) in that area for IbisML
Also expect feedback across the rest of the pipeline, but this is where we have the most uncertainty
Projects
Lichess live win probability using distributed XGBoost
Full dataset size: >12TB
TBD using PyTorch
(Backup option) NYC taxi dataset
(Backup option) Bureau of Transportation Statistics full airline dataset
The text was updated successfully, but these errors were encountered:
Objective
Create two real-world reference projects that showcase Ibis and IbisML at scale.
Outcomes
Documented end-to-end ML projects, including:
Ideally, these can be written up as (series of) blog posts in the future.
They can also be submitted to conferences.
It could be useful to track approximate time needed for each stage of the project (e.g. to confirm whether most time really is spent on feature engineering).
Lessons learned on model handoff that can inform future work (if any necessary) in that area for IbisML
Also expect feedback across the rest of the pipeline, but this is where we have the most uncertainty
Projects
The text was updated successfully, but these errors were encountered: