Available databases on ClickHouse

If you need a new database, please reach out to us via https://fb.workplace.com/groups/4571909969591489 (for metamates) or create an issue and book an OH with us at https://github.com/pytorch/pytorch/wiki/Dev-Infra-Office-Hours (for external partners).

The default database

The default database that includes all GitHub events, for example workflow_run. These tables includes the same information as the webhook payload https://docs.github.com/en/webhooks/webhook-events-and-payloads. The list includes:

In addition, it also includes several non-GitHub tables migrated there from Rockset. They are custom tables that are created to serve different use cases:

failed_test_runs includes the information about failed tests. It's populated by upload_test_stats.py script.
job_annotation is used in HUD to manually annotate a failure into several categories like INFRA_FLAKE, or BROKEN_TRUNK.
merge_bases contain the merge base of each pull requests. The information is populated by TD.
merges contains the information about merges from mergebot. This is used to compute the important % force merges KPI.
queue_times_historical stores the historical queue time by different runner types as populated by updateQueueTimes.mjs script.
rerun_disabled_tests is used by rerun disabled tests bot to confirm if a disabled test is still failing in trunk.
servicelab_torch_dynamo_perf_stats stores the internal service lab benchmark results. This should be on the benchmark database instead. Having it here is a mistake during the migration.
test_run_s3 keeps the test time for individual tests on, well, S3. This information is used later to build CI features that depends on test times, for example marking slow tests.
test_run_summary aggregates the information in test_run_s3 by test class and provide aggregated test time per class when computing CI test shards.

The benchmark database

The benchmark database for all benchmark and metric data. They powers HUD benchmark dashboards. They are being consolidated into oss_ci_benchmark_v3 so that all benchmark data can be found in one place. Until that happens, the list of benchmark tables includes:

inductor_torch_dynamo_perf_stats stores inductor benchmark data from inductor-perf-test-nightly.yml
inductor_torchao_perf_stats shares the same schema, but comes from torchao.yml. As the name implies, it's built for torchao.
oss_ci_benchmark_v2 is the generic benchmark database. It will be deprecated soon and be replaced by oss_ci_benchmark_v3.
torchbench_userbenchmark keeps the TorchBench user benchmark results, which is run by workflows like userbenchmark-a100.yml

The misc database

aggregated_test_metrics - to be deleted
aggregated_test_metrics_with_preproc - to be deleted
external_contribution_stats - powers the weekly external PR count on the KPIs page of HUD
metrics_ci_wait_time - to be deleted
ossci_uploaded_metrics - populated by here
queue_times_24h_stats - populated by pytorch-gha-infra lambda
rate_limit - used in future PR (maybe)
runner_cost - powers cost_analysis page, populated by pytorch-gha-infra lambda
stable_pushes - powers historical strict lag on KPIs page
test_file_to_oncall_mapping - to be deleted
workflow_ids_from_test_aggregates - to be deleted

The fortesting database

This is a special playground database that grants developers write access to the console by default. This can be used for testing database schemas and syntax, as well as insert queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Available databases on ClickHouse

The default database

The benchmark database

The misc database

The fortesting database

Clone this wiki locally