Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row count checks on final parquet, _metadata and ancillary files #373

Open
nevencaplar opened this issue Aug 9, 2024 · 0 comments · May be fixed by #428
Open

Row count checks on final parquet, _metadata and ancillary files #373

nevencaplar opened this issue Aug 9, 2024 · 0 comments · May be fixed by #428
Assignees
Labels
enhancement New feature or request

Comments

@nevencaplar
Copy link
Member

This is part of the verification pipeline tickets, connected with #344

Implement row count checks on

  1. Final Parquet Files
    ● Get row counts from file footers.
    ● Compare total with truth.
    ● Compare per partition with intermediate files.
  2. _metadata File
    ● Get row counts from _metadata file.
    ● Compare total with truth.
    ● Compare per partition with intermediate files.
  3. Ancillary Files
    ● Check numbers in all ancillary files.
    ● Total in README.
    ● Counts per file/partition in csv files
@nevencaplar nevencaplar added the enhancement New feature or request label Aug 9, 2024
@nevencaplar nevencaplar moved this to Todo in HATS / LSDB Aug 9, 2024
@troyraen troyraen mentioned this issue Aug 14, 2024
14 tasks
@nevencaplar nevencaplar removed the status in HATS / LSDB Oct 11, 2024
@troyraen troyraen linked a pull request Nov 5, 2024 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
2 participants