Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support metadata table "Metadata Log Entries" #846

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rshkv
Copy link

@rshkv rshkv commented Dec 25, 2024

Re #823. This adds support for the Metadata Log Entries metadata table.

This is building on @xxchan's unmerged #822. I'll update and rebase this PR when #822 merges.

Metadata Log Entries is the metadata log with the latest snapshot per metadata file.

Reference implementations:

@rshkv rshkv force-pushed the wr/metadata-log-entries branch from bdb6372 to 067ada8 Compare December 25, 2024 13:24
Comment on lines 155 to 167
fn snapshot_id_as_of_time(
table_metadata: &TableMetadataRef,
timestamp_ms_inclusive: i64,
) -> Option<&SnapshotRef> {
let mut snapshot_id = None;
// The table metadata snapshot log is chronological
for log_entry in table_metadata.history() {
if log_entry.timestamp_ms <= timestamp_ms_inclusive {
snapshot_id = Some(log_entry.snapshot_id);
}
}
snapshot_id.and_then(|id| table_metadata.snapshot_by_id(id))
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking Java and Python implementations to compute the latest snapshot given a timestamp.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that history() is chronologically ordered because that gets asserted TableMetadata#try_normalize.

@rshkv rshkv force-pushed the wr/metadata-log-entries branch from 067ada8 to 8d76183 Compare December 25, 2024 13:43
Comment on lines 211 to 323
// Include the current metadata locaction and modification time in the table. This matches
// the Java implementation. Unlike the Java implementation, a current metadata location is
// optional here. In that case, we omit current metadata from the metadata log table.
if let Some(current_metadata_location) = &scan.metadata_location {
append_metadata_log_entry(
scan.metadata_ref.last_updated_ms(),
current_metadata_location,
);
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking Java and Python implementations. As commented, the difference is that the metadata location is optional in Table.

Alternatively, we could expect here.

Comment on lines -1022 to +1034
.metadata_location(table_metadata1_location.as_os_str().to_str().unwrap())
.metadata_location(template_json_location)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change, the location in the metadata log and the current location would both be ../metadata/v1.json.

I wanted to have a distinction so we can assert that metadata_log_entries includes the current metadata location, even if not in the metadata log.

@Xuanwo
Copy link
Member

Xuanwo commented Dec 30, 2024

#822 has been merged, let's move on!

@rshkv rshkv force-pushed the wr/metadata-log-entries branch from 8d76183 to 395ff1f Compare December 30, 2024 16:59
@rshkv rshkv marked this pull request as ready for review December 30, 2024 17:04
@rshkv
Copy link
Author

rshkv commented Dec 30, 2024

Thank you, @Xuanwo. This is rebased and ready for review.

@rshkv rshkv force-pushed the wr/metadata-log-entries branch from 76c2bf2 to 439c529 Compare January 2, 2025 12:47
Comment on lines 454 to 543
-- validity:
-- validity:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was failing for me locally.

pub mod metadata_scan;
pub mod metadata_table;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up to this comment: #822 (comment)

Don't need to do in this PR. Up to you.

pub fn metadata_table(self) -> MetadataTable {
pub fn metadata_table(&self) -> MetadataTable<'_> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up to this comment: #822 (comment)

Also not necessary to do in this PR.

@Xuanwo
Copy link
Member

Xuanwo commented Jan 2, 2025

Let's get #863 merge first.

@liurenjie1024
Copy link
Contributor

Let's get #863 merge first.

+1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants