Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve downloading files logic to check if a file already exists #405

Open
ek-nyc opened this issue Nov 12, 2024 · 1 comment
Open

Improve downloading files logic to check if a file already exists #405

ek-nyc opened this issue Nov 12, 2024 · 1 comment

Comments

@ek-nyc
Copy link

ek-nyc commented Nov 12, 2024

When I rerun my tests, it takes a long time to download the 3 files that already have been downloaded. The downloading steps should be skipped if the files already exist.

@alwayslove2013
Copy link
Collaborator

@ek-nyc could you please provide more detailed information?

Currently, there is a logic in VectorDBBench to skip file downloads. If we detect that a file with the same name has the same size, we will skip the download.

def validate_file(self, remote: pathlib.Path, local: pathlib.Path) -> bool:
info = self.bucket.get_object_meta(remote.as_posix())
# check size equal
remote_size, local_size = info.content_length, os.path.getsize(local)
if remote_size != local_size:
log.info(f"local file: {local} size[{local_size}] not match with remote size[{remote_size}]")
return False
return True

Additionally, please note that the default download location is the /tmp folder, which is typically cleared upon system reboot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants