-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
csvs-to-sqlite 2.0: dropping Pandas in favour of sqlite-utils #69
Comments
I need to be able to detect likely column types for this - see simonw/sqlite-utils#179 |
@simonw it will also be needed to be able to specify the separator: I couldn't find a way to specify it with |
Loading large files quicker will be a welcome boost! I have use cases where I need to load large RNA sequencing data, of ~20-30 GBs, into sqlite tables, and using pandas is a bottleneck. I'm currently exploring EDIT: clarify what I mean |
Should I just use sqlite-utils and not this library? Especially when just beginning? |
For example, if you have a directory or subdirectories of CSVs that you want to bundle together:
AFAIK, |
If your needs are simple - just loading a single CSV file - then yes, I'd recommend
|
My sqlite-utils library has evolved to the point where I think it would make a good foundation for the next version of
csvs-to-sqlite
.The main feature I'm excited about here is being able to handle giant CSV files - right now they have to be loaded into memory by Pandas, but sqlite-utils has similar functionality which handles them as streams, reducing the amount of memory needed to consume a huge file.
I intend to keep as much of the CLI API the same for the new version, but this is a big change so it's likely some cases will break. As such, I intend to keep the
1.x
branch around (and maintained with bug fixes) for users who find that 2.0 doesn't work for them.I'll pin this issue for a few weeks so people can comment on this plan before I start executing.
The text was updated successfully, but these errors were encountered: