csvs-to-sqlite 2.0: dropping Pandas in favour of sqlite-utils #69

simonw · 2020-07-09T19:00:12Z

My sqlite-utils library has evolved to the point where I think it would make a good foundation for the next version of csvs-to-sqlite.

The main feature I'm excited about here is being able to handle giant CSV files - right now they have to be loaded into memory by Pandas, but sqlite-utils has similar functionality which handles them as streams, reducing the amount of memory needed to consume a huge file.

I intend to keep as much of the CLI API the same for the new version, but this is a big change so it's likely some cases will break. As such, I intend to keep the 1.x branch around (and maintained with bug fixes) for users who find that 2.0 doesn't work for them.

I'll pin this issue for a few weeks so people can comment on this plan before I start executing.

The text was updated successfully, but these errors were encountered:

simonw · 2020-11-03T23:21:36Z

I need to be able to detect likely column types for this - see simonw/sqlite-utils#179

victornoel · 2021-01-17T11:20:48Z

@simonw it will also be needed to be able to specify the separator: I couldn't find a way to specify it with sqllite-utils insert.

plpxsk · 2021-02-03T19:31:44Z

Loading large files quicker will be a welcome boost!

I have use cases where I need to load large RNA sequencing data, of ~20-30 GBs, into sqlite tables, and using pandas is a bottleneck. I'm currently exploring sqlite-utils insert but would welcome the approachability of csvs-to-sqlite (eg, super simple to dump a whole directory of files into a sqlite db)

EDIT: clarify what I mean

deverman · 2021-02-10T13:35:38Z

Should I just use sqlite-utils and not this library? Especially when just beginning?

plpxsk · 2021-02-10T15:45:50Z

Should I just use sqlite-utils and not this library? Especially when just beginning?

csvs-to-sqlite still seems great for dumping a large number of (not too huge) CSVs into a single db (sqlite) file.

For example, if you have a directory or subdirectories of CSVs that you want to bundle together:

csvs-to-sqlite ~/Downloads/*.csv my-downloads.db
csvs-to-sqlite ~/path/to/directory all-my-csvs.db
etc

AFAIK, sqlite-utils would need you to write additional code to handle more than 1 CSV (or TSV/JSON) file at a time.

simonw · 2021-02-11T03:15:59Z

Should I just use sqlite-utils and not this library? Especially when just beginning?

If your needs are simple - just loading a single CSV file - then yes, I'd recommend sqlite-utils instead - it has better performance as it works by streaming files rather than loading them all into memory.

csvs-to-sqlite is still better if you want to transform a folder full of files.

simonw added the enhancement label Jul 9, 2020

simonw pinned this issue Jul 9, 2020

simonw mentioned this issue Jul 9, 2020

Progress bars would be useful #66

Open

simonw added this to the 2.0 milestone Aug 9, 2020

ryancheley mentioned this issue Apr 2, 2022

Ignore lines are start and end of file #72

Open

BryantD mentioned this issue Dec 20, 2023

error_bad_lines argument has been deprecated #88

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csvs-to-sqlite 2.0: dropping Pandas in favour of sqlite-utils #69

csvs-to-sqlite 2.0: dropping Pandas in favour of sqlite-utils #69

simonw commented Jul 9, 2020

simonw commented Nov 3, 2020

victornoel commented Jan 17, 2021

plpxsk commented Feb 3, 2021 •

edited

Loading

deverman commented Feb 10, 2021

plpxsk commented Feb 10, 2021

simonw commented Feb 11, 2021

csvs-to-sqlite 2.0: dropping Pandas in favour of sqlite-utils #69

csvs-to-sqlite 2.0: dropping Pandas in favour of sqlite-utils #69

Comments

simonw commented Jul 9, 2020

simonw commented Nov 3, 2020

victornoel commented Jan 17, 2021

plpxsk commented Feb 3, 2021 • edited Loading

deverman commented Feb 10, 2021

plpxsk commented Feb 10, 2021

simonw commented Feb 11, 2021

plpxsk commented Feb 3, 2021 •

edited

Loading