Can it replace the H2 database engine? #104

hw2499 · 2023-03-30T06:23:08Z

The functionality is very powerful. I have a question. If a csv file has a size of 1TB, what would the query efficiency be?
Does csvq want to read all 1TB of content into memory before subsequent processing?
Can it replace the H2 database engine?

mithrandie · 2023-04-02T19:40:37Z

No, csvq reads all the data into memory at runtime, so trying to handle a 1TB file is nearly impossible.

hw2499 · 2023-04-02T21:57:27Z

Is there any suggestion on how large a file size csvq is suitable for processing?
How can I solve this 1TB file scenario?

mithrandie · 2023-04-03T00:05:32Z

The file size you can handle depends on the query you want to execute and your system.

Csvq is not a DBMS, but an SQL interpreter that executes queries against text files such as csv.
I don't know what you want to do with the file and how you want to handle it, so I can't say what method would be appropriate. In general, however, it is appropriate to use a DBMS that is designed to handle large amounts of data.

hw2499 · 2023-04-03T09:09:14Z

I want to use this SQL interpreter to process large text files, but I am not sure how efficient it is to process large files. If it is very efficient, it can do many things in memory databases

derekmahar · 2023-04-03T16:02:29Z

Have you tried DuckDB?

hw2499 · 2023-04-08T12:57:13Z

I am not looking for a database, but for a solution that can process big data in memory. I feel that CVSQ is very powerful.

derekmahar · 2023-04-08T18:03:46Z

I agree that csvq is a very capable tool for querying CSV files using SQL and I still use it where it makes sense. DuckDB can do many of the same things and can process as much data as will fit in memory without ever creating a single table. Like csvq, it can read this data directly from CSV files, but also from Parquet files, and write results to such files. You need not create a DuckDB database to query very large CSV files in memory!

hw2499 · 2023-04-10T11:25:56Z

What types of databases does DuckDB belong to? I read the document and feel that it is similar to a sqlite database. What are the main scenarios in which DuckDB is used?

derekmahar · 2023-04-10T16:26:20Z

DuckDB is similar to SQLite in that it is an embedded database, but unlike SQLite, DuckDB is a column-oriented database such that it stores data column-wise, rather than row-wise. Column stores like DuckDB are designed for analysis of large data sets while SQLite is designed more for transaction processing. Neither has a server component, so they both are designed for local data processing on a personal computer. Column-oriented databases use various column data compression methods to store data more efficiently, scan and retrieve only the columns that the query selects (good for querying wide tables), and generally execute aggregate queries on columns more quickly.

hw2499 · 2023-04-11T02:32:41Z

Can duckDB process (query) big data like 1TB in memory, or are there any other solutions?
Does duckDB Client API support golang?

derekmahar · 2023-04-12T00:39:52Z

Can duckDB process (query) big data like 1TB in memory, or are there any other solutions?

DuckDB streams the input and results of most (or many) query operations, so most (or many) DuckDB commands can query very large data sets, even 1 TB, on computers that have much lower memory capacity.

Does duckDB Client API support golang?

https://pkg.go.dev/github.com/benjajaja/go-duckdb

hw2499 · 2023-04-12T04:43:49Z

ok, I will look into the use of duckdb and then adapt it to etl-engine products.
https://github.com/hw2499/etl-engine
What olap functions does duckbd support?

kpym · 2023-04-12T05:59:37Z

@hw2499 , @derekmahar Maybe the end of this discussion should be moved to the DuckDB Discussion Forum ? IMHO, the csvq issue tracker is not the place for this.

derekmahar · 2023-04-12T13:01:37Z

@kpym, I agree. I actually didn't realise that this discussion was about a csvq issue. I thought it was a csvq discussion topic. In any case, in the DuckDB discussion forums on GitHub or the DuckDB Discord server, the DuckDB developers and other users could better answer @hw2499's questions.

hw2499 · 2023-04-12T13:11:16Z

Okay, thank you. @kpym @derekmahar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can it replace the H2 database engine? #104

Can it replace the H2 database engine? #104

hw2499 commented Mar 30, 2023

mithrandie commented Apr 2, 2023

hw2499 commented Apr 2, 2023

mithrandie commented Apr 3, 2023

hw2499 commented Apr 3, 2023

derekmahar commented Apr 3, 2023

hw2499 commented Apr 8, 2023

derekmahar commented Apr 8, 2023 •

edited

Loading

hw2499 commented Apr 10, 2023

derekmahar commented Apr 10, 2023 •

edited

Loading

hw2499 commented Apr 11, 2023

derekmahar commented Apr 12, 2023

hw2499 commented Apr 12, 2023 •

edited

Loading

kpym commented Apr 12, 2023

derekmahar commented Apr 12, 2023

hw2499 commented Apr 12, 2023

Can it replace the H2 database engine? #104

Can it replace the H2 database engine? #104

Comments

hw2499 commented Mar 30, 2023

mithrandie commented Apr 2, 2023

hw2499 commented Apr 2, 2023

mithrandie commented Apr 3, 2023

hw2499 commented Apr 3, 2023

derekmahar commented Apr 3, 2023

hw2499 commented Apr 8, 2023

derekmahar commented Apr 8, 2023 • edited Loading

hw2499 commented Apr 10, 2023

derekmahar commented Apr 10, 2023 • edited Loading

hw2499 commented Apr 11, 2023

derekmahar commented Apr 12, 2023

hw2499 commented Apr 12, 2023 • edited Loading

kpym commented Apr 12, 2023

derekmahar commented Apr 12, 2023

hw2499 commented Apr 12, 2023

derekmahar commented Apr 8, 2023 •

edited

Loading

derekmahar commented Apr 10, 2023 •

edited

Loading

hw2499 commented Apr 12, 2023 •

edited

Loading