How to work with partitioned in-memory tables

In DolphinDB, all partition schemes for databases also apply to in-memory tables except composite domain. Working with partitioned in-memory tables can utilize parallel computing capacities of multi-core CPUs.

How to work with partitioned in-memory tables
- 1. Load data as in-memory partitioned tables
- 2. Data manipulation with in-memory partitioned tables

1. Load data as in-memory partitioned tables

DolphinDB offers various ways to load a data set as a partitioned in-memory table. We start with creating a text file for the examples below.

n=30000000
workDir = "C:/DolphinDB/Data"
if(!exists(workDir)) mkdir(workDir)
trades=table(rand(`IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S,n) as sym, 2000.01.01+rand(365,n) as date, 10.0+rand(2.0,n) as price1, 100.0+rand(20.0,n) as price2, 1000.0+rand(200.0,n) as price3, 10000.0+rand(2000.0,n) as price4, 10000.0+rand(3000.0,n) as price5, 10000.0+rand(4000.0,n) as price6, rand(10,n) as qty1, rand(100,n) as qty2, rand(1000,n) as qty3, rand(10000,n) as qty4, rand(10000,n) as qty5, rand(10000,n) as qty6)
trades.saveText(workDir + "/trades.txt")

1.1 Use function `ploadText` to generate an in-memory table with sequential partition

This approach is the most simple way to create a partitioned in-memory table. However, it does not have certain benefits or flexibilities of other approaches. For examples, the input data file must be smaller than available memory; the partitioned in-memory table generated by this approach cannot use function sortBy! to perform meaningful within-partition sorting.

trades = ploadText(workDir + "/trades.txt");

1.2 Use function `loadTextEx` to generate an in-memory table with specified partition scheme

This approach is for the following use cases:

Frequently conduct within-partition sorting.
Frequently use the partitioning column as the "group by" or "context by" column.

The input data file must be smaller than available memory.

We need to use empty string ("") for parameter "tableName" of function loadTextEx and parameter "directory" of function database.

db = database("", VALUE, `IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S)
trades = db.loadTextEx("", `sym, workDir + "/trades.txt");

Use function sortBy! for within-partition sorting.

trades.sortBy!(`qty1);

trades.sortBy!(`date`qty1, false true);

trades.sortBy!(<qty1 * price1>, false);

When the partitioning column is the grouping column, the grouping calculation has the optimal performance. The following table compares the performance of the following query on 3 tables: an unpatitioned table generated by trades=loadText(workDir+"/trades.txt"), the partitioned table with sequential partition in 1.1, and the partitioned table with value partition on "sym" in this section.

timer(10) select std(qty1) from trades group by sym;

Here "timer(10)" means the total time consumed when the query is executed 10 times.

Table	Generated by	Time Consumed
unpartitioned table	`loadText`	3.69 second
partitioned table with sequential domain	`ploadText`	2.51 second
partitioned table with value domain	`loadTextEx`	0.17 second

1.3 Use function `loadTable` to import selected or all partitions of a table on disk into memory

This approach is for the following use cases:

To import a text file larger than available memory and only a subset of the data are needed each time.
The same data set is used repeatedly. It is much faster to load a database table than to import a text file.

Use function loadTextEx to create a table in a partitioned database on disk: (can also use function createPartitionedTable and append!)

db = database(workDir+"/tradeDB", RANGE, ["A","G","M","S","ZZZZ"])
db.loadTextEx(`trades, `sym, workDir + "/trades.txt");

To load only 2 partitions (["A","G") and ["M","S")) into memory:

db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades, ["A", "M"], 1);

Please note that we need to set the optional parameter "memoryMode" of function loadTable to 1. Otherwise only the metadata of the table are loaded.

1.4 Use function `loadTableBySQL` to import selected columns and/or rows of a partitioned table on disk

This is the most flexible way to create a partitioned in-memory table. It needs to be used after function loadTable.

db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades);

sample=loadTableBySQL(<select * from trades where date between 2000.03.01 : 2000.05.01>);

sample=loadTableBySQL(<select sym, date, price1, qty1 from trades where date between 2000.03.01 : 2000.05.01>);

dates = 2000.01.16 2000.02.14 2000.08.01
st = sql(<select sym, date, price1, qty1>, trades, expr(<date>, in, dates))
sample = loadTableBySQL(st);

colNames =`sym`date`qty2`price2
st= sql(sqlCol(colNames), trades)
sample = loadTableBySQL(st);

1.5 Syntax of related functions

ploadText(fileName, [delimiter], [schema])
loadTextEx(dbHandle, tableName, partitionColumns, filename, [delimiter=','], [schema])
database(directory, [partitionType], [partitionScheme], [locations])
loadTable(database, tableName, [partitions], [memoryMode=false])
loadTableBySQL(meta code representing a SQL query)
sortBy!(table, sortColumns, [sortDirections])
update!(table, colNames, newValues, [filter])
drop!(table, colNames)
erase!(table, filter)
rename!(table, ColName, newColName)

2. Data manipulation with in-memory partitioned tables

2.1 Add new columns

trades = ploadText(workDir + "/trades.txt");

SQL update statement

update trades set logPrice1= log(price1), newQty1= double(qty1);

function update!

trades.update!(`logPrice1`newQty1, <[log(price1), double(qty1)]>);

assignment statement

trades[`logPrice1`newQty1] = <[log(price1), double(qty1)]>;

2.2 Update existing columns

SQL update statement

update trades set qty1=qty1+10;

update trades set qty1=qty1+10 where sym = `IBM;

function update!

trades.update!(`qty1, <qty1+10>);

trades.update!(`qty1, <qty1+10>, <sym=`IBM>);

assginment statement

trades[`qty1] = <qty1+10>;

trades[`qty1, <sym=`IBM>] = <qty1+10>;

2.3 Delete rows

SQL delete statement

delete from trades where qty3<20;

function erase!

trades.erase!(< qty3<30 >);

2.4 Drop columns

trades.drop!("qty1");

2.5 Rename columns

trades.rename!("qty2", "qty2New");

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

partitioned_in_memory_table.md

partitioned_in_memory_table.md

How to work with partitioned in-memory tables

1. Load data as in-memory partitioned tables

1.1 Use function `ploadText` to generate an in-memory table with sequential partition

1.2 Use function `loadTextEx` to generate an in-memory table with specified partition scheme

1.3 Use function `loadTable` to import selected or all partitions of a table on disk into memory

1.4 Use function `loadTableBySQL` to import selected columns and/or rows of a partitioned table on disk

1.5 Syntax of related functions

2. Data manipulation with in-memory partitioned tables

2.1 Add new columns

2.2 Update existing columns

2.3 Delete rows

2.4 Drop columns

2.5 Rename columns

Files

partitioned_in_memory_table.md

Latest commit

History

partitioned_in_memory_table.md

File metadata and controls

How to work with partitioned in-memory tables

1. Load data as in-memory partitioned tables

1.1 Use function ploadText to generate an in-memory table with sequential partition

1.2 Use function loadTextEx to generate an in-memory table with specified partition scheme

1.3 Use function loadTable to import selected or all partitions of a table on disk into memory

1.4 Use function loadTableBySQL to import selected columns and/or rows of a partitioned table on disk

1.5 Syntax of related functions

2. Data manipulation with in-memory partitioned tables

2.1 Add new columns

2.2 Update existing columns

2.3 Delete rows

2.4 Drop columns

2.5 Rename columns

1.1 Use function `ploadText` to generate an in-memory table with sequential partition

1.2 Use function `loadTextEx` to generate an in-memory table with specified partition scheme

1.3 Use function `loadTable` to import selected or all partitions of a table on disk into memory

1.4 Use function `loadTableBySQL` to import selected columns and/or rows of a partitioned table on disk