Skip to content

Latest commit

 

History

History
188 lines (145 loc) · 7.65 KB

partitioned_in_memory_table.md

File metadata and controls

188 lines (145 loc) · 7.65 KB

How to work with partitioned in-memory tables

In DolphinDB, all partition schemes for databases also apply to in-memory tables except composite domain. Working with partitioned in-memory tables can utilize parallel computing capacities of multi-core CPUs.

1. Load data as in-memory partitioned tables

DolphinDB offers various ways to load a data set as a partitioned in-memory table. We start with creating a text file for the examples below.

n=30000000
workDir = "C:/DolphinDB/Data"
if(!exists(workDir)) mkdir(workDir)
trades=table(rand(`IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S,n) as sym, 2000.01.01+rand(365,n) as date, 10.0+rand(2.0,n) as price1, 100.0+rand(20.0,n) as price2, 1000.0+rand(200.0,n) as price3, 10000.0+rand(2000.0,n) as price4, 10000.0+rand(3000.0,n) as price5, 10000.0+rand(4000.0,n) as price6, rand(10,n) as qty1, rand(100,n) as qty2, rand(1000,n) as qty3, rand(10000,n) as qty4, rand(10000,n) as qty5, rand(10000,n) as qty6)
trades.saveText(workDir + "/trades.txt")

1.1 Use function ploadText to generate an in-memory table with sequential partition

This approach is the most simple way to create a partitioned in-memory table. However, it does not have certain benefits or flexibilities of other approaches. For examples, the input data file must be smaller than available memory; the partitioned in-memory table generated by this approach cannot use function sortBy! to perform meaningful within-partition sorting.

trades = ploadText(workDir + "/trades.txt");

1.2 Use function loadTextEx to generate an in-memory table with specified partition scheme

This approach is for the following use cases:

  • Frequently conduct within-partition sorting.
  • Frequently use the partitioning column as the "group by" or "context by" column.

The input data file must be smaller than available memory.

We need to use empty string ("") for parameter "tableName" of function loadTextEx and parameter "directory" of function database.

db = database("", VALUE, `IBM`MSFT`GM`C`FB`GOOG`V`F`XOM`AMZN`TSLA`PG`S)
trades = db.loadTextEx("", `sym, workDir + "/trades.txt");

Use function sortBy! for within-partition sorting.

trades.sortBy!(`qty1);

trades.sortBy!(`date`qty1, false true);

trades.sortBy!(<qty1 * price1>, false);

When the partitioning column is the grouping column, the grouping calculation has the optimal performance. The following table compares the performance of the following query on 3 tables: an unpatitioned table generated by trades=loadText(workDir+"/trades.txt"), the partitioned table with sequential partition in 1.1, and the partitioned table with value partition on "sym" in this section.

timer(10) select std(qty1) from trades group by sym;

Here "timer(10)" means the total time consumed when the query is executed 10 times.

Table Generated by Time Consumed
unpartitioned table loadText 3.69 second
partitioned table with sequential domain ploadText 2.51 second
partitioned table with value domain loadTextEx 0.17 second

1.3 Use function loadTable to import selected or all partitions of a table on disk into memory

This approach is for the following use cases:

  • To import a text file larger than available memory and only a subset of the data are needed each time.
  • The same data set is used repeatedly. It is much faster to load a database table than to import a text file.

Use function loadTextEx to create a table in a partitioned database on disk: (can also use function createPartitionedTable and append!)

db = database(workDir+"/tradeDB", RANGE, ["A","G","M","S","ZZZZ"])
db.loadTextEx(`trades, `sym, workDir + "/trades.txt");

To load only 2 partitions (["A","G") and ["M","S")) into memory:

db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades, ["A", "M"], 1);

Please note that we need to set the optional parameter "memoryMode" of function loadTable to 1. Otherwise only the metadata of the table are loaded.

1.4 Use function loadTableBySQL to import selected columns and/or rows of a partitioned table on disk

This is the most flexible way to create a partitioned in-memory table. It needs to be used after function loadTable.

db = database(workDir+"/tradeDB")
trades=loadTable(db, `trades);

sample=loadTableBySQL(<select * from trades where date between 2000.03.01 : 2000.05.01>);

sample=loadTableBySQL(<select sym, date, price1, qty1 from trades where date between 2000.03.01 : 2000.05.01>);

dates = 2000.01.16 2000.02.14 2000.08.01
st = sql(<select sym, date, price1, qty1>, trades, expr(<date>, in, dates))
sample = loadTableBySQL(st);

colNames =`sym`date`qty2`price2
st= sql(sqlCol(colNames), trades)
sample = loadTableBySQL(st);

1.5 Syntax of related functions

  • ploadText(fileName, [delimiter], [schema])
  • loadTextEx(dbHandle, tableName, partitionColumns, filename, [delimiter=','], [schema])
  • database(directory, [partitionType], [partitionScheme], [locations])
  • loadTable(database, tableName, [partitions], [memoryMode=false])
  • loadTableBySQL(meta code representing a SQL query)
  • sortBy!(table, sortColumns, [sortDirections])
  • update!(table, colNames, newValues, [filter])
  • drop!(table, colNames)
  • erase!(table, filter)
  • rename!(table, ColName, newColName)

2. Data manipulation with in-memory partitioned tables

2.1 Add new columns

trades = ploadText(workDir + "/trades.txt");
  1. SQL update statement
update trades set logPrice1= log(price1), newQty1= double(qty1);
  1. function update!
trades.update!(`logPrice1`newQty1, <[log(price1), double(qty1)]>);
  1. assignment statement
trades[`logPrice1`newQty1] = <[log(price1), double(qty1)]>;

2.2 Update existing columns

  1. SQL update statement
update trades set qty1=qty1+10;

update trades set qty1=qty1+10 where sym = `IBM;
  1. function update!
trades.update!(`qty1, <qty1+10>);

trades.update!(`qty1, <qty1+10>, <sym=`IBM>);
  1. assginment statement
trades[`qty1] = <qty1+10>;

trades[`qty1, <sym=`IBM>] = <qty1+10>;

2.3 Delete rows

  1. SQL delete statement
delete from trades where qty3<20;
  1. function erase!
trades.erase!(< qty3<30 >);

2.4 Drop columns

trades.drop!("qty1");

2.5 Rename columns

trades.rename!("qty2", "qty2New");