diff --git a/docs/architecture/index.md b/docs/architecture/index.md index 7992e0f37..bf223221f 100644 --- a/docs/architecture/index.md +++ b/docs/architecture/index.md @@ -29,7 +29,7 @@ KX’s [Fusion interfaces](../interfaces/index.md#fusion-interfaces) connect kdb ### Tickerplant (TP) -A kdb+ processing acting as a TP (tickerplant) captures the initial data feed, writes it to the log file and [publishes](../kb/publish-subscribe.md) these messages to any registered subscribers. +A kdb+ processing acting as a TP (tickerplant) captures the initial data feed, writes it to the log file and publishes these messages to any registered subscribers. Aims for zero-latency. Includes ingesting data in batch mode. @@ -37,6 +37,8 @@ Manages subscriptions: adds and removes subscribers, and sends subscriber table Handles end-of-day (EOD) processing. +[`tick.q`](tickq.md) represents a tickerplant and is provided as a starting point for most environments. + !!! tip "Best practices for tickerplants" Tickerplants should be lightweight, not capturing data and using very little memory. @@ -67,6 +69,8 @@ At startup, the RDB sends a message to the tickerplant and receives a reply cont At end of day usually writes intraday data to the Historical Database, and sends it a new EOD message. +[`r.q`](rq.md) represents a tickerplant and is provided as a starting point for most environments. + !!! tip "Best practices for real-time databases" RDBs queried intraday should exploit attributes in their tables. For example, a trade table might be marked as sorted by time (`` `s#time``) and grouped by sym (`` `g#sym``). @@ -157,26 +161,3 @@ Can connect both the real-time and historical data to allow users to query acros [Query Routing: A kdb+ framework for a scalable, load balanced system](../wp/query-routing/index.md) -## :fontawesome-solid-hand-point-right: What next? - -:fontawesome-regular-map: -[Building real-time tick subscribers](../wp/rt-tick/index.md) -
-:fontawesome-regular-map: -[Data recovery for kdb+ tick](../wp/data-recovery.md) -
-:fontawesome-regular-map: -[Disaster-recovery planning for kdb+ tick systems](../wp/disaster-recovery/index.md) -
-:fontawesome-regular-map: -[Intraday writedown solutions](../wp/intraday-writedown/index.md) -
-:fontawesome-regular-map: -[Query routing: a kdb+ framework for a scalable load-balanced system](../wp/query-routing/index.md) -
-:fontawesome-regular-map: -[Order book: a kdb+ intraday storage and access methodology](../wp/order-book.md) -
-:fontawesome-regular-map: -[kdb+tick profiling for throughput optimization](../wp/tick-profiling.md) - diff --git a/docs/architecture/rq.md b/docs/architecture/rq.md new file mode 100644 index 000000000..7bd3164d0 --- /dev/null +++ b/docs/architecture/rq.md @@ -0,0 +1,126 @@ +--- +title: RBB (r.q) | Documentation for q and kdb+ +description: How to construct an RDB +keywords: kdb+, q, rdb, streaming +--- +# Real-time Database (RDB) using r.q + +`r.q` is available from :fontawesome-brands-github:[KxSystems/kdb-tick](https://github.com/KxSystems/kdb-tick) + +## Overview + +A kdb+ process acting as an RDB stores a current day’s data in-memory for client queries. +It can write its contents to disk at end-of-day, clearing out it in-memory data to prepare for the next day. +After writing data to disk, it instructs a HDB to load the written data. + +### Customization + +`r.q` provides a starting point to most environments. The source code is freely avaialble and can be tailered to individual needs. For example: + +#### Memory use + +The default RDB stores all of a days data in memory before end-of-day writing to disk. The host machines should be configured that all required resources +can handle the demands that may be made of them (both for today and the future). +Depending on when there may be periods of low/no activity, [garbage collection](../ref/dotq.md#gc-garbage-collect) could be deployed after clearing tables at end-of-day, or a system for intra-day writedowns. + +#### User queries + +A gateway process should control user queries and authorization/authentication, using RDBs/RTEs/HDBs to retrieve the required information. +If known/common queries can be designed, the RDB can load additional scripts to pre-define functions a gateway can call. + +### End-of-day + +The end-of-day event is governed by the tickerplant process. The tickerplant calls the RDB [`.u.end`](#uend) function when this event occurs. +The main end-of-day event for an RDB is to save todays data from memory to disk, clear its tables and cause the HDB to be aware of a new days dataset for it to access. + +!!! Note "[.u.rep](#urep) sets the HDB directory be the same as the tickerplant log file directory. This can be edited to use a different directory if required" + +### Recovery + +Using IPC, the RDB process can retrieve the current tickerplant log location and use via the [variables](tickq.md#variables) the tickerplant maintains. +The function [`.u.rep`](#urep) is then used to populate any tables from the log. + +!!! Note "The RDB should be able to access the tickerplant log from a directory on the same machine. The RDB/tickerplant can be changed to reside on different hosts but this increases the resources needed to transmit the log file contents over the network." + +## Usage + +```bash +q tick/r.q [host1]:port1[:usr:pwd] [host2]:port2[:usr:pwd] [-p 5020] +``` + +| Parameter Name | Description | Default | +| ---- | ---- | --- | +| host1 | host running kdb+ instance that the RDB will subscribe to e.g. tickerplant host | localhost | +| port1 | port of kdb+ instance that the RDB will subscribe to e.g. tickerplant port | 5010 | +| host2 | host of kdb+ instance to inform at end-of-day, after data saved to disk e.g. HBD host | localhost | +| port2 | port of kdb+ instance to inform at end-of-day, after data saved to disk e.g. HBD port | 5012 | +| usr | username | <none> | +| pwd | password | <none> | +| -p | [listening port](../basics/cmdline.md#-p-listening-port) for client communications | <none> | + +!!! Note "Standard kdb+ [command line options](../basics/cmdline.md) may also be passed" + +## Variables + +| Name | Description | +| ---- | ---- | +| .u.x | Connection list. First element populated by [`host1`](#usage) (tickerplant), and second element populated by [`host2`](#usage) (HDB) | + +## Functions + +Functions are open source & open to customisation. + +### upd + +Called by external process to update table data. Defaults to [`insert`](../ref/insert.md) to insert/append data to a table. + +```q +upd[x;y] +``` +Where + +* `x` is a symbol atom naming a table +* `y` is table data to add to table `x`, which can contain one or more rows. + +### .u.end + +Perform end-of-day actions of saving tables to disk, clearing tables and running reload on HDB instance to make it aware of new day of data. + +```q +.u.end[x] +``` +Where x is the date that has ended. + +Actions performed: + +* finds all tables with the group attribute on the sym column +* calls [`.Q.dpft`](../ref/dotq.md#hdpf-save-tables), with params: + * HDB connection from [`.u.x`](#variables) (second element) + * current directory (note: directory was changed in [`.u.rep`](#urep)) + * date passed to this function (`x`) i.e. day that is ending + * `` `sym `` column +* re-apply group attribute to sym column for those tables found in first steps (as clearing the table removed grouped attribute) + +### .u.rep + +Initialise RDB by creating tables, which is then populated with any existing tickerplant log. Will set the HDB directory to use at end-of-day. + +```q +.u.rep[x;y] +``` +Where + +* `x` is a list of table details, each element a two item list + * symbol for table name + * schema table +* `y` is the tickerplant log details.log comprising of a two item list: + * a long for the log count (null represents no log) + * a file symbol for the location of the current tickerplant log (null represents no log) + +Actions performed: + +* tables are created using `x` +* if a tickerplant log file has been provided + * log file is replayed using [`-11!`](../basics/internal.md#-11-streaming-execute) to populate tables + * set the HDB directory by changing the working directory to the same directory as used by the log file (see [`tick.q`](tickq.md#usage) for details on how to alter the log file directory). + diff --git a/docs/architecture/tickq.md b/docs/architecture/tickq.md new file mode 100644 index 000000000..8e5d5da64 --- /dev/null +++ b/docs/architecture/tickq.md @@ -0,0 +1,248 @@ +--- +title: tick.q | Documentation for q and kdb+ +description: How to construct a tickerplant process +keywords: hdb, kdb+, q, tick, tickerplant, streaming +--- +# Tickerplant (TP) using tick.q + +`tick.q` is available from :fontawesome-brands-github:[KxSystems/kdb-tick](https://github.com/KxSystems/kdb-tick) + +## Overview + +All incoming streaming data is processed by a kdb+ process acting as a tickerplant. +A tickerplant writes all data to a tickerplant log (to permit data recovery) and publishes data to subscribed clients, for example a RDB. + +### Customization + +`tick.q` provides a starting point to most environments. The source code is freely available and can be tailored to individual needs. + +### Schema file + +A tickerplant requires a schema file. +A schema file describes the data you plan to capture, by specifying the tables to be populated by the tickerplant environment. +The [datatypes](../basics/datatypes.md) and [attributes](../ref/set-attribute.md) are denoted within the file as shown in this example: +```q +quote:([]time:`timespan$(); sym:`g#`symbol$(); bid:`float$(); ask:`float$(); bsize:`long$(); asize:`long$(); mode:`char$(); ex:`char$()) +trade:([]time:`timespan$(); sym:`g#`symbol$(); price:`float$(); size:`int$(); side:`char$()) +``` + +The default setup requires the first two columns to be `time` and `sym`. + +### Real-time vs Batch Mode + +_The mode is controlled via the [`-t`](#usage) command line parameter._ +Batch mode can alleviate CPU use on both the tickerplant and its subscribers by grouping together multiple ticks within the timer interval prior to sending/writing. +This comes at the expense of tickerplant memory (required memory to hold several ticks) and increased latency that may occur between adding to the batch and sending. +There is no ideal setting for all deployments as it depends on the frequency of the ticks received. +Real-time mode processes every tick as soon as they occur. + +!!! note "A feedhandler can be written to send messages comprising of multiple ticks to a tickerplant. In this situation real-time mode will already be processing batches of messages." + +### End-of-day + +The tickerplant watches for a change in the current day. +As the day ends, a new tickerplant log is created and the tickerplant informs all subscribed clients, via their `.u.end` function. +For example, a RDB may implement [`.u.end`](rq.md#uend) to write down all in-memory tables to disk which can then be consumed by a HDB. + +### Tickerplant Logs + +[Log files](../kb/logging.md) are created using the format `/` e.g. `tplog/sym2022.02.02`. +These record all published messages and permit recovery by downstream clients, by allowing them to replay messages they have missed. +The directory used should have enough space to record all published data. + +As end-day-day causes a file roll, a process should be put in place to remove old log files that are no longer required. + +!!! note "The tickerplant does not replay log files for clients, but exposes [log file details](#variables) to clients so they can access the current log file" + +### Publishing to a tickerplant + +Feed handlers publish ticks to the tickerplant using [IPC](../basics/ipc.md). These can be a kdb+ process or clients written in any number of different languages that use one of the available client APIs. +Each feed sends data to the tickerplant by calling the [`.u.upd`](#uupd) function. The call can include one or many ticks. For example, publishing from kdb+: + +```q +q)h:hopen 5010 / connect to TP on port 5010 of same host +q)neg[h](".u.upd";`trade;(.z.n;`APPL;35.65;100;`B)) / async publish single tick to a table called trade +q)neg[h](".u.upd";`trade;(10#.z.n;10?`MSFT`AMZN;10?10000f;10?100i;10?`B`S)) / async publish 10 ticks of some random data to a table called trade +... +``` + +### Subscribing to a tickerplant + +Clients, such as a RDB or RTE, can subscribe by calling [`.u.sub`](uq.md#usub) over [IPC](../basics/ipc.md). + +```q +q)h:hopen 5010 / connect to TP on port 5010 of same host +q)h".u.sub[`;`]" / subscribe to all updates +``` +```q +q)h:hopen 5010 / connect to TP on port 5010 of same host +q)h".u.sub[`trade;`MSFT.O`IBM.N]" / subscribe to updates to trade table that contain sym value of MSFT.O or IBM.N only +``` + +Clients should implement functions [`upd`](rq.md#upd) to receive updates, and [`.u.end`](rq.md#uend) to perform any end-of-day actions. + +## Usage + +```bash +q tick.q SRC DST [-p 5010] [-t 1000] [-o hours] +``` + +| Parameter Name | Description | Default | +| --- | --- | --- | +| SRC | schema filename, loaded using the format `tick/.q` | sym | +| DST | directory to be used by tickerplant logs. _No tickerplant log is created if no directory specified_ | <none> | +| -p | [listening port](../basics/cmdline.md#-p-listening-port) for client communications | 5010 | +| -t | [timer period](../basics/cmdline.md#-t-timer-ticks) in milliseconds. Use zero value to enable real-time mode, otherwise will operate in batch mode. | real-time mode (with timer of 1000ms) | +| -o | [utc offset](../basics/cmdline.md#-o-utc-offset) | localtime | + +!!! Note "Standard kdb+ [command line options](../basics/cmdline.md) may also be passed" + +## Variables + +| Name | Description | +| ---- | ---- | +| .u.w | Dictionary of registered clients interest in data being processed i.e. tables->(handle;syms) | +| .u.i | Msg count in log file | +| .u.j | Total msg count (log file plus those held in buffer) - used when in batch mode | +| .u.t | Table names | +| .u.L | TP log filename | +| .u.l | Handle to tp log file | +| .u.d | Current date | + +## Functions + +Functions are open source & open to customisation. + +### .u.endofday + +Performs end-of-day actions. + +```q +.u.endofday[] +``` + +Actions performed: + +* inform all subscribed clients (for example, RDB/RTE/etc) that the day is ending by calling [.u.end](uq.md#uend) +* increment current date ([`.u.d`](#variables)) to next day +* roll log if using tickerplant log, i.e. + * close current tickerplant log ([`.u.l`](#variables)) + * create a new tickerplant log file i.e set [`.u.l`](#variables), call [`.u.ld`](#uld) with new date + +### .u.tick + +Performs initialisation actions for the tickerplant. + +```q +.u.tick[x;y] +``` + +Where + +* `x` is the name of the schema file without the `.q` file extension i.e. [`SRC`](#usage) command line parameter +* `y` is the directory used to store tickerplant logs i.e. [`DST`](#usage) command line parameter + +Actions performed: + +* call [`.u.init[]`](uq.md#uinit) to initialise table info, [`.u.t`](#variables) and [`.u.w`](#variables) +* check first two columns in all tables of provided schema are called `time` and `sym` (throw `timesym` error if not) +* apply [`grouped`](../ref/set-attribute.md#grouped-and-parted) attribute to the sym column of all tables in provided schema +* set [`.u.d`](#variables) to current local date, using [`.z.D`](../ref/dotz.md#zt-zt-zd-zd-timedate-shortcuts) +* if a tickerplant log filename was provided: + * set [`.u.L`](#variables) with a temporary value of `` `:/.......... `` (will have date added in next step) + * create/initialise the log file by calling [`.u.ld`](#uld), passing [`.u.d`](#variables) (current local date) + * set [`.u.l`](#variables) to log file handle + +### .u.ld + +Initialise or reopen existing log file. + +```q +.u.ld[x] +``` + +Where `x` is current date. Returns handle of log file for that date. + +Actions performed: + +* using [`.u.L`](#variables), change last 10 chars to provided date and create log file if it doesnt yet exist +* set [`.u.i`](#variables) and [`.u.j`](#variables) to count of valid messages currently in log file +* if log file is found to be corrupt (size bigger than size of number of valid messages) an error is returned +* open new/existing log file + +### .u.ts + +Given a date, runs end-of-day procedure if a new day has started. + +```q +.u.ts[x] +``` +Where x is a date. + +Compares date provided with [`.u.d`](#variables). If no change, no action taken. +If one day difference (i.e. a new day), [`.u.endofday`](#uendofday) is called. +More than one day results in an error and the kdb+ timer is cancelled. + +### .u.upd + +Update tickerplant with data to process/analyse. External processes call this to input data into the tickerplant. + +```q +.u.upd[x;y] +``` +Where + +* `x` is table name (sym) +* `y` is data for table `x` (list of column data, each element can be an atom or list) + +#### Batch Mode + +Add each received message to the batch and record message to the tickerplant log. Batch is published on running timer. + +Actions performed: +* If the first element of `y` is not a timespan (or list of timespan) + * inspect [`.u.d`](#variables), if a new day has occured call [`.z.ts`](#batch-mode_1) + * add a new timespan column populated with the current local time ([`.z.P`](../ref/dotz.md#zp-local-timestamp)). If mutiple rows of data, all rows receive the same time. +* Add data to current batch (i.e. new data `y` inserted into table `x`), which will be published on batch timer [`.z.ts`](#batch-mode_1). +* If tickerplant log file created, write upd function call & params to the log and increment [`.u.j`](#variables) so that an RDB can execute what was originally called during recovery. + + +#### Realtime Mode + +Publish each received message to all interested clients & record message to tickerplant log. + +Actions performed: + +* Checks if end-of-day procedure should be run by calling [`.u.ts`](#uts) with the current date +* If the first element of `y` is not a timespan (or list of timespan), add a new timespan column populated with the current local time ([`.z.P`](../ref/dotz.md#zp-local-timestamp)). If mutiple rows of data, all rows receive the same time. +* Retrieves the column names of table `x` +* Publish data to all interested clients, by calling [`.u.pub`](uq.md#upub) with table name `x` and table generated from `y` and column names. +* If tickerplant log file created, write upd function call & params to the log and increment [`.u.i`](#variables) so that an RDB can execute what was originally called during recovery + +### .z.ts + +Defines the action for the kdb+ timer callback function [`.z.ts`](../ref/dotz.md#zts-timer). + +The frequency of the timer was set on the [command line](#usage) ([`-t`](../basics/cmdline.md#-t-timer-ticks) command-line option or [`\t`](../basics/syscmds.md#t-timer) system command). + +#### Batch Mode + +Runs on system timer at specified interval. + +Actions performed: + +* For every table in [`.u.t`](#variables) + * publish data to all interested clients, by calling [`.u.pub`](uq.md#upub) with table name `x` and table generated from `y` and column names. + * reapply the grouped attribute to the sym column +* Update count of processed messages by setting [`u.i`](#variables) to [`u.j`](#variables) (the number of batched messages). +* Checks if end-of-day procedure should be run by calling [`.u.ts`](#uts) with the current date + +#### Realtime Mode + +If batch timer not specified, system timer is set to run every 1000 milliseconds to check if end-of-day has occured. +End-of-day is checked by calling [`.u.ts`](#uts), passing current local date ([`.z.D`](../ref/dotz.md#zt-zt-zd-zd-timedate-shortcuts)). + +### Pub/Sub functions + +`tick.q` also loads [`u.q`](uq.md) which enables all of its features within the tickerplant. + diff --git a/docs/architecture/uq.md b/docs/architecture/uq.md new file mode 100644 index 000000000..2499fc5a6 --- /dev/null +++ b/docs/architecture/uq.md @@ -0,0 +1,178 @@ +--- +title: u.q | Documentation for q and kdb+ +description: How to construct systems from kdb+ processes +keywords: kdb+, q, tick, tickerplant, streaming +--- +# u.q + +`u.q` is available from :fontawesome-brands-github:[KxSystems/kdb-tick](https://github.com/KxSystems/kdb-tick) + +## Overview + +Contains functions to allow clients to subscribe to all or subsets of available data, publishing to interested clients and alerting clients to events, for example, end-of-day. Tracks client subscription interest and removes client subscription details on their disconnection. + +This script is loaded by other processes, for example [a tickerplant](tickq.md). + +## Usage + +To allow the ability to publish data to any process, do the following: + +- load `u.q` +- declare the tables to be published in the top level namespace. Each table must contain a column called `sym`, which acts as the single key field to which subscribers subscribe +- initialize by calling [`.u.init[]`](#uinit) +- publish data by calling [`.u.pub[table name; table data]`](#upub) + +The list of tables that can be published and the processes currently subscribed are held in [`.u.w`](#variables). + +Subscriber processes must open a connection to the publisher and call [`.u.sub[tablename;list_of_symbols_to_subscribe_to]`](#usub). + +If a subscriber calls `.u.sub` again, the current subscription is overwritten either for all tables (if a wildcard is used) or the specified table. +To add to a subscription, for example, add more `syms` to a current subscription, the subscriber can call [`.u.add`](#uadd)). + +Clients should define a [`upd`](rq.md#upd) function to receive updates, and [`.u.end`](rq.md#uend) function for end-of-day events. + +## Variables + +| Name | Description | +| ---- | ---- | +| .u.w | Dictionary of registered client interest in data being processed (for example, tables->(handle;syms) | +| .u.t | Table names | + +## Functions + +Functions are open source and open to customisation. + +### .u.init + +Initialise variables used to track registered clients. + +```q +.u.init[] +``` + +Initialises [variables](#variables) by retreiving all tables defined in the root namespace. Used to track client interest in data being published. + +### .u.del + +Delete subscriber from dictionary of known subscribers ([`.u.w`](#variables)) for given table + +```q +.u.del[x;y] +``` +Where + +* `x` is a table name +* `y` is the connection handle + +### .u.sel + +Select from table, given optional sym filter. Used to filter tables to clients who may not want everything from the table. + +```q +.u.sel[x;y] +``` +Where + +* `x` is a table +* `y` is a list of syms (can be empty list) + +returns the table `x`, which can be filtered by `y`. + +### .u.pub + +Publish updates to subscribers. + +```q +.u.pub[x;y] +``` +Where + +* `x` is table name (sym type) +* `y` is new data for table `x` (table type) + +Actions performed: + +* find interested client handles for table `x` and any filter they may have (using [`.u.w`](#variables)) +* for each client + * filter `y` using [`.u.sel`](#usel) (if client specifed a filter at subscription time) + * publish [asynchronously](../basics/ipc.md#async-message-set) to client, calling their `upd` function with parameters _table name_ and _table data_. + + +### .u.add + +Add client subscription interest in table with optional filter. + +```q +.u.add[x;y] +``` +Where +* `x` is a table name (sym) +* `y` is list of syms used to filter table data, with empty sym representing for all table data + +Actions performed: + +* uses [`.z.w`](../ref/dotz.md#zw-handle) to get current client handle. +* find any existing subscriptions to table `x` for client (using [`.u.w`](#variables)) + * if existing, update filter with union on `y` + * else a new entry is added to [`.u.w`](#variables) with client handle, `x` and `y`. + +Returns 2 element list. The first element is the table name. The second element depends on whether `x` refers to a keyed table. + +* If `x` is a keyed table, [`.u.sel`](#usel) is used to select from the keyed table the required syms +* otherwise returns an empty table `x` (schema definition of table), with the [grouped attribute](../ref/set-attribute.md#grouped-and-parted) applied to the sym column. + +### .u.sub + +Used by clients to register subscription interest. + +```q +.u.sub[x;y] +``` +Where + +* `x` is a table name (sym) +* `y` is list of syms used to filter table data, with empty sym representing for all table data + +If `x` is empty symbol, client is subscribed to all known tables using `y` criteria. This is achieved by calling .u.sub for each table in [`.u.t`](#variables). +It then returns a list of all the return values provided by .u.sub (i.e. a list of pairs comprising of table name and table definition). + +An error is returned if the table does not exist. + +For the subscribing client, any previous registered in the given tables are removed prior to reinstating new criteria provided i.e. calls [`.u.del`](#udel). + +Calls [`.u.add`](#uadd) to record the client subscription and passes the returned values to client. + +### .u.end + +Inform all registered clients that end-of-day has occured. + +```q +.u.end[x] +``` + +Where `x` is a date, representing the day that is ending. + +Iterates over all client handles via [`.u.w`](#variables) and asyncronously calls their `.u.end` function passing `x`. + +### .z.pc + +Implementation of [`.z.pc`](../ref/dotz.md#zpc-close) callback for connection close. + +Called when a client disconnects. The client handle provided is used to call [`.u.del`](#udel) for all tables. This ensures all subscriptions are removed for that client. + +## Example + +[`tick.q`](tickq.md) is an example of a tickerplant that uses `u.q` for pub/sub. + +In addition, the example scripts below demonstrate pub/sub in a standalone publisher and subscriber. +They can be downloaded from :fontawesome-brands-github:[KxSystems/cookbook/pubsub](https://github.com/KxSystems/cookbook/tree/master/pubsub). +Each script should be run from the OS command prompt as shown in the following example. + +```bash +$ q publisher.q +$ q subscriber.q +``` + +The publisher generates some random data and publishes it periodically on a timer. + +The subscriber receives data from the publisher and is displayed on the screen. You can modify the subscription request and the `upd` function of the subscriber as required. You can run multiple subscribers at once. diff --git a/docs/basics/comparison.md b/docs/basics/comparison.md index 25c04acbb..b8a28995a 100644 --- a/docs/basics/comparison.md +++ b/docs/basics/comparison.md @@ -66,6 +66,8 @@ q)(1 + 1e-13) = 1 1b ``` +`< > = >= <= <>` are [multithreaded primitives](../kb/mt-primitives.md). + !!! tip "For booleans, `<>` is the same as _exclusive or_ (XOR)." diff --git a/docs/basics/datatypes.md b/docs/basics/datatypes.md index 864331017..16b293894 100644 --- a/docs/basics/datatypes.md +++ b/docs/basics/datatypes.md @@ -25,7 +25,7 @@ n c name sz literal null inf SQL Java .Net 10 c char 1 " " " " Character char 11 s symbol \` \` varchar 12 p timestamp 8 dateDtimespan 0Np 0Wp Timestamp DateTime (RW) -13 m month 4 2000.01m 0Nm +13 m month 4 2000.01m 0Nm 0Wm 14 d date 4 2000.01.01 0Nd 0Wd date Date 15 z datetime 8 dateTtime 0Nz 0wz timestamp Timestamp DateTime (RO) 16 n timespan 8 00:00:00.000000000 0Nn 0Wn Timespan TimeSpan diff --git a/docs/img/parallelism.jpg b/docs/img/parallelism.jpg deleted file mode 100644 index 53f02b97c..000000000 Binary files a/docs/img/parallelism.jpg and /dev/null differ diff --git a/docs/interfaces/c-client-for-q.md b/docs/interfaces/c-client-for-q.md index 5259d8364..cd340b48b 100644 --- a/docs/interfaces/c-client-for-q.md +++ b/docs/interfaces/c-client-for-q.md @@ -584,9 +584,8 @@ if(handle==-3){ } ``` -Prior to 4.1t 2023.11.10, SSL/TLS connections can be used from the initialization thread only, i.e. the thread which first calls any `khp` function since the start of the application. It can now be used for one-shot synchronous requests. - -The lib is sensitive to the same environment variables as kdb+, noted at [Knowledge Base: SSL/TLS](../kb/ssl.md) +The lib is sensitive to the same environment variables as kdb+, noted at [Knowledge Base: SSL/TLS](../kb/ssl.md). +Using `khpunc` for SSL/TLS connections can be used from the initialization thread only, see [SSL/TLS thread](../kb/ssl.md#thread-support) support for more details. The OpenSSL libs are loaded dynamically, the first time a TLS connection is requested. It may be forced on startup with diff --git a/docs/interfaces/img/matlab.png b/docs/interfaces/img/matlab.png deleted file mode 100644 index 232263232..000000000 Binary files a/docs/interfaces/img/matlab.png and /dev/null differ diff --git a/docs/interfaces/index.md b/docs/interfaces/index.md index bced80459..2edde30a5 100644 --- a/docs/interfaces/index.md +++ b/docs/interfaces/index.md @@ -39,11 +39,10 @@ Our Fusion interfaces are [automl](https://github.com/KxSystems/automl)[Automate machine learning in kdb+](../ml.md) [cookbook](https://github.com/KxSystems/cookbook)Companion files to the Knowledge Base [help](https://github.com/KxSystems/help)Online **help** for q -[insights-assemblies](https://github.com/KxSystems/insights-assemblies) Deploy assemblies for **KX Insights** ==new== [jupyterq](https://github.com/KxSystems/jupyterq)**Jupyter** kernel for kdb+ [kdb](https://github.com/KxSystems/kdb)Companion files to **kdb+** [kdb-taq](https://github.com/KxSystems/kdb-taq)Processing **trade-and-quote** data -[kdb-tick](https://github.com/KxSystems/kdb-tick)[**Ticker**plant](../kb/kdb-tick.md) +[kdb-tick](https://github.com/KxSystems/kdb-tick)[Tickerplant](../architecture/index.md) [man](https://github.com/KxSystems/man)[man-style reference](../about/man.md) [ml](https://github.com/KxSystems/ml)[**Machine Learning** Toolkit](../ml.md) [mlnotebooks](https://github.com/KxSystems/mlnotebooks)Jupyter notebooks with ML examples diff --git a/docs/interfaces/matlab-client-for-q.md b/docs/interfaces/matlab-client-for-q.md index 0f01587b9..60475b023 100644 --- a/docs/interfaces/matlab-client-for-q.md +++ b/docs/interfaces/matlab-client-for-q.md @@ -1,24 +1,75 @@ --- -title: Working with Matlab | Interfaces | kdb+ and q documentation -description: How connect a Matlab client program to a kdb+ server process +title: Working with MATLAB | Interfaces | kdb+ and q documentation +description: How connect a MATLAB client program to a kdb+ server process --- -# ![Matlab](img/matlab.png) Working with Matlab +# Working with MATLAB - - -Support for Matlab is a part of [Datafeed Toolbox for Matlab](https://uk.mathworks.com/help/datafeed/kx-systems-inc-.html): since R2007a edition. - -MathWorks provides functions overview, usage instructions and some examples on the toolbox webpage. +## Installation !!! note "Versions" - As Matlab/datafeed toolbox evolves features or instruction below are subject to revisions. Please refer to toolbox documentation for latest version. + As MATLAB/datafeed toolbox evolves features or instruction below are subject to revisions. Please refer to toolbox documentation for latest version. Users have reported that this works with more recent versions (e.g. R2015b on RHEL 6.8/2016b and 2017a on macOS). See also community-supported native connector :fontawesome-brands-github: [dmarienko/kdbml](https://github.com/dmarienko/kdbml) -First, we start up a kdb+ process that we wish to communicate with from Matlab and load some sample data into it. +=== "MATLAB R2021a and later" + + Download and unzip [kx_kdbplus.zip](https://uk.mathworks.com/matlabcentral/answers/864350-trading-toolbox-functionality-for-kx-systems-inc-kdb-in-matlab-r2021a-and-later). + Add the resulting directory to your [MATLAB path](https://uk.mathworks.com/help/matlab/matlab_env/what-is-the-matlab-search-path.html), for example in MATLAB + ```matlab + >> addpath('/Users/Developer/matlabkx') + ``` + +=== "MATLAB prior to R2021a" + + Support for MATLAB is a part of [Datafeed Toolbox for MATLAB](https://uk.mathworks.com/help/releases/R2020b/datafeed/kx-systems-inc-.html): since R2007a edition. + +The MATLAB integration depends on the two Java files `c.jar` and `jdbc.jar`. + +:fontawesome-brands-github: +[KxSystems/kdb/c/c.jar](https://github.com/KxSystems/kdb/blob/master/c/c.jar) +:fontawesome-brands-github: +[KxSystems/kdb/c/jdbc.jar](https://github.com/KxSystems/kdb/blob/master/c/jdbc.jar) + +Add the JAR files to the classpath used by MATLAB. It can be added permanently by editing `classpath.txt` (type `edit classpath.txt` at the MATLAB prompt) or for the duration of a particular session using the `javaaddpath` function, for example + +```matlab +>> javaaddpath /home/myusername/jdbc.jar +>> javaaddpath /home/myusername/c.jar +``` + +!!! note "Installation directory" + + In these examples change `/home/myusername` to the directory where `jdbc.jar` and `c.jar` are installed. + +Alternatively, this can be achieved in a MATLAB source file (i.e., \*.m file) adding the following two functions before calling `kx` functions. + +```matlab +javaaddpath('/home/myusername/jdbc.jar') +javaaddpath('/home/myusername/c.jar') +``` + +Confirm they have been added successfully using the `javaclasspath` function. + +```matlab +>> javaclasspath + STATIC JAVA PATH +... + /opt/matlab/2015b/java/jar/toolbox/stats.jar + /opt/matlab/2015b/java/jar/toolbox/symbol.jar + + DYNAMIC JAVA PATH + + /home/myusername/jdbc.jar + /home/myusername/c.jar +>> +``` + +## Connecting to a q process + +First, we start up a kdb+ process that we wish to communicate with from MATLAB and load some sample data into it. Save following as `tradedata.q` file ```q @@ -48,7 +99,7 @@ totalvolume2:{[stock;minvolume] select sum(volume) from trade where sec = stock, Then -```powershell +```bash q tradedata.q -p 5001 ``` @@ -80,54 +131,7 @@ XYZ 13.64856 2900 6.435824 2005.04.03 q) ``` -The Matlab integration depends on the two Java files `c.jar` and `jdbc.jar`. -For the purposes of this recipe, we assume this is available on the machine Matlab is running on, at `C:\q\jdbc.jar`. - -:fontawesome-brands-github: -[KxSystems/kdb/c/c.jar](https://github.com/KxSystems/kdb/blob/master/c/c.jar) -:fontawesome-brands-github: -[KxSystems/kdb/c/jdbc.jar](https://github.com/KxSystems/kdb/blob/master/c/jdbc.jar) - -We then start a new Matlab session. From here on, `>>` represents the Matlab prompt. - - -## Connecting to a q process - -We assume a kdb+ process running on the local host on port 5001 and that the `jdbc.jar` is installed. - -First we need to add the JAR file to the classpath used by Matlab. We can either permanently add it by editing `classpath.txt` (type `edit classpath.txt` at the Matlab prompt) or for the duration of a particular session using the `javaaddpath` function. We’ll use the latter here. - -```matlab ->> javaaddpath /home/myusername/jdbc.jar ->> javaaddpath /home/myusername/c.jar -``` - -!!! note "Installation directory" - - In these examples change `/home/myusername` to the directory where `jdbc.jar` and `c.jar` are installed. - -Alternatively, this can be achieved in a Matlab source file (i.e., \*.m file) adding the following two functions before calling `kx` functions. - -```matlab -javaaddpath('/home/myusername/jdbc.jar') -javaaddpath('/home/myusername/c.jar') -``` - -We can confirm that we’ve added this successfully using the `javaclasspath` function. - -```matlab ->> javaclasspath - STATIC JAVA PATH -... - /opt/matlab/2015b/java/jar/toolbox/stats.jar - /opt/matlab/2015b/java/jar/toolbox/symbol.jar - - DYNAMIC JAVA PATH - - /home/myusername/jdbc.jar - /home/myusername/c.jar ->> -``` +We then start a new MATLAB session. From here on, `>>` represents the MATLAB prompt. We’re now ready to open a connection to the q process: @@ -145,7 +149,7 @@ q = We can also pass a username:password string as the third parameter to the `kx` function if it is required to log in to the q process. -The `q` value is a normal Matlab object and we can inspect the listed properties. We’ll use this value in all our communications with the q process. +The `q` value is a normal MATLAB object and we can inspect the listed properties. We’ll use this value in all our communications with the q process. We close a connection using the `close` function: @@ -171,13 +175,9 @@ We close a connection using the `close` function: java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(Unknown Source) - at java.net.SocketOutputStream.write(Unknown Source) - at c.w(c.java:99) - at c.k(c.java:107) - at c.k(c.java:108) Error in ==> kx.fetch at 65 @@ -222,28 +222,29 @@ Then we can fetch it: hundreds = -java.lang.Object[]: - [100x1 int32] - [100x1 int32] + java.lang.Object[]: + + [100×1 int64] + [100×1 int64] ``` -We can use the `cell` function to strip the Java array wrapper away: +We can use the [`cell`](https://uk.mathworks.com/help/matlab/ref/cell.html) function to strip the Java array wrapper away: ```matlab >> hundreds_as_cell = cell(hundreds) hundreds_as_cell = - [100x1 int32] - [100x1 int32] + 2×1 cell array ->> + {100×1 int64} + {100×1 int64} ``` Tables are returned as an object with an array property for each column. Taking the first 10 rows of the `trade` table as an example: ```q -q) 10 # trade +q)10#trade sec price volume exchange date ---------------------------------------- ACME 89.5897 1300 6.58303 2005.04.26 @@ -258,18 +259,18 @@ ABC 58.26731 2100 5.220929 2004.09.10 XYZ 74.14568 2900 5.075229 2004.08.24 ``` -Will be returned in Matlab: +Will be returned in MATLAB: ```matlab ->> ten = fetch(q, '10 # trade') +>> ten = fetch(q, '10#trade') ten = - sec: {10x1 cell} - price: [10x1 double] - volume: [10x1 int32] - exchange: [10x1 double] - date: [10x1 double] + sec: {10×1 cell} + price: [10×1 double] + volume: [10×1 int64] + exchange: [10×1 double] + date: [10×1 double] ``` With suitable computation in q, we can return data suitable for immediate plotting. Here we compute a 10-item moving average over the `` `ACME `` prices: @@ -279,13 +280,13 @@ q)mavg[10;exec price from trade where sec=`ACME] 89.5897 65.9259 61.50204 53.32677 54.74408 57.39743 57.15958 62.33525 56.8732.. ``` ```matlab ->> plot (fetch(q,'mavg[10;exec price from trade where sec=`ACME]')) +>> acme = fetch(q,'mavg[10;exec price from trade where sec=`ACME]') ``` ## Metadata -The q integration in Matlab provides the `tables` meta function. +The q integration in MATLAB provides the `tables` meta function. ```matlab >> tables(q) @@ -297,7 +298,7 @@ ans = 'trade' ``` -The experienced q user can use the `\v` command to see all values in the directory: +The experienced q user can use the [`\v`](../basics/syscmds.md#v-variables) command to see all values in the directory: ```matlab >> fetch(q,'\v') @@ -322,7 +323,8 @@ We can use the `fetch` function to cause side effects in the kdb+ process, such Given a table `b`: ```q -q)show b +q)b:([] a:1 2; b:1 2) +q)b a b --- 1 1 @@ -371,7 +373,7 @@ a b A more complicated row shows the potential advantage to better effect: ```matlab ->> insert(q,'TRADE',{'`ACME',100.45,400,.0453,'2005.04.28'}) +>> insert(q,'trade',{'`ACME',100.45,400,.0453,'2005.04.28'}) ``` Be warned though, that errors will not be detected very well. For example the following expression silently fails! @@ -384,114 +386,67 @@ whereas the equivalent `fetch` call provokes an error: ```matlab >> fetch(q,'b,:(1;2;3)') -??? Java exception occurred: -c$KException: length - - at c.k(c.java:106) - - at c.k(c.java:107) - - at c.k(c.java:108) - -Error in ==> kx.fetch at 65 - t = c.handle.k(varargin{1}); -``` - - -## Moving data from one source to another - -As an example of moving data from one source to another, let us get a MSFT quote from Yahoo! and insert it into our q table of data. - -First we connect to Yahoo! and get the quote: - -```matlab ->> y = yahoo - -y = - - url: 'http://quote.yahoo.com' - ip: [] - port: [] - ->> msft = fetch(y,'MSFT') - -msft = - - Symbol: {'MSFT'} - Last: 30.7200 - Date: 733064 - Time: 0.6674 - Change: -0.3900 - Open: 31.0600 - High: 31.1200 - Low: 30.5100 - Volume: 50928424 +Error using fetch (line 64) +Java exception occurred: +kx.c$KException: length + at kx.c.k(c.java:110) + at kx.c.k(c.java:111) + at kx.c.k(c.java:112) ``` -And then we insert it to a suitable table in q: +## Async commands to q -```matlab ->> fetch(q,'yahoo_data:([symbol:`symbol$()];high:`float$();low:`float$())') ->> insert(q,'yahoo_data',{'`MSFT', msft.High, msft.Low}) -``` +The `exec` function is used for sending asynchronous commands to q; ones we do not expect a response to, and which may be performed in the background while we continue interacting with the MATLAB process. -And do the same for an IBM quote: +Here we establish a large-ish data structure in the kdb+ process: ```matlab ->> ibm = fetch(y,'IBM') - -ibm = - - Symbol: {'IBM'} - Last: 97.0900 - Date: 733064 - Time: 0.6660 - Change: 0.9200 - Open: 96.4200 - High: 97.2300 - Low: 96.1200 - Volume: 10474800 - ->> insert(q,'yahoo_data',{'`IBM', ibm.High, ibm.Low}) +>> exec(q,'big_data:10000000?100') ``` -Finally, let’s check the average high for the data we’re tracking: +Then we take the average of the data, delete it from the namespace and close the connection: ```matlab ->> fetch(q,'select avg high from yahoo_data') +>> fetch(q,'avg big_data') ans = - high: 64.1750 -``` + 49.4976 +>> exec(q,'delete big_data from `.') +>> close(q) +``` -## Async commands to q +## Handling null -The `exec` function is used for sending asynchronous commands to q; ones we do not expect a response to, and which may be performed in the background while we continue interacting with the Matlab process. +kdb+ has the ability to set values to null. MATLAB doesnt have a corresponding null type, so if your data contains nulls you may wish to filter or detect them. -Here we establish a large-ish data structure in the kdb+ process: +MATLAB has the ability to call static methods within Java. The `NULL` method can provide the null values for the different [data types](../basics/datatypes.md). For example ```matlab ->> exec(q,'big_data:10000000?100') +NullInt=kx.c.NULL('i') +NullLong=kx.c.NULL('j') +NullDouble=kx.c.NULL('f') +NullDate=kx.c.NULL('d') ``` -Then we take the average of the data, delete it from the namespace and close the connection: +With this, you can test values for null. The following shows that the comparison will return true when requesting null values from a kdb+ connection named conn: ```matlab ->> fetch(q,'avg big_data') - -ans = - - 49.4976 - ->> exec(q,'delete big_data from `.') ->> close(q) +fetch(conn,'0Ni')== NullInt +fetch(conn,'0N')== NullLong +fetch(conn,'0Nd')== NullDate +isequaln(fetch(conn,'0Ni'),NullInt) +isequaln(fetch(conn,'0N'), NullLong) +isequaln(fetch(conn,'0Nd'), NullDate) +isequaln(fetch(conn,'0Nf'), NullDouble) ``` +An alternative is to have your query include a filter for nulls (if they are populated), so they arent retrieved by MATLAB. -## Getting more help -Start with `help kx` in your Matlab session and also see `help kx.fetch` and so on for further details of the integration. +## Getting more help +Start with `help kx` in your MATLAB session and also see `help kx.fetch` and so on for further details of the integration. +MathWorks provides functions overview, usage instructions and some examples on the [toolbox webpage](https://uk.mathworks.com/help/releases/R2020b/datafeed/kx-systems-inc-.html). diff --git a/docs/kb/file-compression.md b/docs/kb/file-compression.md index bc9feb7f1..bcdab33cf 100644 --- a/docs/kb/file-compression.md +++ b/docs/kb/file-compression.md @@ -207,7 +207,7 @@ and on macOS, the OS command `purge` can be used. ### Compression parameters -The `logicalBlockSize` represents how much data is taken as a compression unit, and consequently the minimum size of a block to decompress. E.g. using a `logicalBlockSize` of 128kB, a file of size 128000kB would be cut into 100 blocks, and each block compressed independently of the others. Later, if a single byte is requested from that compressed file, a minimum of 128kB would be decompressed to access that byte. Fortunately those types of access patterns are rare, and typically you would be extracting clumps of data that make a logical block size of 128kB quite reasonable. +The `logicalBlockSize` represents how much data is taken as a compression unit, and consequently the minimum size of a block to decompress. E.g. using a `logicalBlockSize` of 128kB, a file of size 128000kB would be cut into 1000 blocks, and each block compressed independently of the others. Later, if a single byte is requested from that compressed file, a minimum of 128kB would be decompressed to access that byte. Fortunately those types of access patterns are rare, and typically you would be extracting clumps of data that make a logical block size of 128kB quite reasonable. Experiment to discover what suits your data, hardware and access patterns best. A good balance for TAQ data and typical TAQ queries is to use algorithm 1 (the same algorithm as used for IPC compression) with 128kB `logicalBlockSize`. To trade performance for better compression, choose gzip with compression level 6. diff --git a/docs/kb/kdb-tick.md b/docs/kb/kdb-tick.md index 7539f1ee7..9f4b4ea29 100644 --- a/docs/kb/kdb-tick.md +++ b/docs/kb/kdb-tick.md @@ -35,6 +35,7 @@ q chainedtick.q [host]:port[:usr:pwd] [-p 5110] [-t N] ``` !!! note + chainedtick.q contains `\l tick/u.q` therefore has a dependancy on [`u.q`](https://github.com/KxSystems/kdb-tick/blob/master/tick/u.q) existing within the directory `tick`. If the primary tickerplant is running on the same host (port 5010), the following starts a chained tickerplant on port 5010 sending bulk updates, every 1000 milliseconds. @@ -92,15 +93,15 @@ It would make sense to write the data to disk during the day so that it’s read :fontawesome-brands-github: [simongarland/tick/w.q](https://github.com/simongarland/tick/blob/master/w.q) -is a potential replacement for the default RDB -:fontawesome-brands-github: -[KxSystems/kdb-tick/tick/r.q](https://github.com/KxSystems/kdb-tick/blob/master/tick/r.q). +is a potential replacement for the default RDB ([`r.q`](../architecture/rq.md)) `w.q` connects to the tickerplant, but buffers requests. Each time the number of records in the buffer is equal to `MAXROWS`, it will write the records to disk. At day end, remaining data is flushed to disk, the database is sorted (on disk) and then moved to the appropriate date partition within the historical database. -!!! Note It is not recommended to query the task running `w.q` as it contains a small (and variable-sized) selection of records. -Although it wouldn’t be difficult to modify it to keep the last 5 minutes of data, for example, that sort of custom collection is probably better housed in a task running a [`c.q`](#cq)-like aggregation. +!!! note + + It is not recommended to query the task running `w.q` as it contains a small (and variable-sized) selection of records. + Although it wouldn’t be difficult to modify it to keep the last 5 minutes of data, for example, that sort of custom collection is probably better housed in a task running a [`c.q`](#cq)-like aggregation. Syntax: ```bash @@ -123,7 +124,7 @@ It’s easier to replay the log and (re)write the data. If the flag is provided ### `c.q` -Another often-overlooked problem is users fetching vast amounts of raw data to calculate something that could much better be built once, incrementally updated, and then made available to all interested clients. +An often-overlooked problem is users fetching vast amounts of raw data to calculate something that could much better be built once, incrementally updated, and then made available to all interested clients. A simple example would be keeping a running Open/High/Low/Latest: much simpler to update incrementally with data from the TP each time something changes than to build from scratch. diff --git a/docs/kb/load-balancing.md b/docs/kb/load-balancing.md index b3778cde3..15ecf37f7 100644 --- a/docs/kb/load-balancing.md +++ b/docs/kb/load-balancing.md @@ -36,7 +36,7 @@ q)h "xs" 0 1 2 3 4 5 6 7 8 ``` -Asynchronous messages are forwarded to one of the secondary servers, transparently to the client. The code below issues an asynchronous request, then [blocks on the handle](../basics/ipc.md#async-blocking) waiting for a result to be returned. This is called _deferred synchronous_. +Asynchronous messages are forwarded to one of the secondary servers, transparently to the client. The code below issues an asynchronous request, then [blocks on the handle](../basics/ipc.md#async-blocking) waiting for a result to be returned. This is called [_deferred synchronous_](../basics/ipc.md#deferred-sync). ```q q)(neg h) "select sym,price from trade where size > 50000" ; h[] diff --git a/docs/kb/mt-primitives.md b/docs/kb/mt-primitives.md index bd856886d..55b793f32 100644 --- a/docs/kb/mt-primitives.md +++ b/docs/kb/mt-primitives.md @@ -6,10 +6,6 @@ date: March 2020 --- # :fontawesome-solid-bolt: Multithreaded primitives - -![Parallelism](../img/parallelism.jpg) - - To complement existing explicit parallel computation facilities ([`peach`](../ref/each.md)), kdb+ 4.0 introduces implicit, within-primitive parallelism. It is able to exploit internal parallelism of the hardware – in-memory, with modern multi-channel memory architectures, and on-disk, e.g. making use of SSD internal parallelism. ```q @@ -22,6 +18,8 @@ q)(s;r[0]%r;r:st[;"\\t:100 f a"]each s:1 4 16 32) 1082 262 131 95 / time, ms ``` +## Supported Primitives + The following primitives now use multiple threads where appropriate: ```txt diff --git a/docs/kb/publish-subscribe.md b/docs/kb/publish-subscribe.md deleted file mode 100644 index dcc0d87f7..000000000 --- a/docs/kb/publish-subscribe.md +++ /dev/null @@ -1,137 +0,0 @@ ---- -title: Publish and subscribe – Knowledge Base – kdb+ and q documentation -description: KxSystems/kdb-tick contains functionality to allow processes to publish data and subscribe to it. It is worth highlighting how the publish-and-subscribe code can be used by any process on a standalone basis. The pubsub functionality is supplied in the u.q script of kdb+tick. -keywords: kdb+, publish, q, subscribe ---- -# Publish and subscribe - - - - -:fontawesome-brands-github: -[KxSystems/kdb-tick](https://github.com/KxSystems/kdb-tick) -contains functionality to allow processes to publish data and subscribe to it. It is worth highlighting how the publish-and-subscribe code can be used by any process on a standalone basis. The pubsub functionality is supplied in the `u.q` script of kdb+tick. - -To give the ability to publish data to any process, a few things need to be done: - -- load `u.q` -- declare the tables to be published in the top level namespace. Each table must contain a column called `sym`, which acts as the single key field to which subscribers subscribe -- initialize by calling `.u.init[]` -- publish data by calling `.u.pub[table name; table data]` - -The list of tables that can be published and the processes currently subscribed are held in `.u.w`. When a client process closes a connection, it is removed from `.u.w`. - -Subscriber processes must open a connection to the publisher and call `.u.sub[tablename;list_of_symbols_to_subscribe_to]`. - -`.u.sub` can be called synchronously or asynchronously. If `.u.sub` is called synchronously, the table schema is returned to the client. If the table being subscribed to is a keyed table, then the current value for each subscribed `sym` is returned, assuming it is stored. Otherwise, an empty schema definition is returned. Specifying `` ` `` for either parameter of `.u.sub` means _all_ – all tables, all syms, or all tables and all syms. - -If a subscriber calls `.u.sub` again, the current subscription will be overwritten either for all tables (if a wildcard is used) or the specified table. To add to a subscription (e.g. add more syms to a current subscription) the subscriber can call `.u.add`. - - -The example scripts below can be downloaded from GitHub. Each script should be run from the OS command prompt e.g. - -```bash -$ q publisher.q -$ q subscriber.q -``` - :fontawesome-brands-github: - [KxSystems/cookbook/pubsub](https://github.com/KxSystems/cookbook/tree/master/pubsub) - - -## Publisher - -The code below will generate some random data and publish it periodically on a timer. - -```q -\d .testdata - -// set the port -@[system;"p 6812";{-2"Failed to set port to 6812: ",x, - ". Please ensure no other processes are running on that port", - " or change the port in both the publisher and subscriber scripts."; - exit 1}] - -// create some test data to be published -// this could also be read from a csv file (for example) -meterdata:([]sym:10000?200; reading:10000?500i) -griddata:([]sym:2000?100?`3; capacity:2000?100f; flowrate:2000?3000i) - -// utility functions to get the next set of data to publish -// get the next chunk of data, return to start once data set is exhausted -counts:`.testdata.meterdata`.testdata.griddata!0 0 -getdata:{[table;n] - res:`time xcols update time:.z.p from (counts[table];n) sublist value table; - counts[table]+:n; - if[count[value table]<=counts[table]; counts[table]:0]; - res} -getmeter:getdata[`.testdata.meterdata] -getgrid:getdata[`.testdata.griddata] - -\d . - -// the tables to be published - all must be in the top level namespace -// tables to be published require a sym column, which can be of any type -// apart from that, they can be anything you like -meter:([]time:`timestamp$(); sym:`long$(); reading:`int$()) -grid:([]time:`timestamp$(); sym:`symbol$(); capacity:`float$(); flowrate:`int$()) - -// load in u.q from tick -upath:"tick/u.q" -@[system;"l ",upath;{-2"Failed to load u.q from ",x," : ",y, - ". Please make sure u.q is accessible.", - " kdb+tick can be downloaded from https://github.com/KxSystems/kdb-tick"; - exit 2}[upath]] - -// initialise pubsub -// all tables in the top level namespace (`.) become publish-able -// tables that can be published can be seen in .u.w -.u.init[] - -// functions to publish data -// .u.pub takes the table name and table data -// there is no checking to ensure that the table being published matches -// the table schema defined at the top level -// that is left up to the programmer! -publishmeter:{.u.pub[`meter; .testdata.getmeter[x]]} -publishgrid:{.u.pub[`grid; .testdata.getgrid[x]]} - -// create timer function to randomly publish -// between 1 and 10 meter records, and between 1 and 5 grid records -.z.ts:{publishmeter[1+rand 10]; publishgrid[1+rand 5]} - -/- fire timer every 1 second -\t 1000 -``` - - -## Subscriber - -```q -// define upd function -// this is the function invoked when the publisher pushes data to it -upd:{[tabname;tabdata] show tabname; show tabdata} - -// open a handle to the publisher -h:@[hopen;`::6812;{-2"Failed to open connection to publisher on port 6812: ", - x,". Please ensure publisher is running"; - exit 1}] - -// subscribe to the required data -// .u.sub[tablename; list of instruments] -// ` is wildcard for all -h(`.u.sub;`;`) - -\ -Could also do (for example) - -Subscribe to 10 syms of meter data: -h(`.u.sub;`meter;`long$til 10) - -Add subscriptions -h(`.u.add;`meter;20 21 22) -``` - - -## Running - -The subscriber will receive data from the publisher and output it to the screen. You can modify the subscription request and the `upd` function of the subscriber as required. You can run multiple subscribers at once. diff --git a/docs/kb/ssl.md b/docs/kb/ssl.md index 824179708..016353648 100644 --- a/docs/kb/ssl.md +++ b/docs/kb/ssl.md @@ -202,8 +202,8 @@ $ export KX_SSL_CA_CERT_FILE=$HOME/certs/ca-cert.pem with the client as: ```bash $ export SSL_CERT_FILE=$HOME/certs/client-cert.pem -$ export SSL_KEY_FILE=$HOME/certs/client-private-key.pep -$ export KX_SSL_CA_CERT_FILE=/tmp/new/ca-cert.pem +$ export SSL_KEY_FILE=$HOME/certs/client-private-key.pem +$ export KX_SSL_CA_CERT_FILE=$HOME/certs/ca-cert.pem ``` :fontawesome-brands-github: diff --git a/docs/learn/startingkdb/hdb.md b/docs/learn/startingkdb/hdb.md index d4b697848..27cbfeb56 100644 --- a/docs/learn/startingkdb/hdb.md +++ b/docs/learn/startingkdb/hdb.md @@ -36,17 +36,15 @@ These storage strategies give best efficiency for searching and retrieval. For e For example, a simple partitioning scheme on a single disk might be as shown right. Here, the daily and master tables are small enough to be written to single files, while the trade and quote tables are splayed and partitioned by date. -## Sample database +## Sample partitioned database -The script `buildhdb.q` will build a sample HDB. It builds a month’s random data in directory `start/db`, and takes a few seconds to run. - -:fontawesome-brands-github: -[KxSystems/cookbook/start/buildhdb.q](https://github.com/KxSystems/cookbook/blob/master/start/buildhdb.q) +The script :fontawesome-brands-github:[`KxSystems/cookbook/start/buildhdb.q`](https://github.com/KxSystems/cookbook/blob/master/start/buildhdb.q) builds a sample HDB. +It builds a month’s random data in directory `start/db`. Load q, then: ```q -q)\l start/buildhdb.q +q)\l buildhdb.q ``` To load the database, enter: @@ -86,8 +84,7 @@ date | x 2013.05.06| 14182 ... -q)select cnt:count i,sum size,last price, wprice:size wavg price - by 5 xbar time.minute from t +q)select cnt:count i,sum size,last price, wprice:size wavg price by 5 xbar time.minute from t minute| cnt size price wprice ------| ----------------------- 09:30 | 44 2456 47.83 47.60555 @@ -115,11 +112,11 @@ time price bid ask ## Sample segmented database -The `buildhdb.q` script can be customized to build a segmented database. In practice, database segments should be on separate drives, but for illustration, the segments are here written to a single drive. Both the database root, and the location of the database segments need to be specified. - -For example, edit the first few lines of the script as below. +The `buildhdb.q` script can be customized to build a segmented database. +In practice, database segments should be on separate drives, but for illustration, the segments are here written to a single drive. +Both the database root, and the location of the database segments need to be specified. -Ensure that the directory given in `dsp` is the full pathname, and that it is created, writeable and empty. +For example, edit the first few lines of the script :fontawesome-brands-github:[`KxSystems/cookbook/start/buildhdb.q`](https://github.com/KxSystems/cookbook/blob/master/start/buildhdb.q) as below. ```q dst:`:start/dbs / new database root @@ -131,9 +128,12 @@ end:2013.12.31 ... ``` -For Windows, `dsp` might be: ``dsp:`:c:/dbss``. +Ensure that the directory given in `dsp` is the full pathname, and that it is created, writeable and empty. For Windows, `dsp` might be: ``dsp:`:c:/dbss``. + +!!! warning "This example writes approximately 7GB of created data to disk." -Load the modified script, which should now take a minute or so. This should write the partioned data to subdirectories of `dsp`, and create a `par.txt` file like: +Load the modified script, which should now take a minute or so. This should write the partioned data to subdirectories of the directory specified by `dsp` +`par.txt` can be found within the `dsp` directory, which lists the disks/directories containing the data of the segmented database. ```txt /dbss/d0 @@ -151,8 +151,7 @@ q)\l start/dbs q)(count quote), count trade 61752871 12356516 -q)select cnt:count i,sum size,size wavg price from trade - where date in 2012.09.17+til 5, sym=`IBM +q)select cnt:count i,sum size,size wavg price from trade where date in 2012.09.17+til 5, sym=`IBM cnt size price -------------------- 4033 217537 37.35015 diff --git a/docs/learn/startingkdb/index.md b/docs/learn/startingkdb/index.md index 2e4c6b405..a68c6803d 100644 --- a/docs/learn/startingkdb/index.md +++ b/docs/learn/startingkdb/index.md @@ -5,15 +5,10 @@ keywords: kdb+, q, start, tutorial, --- # Starting kdb+ - - - - This is a quick-start guide to kdb+, aimed primarily at those learning independently. It covers system installation, the kdb+ environment, IPC, tables and typical databases, and where to find more material. After completing this you should be able to follow the Borror textbook [Q for Mortals](/q4m3/), and the [Reference](../../ref/index.md). One caution: you can learn kdb+ reasonably well by independent study, but for serious deployment of the product you need the help of a consultant. This is because kdb+ is typically used for very demanding applications that require experience to set up properly. Contact KX for help with such evaluations. - ## kdb+ The kdb+ system is both a database and a programming language. @@ -47,7 +42,7 @@ Several background articles and links can be found in the [Archive](../archive.m ### Discussion groups - The main discussion forum is the [k4 Topicbox](https://k4.topicbox.com/groups/k4). This is available only to licensed customers – please use a work email address to [apply for access](https://k4.topicbox.com/groups/k4?subscription_form=e1ca20f8-95f6-11e8-8090-9973fa3f0106). -- The [kdb+ Personal Developers](https://groups.google.com/forum/#!forum/personal-kdbplus) forum is an open Google discussion group for users of the free system. +- [KX Community discussion forum](https://learninghub.kx.com/forums/) is used to find answers, ask questions, and connect with our KX Community. ## :fontawesome-solid-download: Install free system @@ -55,39 +50,6 @@ Several background articles and links can be found in the [Archive](../archive.m If you do not already have access to a licensed copy, go to [Get started](../index.md) to download and install q. -## :fontawesome-solid-file: Example files - -Two sets of scripts are referenced in this guide: - -1. The free system is distributed with the following example scripts in the main directory: - - - `sp.q` – the Suppliers and Parts sample database - - `trade.q` – a stock trades sample database - - If you do not have these scripts, get them from - :fontawesome-brands-github: [KxSystems/kdb](https://github.com/KxSystems/kdb) - and save them in your `QHOME` directory. - -2. Other example files are in the :fontawesome-brands-github: [KxSystems/cookbook/start](https://github.com/KxSystems/cookbook/tree/master/start) directory. - - Move the `start` directory under your `QHOME` directory, e.g. `q/start`. For example, there should be a file: - - === ":fontawesome-brands-linux: Linux :fontawesome-brands-apple: macOS" - - ~/q/start/buildhdb.q - - === ":fontawesome-brands-windows: Windows" - - c:\q\start\buildhdb.q - - -!!! tip "Text editor for :fontawesome-brands-windows: Windows" - - Since q source is in plain text files, it is worth installing a good text editor such as Notepad++ or Notepad2. - - Some text editors have extensions to provide e.g. syntax highlighting for q. See the [list of editor integrations](../../interfaces/index.md#editor-integrations) - - ## Graphical user interface When q is run, it displays a console where you can enter commands and see the results. This is all you need to follow the tutorial, and if you just want to learn a little about q, then it is easiest to work in the console. diff --git a/docs/learn/startingkdb/language.md b/docs/learn/startingkdb/language.md index 50716fe3c..00d946767 100644 --- a/docs/learn/startingkdb/language.md +++ b/docs/learn/startingkdb/language.md @@ -64,7 +64,6 @@ You can confirm that you are in the `QHOME` directory by calling a directory lis q)\ls *.q ... "sp.q" - "trade.q" ... ``` @@ -74,7 +73,6 @@ You can confirm that you are in the `QHOME` directory by calling a directory lis q)\dir *.q ... "sp.q" - "trade.q" ... ``` @@ -360,7 +358,7 @@ cog 3 20 60 A q script is a plain text file with extension `.q`, which contains q expressions that are executed when loaded. -For example, load the script `sp.q` and display the `s` table that it defines: +For example, load the script :fontawesome-brands-github:[`KxSystems/kdb/sp.q`](https://github.com/KxSystems/kdb/blob/master/sp.q) and display the `s` table that it defines: ```q q)\l sp.q / load script @@ -416,7 +414,8 @@ q)b ## Q queries -Q queries are similar to SQL, though often much simpler. +Q queries are similar to SQL, though often much simpler. +Loading the script :fontawesome-brands-github:[`KxSystems/kdb/sp.q`](https://github.com/KxSystems/kdb/blob/master/sp.q) to populate tables `s`,`p` and `sp` we can show some query examples: ```q \l sp.q diff --git a/docs/learn/startingkdb/tables.md b/docs/learn/startingkdb/tables.md index 2bba8e565..a0c659f5c 100644 --- a/docs/learn/startingkdb/tables.md +++ b/docs/learn/startingkdb/tables.md @@ -51,7 +51,7 @@ Here table `t` is defined with column names $c_{1-n}$, and corresponding values ## 4.3 Suppliers and parts -The script `sp.q` defines [C.J. Date’s Suppliers and Parts database](https://en.wikipedia.org/wiki/Suppliers_and_Parts_database). You can view this script in an editor to see the definitions. Load the script. +The script :fontawesome-brands-github:[`KxSystems/kdb/sp.q`](https://github.com/KxSystems/kdb/blob/master/sp.q) defines [C.J. Date’s Suppliers and Parts database](https://en.wikipedia.org/wiki/Suppliers_and_Parts_database). You can view this script in an editor to see the definitions. Load the script. ```q q)\l sp.q @@ -163,15 +163,12 @@ s4 p5 100 ## Stock data -The following is a typical layout populated with random data. Load the `trades.q` script. +The following is a typical layout populated with random data. Load the :fontawesome-brands-github:[`KxSystems/cookbook/start/trades.q`](https://github.com/KxSystems/cookbook/blob/master/start/trades.q) script. ```q -q)\l start/trades.q +q)\l trades.q ``` -:fontawesome-brands-github: -[KxSystems/cookbook/start/trades.q](https://github.com/KxSystems/cookbook/blob/master/start/trades.q) - A trade table might include: date, time, symbol, price, size, condition code. ```q diff --git a/docs/ref/abs.md b/docs/ref/abs.md index e65d70b67..b89ab8993 100644 --- a/docs/ref/abs.md +++ b/docs/ref/abs.md @@ -16,7 +16,7 @@ abs x abs[x] Where `x` is a numeric or temporal, returns the absolute value of `x`. -Null is returned if `x` is null. +Null is returned if `x` is null. ```q q)abs -1.0 @@ -25,6 +25,8 @@ q)abs 10 -43 0N 10 43 0N ``` +`abs` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration @@ -53,4 +55,4 @@ Range: `ihjefpmdznuvt` [`signum`](signum.md)
:fontawesome-solid-book-open: -[Mathematics](../basics/math.md) \ No newline at end of file +[Mathematics](../basics/math.md) diff --git a/docs/ref/add.md b/docs/ref/add.md index 81607aacd..f2ba32738 100644 --- a/docs/ref/add.md +++ b/docs/ref/add.md @@ -35,6 +35,8 @@ msoft| 3005 103 Add is generally faster than [Subtract](subtract.md). +`+` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/aj.md b/docs/ref/aj.md index 445eefd80..a9a9c0bc9 100644 --- a/docs/ref/aj.md +++ b/docs/ref/aj.md @@ -57,7 +57,9 @@ time sym qty px 10:01:04 ge 150 ``` -!!! tip "There is no requirement for any of the join columns to be keys but the join will be faster on keys." +`aj` is a [multithreaded primitive](../kb/mt-primitives.md). + +!!! tip "There is no requirement for any of the join columns to be keys but the join is faster on keys." ## `aj`, `aj0` diff --git a/docs/ref/all-any.md b/docs/ref/all-any.md index 096d4140e..419a8f9af 100644 --- a/docs/ref/all-any.md +++ b/docs/ref/all-any.md @@ -60,6 +60,7 @@ domain: b g x h i j e f c s p m d z n u v t range: b . b b b b b b b . b b b b b b b b ``` +`all` is a [multithreaded primitive](../kb/mt-primitives.md). ## `any` @@ -99,6 +100,9 @@ q)if[any x in y;....] / use in control structure domain: b g x h i j e f c s p m d z n u v t range: b . b b b b b b b . b b b b b b b b ``` + +`any` is a [multithreaded primitive](../kb/mt-primitives.md). + ---- :fontawesome-solid-book: @@ -109,4 +113,4 @@ range: b . b b b b b b b . b b b b b b b b [`min`](min.md)
:fontawesome-solid-book-open: -[Logic](../basics/by-topic.md#logic) \ No newline at end of file +[Logic](../basics/by-topic.md#logic) diff --git a/docs/ref/amend.md b/docs/ref/amend.md index b869de3b6..ea2605faa 100644 --- a/docs/ref/amend.md +++ b/docs/ref/amend.md @@ -98,6 +98,25 @@ q).[(5 2.14; "abc"); 1 2; :; "x"] / replace at depth 2 "abx" ``` +### Amend At + +Indicies results are accumulated when repeated: + +```q +q)@[(0 1 2;1 2 3 4;7 8 9) ;1 1; 2*] +0 1 2 +4 8 12 16 / equates to 2*2*1 2 3 4 +7 8 9 +q)@[(0 1 2;1 2 3 4;7 8 9) ;0 1 2 1; 100*] +0 100 200 / equates to 100*0 1 2 +10000 20000 30000 40000 / equates to 100*100*1 2 3 4 +700 800 900 / equates to 100*7 8 9 +q)@[(0 1 2;1 2 3 4;7 8 9) ;0 1 2 1; {x*y};100] +0 100 200 / equates to {x*100}0 1 2 +10000 20000 30000 40000 / equates to {x*100}{x*100}1 2 3 4 +700 800 900 / equates to {x*100}7 8 9 +``` + ### Cross sections diff --git a/docs/ref/and.md b/docs/ref/and.md index e8299ed1a..9840275fd 100644 --- a/docs/ref/and.md +++ b/docs/ref/and.md @@ -9,6 +9,8 @@ author: Stephen Taylor _Lesser of two values, logical AND_ +`and` is a [multithreaded primitive](../kb/mt-primitives.md). + :fontawesome-solid-book: [Lesser](lesser.md) diff --git a/docs/ref/asof.md b/docs/ref/asof.md index 2838fe913..259aca432 100644 --- a/docs/ref/asof.md +++ b/docs/ref/asof.md @@ -68,6 +68,7 @@ A 1996.05.23| 046298105 ASTRA AB CL-A ADS 1CL-ASEK2.50 0 N 100 A 2000.08.04| 00846U101 AGILENT TECHNOLOGIES INC 0 N 100 ``` +`asof` is a [multithreaded primitive](../kb/mt-primitives.md). ---- :fontawesome-solid-book: diff --git a/docs/ref/avg.md b/docs/ref/avg.md index fc9ea8674..10d0a35bc 100644 --- a/docs/ref/avg.md +++ b/docs/ref/avg.md @@ -53,6 +53,8 @@ domain: b g x h i j e f c s p m d z n u v t range: f . f f f f f f f . f f f f f f f f ``` +`avg` is a [multithreaded primitive](../kb/mt-primitives.md). + ## `avgs` _Running averages_ @@ -198,6 +200,8 @@ t | f . f f f f f f f . f f f f f f f f Range: `f` +`wavg` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/bin.md b/docs/ref/bin.md index 54f427be3..bf5bd680e 100644 --- a/docs/ref/bin.md +++ b/docs/ref/bin.md @@ -48,6 +48,8 @@ Essentially `bin` gives a half-open interval on the left. `bin` also operates on tuples and table columns and is the function used in [`aj`](aj.md) and [`lj`](lj.md). +`bin` and `binr` are [multithreaded primitives](../kb/mt-primitives.md). + !!! danger "If `x` is not sorted the result is undefined." diff --git a/docs/ref/cast.md b/docs/ref/cast.md index 61bc6839c..996644dc2 100644 --- a/docs/ref/cast.md +++ b/docs/ref/cast.md @@ -45,6 +45,7 @@ Where `x` is: Casting does not change the underlying bit pattern of the data, only how it is represented. +`$`(cast) is a [multithreaded primitive](../kb/mt-primitives.md). ## :fontawesome-solid-sitemap: Iteration diff --git a/docs/ref/ceiling.md b/docs/ref/ceiling.md index f62b4ccad..c4bd0815e 100644 --- a/docs/ref/ceiling.md +++ b/docs/ref/ceiling.md @@ -20,6 +20,7 @@ q)ceiling 01b 0 1i ``` +`ceiling` is a [multithreaded primitive](../kb/mt-primitives.md). ## :fontawesome-solid-sitemap: Implicit iteration @@ -74,4 +75,4 @@ Range: `hij` [`floor`](floor.md)
:fontawesome-solid-book-open: -[Mathematics](../basics/math.md) \ No newline at end of file +[Mathematics](../basics/math.md) diff --git a/docs/ref/cor.md b/docs/ref/cor.md index 8378f53d1..66eb57330 100644 --- a/docs/ref/cor.md +++ b/docs/ref/cor.md @@ -32,6 +32,8 @@ q)1000101000b cor 0010011001b `cor` is an aggregate function, equivalent to `{cov[x;y]%dev[x]*dev y}`. +`cor` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Domain and range diff --git a/docs/ref/cos.md b/docs/ref/cos.md index 02c804df3..cd280a862 100644 --- a/docs/ref/cos.md +++ b/docs/ref/cos.md @@ -41,6 +41,8 @@ q)acos -0.4 / arccosine 1.982313 ``` +`cos` and `acos` are [multithreaded primitives](../kb/mt-primitives.md). + ## Domain and range diff --git a/docs/ref/cov.md b/docs/ref/cov.md index e5d22f531..38c1e68c6 100644 --- a/docs/ref/cov.md +++ b/docs/ref/cov.md @@ -56,7 +56,7 @@ t | f . f f f f f f f . f f f f f f f f Range: `f` - +`cov` is a [multithreaded primitive](../kb/mt-primitives.md). ## `scov` @@ -110,6 +110,8 @@ t | f . f f f f f f f . f f f f f f f f Range: `f` +`scov` is a [multithreaded primitive](../kb/mt-primitives.md). + ---- :fontawesome-solid-book: diff --git a/docs/ref/cut.md b/docs/ref/cut.md index e66af2ef9..43b1eccda 100644 --- a/docs/ref/cut.md +++ b/docs/ref/cut.md @@ -62,6 +62,8 @@ s4 p4 300 s1 p5 400 ``` +`_`(cut) is a [multithreaded primitive](../kb/mt-primitives.md). + !!! tip "Avoid confusion with underscores in names: separate the Cut operator with spaces." @@ -93,4 +95,4 @@ Otherwise `cut` behaves as `_` Cut. ---- :fontawesome-solid-book: -[Drop](drop.md) \ No newline at end of file +[Drop](drop.md) diff --git a/docs/ref/dev.md b/docs/ref/dev.md index fd3eccbd3..3467f77f9 100644 --- a/docs/ref/dev.md +++ b/docs/ref/dev.md @@ -50,6 +50,7 @@ a| 1.247219 b| 2 ``` +`dev` is a [multithreaded primitive](../kb/mt-primitives.md). ## `mdev` @@ -184,6 +185,8 @@ a| 1.527525 b| 2.828427 ``` +`sdev` is a [multithreaded primitive](../kb/mt-primitives.md). + ---- :fontawesome-solid-book: [`var`, `svar`](var.md) diff --git a/docs/ref/differ.md b/docs/ref/differ.md index 3f9bfcb54..67b7f1516 100644 --- a/docs/ref/differ.md +++ b/docs/ref/differ.md @@ -59,6 +59,8 @@ domain: b g x h i j e f c s p m d z n u v t range: b b b b b b b b b b b b b b b b b b ``` +`differ` is a [multithreaded primitive](../kb/mt-primitives.md). + ??? warning "Binary use deprecated" As of V3.6 the keyword is [variadic](../basics/variadic.md). @@ -69,4 +71,4 @@ range: b b b b b b b b b b b b b b b b b b --- :fontawesome-regular-hand-point-right: -Basics: [Comparison](../basics/comparison.md) \ No newline at end of file +Basics: [Comparison](../basics/comparison.md) diff --git a/docs/ref/distinct.md b/docs/ref/distinct.md index e1a2883d9..25e018ac8 100644 --- a/docs/ref/distinct.md +++ b/docs/ref/distinct.md @@ -40,6 +40,8 @@ q)distinct 2 + 0f,10 xexp -13 2 2.0000000000001 ``` +`distinct` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Errors @@ -55,4 +57,4 @@ type | `x` is an atom
:fontawesome-solid-book-open: [Precision](../basics/precision.md), -[Search](../basics/by-topic.md#search) \ No newline at end of file +[Search](../basics/by-topic.md#search) diff --git a/docs/ref/div.md b/docs/ref/div.md index 79aedbe32..2425e0d77 100644 --- a/docs/ref/div.md +++ b/docs/ref/div.md @@ -51,6 +51,8 @@ q)"\023" div 8 2i ``` +`div` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/divide.md b/docs/ref/divide.md index c7812811a..eab6a2a4e 100644 --- a/docs/ref/divide.md +++ b/docs/ref/divide.md @@ -45,6 +45,8 @@ q)2010.01.01 % 2005.01.01 1.999453 ``` +`%` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/dotq.md b/docs/ref/dotq.md index 02706b815..3570a49fb 100644 --- a/docs/ref/dotq.md +++ b/docs/ref/dotq.md @@ -992,7 +992,11 @@ q).Q.gz{0N!count x;x}[.Q.gz(9;10000#"helloworld")] .Q.hdpf[historicalport;directory;partition;`p#field] ``` -Saves all tables by calling `.Q.dpft`, clears tables, and sends reload message to HDB. +The function: + +* saves all tables to disk, by calling [`.Q.dpft`](#dpft-save-table) (saves as splayed tables to a partition) +* clears in-memory tables +* sends reload message to HDB, by opening a temporary connection and sending [`\l .`](../basics/syscmds.md#l-load-file-or-directory) ## `hg` (HTTP get) diff --git a/docs/ref/drop.md b/docs/ref/drop.md index f3b095801..a57711a41 100644 --- a/docs/ref/drop.md +++ b/docs/ref/drop.md @@ -15,6 +15,8 @@ _Drop items from a list, entries from a dictionary or columns from a table._ x _ y _[x;y] ``` +`_`(drop) is a [multithreaded primitive](../kb/mt-primitives.md). + ## Drop leading or trailing items diff --git a/docs/ref/exp.md b/docs/ref/exp.md index f0eb372ce..3da94cc5e 100644 --- a/docs/ref/exp.md +++ b/docs/ref/exp.md @@ -39,6 +39,7 @@ q)exp 00:00:00 00:00:12 12:00:00 1 162754.8 0w ``` +`exp` is a [multithreaded primitive](../kb/mt-primitives.md). ### :fontawesome-solid-sitemap: Implicit iteration @@ -123,6 +124,7 @@ q)1.5 xexp -4.2 0 0.1 0n 0w 7.9999999999999982 ``` +`xexp` is a [multithreaded primitive](../kb/mt-primitives.md). ### :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/find.md b/docs/ref/find.md index 7e7e70013..2bafe508d 100644 --- a/docs/ref/find.md +++ b/docs/ref/find.md @@ -9,8 +9,6 @@ keywords: find, kdb+, q, query, search _Find the first occurrence of an item in a list._ - - ```syntax x?y ?[x;y] ``` @@ -38,6 +36,8 @@ q)"abcde"?"d" 3 ``` +`?`(find) is a [multithreaded primitive](../kb/mt-primitives.md). + ## Type-specific diff --git a/docs/ref/floor.md b/docs/ref/floor.md index bf7dd3618..6ec43e241 100644 --- a/docs/ref/floor.md +++ b/docs/ref/floor.md @@ -21,6 +21,8 @@ q)floor -2.1 0 2.1 -3 0 2 ``` +`floor` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration @@ -80,4 +82,4 @@ Range: `hijcs` [`ceiling`](ceiling.md)
:fontawesome-solid-book-open: -[Mathematics](../basics/math.md) \ No newline at end of file +[Mathematics](../basics/math.md) diff --git a/docs/ref/greater.md b/docs/ref/greater.md index daa99e5d2..3295c2b67 100644 --- a/docs/ref/greater.md +++ b/docs/ref/greater.md @@ -25,6 +25,8 @@ q)"sat"|"cow" "sow" ``` +`|` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Flags diff --git a/docs/ref/ij.md b/docs/ref/ij.md index fe8d30ae4..5e01d7d04 100644 --- a/docs/ref/ij.md +++ b/docs/ref/ij.md @@ -59,6 +59,8 @@ k v s 4 400 c ``` +`ij` is a [multithreaded primitive](../kb/mt-primitives.md). + !!! detail "Changes in V3.0" Since V3.0, `ij` has changed behavior (similarly to `lj`): when there are nulls in `y`, `ij` uses the `y` null, where the earlier version left the corresponding value in `x` unchanged: diff --git a/docs/ref/in.md b/docs/ref/in.md index 7ad158587..36a332bec 100644 --- a/docs/ref/in.md +++ b/docs/ref/in.md @@ -62,6 +62,8 @@ q)(1 2;3 4) in ((1 2;3 4);9) / x is an item of y `in` uses [Find](find.md) to search for `x` in `y`. +`in` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Queries diff --git a/docs/ref/join.md b/docs/ref/join.md index 6f9690ed3..fdda5f4d2 100644 --- a/docs/ref/join.md +++ b/docs/ref/join.md @@ -46,6 +46,8 @@ q)v,(type v)$0xab 1.00 2.34 -567.1 20.00 171e ``` +`,`(join) is a [multithreaded primitive](../kb/mt-primitives.md). + ## Dictionaries diff --git a/docs/ref/lesser.md b/docs/ref/lesser.md index 01ece948a..5e0c4520b 100644 --- a/docs/ref/lesser.md +++ b/docs/ref/lesser.md @@ -26,6 +26,8 @@ q)"sat"&"cow" "cat" ``` +`&` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Flags diff --git a/docs/ref/lj.md b/docs/ref/lj.md index 93643f7cb..69dbb0f9b 100644 --- a/docs/ref/lj.md +++ b/docs/ref/lj.md @@ -59,6 +59,8 @@ c d 2 20 ``` +`lj` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Changes in V4.0 diff --git a/docs/ref/log.md b/docs/ref/log.md index 2d52f521e..c1bc384c2 100644 --- a/docs/ref/log.md +++ b/docs/ref/log.md @@ -35,6 +35,8 @@ q)log -2 0n 0 0.1 1 42 0n 0n -0w -2.302585 0 3.73767 ``` +`log` is a [multithreaded primitive](../kb/mt-primitives.md). + ### :fontawesome-solid-sitemap: Implicit iteration @@ -110,6 +112,8 @@ q)"A"xlog"C" 1.00726 ``` +`xlog` is a [multithreaded primitive](../kb/mt-primitives.md). + ### :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/max.md b/docs/ref/max.md index 7c7be8d5c..c8a236173 100644 --- a/docs/ref/max.md +++ b/docs/ref/max.md @@ -41,6 +41,7 @@ domain: b g x h i j e f c s p m d z n u v t range: b . x h i j e f c . p m d z n u v t ``` +`max` is a [multithreaded primitive](../kb/mt-primitives.md). ## `maxs` diff --git a/docs/ref/min.md b/docs/ref/min.md index e0e932f4e..8a8a556cf 100644 --- a/docs/ref/min.md +++ b/docs/ref/min.md @@ -37,6 +37,8 @@ q)select min price by sym from t / use in a select statement `min` is an aggregate function, equivalent to `&/`. +`min` is a [multithreaded primitive](../kb/mt-primitives.md). + ## `mins` diff --git a/docs/ref/mod.md b/docs/ref/mod.md index fdf85361d..643e747a3 100644 --- a/docs/ref/mod.md +++ b/docs/ref/mod.md @@ -28,6 +28,8 @@ q)-7 7 mod/:\:-2.5 -2 2 2.5 -0.5 -1 1 2 ``` +`mod` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/multiply.md b/docs/ref/multiply.md index 49fc6e955..c8f6a1933 100644 --- a/docs/ref/multiply.md +++ b/docs/ref/multiply.md @@ -40,6 +40,8 @@ price qty 34.5 17 ``` +`*` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/neg.md b/docs/ref/neg.md index bf3aa57aa..559bc11bb 100644 --- a/docs/ref/neg.md +++ b/docs/ref/neg.md @@ -34,6 +34,8 @@ q)neg 2000.01.01 2012.01.01 / negates the underlying data value An atomic function. +`neg` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Domain and range diff --git a/docs/ref/not.md b/docs/ref/not.md index d232b20f0..5285df4be 100644 --- a/docs/ref/not.md +++ b/docs/ref/not.md @@ -46,6 +46,8 @@ q)not (0W;-0w;0N) An atomic function. +`not` is a [multithreaded primitive](../kb/mt-primitives.md). + --- :fontawesome-solid-book: [`neg`](neg.md) diff --git a/docs/ref/null.md b/docs/ref/null.md index 44c0ab6f4..8a11d7b46 100644 --- a/docs/ref/null.md +++ b/docs/ref/null.md @@ -52,4 +52,6 @@ q)\ts `=v 66 268435648 ``` +`null` is a [multithreaded primitive](../kb/mt-primitives.md). + diff --git a/docs/ref/or.md b/docs/ref/or.md index bb6b313bb..b7a4de491 100644 --- a/docs/ref/or.md +++ b/docs/ref/or.md @@ -8,6 +8,8 @@ keywords: and, greater, kdb+, logic, or, q _Greater of two values, logical OR_ +`or` is a [multithreaded primitive](../kb/mt-primitives.md). + :fontawesome-regular-hand-point-right: diff --git a/docs/ref/reciprocal.md b/docs/ref/reciprocal.md index 8949b311e..2af39afbf 100644 --- a/docs/ref/reciprocal.md +++ b/docs/ref/reciprocal.md @@ -25,6 +25,8 @@ q)reciprocal 1b 1f ``` +`reciprocal` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration `reciprocal` is an [atomic function](../basics/atomic.md). diff --git a/docs/ref/signum.md b/docs/ref/signum.md index cafe76299..59b5319f7 100644 --- a/docs/ref/signum.md +++ b/docs/ref/signum.md @@ -35,6 +35,8 @@ Find counts of price movements by direction: select count i by signum deltas price from trade ``` +`signum` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration @@ -81,4 +83,4 @@ Range: `i` [`abs`](abs.md)
:fontawesome-solid-book-open: -[Mathematics](../basics/math.md) \ No newline at end of file +[Mathematics](../basics/math.md) diff --git a/docs/ref/sin.md b/docs/ref/sin.md index 94dcc955e..170f1e0bd 100644 --- a/docs/ref/sin.md +++ b/docs/ref/sin.md @@ -35,6 +35,8 @@ q)asin 0.8 / arcsine 0.9272952 ``` +`sin` and `asin` are [multithreaded primitives](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/sqrt.md b/docs/ref/sqrt.md index d5524eb91..be813a433 100644 --- a/docs/ref/sqrt.md +++ b/docs/ref/sqrt.md @@ -35,6 +35,8 @@ q)sqrt 101b 1 0 1f ``` +`sqrt` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration @@ -83,4 +85,4 @@ Range: `fz` [`xlog`](log.md#xlog)
:fontawesome-solid-book-open: -[Mathematics](../basics/math.md) \ No newline at end of file +[Mathematics](../basics/math.md) diff --git a/docs/ref/subtract.md b/docs/ref/subtract.md index c1ea857ea..24bf1c726 100644 --- a/docs/ref/subtract.md +++ b/docs/ref/subtract.md @@ -21,6 +21,8 @@ q)2000.11.22 - 03:44:55.666 2000.11.21D20:15:04.334000000 ``` +`-` is a [multithreaded primitive](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/sum.md b/docs/ref/sum.md index 1ae364b9d..541b79460 100644 --- a/docs/ref/sum.md +++ b/docs/ref/sum.md @@ -80,6 +80,7 @@ q)sum each flip(0n 8;8 0n) /do this to fall back to vector case q)sum a 49999897.181933172 +`sum` is a [multithreaded primitive](../kb/mt-primitives.md). ## `sums` diff --git a/docs/ref/take.md b/docs/ref/take.md index 105c9dc0b..365ecb7f2 100644 --- a/docs/ref/take.md +++ b/docs/ref/take.md @@ -23,6 +23,8 @@ Where returns `y` as a list, dictionary or table described or selected by `x`. +`#` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Atom or list diff --git a/docs/ref/tan.md b/docs/ref/tan.md index 8bd20441c..a23a50544 100644 --- a/docs/ref/tan.md +++ b/docs/ref/tan.md @@ -37,6 +37,8 @@ q)atan 42 1.546991 ``` +`tan` and `atan` are [multithreaded primitives](../kb/mt-primitives.md). + ## :fontawesome-solid-sitemap: Implicit iteration diff --git a/docs/ref/til.md b/docs/ref/til.md index c44a82a73..464e09c97 100644 --- a/docs/ref/til.md +++ b/docs/ref/til.md @@ -31,6 +31,8 @@ q)til 5f `til` and [`key`](key.md) are synonyms, but the above usage is conventionally reserved to `til`. +`til` is a [multithreaded primitive](../kb/mt-primitives.md). + ---- :fontawesome-solid-book-open: [Mathematics](../basics/math.md) diff --git a/docs/ref/uj.md b/docs/ref/uj.md index 798fba633..5b4df10aa 100644 --- a/docs/ref/uj.md +++ b/docs/ref/uj.md @@ -58,6 +58,8 @@ a b| c d 3 7| 30 C ``` +`uj` is a [multithreaded primitive](../kb/mt-primitives.md). + !!! note "`uj` generalizes the [`,` Join](join.md) operator." diff --git a/docs/ref/var.md b/docs/ref/var.md index 5bcc90103..d7ba9815d 100644 --- a/docs/ref/var.md +++ b/docs/ref/var.md @@ -8,9 +8,6 @@ author: Stephen Taylor _Variance, sample variance_ - - - ## `var` _Variance_ @@ -51,6 +48,8 @@ a| 1.555556 b| 4 ``` +`var` is a [multithreaded primitive](../kb/mt-primitives.md). + ## `svar` @@ -91,6 +90,8 @@ a| 2.333333 b| 8 ``` +`svar` is a [multithreaded primitive](../kb/mt-primitives.md). + ## Domain and range diff --git a/docs/ref/within.md b/docs/ref/within.md index 9479a9bcc..6a1073f0b 100644 --- a/docs/ref/within.md +++ b/docs/ref/within.md @@ -47,6 +47,8 @@ q)(1 3 10 6 4;"acyxmpu") within ((2;"b");(6;"r")) `within` uses [Find](find.md) to search for `x` in `y`. +`within` is a [multithreaded primitive](../kb/mt-primitives.md). + ---- :fontawesome-solid-book: diff --git a/docs/ref/xbar.md b/docs/ref/xbar.md index d81d9f350..5041b8b46 100644 --- a/docs/ref/xbar.md +++ b/docs/ref/xbar.md @@ -18,7 +18,7 @@ Where - `x` is a non-negative numeric atom - `y` is numeric or temporal -returns `y` rounded down to the nearest multiple of `x`. +returns `y` rounded down to the nearest multiple of `x`. `xbar` is a [multithreaded primitive](../kb/mt-primitives.md). ```q q)3 xbar til 16 diff --git a/docs/wp/capi/index.md b/docs/wp/capi/index.md index 856e567fd..548659976 100644 --- a/docs/wp/capi/index.md +++ b/docs/wp/capi/index.md @@ -1307,7 +1307,7 @@ q)csha256"kx tech" ### Subscribing to a kdb+ tickerplant -A kdb+ tickerplant is a kdb+ process specifically designed to handle incoming, high-frequency, data feeds from publishing processes. The primary responsibility of the tickerplant is to manage subscription requests and publish data quickly to its subscribers. In the vanilla kdb+ setup, illustrated below, the real-time database (RDB) and chained tickerplant kdb+ processes are the most common type of subscriber. However, C applications are also possible using the API. +A kdb+ [tickerplant](../../architecture/index.md) is a kdb+ process specifically designed to handle incoming, high-frequency, data feeds from publishing processes. The primary responsibility of the tickerplant is to manage subscription requests and publish data quickly to its subscribers. In the vanilla kdb+ setup, illustrated below, the real-time database (RDB) and chained tickerplant kdb+ processes are the most common type of subscriber. However, C applications are also possible using the API. ![Architecture](img/architecture.png) @@ -1341,7 +1341,7 @@ The tickerplant process is started from the command line as follows. $ q tick.q trade /logs/tickerplant/ -p 5010 ``` -Above, the first argument following `tick.q` is the name of the table schema file to use. The second argument is the location where the tickerplant log file will be created and the value following the `-p` option is the port the tickerplant will listen on. The C process will use this port number when initializing the connection. The final step in this setup is to create a kdb+ mock feedhandler process which will act as the trade data source for the tickerplant. Below is a simple publishing process which is sufficient for the demonstration. +Above, the first argument following [`tick.q`](../../architecture/tickq.md) is the name of the table schema file to use. The second argument is the location where the tickerplant log file is created and the value following the `-p` option is the port the tickerplant listens on. The C process uses this port number when initializing the connection. The final step in this setup is to create a kdb+ mock feedhandler process, which acts as the trade data source for the tickerplant. Below is a simple publishing process. ```q /* File name: feed.q */ @@ -1368,10 +1368,7 @@ Once the tickerplant and feedhandler processes are up and running the C subscrib Subscriber processes are required to make an initial subscribe request to the tickerplant in order to receive data. -:fontawesome-regular-hand-point-right: -Knowledge Base: [Publish and subscribe](../../kb/publish-subscribe.md) - -This request involves calling the `.u.sub` function with two parameter arguments. The first argument is the table name, and the second is the list of symbols to subscribe to. +This request involves calling the [`.u.sub`](../../architecture/uq.md#usub) function with two parameter arguments. The first argument is the table name, and the second is the list of symbols to subscribe to. Specifying a backtick character for either parameter of `.u.sub` means _all_, as in, all tables and/or all symbols. If the `.u.sub` function is called synchronously the tickerplant will return the table schema. The following program demonstrates how the initial subscribe request can be made and the column names of table schema extracted. In the case below, a request is made for all trade records. @@ -1564,7 +1561,7 @@ Handle value : The integer value returned by the `khpu` function when a socket connection is established. -Update function name `.u.upd` +Update function name [`.u.upd`](../../architecture/tickq.md#uupd) : The function executed on the tickerplant which enables the data insertion. It may be called synchronously or asynchronously and takes two arguments, as follow. @@ -1588,7 +1585,7 @@ k(handle,".u.upd",r1(tableName),mixedList,(K)0) ### Publishing a single row using a mixed-list object -The next example shows how a `K` object, containing a mixed list corresponding to one row of data, can be passed to the `.u.upd` function. Below the `knk` function is used to create the mixed list containing a symbol, float and integer, constituting a single row. The function `.u.upd` is called in the `k` function with two parameters being passed, a symbol object corresponding to the table name and the singleRow object. +The next example shows how a `K` object, containing a mixed list corresponding to one row of data, can be passed to the [`.u.upd`](../../architecture/tickq.md#uupd) function. Below the `knk` function is used to create the mixed list containing a symbol, float and integer, constituting a single row. The function `.u.upd` is called in the `k` function with two parameters being passed, a symbol object corresponding to the table name and the singleRow object. ```c /* File name: singleRow.c */ diff --git a/docs/wp/data-recovery.md b/docs/wp/data-recovery.md index 0ae4244e1..6513ec095 100644 --- a/docs/wp/data-recovery.md +++ b/docs/wp/data-recovery.md @@ -39,16 +39,7 @@ accounts:([] time:`timespan$(); sym:`$(); curr:`$(); action:`$(); limit:`long$() ### Tick scripts -kdb+tick is freely available and contains a few short, yet powerful scripts. - -:fontawesome-brands-github: -[KxSystems/kdb-tick](https://github.com/KxSystems/kdb-tick) - -script | purpose ------------|-------- -`tick.q` | runs a standard tickerplant -`tick/r.q` | runs a standard real-time database -`tick/u.q` | contains functions for subscription and publication +:fontawesome-brands-github:[KxSystems/kdb-tick](https://github.com/KxSystems/kdb-tick) is freely available and contains a few short, yet powerful scripts. :fontawesome-regular-hand-point-right: Starting kdb+: [Realtime database](../learn/startingkdb/tick.md) @@ -72,24 +63,15 @@ Here, `functionname` and `tablename` are symbols, and `tabledata` is a row of da `upd `trade (0D14:56:01.122310000;`SGDUSD;"B";5000;98.14) ``` -In a standard tick system, each kdb+ message will call a function named `upd`. Each process may have a different definition of this function. A TP will publish each time the `upd` function is called (if its timer is not set to batch the data), and an RDB will simply insert the data into the relevant table. - +In a standard tick system, each kdb+ message calls a function named `upd`. Each process may have a different definition of this function. A TP publishes each time the `upd` function is called (if its timer is not set to batch the data), and an RDB inserts the data into the relevant table. ## Recovery - ### Writing a tplog Should the TP fail, or be shut down for any period of time, no downstream subscriber will receive any published data for the period of its downtime. This data typically will not be recoverable. Thus it is imperative that the TP remain always running and available. -Every message that the tickerplant receives is written to a kdb+ binary file, called the tickerplant log file, or _tplog_. The tickerplant maintains some key variables which are important in the context of data recovery for subscribers. - -variable | purpose ----------|----------------- -`.u.l` | A handle to the log file which is created at startup. This is used to write each message to disk. -`.u.L` | Path to the log file. In a standard tickerplant, the name of the log file will be a combination of the first parameter passed to `tick.q` (the name of the schema file, generally `sym`) and the current date. -`.u.i` | Total count of messages in the log file. -`.u.j` | Total count of messages in the tickerplant: `.u.i` plus what is buffered in memory. +Every message that the tickerplant receives is written to a kdb+ binary file, called the tickerplant log file, or _tplog_. The tickerplant maintains some key [variables](../architecture/tickq.md#variables) which are important in the context of data recovery for subscribers. ```bash # start tickerplant @@ -102,7 +84,6 @@ q).u.l 376i q).u.i 0 ``` - The `upd` function is called each time a TP receives a message. Within this function, the TP will write the message to the tplog. ```q @@ -117,9 +98,13 @@ if[l; l enlist(`upd;t;x); j+:1] ### Replaying a tplog -Recovery of an RDB typically involves simply restarting the process. On startup, an RDB will subscribe to a TP and will receive information about the message count (`.u.i`) and location of the tplog (`.u.L`) in return. +Recovery of an RDB involves restarting the process. On startup, an RDB subscribes to a TP and receives the following information: +- message count ([`.u.i`](../architecture/tickq.md#variables)) +- location of the tplog ([`.u.L`](../architecture/tickq.md#variables)). + +It then replays this tplog to recover all the data that has passed through the TP up to that point in the day. The replay is achieved using [`-11!`](../basics/internal.md#-11-streaming-execute), the streaming replay function. -It will then replay this tplog to recover all the data that has passed through the TP up to that point in the day. The replay is achieved using `-11!`, the streaming replay function. This is called within `.u.rep`, which is executed when the RDB connects to the TP. +This is called within `.u.rep`, which is executed when the RDB connects to the TP. ```q //from r.q @@ -131,9 +116,9 @@ kdb+ messages were described above in [_kdb+ messages and upd function_](#kdb-me ### `-11!` functionality -`-11!` is an internal function that will read each message in a tplog in turn, running the function `functioname` on the `(tablename;tabledata)` parameters. +[`-11!`](../basics/internal.md#-11-streaming-execute) is an internal function that reads each message in a tplog in turn, running the function `functioname` on the `(tablename;tabledata)` parameters. -In this section, we will focus on normal usage of the `-11!` function where the tplog is uncorrupted, and move on to discuss how to use it to recover from a corrupted tplog below in [_Replay errors_](#replay-errors). +This section focuses on normal usage of the `-11!` function where the tplog is uncorrupted, and then discusses how to use it to recover from a corrupted tplog below in [_Replay errors_](#replay-errors). There are three distinct usages of `-11!` when passed a list of two parameters, depending on the value of the first parameter. diff --git a/docs/wp/index.md b/docs/wp/index.md index f9b4fae26..9c8ed10fc 100644 --- a/docs/wp/index.md +++ b/docs/wp/index.md @@ -26,7 +26,6 @@ White papers are flagged in the navigation menus. ## :fontawesome-solid-handshake: Interfaces - [**Internet of Things with MQTT**](iot-mqtt/index.md)
Rian Ó Cuinneagáin, 2021.06 -- [**Interprocess communication**](ipc/index.md)
Katrina McCormack, 2021.04 - [**Publish/subscribe with the Solace event broker**](solace/index.md)
Himanshu Gupta, 2020.11 - [**Lightning tickerplants**: pay-per-ticker with micropayments on the Lightning network](lightning-tickerplants/index.md)
Jeremy Lucid, 2019.05 - [**C API for kdb+**](capi/index.md)
Jeremy Lucid, 2018.12 diff --git a/docs/wp/ipc/img/tcp-diagram.png b/docs/wp/ipc/img/tcp-diagram.png deleted file mode 100644 index 11fd268f5..000000000 Binary files a/docs/wp/ipc/img/tcp-diagram.png and /dev/null differ diff --git a/docs/wp/ipc/index.md b/docs/wp/ipc/index.md deleted file mode 100644 index 97b6e9085..000000000 --- a/docs/wp/ipc/index.md +++ /dev/null @@ -1,834 +0,0 @@ ---- -title: Interprocess communications | Interfaces | q and kdb+ documentation -description: Core concepts related to IPC in kdb+, and the role of IPC in a kdb+ tick application -author: Katrina McCormack -date: April 2021 ---- -# Interprocess communications - -by [Katrina McCormack](#author) -{: .wp-author} - - -!!! summary - - A look at the mechanisms underlying interprocess communication in kdb+ begins with an overview of TCP connections and discusses applications for using Unix domain sockets across processes on a local host, and the use of TLS/SSL for more secure communication. It describes the main message handlers invoked in the process of opening a connection, running a query and closing a connection to a process, with various examples to illustrate each step. - - It then takes a closer look at how IPC is applied in a vanilla tickerplant application, from when the data is first received from an upstream feed process to how tables are saved at the end of the day. - - - -Microsoft defines Interprocess Communications (IPC) as the collective term for mechanisms facilitating communications and data sharing between applications. This paper explores some core concepts related to IPC in kdb+ before investigating the role of IPC in a kdb+ tick application. - - -## Core concepts - -### Set port - -A port is used by TCP as a communications endpoint: in other words, a point through which information flows to and from a process. - -A kdb+ process can be set to listen on a port in two ways. For this example, we will listen on port 1234 using either - -- `\p 1234` or `system"p 1234"` within the process -- `-p 1234` on the command line at process startup - -To stop listening on a port, set the server to listen on port 0: - -```q -q)\p -4567i -q).z.i -2493i -q)\p 5678 -q)\p -5678i -q)system"p 6789" -q)\p -6789i -q)\p 0 -``` - -It is possible to see this listening port using `lsof` (list all open files) to view TCP connections for this PID. If the port number is changed, using `lsof` will show the process listening on the new port only. - -```q -q).z.i -97082i -q)\p 4567 -q)\p 6789 -``` - -```bash -$ lsof -p 97082 -i tcp -a -$ lsof -p 97082 -i tcp -a -COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME -q 97082 katrina 4u IPv4 2140799 0t0 TCP *:4567 (LISTEN) - -$ lsof -p 97082 -i tcp -a -COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME -q 97082 katrina 4u IPv4 2149383 0t0 TCP *:6789 (LISTEN) -``` - -Port numbers between 0 and 1024 are called _system_ or _well known_ ports and are used by the application layer of the Internet Protocol suite for the establishment of host-to-host connectivity. Root privileges are required to open these ports. Ports 1024 to 49151 are called _user_ ports. Some of these may also be reserved; for example, port 8080 is commonly used as a server port. - -:fontawesome-brands-wikipedia-w: -[List of TCP and UDP port numbers](https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers "Wikipedia") - -Both system and user ports are used by transport protocols to indicate an application or service. - - -### Connecting to a process listening on a port - -When the client process executes the function `hopen` with a server’s details as its argument, it starts a connection and returns a positive integer used to identify the connection handle. This integer is assigned to a variable, `h` in the following examples demonstrating different connection protocols. - -### TCP - -`hopen` can be used to open a TCP handle to another process by passing the process host and port as argument: user and password can also be included. - -```q -q)h:hopen`:localhost:4567 -q)h"2+2" -4 -``` - -The hostname can be omitted when connecting to a process running on the same machine. - -```q -q)i:hopen`::4567 -q)i"2+2" -4 -``` - -This can also be written with the target process port number only: - -```q -q)j:hopen 4567 -q)j"2+2" -4 -``` - -It is also worth noting that for the purposes of this white paper we will be assigning open handles to variables; but this is not required. - -```q -q)hopen 4567 -4i -q)4"1+1" -2 -q)4"\\p" -4567i -``` - -Using `netstat` we can see the process listening on port 4567 and the established TCP connection between the client and server processes. - -```bash -$ netstat | grep 4567 -tcp 0 0 localhost:50254 localhost:4567 ESTABLISHED -tcp 0 0 localhost:4567 localhost:50254 ESTABLISHED -unix 3 [ ] DGRAM 14567 -``` - -It is also possible to open a ‘one-shot’ connection to another process. This will establish a connection for only as long as it takes to execute the query. Since V4.0 a one-shot query can be run with a timeout, as in the second example below. - -```q -q)`::4567"2+2" -4 -q)`::[(`::4567;100);"1+1"] -2 -``` - -It is possible for a kdb+ process to have too many open connections. -The max is defined by the system limit for protocol (operating system configurable). Prior to 4.1t 2023.09.15, the limit was hardcoded to 1022. -After the limit is reached, you see the error `'conn` on the server process. (All successfully opened connections remain open.) - - -```q -q)\p 5678 -q)'conn -``` - -```q -q)openCon:{hopen 5678} -q)do[2000;openCon[]] -'hop. OS reports: Connection reset by peer - [1] openCon:{hopen 5678} - ^ -``` - -`hopen` also accepts a timeout parameter in its argument. This will prevent the process hanging when attempting to connect to a process which is alive but unresponsive; for example, due to a long running query. - -```q -q)// system"sleep 30" running on port 4567 -q)l:hopen(`::4567;1) -'timeout - [0] l:hopen(`::4567;1) - ^ -``` - -Attempts to connect to a non-existent process signal an error. - -```q -q)l:hopen 5678 -'hop. OS reports: Connection refused - [0] l:hopen 5678 - ^ -``` - -Opening a connection to the current port will execute any queries on handle 0, the file descriptor for standard input. - -```q -q)\p 4567 -q)h:hopen 4567 -q)h -0i -q)h"0N!`hi" -`hi -`hi -``` - -`hopen` can also be used to open a handle to a file location. This is discussed below with regard to writing to a log file within a kdb+ tickerplant setup. - -:fontawesome-solid-book: -[`hopen`](../../ref/hopen.md) - - -### Unix Domain Socket - -A Unix Domain Socket ([UDS](../../basics/listening-port.md#unix-domain-socket)) can be used for connections between processes running on the same host. A UDS allows bidirectional data exchange between processes running on the same machine. They are similar to internet sockets (TCP/IP socket) but, rather than using a network protocol, all communication occurs entirely within the operating system kernel. Using a UDS to communicate when processes are on the same machine will avoid some checks and operations in the TCP/IP protocol, making them faster and lighter. There is no Windows equivalent. - -```q -q)h:hopen`:unix://4567 -q)h -4i -``` - -Use `netstat` to view this connection vs tcp - -```bash -// unix domain socket established -$ netstat |grep 4567 -unix 3 [ ] DGRAM 14567 -unix 3 [ ] STREAM CONNECTED 100811 @/tmp/kx.4567 -``` - -### TLS/SSL - -The TLS protocol is used to communicate across a network in a way designed to prevent eavesdropping and tampering. It is possible to initiate a TLS connection between kdb+ processes for secure communication. Note that a TLS connection requires certificates to be in place before initializing and it should currently only be considered for long-standing, latency-insensitive, low-throughput connections given high overheads. - -:fontawesome-solid-graduation-cap: -[Secure Socket Layer](../../kb/ssl.md) - - -### Negative ports - -A kdb+ process can be started in multithreaded input mode by setting a [negative port number](../../basics/listening-port.md#multi-threaded-port). A multithreaded process will use separate threads for every process that connects, which means that each client request can be executed on a separate CPU. - -Although secondary processes are used to farm queries out to multiple processes, they still have a single-threaded input queue. By using a negative port number, it is possible to multithread that queue too. - -```bash -$ q -p -4567 -KDB+ 4.0 2020.03.17 Copyright (C) 1993-2020 Kx Systems -l64/ 2(16)core 1959MB katrina ubuntu 127.0.1.1 EXPIRE 2021.05.05 kmccormack@kx.com .. - -q)\p --4567i -``` - -Connections can be opened to this process in the same way as described previously for positive port numbers. - -```q -q)h:hopen 4567 -q)h"1+1" -2 -q)h"a:2" -'noupdate: `. `a - [0] h"a:2" - ^ -``` - -:fontawesome-solid-graduation-cap: -[Multithreaded input](../../kb/multithreaded-input.md) - - -### IPC handlers - -The `.z` namespace is the main namespace used in kdb+ IPC programming. When a client sends a query via IPC, the message is [serialized](../../kb/serialization.md), sent, then deserialized on the server side after passing through a number of `.z` IPC handlers. This paper discusses the main handlersonly. - -:fontawesome-solid-book: -[`.z` namespace](../../ref/dotz.md) - -When a process attempts to open a connection to a kdb+ process, two main handlers are invoked; `.z.pw` and `.z.po`. - - -### `.z.pw` - -If [`.z.pw`](../../ref/dotz.md#zpw-validate-user) is set, it is the first handler invoked on the server when a client attempts to connect. This happens immediately after ['-u' checks](../..//basics/cmdline.md#-u-usr-pwd) if this option has been specified in the process command line. `.z.pw` is simply a function that can be used to perform custom validation so user and password info, as well as any other rules, can be validated as required by the application. - -By default `.z.pw` will return `1b`. The arguments passed to the function are user name (symbol) and password (string). These are optional in arguments to `hopen`. If the output of `.z.pw` is `1b`, the login can proceed (next stop `.z.po`). If it returns `0b`, login fails and the client gets an `access` error. - -:fontawesome-regular-map: -[Permissions with kdb+](../permissions/index.md "White paper") - -```q -q)// client process -q)h:hopen`::4567:katrina:password -``` - -```q -q)// server process -q)\p 4567 -q).z.pw:{[u;p] 0N!(u;p);1b} -q)(`katrina;"password") -``` - -If no username or password is passed to `hopen`, `u` and `p` are as below where the default username is the output of `.z.u` on the client process. - -```q -q)// client process -q).z.u -`katrina -q)hopen 4567 -4i -``` - -```q -q)// server process -q).z.pw:{[u;p] 0N!(u;p);1b} -q)(`katrina;"") -``` - - -### `.z.po` - -[`.z.po`](../../ref/dotz.md#zpo-open) (port open) is evaluated when a connection to a kdb+ process has been initialized and after it has been validated against `.z.pw` checks. Similar to `.z.pw`, `.z.po` will not be evaluated by default but only if it is assigned a user-defined function. Its argument is the handle to the connecting client process. This is typically used to build a dictionary of handles (`.z.w`) with session information such as [`.z.a`](../../ref/dotz.md#za-ip-address) (IP address) and [`.z.u`](../../ref/dotz.md#zu-user-id) (user). It is also commonly used, together with `.z.pc`, to track open connections to the process. - -```q -q).z.po:{0N!(x;.z.w;.z.a;.z.u)} -q)(7i;7i;2130706433i;`katrina) -``` - - -### Synchronous vs asynchronous communication - -Once we have established a connection, the next step is to query data available on the server process. This query can be either [synchronous or asynchronous](../../learn/startingkdb/ipc.md#synchronousasynchronous). - - -### Synchronous queries - -If the client queries the server synchronously, the client will be unavailable until the server responds with the result of the query or an error. The handle, the positive integer assigned to the variable `h` in this example, is used to send the query. - -When a query is sent synchronously any messages queued on this handle are sent and no incoming messages will be processed on any handle until a response to this sync query is received. - -```q -q)h:hopen 4567 -q)h"2+2" -4 -``` - -The basic method used to execute a query via IPC is sending the query as a string as in the above example. A function can also be executed on the server by passing a [parse tree](../../basics/parsetrees.md) to the handle: a list with the function as first item, followed by its arguments. - -To execute a function defined on the client side, simply pass the function name so it will be resolved before sending. To execute a function defined on the server, pass the function name as a symbol. - -```q -q)add:{x+2*y} -q)h:hopen 4567 -q)h"add" -{x+y} -q) -q)h(add;2;3) -8 -q)h(`add;2;3) -5 -``` - -If a synchronous query is interrupted, for example if the server process is killed, the client process will receive an error. In the example below the process running on port 5678 was killed before the query completed successfully. The variable `h` will still be assigned to the handle number but any further attempts to communicate across this handle will fail. - -```q -q)h:hopen 5678 -q) -q)h"system\"sleep 10\"" -'os - [0] h"system\"sleep 10\"" - ^ -q)h -4i -q)h"1+1" -'Cannot write to handle 4. OS reports: Bad file descriptor - [0] h"1+1" - ^ -``` - -The stale handle will be recycled if a new connection is made: `4` and therefore `h` could now point to a completely different process. - -It is possible to interrupt a long-running sync query with `kill -s INT *PID*`. As with the previous example, any subsequent attempt to communicate across this handle will fail. - -```q -q)h"system\"sleep 30\"" -'rcv handle: 4. OS reports: Interrupted system call - [0] h"system\"sleep 30\"" - ^ -q) -q)h"a" -'Cannot write to handle 4. OS reports: Bad file descriptor - [0] h"a" - ^ -``` - -!!! tip "Deferred response" - - [Deferred reponse](../../kb/deferred-response.md) with [`-30!`](../../basics/internal.md#-30x-deferred-response) allows a server to defer the response to a synchronous query, allowing other messages to be processed before responding. This is useful where synchronous messaging is necessary on the client side. - - An example implementation of deferred sync message handling is discussed in the blog [kdb+/q Insights: Deferred Response](https://kx.com/blog/kdb-q-insights-deferred-response/). - - -### Asynchronous queries - -A query can be sent asynchronously using a negative handle. An async query will not return a result and the client process does not wait. Async messages can be serialized and queued for sending, but the messages will not necessarily be dispatched immediately. Since the process is not waiting for a response, async querying is critical in situations where waiting for an unresponsive subscriber is unacceptable, e.g. in a tickerplant. - -```q -q)h:hopen 4567 -q)neg[h]"a:2+2" -q) -q)h"a" -``` - -### Flushing - -Outgoing aysnc messages are sent periodically on each iteration of the underlying process timer. This can cause messages to be queued such that it is necessary to flush all messages through a handle. This can be achieved as below: - -- execute `neg[handle][]` or `neg[handle](::)` -- send a synchronous message on the same handle: this will confirm execution as all messages are processed in the order they are sent - -:fontawesome-solid-book-open: -[Flushing](../../basics/ipc.md#flushing) - - -### Deferred synchronous - -[Deferred sync](../../kb/load-balancing.md) is when a message is sent asynchronously to the server using the negative handle and executes a function which includes an instruction to return the result though the handle to the client process (`.z.w`), again asynchronously. After the client sends its async request it blocks on the handle waiting for a result to be returned. - -```q -q)h:hopen 4567 -q)h"add" -{x+y+z} -q)h"proc" -{r:add . x;0N!r;neg[.z.w]({0N!x};r)} -q) -q)neg[h](`proc;1 2 3);res:h[]; -q)res -q)6 -``` - - -### Asynchronous callback - -In a kdb+ application there is generally a gateway process that will manage queries from multiple clients. In this situation it is not practical or useful to query data synchronously. Instead, it will use async callbacks. The initial async query from the client invokes a function on the server which will use `.z.w` (handle of the client) to return the result asynchronously to the client process. - -```q -q)// server process on port 4567 -q)\p 4567 -q)getLastPriceBySym:{select last price by sym from quote} -q)quote:([]date:.z.d; size:100?250; price:100?100.; sym:100?`a`b`c) -``` - -```q -q)// server process on port 4567 -q)// client process -q)h:hopen 4567 -q)onPriceReceived:{[x] -1 "Received async callback"; show x}; -q)neg[h]({neg[.z.w](`onPriceReceived;getLastPriceBySym x)};`) -q)Received async callback -sym| price ----| -------- -a | 79.20702 -b | 35.29016 -c | 43.7764 -``` - -:fontawesome-solid-graduation-cap: -[Callbacks](../../kb/callbacks.md) - - -### Broadcast - -Much of the overhead of sending a message via IPC is in serializing the data before sending. It is possible to ‘async broadcast’ the same message to multiple handles using the internal [`-25!`](../../basics/internal.md#-25x-async-broadcast) function. This serializes the message once and sends to all handles to reduce CPU and memory load. An error in publishing to any handle results in the message not being sent to any of the handles, regardless of the handle’s position in the list. - -```q -q)h1:hopen 5551 -q)h2:hopen 5552 -q)h3:hopen 5553 -q) -q)h1"a:1" -q)h3"a:1" -q) -q)-25!((h1;h2;h3);({c::a+1};`)) -q)h1"c" -2 -q)h3"c" -2 -q)// close process on 5552, try to assign 'd' -q)-25!((h1;h2;h3);({d::a+1};`)) -'5 is not an ipc handle - [0] -25!((h1;h2;h3);({d::a+1};`)) - ^ -q)h1"d" -'d - [0] h1"d" - ^ -q)h3"d" -'d - [0] h3"d" - ^ -``` - -This can be applied to a tickerplant publishing asynchronously to multiple subscribers: RDBs, chained tickerplants, or other processes performing realtime calculations. - -### `.z.pg` (get) and `.z.ps` (set) - -When the query reaches the server, it invokes a different message handler according to whether the query was sent synchronously ([`.z.pg`](../../ref/dotz.md#zpg-get) – get) or asynchronously ([`.z.ps`](../../ref/dotz.md#zps-set) – set). The return value from `.z.pg` is sent as the response message and the return value from `.z.ps` is ignored unless it is an error. If it is an error, the message is printed to the console of the process the query is being executed on. The error will not be visible to the querying process (the client). - -By default, `.z.pg` and `.z.ps` are equivalent to `{value x}` but are commonly edited to implement user-level permissioning. The default behavior of these (or any `.z.p*` handlers defined before `.q.k` is loaded) can be restored by `\x`. - -:fontawesome-regular-map: -[Permissions with kdb+](../permissions/index.md "White paper") - -Note that outside of the common applications discussed above, any value can be sent across a handle and `.z.pg` or `.z.ps` defined to evaluate it. - -```q -q)// server process -q)\p 4567 -q).z.pg:{99h=type x} -``` - -```q -q)// client process -q)h:hopen 4567 -q)h"1+1" -0b -q)h`a`b`c!`d`e`f -1b -``` - - -### Closing a connection - -Now the client has connected to the server and run the query, the last step is to close the handle. - -To close the handle inside a process, use [`hclose`](../../ref/hopen.md#hclose). - -```q -q)h:hopen 4567 -q)h -4i -q)h"2+2" -4 -q)hclose h -q)h"2+2" -'Cannot write to handle 4. OS reports: Bad file descriptor - [0] h"2+2" - ^ -``` - -This can also be used to close a handle from the server side. - -```q -q)// server process -q)// use .z.po to store a list of opened handles -q)l:() -q).z.po:{`l set l,.z.w} -q)l -q) -q)// connection opened from client -q)l -,6i -q)hclose 6 -``` - - -### `.z.pc` (close) - -Running `hclose[handle]` on the client will cause [`.z.pc`](../../ref/dotz.md#zpc-close) to be invoked on the server. Unlike the message handlers we have seen before, it is not possible to obtain information relating to the client process in `.z.pc` because at this point that connection no longer exists. The integer identifying the handle that was just closed is passed as argument to `.z.pc` and can be used together with `.z.po`, to track connections to the server process. - -Besides when a handle is closed gracefully using `hclose`, `.z.pc` is also invoked on the server if the client process is killed. - -```q -q).z.po:{0N!"connection opened, handle: ",string[.z.w]} -q).z.pc:{0N!("handle closed";x;.z.w)} -q) -q)"connection opened, handle: 6" -("handle closed";6i;0i) -``` - - -### Tracking open connections - -[`.z.H`](../../ref/dotz.md#zh-active-sockets) can be used to view active sockets, augmented by the internal function [-38!](../../basics/internal.md#-38x-socket-table) to view details of active connections. - -In addition `.z.po` and `.z.pc` can be used to track all connections from client processes. In the below example, we create a table in memory on the server side, keyed on handle. When a new connection is opened, a new row is added to the table; and when the connection is closed, the row can be deleted or preferably updated to reflect the new status. - -```q -q)// on the server side -q)\p 4567 -q)trackConnections:([handle:()]ip:();user:();status:()) -q).z.po:{`trackConnections upsert (.z.w;.z.a;.z.u;`OPEN)} -q).z.pc:{`trackConnections set update status:`CLOSED from trackConnections where handle=x} -``` - -```q -q)// on the client side -q)h:hopen 4567 -q)h"trackConnections" -handle| ip user status -------| ------------------------- -892 | 2130706433 katrina OPEN -q)hclose h -q) -q)h:hopen 4567 -q)h"trackConnections" -handle| ip user status -------| ------------------------- -892 | 2130706433 katrina CLOSED -736 | 2130706433 katrina OPEN -``` - -If the handle is closed from the server side, `trackConnections` is not updated by default, and `.z.pc` must be invoked manually. - -```q -q)trackConnections -handle| ip user status -------| ------------------------- -6 | 2130706433 katrina OPEN -q) -q)hclose 6 -q)trackConnections -handle| ip user status -------| ------------------------- -6 | 2130706433 katrina OPEN -q)6"2+2" -'Cannot write to handle 6. OS reports: Bad file descriptor - [0] 6"2+2" - ^ -q).z.pc[6] -`trackConnections -q)trackConnections -handle| ip user status -------| ------------------------- -6 | 2130706433 katrina CLOSED -``` - - -## Application of IPC in a kdb+ tick system - -:fontawesome-solid-graduation-cap: -[Realtime database](../../learn/startingkdb/tick.md) -
-:fontawesome-regular-map: -[Building real-time tick subscribers](../rt-tick/index.md) - -The core elements of this kdb+ tick setup are - -- A tickerplant -- A realtime database (RDB) -- A historical database (HDB) -- A source of data (feed) - -The discussion uses the KX tick code to explore how IPC is used in a [vanilla tick application](../../architecture/index.md). - -:fontawesome-brands-github: -[KxSystems/kdb-tick](https://github.com/KxSystems/kdb-tick) - -In the following section, code taken from the kdb+ tick scripts is flagged. - - -### Initialize tickerplant - -When a vanilla tickerplant is first started up a number of variables are assigned, the process port is set, and the `u.q` utilities and the table schema scripts are loaded. - -```q -//tick.q extract -system"l ",tickdir,"/u.q" -``` - -The tickerplant process is set to listen on port 5010 unless a port is specified on the command line at startup. - -```q -//tick.q extract -if[not system"p";system"p 5010"] -``` - -All messages published to a tickerplant are immediately logged to a file. In the event of a process crashing, this file can be used to replay all messages up to the point of failure. A handle (`.u.l`) is opened to the desired file (`.u.L`) using `hopen`. Messages sent to this handle are appended to the file. - -:fontawesome-regular-map: -[Data recovery](../data-recovery.md#recovery) - -```q -q).u.L:`:sampleLog -q).u.L set () -`:sampleLog -q).u.l:hopen .u.L -q).u.l enlist (`upd;`tab;([]"first record")); -q)get .u.L -`upd `tab +(,`x)!,"first record" -q).u.l enlist (`upd;`tab;([]"second record")) -q)get .u.L -`upd `tab +(,`x)!,"first record" -`upd `tab +(,`x)!,"second record" -``` - - -### Initialize RDB - -During RDB initialization, the functions below are invoked to initialize table schemas and, if necessary, replay data from the tickerplant log (tplog) to catch up to the current state. - -```q -//tick.q extract -/ connect to ticker plant for (schema;(logcount;log)) -.u.rep .(hopen `$tpport)"(.u.sub[`;`];`.u `i`L)" -``` - -The RDB process opens a handle to the tickerplant using `hopen`, runs a synchronous query to subscribe this RDB to all tables and all symbols available from the tickerplant, and return the table names and schemas to the rdb. The result is passed to `.u.rep` in order to initialize these table schemas in memory. We can split this sequence out as below: - -```q -q)h:hopen `$tpport -q)x:h"(.u.sub[`;`];`.u `i`L)" -q).u.rep . x -``` - - -### Publish data to a tickerplant - -In a vanilla tickerplant `.z.pg` and `.z.ps` use the default `{value x}` and `.u.upd` is defined as below: - -```q -//tick.q extract -\d .u -upd:{[t;x]t insert x;if[l;l enlist (`upd;t;x);j+:1]} -``` - -Data is first inserted into a table in memory, then logged to the tplog on disk such that running `{value x}` on any line will invoke the `upd` function with the table name and data as arguments. - -Data is published synchronously from a feedhandler to the tickerplant. This is not recommended for time-critical feeds with a large number of updates but can be used for feeds where you need confirmation that each message was received correctly. - - -### Managing tickerplant subscriptions - -In a tickerplant, current subscriptions are maintained in `.u.w`, an in-memory dictionary which stores each subscriber’s handle and subscription information. When a process subscribes to the tickerplant, the process handle and any symbol filters are added to `.u.w`. - -Below is an example of `.u.w` in a tickerplant with two tables: `trade` and `quote`. In this example the RDB is the only subscriber and it is subscribed to all symbols for both tables. - -```q -q).u.w -quote| 7i ` -trade| 7i ` -``` - -`.z.pc` is invoked when a handle to the process is closed. In the context of a tickerplant this means that a connection from a subscriber has closed and this subscription must be removed from `.u.w`. - -```q -//tick.q extract -\d .u -del:{w[x]_:w[x;;0]?y} -.z.pc:{del[;x]each t} -``` - -The function `.u.del` takes two arguments: `x`, the handle to the now closed subscriber, and `.u.t`, a global list of all tables defined in the tickerplant. This function will remove the handle from the subscription list for each of these tables. In this example, if the single RDB subscribing to the process is killed, `.u.w` will be empty. - -```q -q).u.w -quote| -trade| -``` - - -### Publishing from the tickerplant to subscribers - -Data is published from the tickerplant by invoking `.u.pub` on the tickerplant periodically on a timer. `.u.pub` takes two arguments: `t`, the table name, and `x`, the data (`value t`). - -```q -//tick.q extract -\d .u -pub:{[t;x]{[t;x;w]if[count x:sel[x]w 1;(neg first w)(`upd;t;x)]}[t;x]each w t} -``` - -If on a timer tick (whatever `\t` is set to, 1000ms by default) the count of the in-memory table `x` is greater than zero, these rows are published asynchronously to each handle subscribed to table `t` in `.u.w`. - -This publishing is considered the critical path of data in a low-latency tick system. As such, it is best practice to use async messaging so that no unnecessary delays in streaming data are caused by the process waiting for a response from a hanging or unresponsive subscriber. - -It is possible that a client subscribed to a tickerplant might not be processing the data it is being sent quickly enough, causing a backlog to form and the TCP buffer of the subscriber to fill. When this happens the pending messages will sit in an output queue in the memory space of the tickerplant process itself until the slow subscriber becomes available. It is possible to view a dictionary of open handles mapped to the size of messages queued for this handle using [`.z.W`](../../ref/dotz.md#zw-handles). In extreme cases the tickerplant memory footprint might grow to an unmanageable level, resulting in a [`wsfull` error](../../basics/errors.md). If writing logic for the tickerplant to remove a slow consumer to protect the tickerplant, `.z.pc` must be manually invoked to perform subscription cleanup after the bad client is kicked as shown in the _Tracking open connections_ example above. - -:fontawesome-regular-map: -[Disaster recovery](../disaster-recovery/index.md) - - -### End of day - -The timer has another important function within the tickerplant: the end-of-day rollover. The [`.z.ts`](../../ref/dotz.md#zts-timer) timer function takes the system’s date as an argument and passes it to `.u.ts`. Below we can see the function definitions of `endofday` and `ts` within the `.u` namespace. - -```q -//tick.q extract -endofday:{ - end d; - d+:1; - if[l;hclose l;l::0(`.u.ld;d)]}; - -ts:{if[d -[:fontawesome-solid-envelope:](mailto:kmccormack@kx.com) -  -[:fontawesome-brands-linkedin:](https://www.linkedin.com/in/katrina-mccormack-35379359/) - - - -## Notes - -TCP/IP - -: The Internet Protocol Suite, commonly known as TCP/IP, is a conceptual model and set of communications protocols which specifies how data is exchanged. It provides end-to-end communications that identify how the data should be broken into packets, addressed, transmitted, routed and received at the destination with little central management. - -TCP/IP sockets - -: The term _socket_ usually refers to a TCP socket. A socket is one end point of a two-way communication link. These network sockets allow communication between two different processes on the same or on different machines. These sockets are assumed to be associated with a specific socket address: the combination of an IP address and port number. The local process can communicate with another (foreign) process by sending data to or receiving data from the foreign socket address which will have its own associated socket. - -![TCP diagram](img/tcp-diagram.png) - -A process can refer to a socket using a file descriptor or handle, an abstract indicator used to access a file, or other resource. - - -### Major IPC-related releases - -version | date | feature ---------|------------|-------- -2.4 | 2012.03.31 | Added [Multi-threaded input](../../releases/ChangesIn2.4.md#multi-threaded-input) using a negative port number -| | [`.z.pw`](../../releases/ChangesIn2.4.md#zpw): username and password passed to `.z.pw` to enable option for custom validation -2.5 | 2012.03.31 | Added [`.z.W`](../../releases/ChangesIn2.5.md#zw) to return a dictionary of IPC handles with the total number of bytes in each output queue -2.6 | 2012.08.03 | Added [IPC compression](../../releases/ChangesIn2.6.md#ipc-compression) -| | [`.z.W`](../../releases/ChangesIn2.6.md#zw) updated to return the size in bytes of each messages in the input queue -2.7 | 2013.06.28 | Added an [IPC message validator](../../releases/ChangesIn2.7.md#ipc-message-validator) -[3.4](../../releases/ChangesIn3.4/) | 2019.06.03 | IPC message size limit raised from 2GB to 1TB -| | Added support for IPC via Unix Domain Sockets -| | Secure Sockets Layer(SSL)/Transport Layer Security (TLS) -| | Added async broadcast as `-25!(handles;msg)` -3.5 | 2020.02.13 | Added `hopen` timeout for [TLS](../../releases/ChangesIn3.5.md#ssltls) -3.6 | 2020.02.24 | [Deferred response](../../releases/ChangesIn3.6.md#deferred-response): a server process can now use `-30!x` to defer responding to a sync query diff --git a/docs/wp/rt-tick/index.md b/docs/wp/rt-tick/index.md index e891aa0e2..9289ff4ed 100644 --- a/docs/wp/rt-tick/index.md +++ b/docs/wp/rt-tick/index.md @@ -1,16 +1,16 @@ --- -title: Building real-time tick subscribers | kdb+ and q documentation -description: How to build a custom real-time tick subscriber +title: Building real-time tick engines | kdb+ and q documentation +description: How to build a custom real-time tick engine author: Nathan Perrem date: August 2014 keywords: kdb+, q, real-time, subscribe, tick --- -# Building real-time tick subscribers +# Building real-time engines by [Nathan Perrem](#author) {: .wp-author} -The purpose of this white paper is to help q developers who wish to build their own custom real-time tick subscribers. KX provides kdb+tick, a tick capture system which includes the core q code for the tickerplant process (`tick.q`) and the vanilla real-time subscriber process (`r.q`), known as the real-time database. This vanilla real-time process subscribes to all tables and to all symbols on the tickerplant. This process has very simple behavior upon incoming updates – it simply inserts these records to the end of the corresponding table. This may be perfectly useful to some clients, however what if the client requires more interesting functionality? For example, the client may need to build or maintain their queries or analytics in real time. How would one take `r.q` and modify it to achieve said behavior? This white paper attempts to help with this task. It breaks down into the following broad sections: +The purpose of this white paper is to help q developers who wish to build their own custom real-time engine. KX provides kdb+tick, a tick capture system which includes the core q code for the tickerplant process ([`tick.q`](../../architecture/tickq.md)) and the vanilla real-time engine process ([`r.q`](../../architecture/rq.md)), known as the real-time database (RDB). This vanilla real-time process subscribes to all tables and to all symbols on the tickerplant. This process has very simple behavior upon incoming updates – it simply inserts these records to the end of the corresponding table. This may be perfectly useful to some clients, however what if the client requires more interesting functionality? For example, the client may need to build or maintain their queries or analytics in real time. How would one take `r.q` and modify it to achieve said behavior? This white paper attempts to help with this task. It breaks down into the following broad sections: 1. Explain the existing code and principles behind `r.q`. 2. Use `r.q` as a template to build some sample real-time analytic @@ -19,19 +19,16 @@ The purpose of this white paper is to help q developers who wish to build their It is hoped this white paper will help dispel any notion of tick being a black box product which cannot be adapted to the requirements of the real-time data consumer. All tests were run using kdb+ V3.1 (2013.09.19) on Windows. -The tickerplant and real-time database scripts can be obtained from GitHub. - -:fontawesome-brands-github: -[KxSystems/kdb+tick](https://github.com/KxSystems/kdb-tick) +The tickerplant and real-time database scripts can be obtained from :fontawesome-brands-github:[KxSystems/kdb+tick](https://github.com/KxSystems/kdb-tick) The tickerplant and real-time database scripts used are dated 2014.03.12 and 2008.09.09 respectively. These are the most up-to-date versions as of the writing of this white paper. -This paper is focused on the real-time database and custom real-time subscribers. However, some background will be provided on the other key processes in this environment. +This paper is focused on the real-time database and custom real-time engines (RTEs). However, some background will be provided on the other key processes in this environment. ## The kdb+tick environment -The real-time database (RDB) and all other real-time subscribers (RTS) do not exist in isolation. Instead they sit downstream of the feedhandler (FH) and tickerplant (TP) processes. The feedhandler feeds data into the tickerplant, which in turns publishes certain records to the real-time database and other real-time subscribers. Today’s data can be queried on the RDB. The historical data resides on disk and can be read into memory upon demand by the historical database process (HDB). +The real-time database (RDB) and all other real-time engines (RTE) do not exist in isolation. Instead they sit downstream of the feedhandler (FH) and tickerplant (TP) processes. The feedhandler feeds data into the tickerplant, which in turns publishes certain records to the real-time database and other RTEs. Today’s data can be queried on the RDB. The historical data resides on disk and can be read into memory upon demand by the historical database process (HDB). The incoming data feed could be from Reuters, Bloomberg, a particular exchange or some other internal data feed. The feedhandler receives this data and extracts the fields of interest. It will also perform some datatype casting and re-ordering of fields to normalize the data set with the corresponding table schemas present on the tickerplant. The feedhandler then pushes this massaged data to the tickerplant. @@ -51,8 +48,7 @@ Although the inner workings of the tickerplant process are beyond the scope of t : refers to the schema file (in this case called `sym.q`), assumed to reside in the subdirectory called `tick` (relative to `tick.q`). This schema file simply defines the tables that exist in the TP – here we define two tables, `trade` and `quote`, as follows. ```q -quote:([]time:`timespan$();sym:`symbol$();bid:`float$();ask:`float$(); - bsize:`int$();asize:`int$()) +quote:([]time:`timespan$();sym:`symbol$();bid:`float$();ask:`float$();bsize:`int$();asize:`int$()) trade:([]time:`timespan$();sym:`symbol$();price:`float$();size:`int$()) ``` @@ -96,7 +92,7 @@ getask:{[s] prices[s]+getmovement[s]} /generate ask price Points to note from the above: 1. The data sent to the tickerplant is in columnar (column-oriented) list format. In other words, the tickerplant expects data as lists, not tables. This point will be relevant later when the RDB wishes to replay the tickerplant logfile. -1. The function triggered on the tickerplant upon receipt of these updates is `.u.upd`. +1. The function triggered on the tickerplant upon receipt of these updates is [`.u.upd`](../../architecture/tickq.md#uupd). 1. If you wish to increase the frequency of updates sent to the tickerplant for testing purposes, simply change the timer value at the end of this script accordingly. @@ -113,7 +109,7 @@ hdb:.z.x 0 @[{system"l ",x};hdb;{show "Error message - ",x;exit 0}] ``` -Strictly speaking, an instance of the HDB is not required for this paper since all we really need is a tickerplant being fed data and then publishing this data downstream to the RDB and RTS. However, the RDB does communicate with the HDB at end of day once it has finished writing its records to the on-disk database. +Strictly speaking, an instance of the HDB is not required for this paper since all we really need is a tickerplant being fed data and then publishing this data downstream to the RDB and RTE. However, the RDB does communicate with the HDB at end of day once it has finished writing its records to the on-disk database. ## Real-time database (RDB) @@ -132,7 +128,7 @@ argument | semantics ### Real-time updates -Quite simply, the tickerplant provides the ability for a process (in this case the real-time database) to subscribe to certain tables, and for certain symbols (stock tickers, currency pairs etc.). Such a real-time subscriber will subsequently have relevant updates pushed to it by the tickerplant. The tickerplant asynchronously pushes the update as a 3-item list in the format `(upd;Table;data)`: +Quite simply, the tickerplant provides the ability for a process (in this case the real-time database) to subscribe to certain tables, and for certain symbols (stock tickers, currency pairs etc.). Such a RTE will subsequently have relevant updates pushed to it by the tickerplant. The tickerplant asynchronously pushes the update as a 3-item list in the format `(upd;Table;data)`: item | semantics ---------|---------- @@ -161,7 +157,7 @@ Some example updates: size:1000 2000)) ``` -Such a list is received by the real-time subscriber and is implicitly passed to the [`value`](../../ref/value.md) function. Here is a simple example of `value` in action: +Such a list is received by the RTE and is implicitly passed to the [`value`](../../ref/value.md) function. Here is a simple example of `value` in action: ```q q)upd:{:x-y} @@ -169,9 +165,9 @@ q)value (`upd;3;2) 1 ``` -In other words, the real-time subscriber passes two inputs to the function called `upd`. In the above examples, the inputs are the table name `` `trade`` and the table of new records. +In other words, the RTE passes two inputs to the function called `upd`. In the above examples, the inputs are the table name `` `trade`` and the table of new records. -The `upd` function should be defined on the real-time subscriber according to how the process is required to act in the event of an update. Often `upd` is defined as a binary (2-argument) function, but it could alternatively be defined as a dictionary which maps table names to unary function definitions. This duality works because of a fundamental and elegant feature of kdb+: [executing functions and indexing into data structures are equivalent](../../ref/apply.md). For example: +The `upd` function should be defined on the RTE according to how the process is required to act in the event of an update. Often `upd` is defined as a binary (2-argument) function, but it could alternatively be defined as a dictionary which maps table names to unary function definitions. This duality works because of a fundamental and elegant feature of kdb+: [executing functions and indexing into data structures are equivalent](../../ref/apply.md). For example: ```q /define map as a dictionary @@ -190,7 +186,7 @@ q)map[`bar;10] So the developer of the process needs to define `upd` according to their desired behavior. -Perhaps the simplest definition of `upd` is to be found in the vanilla RTS – the RDB. The script for this process is called `r.q` and within this script, we find the definition: +Perhaps the simplest definition of `upd` is to be found in the vanilla RTE – the RDB. The script for this process is called `r.q` and within this script, we find the definition: ```q upd:insert @@ -238,22 +234,22 @@ q)MC 4 ``` -The main challenge in developing a custom real-time subscriber is rewriting `upd` to achieve desired real-time behavior. +The main challenge in developing a custom RTE is rewriting `upd` to achieve desired real-time behavior. ### Tickerplant log replay -An important role of the tickerplant is to maintain a daily logfile on disk for replay purposes. When a real-time subscriber starts up, they could potentially replay this daily logfile, assuming they have read access to it. Such a feature could be useful if the subscriber crashes intraday and is restarted. In this scenario, the process would replay this logfile and then be fully up-to-date. Replaying this logfile, particularly late in the day when the tickerplant has written many messages to it, can take minutes. The exact duration will depend on three factors: +An important role of the tickerplant is to maintain a daily logfile on disk for replay purposes. When a RTE starts up, they could potentially replay this daily logfile, assuming they have read access to it. Such a feature could be useful if the subscriber crashes intraday and is restarted. In this scenario, the process would replay this logfile and then be fully up-to-date. Replaying this logfile, particularly late in the day when the tickerplant has written many messages to it, can take minutes. The exact duration will depend on three factors: 1. How many messages are in the logfile 2. The disk read speed 3. How quickly the process can replay a given message -The first and second factors are probably not controllable by the developer of the RTS. However the third factor is based on the efficiency and complexity of the particular replay function called `upd`. Defining this replay function efficiently is therefore of the upmost importance for quick intraday restarts. +The first and second factors are probably not controllable by the developer of the RTE. However the third factor is based on the efficiency and complexity of the particular replay function called `upd`. Defining this replay function efficiently is therefore of the upmost importance for quick intraday restarts. !!! note "One daily logfile" - The tickerplant maintains just one daily logfile. It does _not_ maintain separate logfiles split across different tables and symbols. This means that an RTS replaying such a logfile may only be interested in a fraction of the messages stored within. + The tickerplant maintains just one daily logfile. It does _not_ maintain separate logfiles split across different tables and symbols. This means that an RTE replaying such a logfile may only be interested in a fraction of the messages stored within. Ultimately the developer must decide if the process truly requires these records from earlier in the day. Changing the tickerplant’s code to allow subscriber specific logfiles should be technically possible, but is beyond the scope of this white paper. @@ -274,15 +270,15 @@ Focusing on the first message: item | semantics -----|----------- -1 | the symbol `` `upd`` is the name of the update/replay function on RTS +1 | the symbol `` `upd`` is the name of the update/replay function on RTE 2 | the symbol `` `trade`` is the table name of the update 3 | a column-oriented (columnar) list containing the new records -The format of the message in the tickerplant logfile is the same as the format of real-time updates sent to the RTS with one _critical_ difference – the data here is a list, _not_ a table. The RTS which wants to replay this logfile will need to define their `upd` to accommodate this list. This will mean in general that an RTS will have two different definitions of `upd` – one for tickerplant logfile replay and another for intraday updates via IPC (interprocess communication). +The format of the message in the tickerplant logfile is the same as the format of real-time updates sent to the RTE with one _critical_ difference – the data here is a list, _not_ a table. The RTE which wants to replay this logfile will need to define their `upd` to accommodate this list. This will mean in general that an RTE will have two different definitions of `upd` – one for tickerplant logfile replay and another for intraday updates via IPC (interprocess communication). For example, a q process with suitable definitions for the tables `trade` and `quote`, as well as the function `upd`, could replay `sym2014.08.23`. Again, a suitable definition for `upd` will depend on the desired behavior, but the function will need to deal with incoming lists as well as tables. -In the RDB (vanilla RTS), `upd` for both replay purposes and intraday update purposes is simply defined as: +In the RDB (vanilla RTE), `upd` for both replay purposes and intraday update purposes is simply defined as: ```q upd:insert @@ -290,7 +286,7 @@ upd:insert In other words, when the RDB replays a given message, it simply inserts the record/s into the corresponding table. This is the same definition of `upd` used for intraday updates via IPC. These updates succeed because the second argument to `insert` can be either a columnar list or a table. -A q process replays a tickerplant logfile using the operator `-11!`. Although this operator can be used in different ways, the simplest syntax is: +A q process replays a tickerplant logfile using the operator [`-11!`](../../basics/internal.md#-11-streaming-execute). Although this operator can be used in different ways, the simplest syntax is: ```q -11! `:TPDailyLogfile @@ -350,8 +346,8 @@ For more information on tickerplant logfile replay, see [“Data Recovery for kd ### End of day -At end of day (EOD), the tickerplant sends messages to all its real-time subscribers, telling them to execute their unary end-of-day function called `.u.end`. The tickerplant supplies a date which is typically the previous day’s -date. When customizing your RTS, define `.u.end` to achieve whatever behavior you deem appropriate at EOD. On the RDB, `.u.end` is defined as follows: +At end of day (EOD), the tickerplant sends messages to all its RTEs, telling them to execute their unary end-of-day function called `.u.end`. The tickerplant supplies a date which is typically the previous day’s +date. When customizing your RTE, define `.u.end` to achieve whatever behavior you deem appropriate at EOD. On the RDB, `.u.end` is defined as follows: ```q / end of day: save, clear, hdb reload @@ -366,7 +362,7 @@ To summarize this behavior: the RDB persists its tables to disk in date-partitio ### Understanding the code in `r.q` -To help you modify `r.q` to create your own custom RTS, this section explains its inner workings. This script starts out with the following: +To help you modify `r.q` to create your own custom RTE, this section explains its inner workings. This script starts out with the following: ```q if[not "w"=first string .z.o;system "sleep 1"]; @@ -566,37 +562,212 @@ Reading this from the right, we obtain the location of the tickerplant process w -## Examples of custom real-time subscribers +## Examples of custom RTEs -Two quite different RTS instances are described below. +Two quite different RTE instances are described below. +### Real-time VWAP subscriber -### Real-time trade with as-of quotes +#### Overview -One of the most popular and powerful joins in the q language is the [`aj`](../../ref/aj.md) function. This keyword was added to the language to solve a specific problem – how to join trade and quote tables together in such a way that for each trade, we grab the prevalent quote _as of_ the time of that trade. In other words, what is the last quote at or prior to the trade? +This section describes how to build an RTE which enriches trade with VWAP (volume-weighted average price) information on a per-symbol basis. +If a situation occurs were this RTE is restarted, it will recalculate VWAP from the TP log file. Upon an end-of-day event it will clear current records. + +A VWAP can be defined as: + +$$ VWAP = \frac{\sum_{i} (tradevolume_i)(tradeprice_i)}{\sum_{i} (trade price_i)}$$ + +Consider the following sample trade table: + +```q +q)trade /examine the first few trade records +time sym price size +------------------------------------------ +0D21:46:24.977505000 GS.N 178.56665 28 +0D21:46:24.977505000 IBM.N 191.22174 66 +0D21:46:25.977501000 MSFT.O 45.106284 584 +0D21:46:26.977055000 GS.N 178.563 563 +0D21:46:26.977055000 GS.N 178.57841 624 +0D21:46:27.977626000 GS.N 178.58783 995 +0D21:46:27.977626000 MSFT.O 45.110017 225 +.. +``` + +An additional column called `rvwap` (running VWAP) will be added to this table. In any given row, `rvwap` will contain the VWAP up until and including that particular trade record (for that particular symbol): + +```q +time sym price size rvwap +---------------------------------------------------- +0D21:46:24.977505000 GS.N 178.56665 28 178.47318 +0D21:46:24.977505000 IBM.N 191.22174 66 191.17041 +0D21:46:25.977501000 MSFT.O 45.106284 584 45.147046 +0D21:46:26.977055000 GS.N 178.563 563 178.47366 +0D21:46:26.977055000 GS.N 178.57841 624 178.47428 +0D21:46:27.977626000 GS.N 178.58783 995 178.47533 +0D21:46:27.977626000 MSFT.O 45.110017 225 45.146982 +.. +``` + +This `rvwap` column will need to be maintained as new trade records arrive from the TP. In order to achieve this, two additional columns need to be maintained per row – `v` and `s`: + +column | content +-------|-------- +`v` | cumulative value of all trades for that symbol (define value as the trade price multiplied by the trade size) +`s` | cumulative size (quantity) of all trades for that symbol + +`v` and `s` are the numerator and denominator respectively in the formula given at the start of this section. `rvwap` can then be simply calculated as `v` divided by `s`. The `trade` table would then become: + +```q +time sym price size v s rvwap +--------------------------------------------------------------------- +0D21:46:24.977505000 GS.N 178.56665 28 18687391 104707 178.47318 +0D21:46:24.977505000 IBM.N 191.22174 66 20497292 107220 191.17041 +0D21:46:25.977501000 MSFT.O 45.106284 584 5865278.4 129915 45.147046 +0D21:46:26.977055000 GS.N 178.563 563 18787922 105270 178.47366 +0D21:46:26.977055000 GS.N 178.57841 624 18899355 105894 178.47428 +0D21:46:27.977626000 GS.N 178.58783 995 19077050 106889 178.47533 +0D21:46:27.977626000 MSFT.O 45.110017 225 5875428.2 130140 45.146982 +.. +``` + +A simple keyed table called `vwap` is also maintained. This table simply maps a symbol to its current VWAP. Based on the above sample data, `vwap` would look like: + +```q +sym | rvwap +-------| -------- +GS.N | 178.47533 +IBM.N | 191.17041 +MSFT.O | 45.146982 +``` + +#### Example script + +Just like the previous RTE example, this solution will comprise a heavily modified version of [`r.q`](../../architecture/rq.md), written by the author and named `real_time_vwap.q`. -This function is relatively easy to use for one-off joins. However, what if you want to maintain trades with as-of quotes in real time? This section describes how to build an RTS with real-time trades and as-of quotes. This is a heavily modified version of `r.q`, written by the author and named `RealTimeTradeWithAsofQuotes.q`. +```q +/initialize schema function +InitializeTrade:{[TradeInfo;logfile] + `trade set TradeInfo 1; + if[null first logfile;update v:0n,s:0Ni,rvwap:0n from `trade;:()]; + -11!logfile; + update v:sums (size*price),s:sums size by sym from `trade; + update rvwap:v%s from `trade; + `vwap upsert select last rvwap by sym from trade;} -One additional feature this script demonstrates is the ability of any q process to write to and maintain its own kdb+ binary logfile for replay/recovery purposes. In this case, the RTS maintains its own daily logfile for trade records. This will be used for recovery in place of the standard tickerplant logfile as used by `r.q`. +/this keyed table maps a symbol to its current vwap +vwap:([sym:`$()] rvwap:`float$()) + +/For TP logfile replay, upd is a simple insert for trades +upd:{if[not `trade=x;:()];`trade insert y} + +/ +This intraday function is triggered upon incoming updates from TP. +Its behavior is as follows: +1. Add s and v columns to incoming trade records +2. Increment incoming records with the last previous s and v values + (on per sym basis) +3. Add rvwap column to incoming records (rvwap is v divided by s) +4. Insert these enriched incoming records to the trade table +5. Update vwap table +\ +updIntraDay:{[t;d] + d:update s:sums size,v:sums size*price by sym from d; + d:d pj select last v,last s by sym from trade; + d:update rvwap:v%s from d; + `trade insert d; + `vwap upsert select last rvwap by sym from trade; } + +/end of day function - triggered by tickerplant at EOD +/Empty tables +.u.end:{{delete from x}each tables `. } /clear out trade and vwap tables + +args:.Q.opt .z.x +args:`$args +h:hopen hsym first args`tp /connect to tickerplant +InitializeTrade . h "(.u.sub[`trade;",(.Q.s1 args`syms),"];`.u `i`L)" +upd:updIntraDay /switch upd to intraday update mode +``` This process should be started off as follows: ```bash -q tick/RealTimeTradeWithAsofQuotes.q -tp localhost:5000 -syms MSFT.O IBM.N GS.N -p 5003 +q tick/real_time_vwap.q -tp localhost:5000 -syms MSFT.O IBM.N GS.N -p 5004 ``` -This process will subscribe to both trade and quote tables for symbols `MSFT.O`, `IBM.N` and `GS.N` and will listen on port 5003. The author has deliberately made some of the q syntax more easily understandable compared to `r.q`. +This process will subscribe to only the `trade` table for symbols `MSFT.O`, `IBM.N` and `GS.N` and will listen on port 5004. The structure and design philosophy behind `real_time_vwap.q` is very similar to `RealTimeTradeWithAsofQuotes.q`. + +The first section of the script simply parses the command-line +arguments and uses these to update some default values – identical +code to the start of `RealTimeTradeWithAsofQuotes.q`. + + +#### Initialization + +`InitializeTrade` defines the behavior of this RTE after connecting to the TP and subscribing to the `trade` table. +This RTE will replay the TP’s logfile, much like the RDB. The `InitializeTrade` function replaces [`.u.rep`](../../architecture/rq.md#urep). + +This binary function `InitializeTrade` will be executed upon startup. It is passed two arguments, just like `.u.rep`: + +argument | semantics +------------|---------- +`TradeInfo` | pair: table name (`` `trade``); empty table definition +`Logfile` | pair: TP logfile record count and location -The first section of the script simply parses the command-line arguments and uses these to update some default values: +The `vwap` table is then simply defined as: + +```q +/this keyed table maps a symbol to its current vwap +vwap:([sym:`$()] rvwap:`float$()) +``` + +When `InitializeTrade` is executed, the TP logfile will be replayed and its contents executed using [`-11!`](../../basics/internal.md#-11-streaming-execute). + +Replaying will cause the `upd` function to be executed, which in this script is defined as insert `trade` records into the `trade` table and ignoring `quote` records. + +#### Intraday update behavior + +`updIntraDay` is called whenever a trade update comes in, the VWAP for each affected symbol is updated and the new trades are enriched with this information. + +#### End of day + +At end of day, the tickerplant sends a message to all RTEs telling them to invoke their EOD function (`.u.end`). The function will clear tables used on this RTE. + +#### Subscribe to TP + +The RTE connects to the TP and subscribes to the `trade` table for user specified symbols. +The RTE also requests TP logfile information (for replay purposes): + +```q +h:hopen args`tp /connect to tickerplant +InitializeTrade . h "(.u.sub[`trade;",(.Q.s1 args`syms),"];`.u `i`L)" +upd:updIntraDay /switch upd to intraday update mode +``` + +The message returned from the TP is passed to the function `InitializeTrade`. +Once the RTE has finished initializing or replaying the TP logfile, the definition of `upd` is then switched to `updIntraDay` so the RTE can deal with intraday updates appropriately. + +### Real-time trade with as-of quotes + +#### Overview + +One of the most popular and powerful joins in the q language is the [`aj`](../../ref/aj.md) function. This keyword was added to the language to solve a specific problem – how to join trade and quote tables together in such a way that for each trade, we grab the prevalent quote _as of_ the time of that trade. In other words, what is the last quote at or prior to the trade? + +This function is relatively easy to use for one-off joins. However, what if you want to maintain trades with as-of quotes in real time? This section describes how to build an RTE with real-time trades and as-of quotes. + +One additional feature this script demonstrates is the ability of any q process to write to and maintain its own kdb+ binary logfile for replay/recovery purposes. In this case, the RTE maintains its own daily logfile for trade records. This will be used for recovery in place of the standard tickerplant logfile. + +#### Example script + +This is a heavily modified version of an RDB ([`r.q`](../../architecture/rq.md)), written by the author and named `RealTimeTradeWithAsofQuotes.q`. ```q / The purpose of this script is as follows: -1. Demonstrate how custom real-time subscribers can be created in q -2. In this example, create an efficient engine for calculating +1. Demonstrate how custom RTEs can be created in q +2. In this example, create an efficient engine for calculating the prevalent quotes as of trades in real-time. This removes the need for ad-hoc invocations of the aj function. -3. In this example, this subscriber also maintains its own binary +3. In this example, this subscriber also maintains its own binary log file for replay purposes. This replaces the standard tickerplant log file replay functionality. \ @@ -604,132 +775,61 @@ show "RealTimeTradeWithAsofQuotes.q" /sample usage /q tick/RealTimeTradeWithAsofQuotes.q -tp localhost:5000 -syms MSFT.O IBM.N GS.N -/default command line arguments - tp is location of tickerplant. +/default command line arguments - tp is location of tickerplant. /syms are the symbols we wish to subscribe to -default:`tp`syms!("::5000";"") - +default:`tp`syms!("::5000";"") + args:.Q.opt .z.x /transform incoming cmd line arguments into a dictionary -args:`$default,args /upsert args into default +args:`$default,args /upsert args into default args[`tp] : hsym first args[`tp] -/drop into debug mode if running in foreground AND +/drop into debug mode if running in foreground AND /errors occur (for debugging purposes) -\e 1 +\e 1 if[not "w"=first string .z.o;system "sleep 1"] -``` - -The error flag above is set for purely testing purposes – when the developer runs this script in the foreground, if errors occur at runtime as a result of incoming IPC messages, the process will drop into debug mode. For example, if there is a problem with the definition of `upd`, then when an update is received from the tickerplant we will drop into debug mode and (hopefully) identify the issue. - -#### Initialize desired table schemas - -The next section of code defines the behavior of this RTS upon connecting and subscribing to the tickerplant’s trade and quote tables. This function replaces `.u.rep` in `r.q`: - -```q -/initialize schemas for custom real-time subscriber +/initialize schemas for custom RTE InitializeSchemas:`trade`quote! ( {[x]`TradeWithQuote insert update bid:0n,bsize:0N,ask:0n,asize:0N from x}; {[x]`LatestQuote upsert select by sym from x} ); -``` -The RTS’s trade table (named `TradeWithQuote`) maintains `bid`, `bsize`, `ask` and `asize` columns of appropriate type. For the quote table, we just maintain a keyed table called `LatestQuote`, keyed on `sym` which will maintain the most recent quote per symbol. This table will be used when joining prevalent quotes to incoming trades. - - -#### Intraday update behavior - -The next code section defines the intraday behavior upon receiving new trades: - -```q /intraday update functions /Trade Update /1. Update incoming data with latest quotes -/2. Insert updated data to TradeWithQuote table -/3. Append message to custom logfile +/2. Insert updated data to TradeWithQuote table +/3. Append message to custom logfile updTrade:{[d] d:d lj LatestQuote; `TradeWithQuote insert d; LogfileHandle enlist (`replay;`TradeWithQuote;d); } -``` - -Besides inserting the new trades with prevalent quote information into the trade table, the above function also appends the new records to its custom logfile. This logfile will be replayed upon recovery/startup of the RTS. Note that the replay function is named `replay`. This differs from the conventional TP logfile where the replay function was called `upd`. - -The next section defines the intraday behavior upon receiving new quotes: -```q /Quote Update -/1. Calculate latest quote per sym for incoming data +/1. Calculate latest quote per sym for incoming data /2. Update LatestQuote table updQuote:{[d] `LatestQuote upsert select by sym from d; } -``` -The following dictionary `upd` acts as a case statement – when an update for the trade table is received, `updTrade` will be triggered with the message as argument. Likewise, when an update for the quote table is received, `updQuote` will be triggered. - -```q /upd dictionary will be triggered upon incoming update from tickerplant upd:`trade`quote!(updTrade;updQuote) -``` - -In `r.q`, `upd` is defined as a function, not a dictionary. However we can use this dictionary definition for reasons discussed previously. - -#### End of day - -At end of day, the tickerplant sends a message to all real-time subscribers telling them to invoke their EOD function – `.u.end`: - -```q /end of day function - triggered by tickerplant at EOD .u.end:{ - hclose LogfileHandle; /close the connection to the old log file + hclose LogfileHandle; /close the connection to the old log file /create the new logfile - logfile::hsym `$"RealTimeTradeWithAsofQuotes_",string .z.D; - .[logfile;();:;()]; /Initialize the new log file + logfile::hsym `$"RealTimeTradeWithAsofQuotes_",string .z.D; + .[logfile;();:;()]; /Initialize the new log file LogfileHandle::hopen logfile; - {delete from x}each tables `. /clear out tables } -``` + {delete from x}each tables `. /clear out tables + } -This function has been heavily modified from `r.q` to achieve the following desired behavior: - -`hclose LogfileHandle` - -: Close connection to the custom logfile. - -``logfile::hsym `$"RealTimeTradeWithAsofQuotes_",string .z.D`` - -: Create the name of the new custom logfile. This logfile is a daily logfile – meaning it only contains one day’s trade records and it has today’s date in its name, just like the tickerplant’s logfile. - -`.[logfile;();:;()]` - -: Initialize this logfile with an empty list. - -`LogfileHandle::hopen logfile` - -: Establish a connection (handle) to this logfile for streaming writes. - -``{delete from x}each tables `.`` - -: Empty out the tables. - - -#### Replay custom logfile - -This section concerns the initialization and replay of the RTS’s custom logfile. - -```q /Initialize name of custom logfile -logfile:hsym `$"RealTimeTradeWithAsofQuotes_",string .z.D +logfile:hsym `$"RealTimeTradeWithAsofQuotes_",string .z.D; replay:{[t;d]t insert d} /custom log file replay function -``` - -At this point, the name of today’s logfile and the definition of the logfile replay function have been established. The replay function will be invoked when replaying the process’s custom daily logfile. It is defined to simply insert the on-disk records into the in memory (`TradeWithQuote`) table. This will be a fast operation ensuring recovery is achieved quickly and efficiently. - -Upon startup, the process uses a try-catch to replay its custom daily logfile. If it fails for any reason (possibly because the logfile does not yet exist), it will send an appropriate message to standard out and will initialize this logfile. Replay of the logfile is achieved with the standard operator `-11!` as discussed previously. -```q /attempt to replay custom log file @[{-11!x;show"successfully replayed custom log file"}; logfile; {[e] @@ -737,30 +837,33 @@ Upon startup, the process uses a try-catch to replay its custom daily logfile. I show m," - assume it does not exist. Creating it now"; .[logfile;();:;()]; /Initialize the log file } ] -``` - -Once the logfile has been successfully replayed/initialized, a handle (connection) is established to it for subsequent streaming appends (upon new incoming trades from tickerplant): -```q - /open a connection to log file for writing +/open a connection to log file for writing LogfileHandle:hopen logfile + +/ connect to tickerplant and subscribe to trade and quote for portfolio +h:hopen args`tp; /connect to tickerplant +InitializeSchemas . h(".u.sub";`trade;args`syms); +InitializeSchemas . h(".u.sub";`quote;args`syms); ``` +This process should be started off as follows: -#### Subscribe to TP +```bash +q tick/RealTimeTradeWithAsofQuotes.q -tp localhost:5000 -syms MSFT.O IBM.N GS.N -p 5003 +``` -The next part of the script is probably the most critical – the process connects to the tickerplant and subscribes to the trade and quote table for user-specified symbols. +This process will subscribe to both trade and quote tables for symbols `MSFT.O`, `IBM.N` and `GS.N` and will listen on port 5003. The author has deliberately made some of the q syntax more easily understandable compared to `r.q`. -```q -/ connect to tickerplant and subscribe to trade and quote for portfolio -h:hopen args`tp /connect to tickerplant -InitializeSchemas . h(".u.sub";`trade;args`syms) -InitializeSchemas . h(".u.sub";`quote;args`syms) -``` +The first section of the script simply parses the command-line arguments and uses these to update some default values. -The output of a subscription to a given table (for example `trade`) from the tickerplant is a 2-list, as discussed previously. This pair is in turn passed to the function `InitializeSchemas`. +The error flag [`\e`](../../basics/syscmds.md#e-error-trap-clients) is set for purely testing purposes. +When the developer runs this script in the foreground, +if errors occur at runtime as a result of incoming IPC messages, the process will drop into debug mode. +For example, if there is a problem with the definition of `upd`, then when an update is received from the tickerplant +we will drop into debug mode and (hopefully) identify the issue. -We can see this RTS in action by examining the five most recent trades for `GS.N`: +We can see this RTE in action by examining the five most recent trades for `GS.N`: ```q q)-5#select from TradeWithQuote where sym=`GS.N @@ -773,190 +876,106 @@ time sym price size bid bsize ask asize 0D21:51:03.317152000 GS.N 178.8314 198 178.8296 915  178.8587 480 ``` +#### Initialize desired table schemas -### Real-time VWAP subscriber - -This section describes how to build an RTS which enriches trade with VWAP (volume-weighted average price) information on a per-symbol basis. A VWAP can be defined as: - -$$ VWAP = \frac{\sum_{i} (tradevolume_i)(tradeprice_i)}{\sum_{i} (trade price_i)}$$ - -Consider the following sample trade table: - -```q -q)trade /examine the first few trade records -time sym price size ------------------------------------------- -0D21:46:24.977505000 GS.N 178.56665 28 -0D21:46:24.977505000 IBM.N 191.22174 66 -0D21:46:25.977501000 MSFT.O 45.106284 584 -0D21:46:26.977055000 GS.N 178.563 563 -0D21:46:26.977055000 GS.N 178.57841 624 -0D21:46:27.977626000 GS.N 178.58783 995 -0D21:46:27.977626000 MSFT.O 45.110017 225 -.. -``` - -An additional column called `rvwap` (running VWAP) will be added to this table. In any given row, `rvwap` will contain the VWAP up until and including that particular trade record (for that particular symbol): - -```q -time sym price size rvwap ----------------------------------------------------- -0D21:46:24.977505000 GS.N 178.56665 28 178.47318 -0D21:46:24.977505000 IBM.N 191.22174 66 191.17041 -0D21:46:25.977501000 MSFT.O 45.106284 584 45.147046 -0D21:46:26.977055000 GS.N 178.563 563 178.47366 -0D21:46:26.977055000 GS.N 178.57841 624 178.47428 -0D21:46:27.977626000 GS.N 178.58783 995 178.47533 -0D21:46:27.977626000 MSFT.O 45.110017 225 45.146982 -.. -``` - -This `rvwap` column will need to be maintained as new trade records arrive from the TP. In order to achieve this, two additional columns need to be maintained per row – `v` and `s`: - -column | content --------|-------- -`v` | cumulative value of all trades for that symbol (define value as the trade price multiplied by the trade size) -`s` | cumulative size (quantity) of all trades for that symbol +`InitializeSchemas` defines the behavior of this RTE upon connecting and subscribing to the tickerplant’s trade and quote tables. +`InitializeSchemas` (defined as a dictionary which maps table names to unary function definitions) replaces [`.u.rep`](../../architecture/rq.md#urep) in `r.q`: -`v` and `s` are the numerator and denominator respectively in the formula given at the start of this section. `rvwap` can then be simply calculated as `v` divided by `s`. The `trade` table would then become: +The RTE’s trade table (named `TradeWithQuote`) maintains `bid`, `bsize`, `ask` and `asize` columns of appropriate type. +For the quote table, we just maintain a keyed table called `LatestQuote`, keyed on `sym` which will maintain the most recent quote per symbol. +This table will be used when joining prevalent quotes to incoming trades. -```q -time sym price size v s rvwap ---------------------------------------------------------------------- -0D21:46:24.977505000 GS.N 178.56665 28 18687391 104707 178.47318 -0D21:46:24.977505000 IBM.N 191.22174 66 20497292 107220 191.17041 -0D21:46:25.977501000 MSFT.O 45.106284 584 5865278.4 129915 45.147046 -0D21:46:26.977055000 GS.N 178.563 563 18787922 105270 178.47366 -0D21:46:26.977055000 GS.N 178.57841 624 18899355 105894 178.47428 -0D21:46:27.977626000 GS.N 178.58783 995 19077050 106889 178.47533 -0D21:46:27.977626000 MSFT.O 45.110017 225 5875428.2 130140 45.146982 -.. -``` -A simple keyed table called `vwap` is also maintained. This table simply maps a symbol to its current VWAP. Based on the above sample data, `vwap` would look like: +#### Intraday update behavior -```q -sym | rvwap --------| -------- -GS.N | 178.47533 -IBM.N | 191.17041 -MSFT.O | 45.146982 -``` +`updTrade` defines the intraday behavior upon receiving new trades. -Just like the previous RTS example, this solution will comprise a heavily modified version of `r.q`, written by the author and named `real_time_vwap.q`. +Besides inserting the new trades with prevalent quote information into the trade table, `updTrade` +also appends the new records to its custom logfile. This logfile will be replayed upon recovery/startup of the RTE. +Note that the replay function is named `replay`. This differs from the conventional TP logfile where the replay function was called `upd`. -This process should be started off as follows: +`updQuote` defines the intraday behavior upon receiving new quotes. -```bash -q tick/real_time_vwap.q -tp localhost:5000 -syms MSFT.O IBM.N GS.N -p 5004 -``` +The `upd` dictionary acts as a case statement – when an update for the trade table is received, +`updTrade` will be triggered with the message as argument. +Likewise, when an update for the quote table is received, `updQuote` will be triggered. -This process will subscribe to only the `trade` table for symbols `MSFT.O`, `IBM.N` and `GS.N` and will listen on port 5004. The structure and design philosophy behind `real_time_vwap.q` is very similar to `RealTimeTradeWithAsofQuotes.q`. +In `r.q`, `upd` is defined as a function, not a dictionary. However we can use this dictionary definition for reasons discussed previously. -The first section of the script simply parses the command-line -arguments and uses these to update some default values – identical -code to the start of `RealTimeTradeWithAsofQuotes.q`. +#### End of day -#### Initialize desired table schemas +At end of day, the tickerplant sends a message to all RTEs telling them to invoke their EOD function (`.u.end`): -The next section of code defines the behavior of this RTS upon connecting to the TP and subscribing to the `trade` table. This RTS will replay the TP’s logfile, much like the RDB. The following function replaces `.u.rep`. +This function has been heavily modified from `r.q` to achieve the following desired behavior: -```q -/initialize schema function -InitializeTrade:{[TradeInfo;logfile] - `trade set TradeInfo 1; - if[null first logfile;update v:0n,s:0Ni,rvwap:0n from `trade;:()]; - -11!logfile; - update v:sums (size*price),s:sums size by sym from `trade; - update rvwap:v%s from `trade; } -``` +* `hclose LogfileHandle` + * Close connection to the custom logfile. +* ``logfile::hsym `$"RealTimeTradeWithAsofQuotes_",string .z.D`` + * Create the name of the new custom logfile. This logfile is a daily logfile – meaning it only contains one day’s trade records and it has today’s date in its name, just like the tickerplant’s logfile. +* `.[logfile;();:;()]` + * Initialize this logfile with an empty list. +* `LogfileHandle::hopen logfile` + * Establish a connection (handle) to this logfile for streaming writes. +* ``{delete from x}each tables `.`` + * Empty out the tables. -This binary function `InitializeTrade` will be executed upon startup. It is passed two arguments, just like `.u.rep`: -argument | semantics -------------|---------- -`TradeInfo` | pair: table name (`` `trade``); empty table definition -`Logfile` | pair: TP logfile record count and location +#### Replay custom logfile -The `vwap` table is then simply defined as: +This section concerns the initialization and replay of the RTE’s custom logfile. ```q -/this keyed table maps a symbol to its current vwap -vwap:([sym:`$()] rvwap:`float$()) -``` - -When `InitializeTrade` is executed, the TP logfile will be replayed using `-11!`. For the purpose of this replay, the function `upd` is simply defined as: +/Initialize name of custom logfile +logfile:hsym `$"RealTimeTradeWithAsofQuotes_",string .z.D -```q -/For TP logfile replay, upd is a simple insert for trades -upd:{if[not `trade=x;:()];`trade insert y} +replay:{[t;d]t insert d} /custom log file replay function ``` -In other words, insert `trade` records into the `trade` table and ignore `quote` records. - - -#### Intraday update behavior +At this point, the name of today’s logfile and the definition of the logfile replay function have been established. The replay function will be invoked when replaying the process’s custom daily logfile. It is defined to simply insert the on-disk records into the in memory (`TradeWithQuote`) table. This will be a fast operation ensuring recovery is achieved quickly and efficiently. -The next code section defines the intraday behavior upon receiving -new trades: +Upon startup, the process uses a try-catch to replay its custom daily logfile. If it fails for any reason (possibly because the logfile does not yet exist), it will send an appropriate message to standard out and will initialize this logfile. Replay of the logfile is achieved with the standard operator `-11!` as discussed previously. ```q -/ -This intraday function is triggered upon incoming updates from TP. -Its behavior is as follows: -1. Add s and v columns to incoming trade records -2. Increment incoming records with the last previous s and v values - (on per sym basis) -3. Add rvwap column to incoming records (rvwap is v divided by s) -4. Insert these enriched incoming records to the trade table -5. Update vwap table -\ -updIntraDay:{[t;d] - d:update s:sums size,v:sums size*price by sym from d; - d:d pj select last v,last s by sym from trade; - d:update rvwap:v%s from d; - `trade insert d; - `vwap upsert select last rvwap by sym from trade; } +/attempt to replay custom log file +@[{-11!x;show"successfully replayed custom log file"}; logfile; + {[e] + m:"failed to replay custom log file"; + show m," - assume it does not exist. Creating it now"; + .[logfile;();:;()]; /Initialize the log file + } ] ``` -So whenever a trade update comes in, the VWAP for each affected symbol is updated and the new trades are enriched with this information. - - -#### End of day - -The EOD behavior on this RTS is very simple – clear out the tables: +Once the logfile has been successfully replayed/initialized, a handle (connection) is established to it for subsequent streaming appends (upon new incoming trades from tickerplant): ```q -/end of day function - triggered by tickerplant at EOD -/Empty tables -.u.end:{{delete from x}each tables `. } /clear out trade and vwap tables + /open a connection to log file for writing +LogfileHandle:hopen logfile ``` - #### Subscribe to TP -The RTS connects to the TP and subscribes to the `trade` table for user specified symbols. The RTS also requests TP logfile information (for replay purposes): +The next part of the script is probably the most critical – the process connects to the tickerplant and subscribes to the trade and quote table for user-specified symbols. ```q -h:hopen args`tp /connect to tickerplant -InitializeTrade . h "(.u.sub[`trade;",(.Q.s1 args`syms),"];`.u `i`L)" -upd:updIntraDay /switch upd to intraday update mode +/ connect to tickerplant and subscribe to trade and quote for portfolio +h:hopen args`tp; /connect to tickerplant +InitializeSchemas . h(".u.sub";`trade;args`syms); +InitializeSchemas . h(".u.sub";`quote;args`syms); ``` -The message returned from the TP is passed to the function `InitializeTrade`. Once the RTS has finished initializing or replaying the TP logfile, the definition of `upd` is then switched to `updIntraDay` so the RTS can deal with intraday updates appropriately. - +The output of a subscription to a given table (for example `trade`) from the tickerplant is a 2-list, as discussed previously. This pair is in turn passed to the function `InitializeSchemas`. ## Performance considerations -The developer can build the RTS to achieve whatever real-time behavior is desired. However from a performance perspective, not all RTS instances are equal. The standard RDB is highly performant – meaning it should be able process updates at a very high frequency without maxing out CPU resources. In a real world environment, it is critical that the RTS can finish processing an incoming update before the next one arrives. The high level of RDB performance comes from the fact that its definition of `upd` is extremely simple: +The developer can build the RTE to achieve whatever real-time behavior is desired. However from a performance perspective, not all RTE instances are equal. The standard RDB is highly performant – meaning it should be able process updates at a very high frequency without maxing out CPU resources. In a real world environment, it is critical that the RTE can finish processing an incoming update before the next one arrives. The high level of RDB performance comes from the fact that its definition of `upd` is extremely simple: ```q upd:insert ``` -In other words, for both TP logfile replay and intraday updates, simply insert the records into the table. It doesn’t take much time to execute `insert` in kdb+. However, the two custom RTS instances discussed in this white paper have more complicated definitions of `upd` for intraday updates and will therefore be less performant. This section examines this relative performance. +In other words, for both TP logfile replay and intraday updates, simply insert the records into the table. It doesn’t take much time to execute `insert` in kdb+. However, the two custom RTE instances discussed in this white paper have more complicated definitions of `upd` for intraday updates and will therefore be less performant. This section examines this relative performance. For this test, the TP log will be used. This particular TP logfile has the following characteristics: @@ -1004,7 +1023,7 @@ upd:{[tblName;tblData] ``` This transformed logfile will now be used to test performance on the -RDB and two RTS instances. +RDB and two RTE instances. On the RDB, we obtained the following performance: @@ -1020,7 +1039,7 @@ q)\ts value each logs /execute each update It took 289 milliseconds to process over a quarter of a million updates, where each update had two records. Therefore, the average time taken to process a single two-row update is 1µs. -In the first example RTS (Real-time Trade With As-of Quotes), we obtained the following performance: +In the first example RTE (Real-time Trade With As-of Quotes), we obtained the following performance: ```q q)upd /custom real time update behavior @@ -1038,12 +1057,12 @@ q)\ts value each logs /execute each update It took 2185 milliseconds to process over a quarter of a million updates, where each update had two records. Therefore, the average time taken to process a single two-row update is 7.7 µs – over seven times slower than RDB. -In the second example RTS (Real-time VWAP), we obtained the following performance: +In the second example RTE (Real-time VWAP), we obtained the following performance: ```q / Because there are trades and quotes in the logfile -but this RTS is only designed to handle trades, +but this RTE is only designed to handle trades, a slight change to upd is necessary for the purpose of this performance experiment \ @@ -1059,16 +1078,16 @@ q)\ts value each logs /execute each update It took 9639 milliseconds to process over a quarter of a million updates, where each update had two records. Therefore, the average time taken to process a single two row update is 34 µs – over thirty times slower than RDB. -We can conclude that there was a significant difference in performance in processing updates across the various real-time subscribers. However even in the worst case, assuming the TP updates arrive no more frequently than once every 100 µs, the process should still function well. +We can conclude that there was a significant difference in performance in processing updates across the various RTEs. However even in the worst case, assuming the TP updates arrive no more frequently than once every 100 µs, the process should still function well. It should be noted that prior to this experiment being carried out on each process, all tables were emptied. ## Conclusions -This white paper explained the inner workings of the standard real-time database subscriber as well as an overview of the rest of the kdb+tick environment. The white paper then detailed examples of customizing the RDB to achieve useful real-time analytical behavior. +This white paper explained the inner workings of the standard RDB as well as an overview of the rest of the kdb+tick environment. The white paper then detailed examples of customizing the RDB to achieve useful real-time analytical behavior. -It’s important when building a custom RTS to consider the performance implications of adding complexity to the update logic. The more complex the definition of `upd`, the longer it will take to process intraday updates or replay the TP logfile. In the case of intraday updates, it is important to know the frequency of TP updates in order to know how much complexity you can afford to build into your `upd` function. +It’s important when building a custom RTE to consider the performance implications of adding complexity to the update logic. The more complex the definition of `upd`, the longer it will take to process intraday updates or replay the TP logfile. In the case of intraday updates, it is important to know the frequency of TP updates in order to know how much complexity you can afford to build into your `upd` function. It is the aim of the author that the reader will now have the understanding of how a kdb+tick subscriber can be built and customized fairly easily according to the requirements of the system. diff --git a/mkdocs.yml b/mkdocs.yml index bbfc99057..34f261cf0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -464,7 +464,6 @@ nav: - SSL/TLS: kb/ssl.md - HTTP: kb/http.md - WebSockets: kb/websockets.md - - Interprocess communication (WP): wp/ipc/index.md - Tools: - Code profiler: kb/profiler.md - Debugging: basics/debug.md @@ -523,20 +522,24 @@ nav: - Developer tools: devtools.md - FAQ: kb/faq-listbox.md - Streaming: - - General architecture: architecture/index.md + - General architecture: + - Overview: architecture/index.md + - kdb-tick: + - Tickerplant (tick.q): architecture/tickq.md + - Tickerplant pub/sub (u.q): architecture/uq.md + - RDB (r.q): architecture/rq.md - Alternative architecture: kb/kdb-tick.md - Alternative in-memory layouts: kb/alternative-in-memory-layouts.md - Corporate actions: kb/corporate-actions.md - - Data recovery: wp/data-recovery.md + - TP Log (data recovery): wp/data-recovery.md - Disaster recovery: wp/disaster-recovery/index.md - Gateway design: wp/gateway-design/index.md - Profiling: wp/tick-profiling.md - Kubernetes: 'https://youtu.be/jqtkkCqBvr4' - Load balancing: kb/load-balancing.md - Order Book: wp/order-book.md - - Publish and subscribe: kb/publish-subscribe.md - Query Routing: wp/query-routing/index.md - - Real-time tick subscribers: wp/rt-tick/index.md + - Real-time engines: wp/rt-tick/index.md - Advanced: - Distributed systems: wp/query-interface.md - Intraday writedown: wp/intraday-writedown/index.md