Skip to content

Statistical Nodes API

github-actions[bot] edited this page Apr 10, 2024 · 1 revision

This page contains the documentation for the csp.stats  library. The stats  library contains functions to calculate statistics on time series data over rolling windows.

Table of Contents

Base Statistics:

  • count: counts the number of data ticks within a given interval
  • unique: counts the number of unique values within a given interval
  • sum: rolling sum of values within a given interval
  • prod: rolling product of values within a given interval
  • first: the earliest value still within the interval
  • last: the last value of the interval
  • mean: the mean of values within the interval
  • gmean: the geometric mean of values within the interval

Order Statistics:

  • max: the maximum value within the interval
  • min: the minimum value within the interval
  • median: the median value within the interval
  • quantile: the quantile value within the interval
  • argmin: the time at which the minimum interval value ticked
  • argmax: the time at which the maximum interval value ticked
  • rank: the time series rank of the most recent tick in the interval

Moment-Based Statistics:

  • var: variance of the time series within the interval
  • stddev: standard deviation within the interval
  • sem: standard error within the interval
  • cov: covariance between two in-sequence time series within the interval
  • corr: correlation between two in-sequence time series within the interval
  • skew: skewness of the time series within the interval
  • kurt: kurtosis (or excess kurtosis) of the time series within the interval

Exponential Moving Statistics:

  • ema: exponential moving average, with numerous different variations available
  • ema_var: exponential moving variance
  • ema_std: exponential moving standard deviation
  • ema_cov: exponential moving covariance between two in-sequence time series

NumPy Specific Statistics:

  • cov_matrix: covariance matrix between N time-series (in a NumPy array) over a rolling time interval
  • corr_matrix: normalized correlation matrix between N time-series (in a NumPy array) a rolling time interval
  • list_to_numpy: converts a listbasket of time-series into a NumPy array
  • numpy_to_list: converts a NumPy array time-series into a listbasket

Cross-Sectional Statistics:

  • cross_sectional: receive all data within the current window for a cross-sectional calculation

Base Statistics

Count

count(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data.
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, ignores NaN values in the window (does not count them). If false, NaN values make the count NaN.
    • By default, ignore_na is True
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
    • By default, there is no reset series.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_points, NaN is returned.

Returns:

  • A time-series of how many data points are currently in the interval. If a tick count is used, then it is necessarily less than or equal to the interval.

Examples: count

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
count(x, interval=3)
# NaN is not counted
{'2020-01-03': 3, '2020-01-04': 2, '2020-01-05': 2}

2. Including NaN

count(x, interval=3, ignore_na=False)
{'2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. Triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), ignore_na=True, trigger=trigger)
{'2020-01-03': 3, '2020-01-05': 2}

4. Sampling

sampler = {'2020-01-01': True, '2020-01-02': True, '2020-01-03': True, '2020-01-05': True, '2020-01-06': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), sampler=sampler)
{'2020-01-03': 3, '2020-01-05': 2}

Note: the x value at 2020-01-04 is ignored completely since sampler does not tick, while the value at 2020-01-06 is treated as NaN.

5. Reset

reset = {'2020-01-04': True}
count(x, interval=timedelta(days=3), min_window=timedelta(days=2), reset=reset)
{'2020-01-03': 3, '2020-01-04': 0, '2020-01-05': 1}

Note: the window data is reset at 2020-01-04, and its value is NaN, so the count is 0

6. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,3]}
count(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], '2020-01-02': [2,1], '2020-01-03': [3,2]} # count is per element

Unique

unique(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    precision: int = 10
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • trigger: another optional time-series which can be use to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
    • By default, there is no reset series.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
  • precision: the decimal place precision at which two floats are considered non-unique. For example, if precision=2, then 2.001 and 2.002 would be considered non-unique.
    • By default, precision is set to 10 decimal places.

Returns:

  • a time-series of how many unique (excluding nan) values are currently in the interval

Examples: unique

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 2, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 3}
unique(x, interval=3, min_window=2)
{'2020-01-02': 1, '2020-01-03': 2, '2020-01-04': 2, '2020-01-05': 1}

2. Triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
unique(x, interval=timedelta(days=3), min_window=timedelta(days=2), trigger=trigger)
{'2020-01-03': 2, '2020-01-05': 1}

3. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,1]}
unique(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], 2020-01-02: [2,1], '2020-01-03': [3,1]} # unique is per element

Sum

sum(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    precise: bool = False,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data. Can either be a ts[Union[float, np.ndarray]] or ts[np.ndarray].
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • precise: if True we use a more numerically stable implementation (Kahan) which is less efficient
  • ignore_na: if True, does not include any nan values in the window. If false, nan values are included and will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted sum (optional).
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset": another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling sums over the interval

Examples: sum

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
sum(x, interval=3)
{'2020-01-03': 6, '2020-01-04: 5', '2020-01-05': 8}

2. Including NaNs

sum(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 3, '2020-01-03': 6, '2020-01-04': nan, '2020-01-05': nan}

3. Weighted single input

weights = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-04': 3}
sum(x_np, interval=3, weights=weights)
{'2020-01-03': 11, '2020-01-04': 10, '2020-01-05': 21} # 21 = 5x3 + 3x2

4. NumPy

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,1]}
sum(x_np, interval=3, min_window=1)
{'2020-01-01': [1,1], '2020-01-02': [3,1], '2020-01-03': [4,2]}

5. NumPy weighted sum

np_weights = {'2020-01-01': [1,2], '2020-01-02': [2,1}
sum(x_np, interval=3, min_window=1, weights=np_weights)
{'2020-01-01': [1,2], '2020-01-02': [5,2], '2020-01-03': [11,3]} # weights applied elementwise

Product

prod(
    x: ts[Union[float, np.ndarray]],
    interval : Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, does not include any nan values in the window. If false, nan values are included and will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling products over the interval. The computation is unstable for large products and windows.

Examples: prod

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
prod(x, interval=3, min_window=2, ignore_na=True)
{'2020-01-02': 2, '2020-01-03': 6 '2020-01-04': 6, '2020-01-05': 15}

2. NumPy

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,4], '2020-01-03': [5,6]}
prod(x_np, 3, 2)
{'2020-01-02': [3,8], '2020-01-03': [15,24]}

First

first(
    x: ts[Union[float, np.ndarray]],
    interval : Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    ignore_na: bool = True
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.
  • ignore_na: if True, will return the first non-nan value in the window. If False, will return the first value in the window

Returns:

  • a time-series of the earliest (non-nan) value still within the given interval

Examples: first

See last

Last

last(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, will return the last non-nan value in the window. If False, will return the last value in the window
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of the most recent value within the given interval

Examples: first and last

Starttime: 2020-01-01 00:00:00

1. Default - first

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
first(x, interval=3)
{'2020-01-03': 1, '2020-01-04': 2, '2020-01-05': 3}

2. Including NaN - last

last(x, interval=3, ignore_na=False)
{'2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. Triggering - last

trigger = {'2020-01-03': True, '2020-01-04': True}
last(x, interval=timedelta(days=3), ignore_na=True, trigger=trigger)
{'2020-01-03': 3, '2020-01-04': 3}

4. NumPy - first

x_np = {'2020-01-01': [1,1], '2020-01-02': [2,np.nan], '2020-01-03': [3,3]}
first(x_np, interval=2)
# first non-nan value
{'2020-01-02': [1,1], '2020-01-03': [2,3]}

Mean

mean(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]
  • x: the time-series data. Can either be a ts[Union[float, np.ndarray]] or a ts[np.ndarray].
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted mean (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling means over the interval. Computation uses smart updating so overflow is not an issue, since no sums are kept

Examples: mean

See gmean

Geometric Mean

gmean(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
)→ ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling geometric means over the interval. Requires a strictly positive-valued input.

Examples: mean and gmean

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
mean(x, interval=3, min_window=2)
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': 2.5, '2020-01-05': 4.0}

2. Including NaN

mean(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': nan, '2020-01-05': nan}

3. Geometric mean

trigger = {'2020-01-03': True, '2020-01-05': True}
gmean(x, interval=timedelta(days=3), min_window=timedelta(days=2), ignore_na=True, trigger=trigger)
{'2020-01-03': 1.817, '2020-01-05': 3.873}

4. Weighted mean

weights = {'2020-01-01': 1, '2020-01-03': 2}
mean(x, interval=3, min_window=2, ignore_na=True, weights=weights)
{'2020-01-02': 1.5, '2020-01-03': 2.25, '2020-01-04': 2.667, '2020-01-05': 4.0}

Note: the first two observations get relative weight of 1, then the last three get relative weight of 2

5. NumPy weighted mean

x_np = {'2020-01-01': [1., 1., 1.], '2020-01-02': [2., 2., 2.], '2020-01-03': [3., 3., 3.]}
np_weights = {'2020-01-01': [1., 1., 1.], '2020-01-02': [2., 1., 2.], '2020-01-03': [3., 1., 3.]}
mean(x_np, 3, 2)
{'2020-01-02': [1.667, 1.5, 1.667], '2020-01-03': [2.667, 2.0, 2.6667]}

Order Statistics

Maximum

max(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
    • By default, the min_window is equal to the interval
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling maximums over the interval.

Examples: max

See min

Minimum

min(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling minimums over the interval.

Examples: max and min

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
min(x, interval=3, min_window=2)
{'2020-01-02': 1, '2020-01-03': 1, '2020-01-04': 2, '2020-01-05': 3}

2. Including NaN

max(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': nan}

3. NumPy example

x_np= {'2020-01-01': [2,3], '2020-01-02': [6,1], '2020-01-03': [1,9]}
min(x, interval=timedelta(days=3), min_window=timedelta(days=1))
{'2020-01-02': [2,1], '2020-01-03': [1,1]}

Median

median(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling medians over the interval. Uses midpoint interpolation if there are an even number of samples.

Examples: median

See quantile

Quantile

quantile(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    quant: Union[float, List[float]] = None,
    min_window: Union[timedelta, int] = None,
    interpolate: str = "linear",
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
    ) → Union[ts[Union[float, np.ndarray]], [ts[Union[float, np.ndarray]]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • quant: the quantile to calculate, which must be between 0 and 1
    • If provided a list, then all quantiles will be calculated for the list.
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks.
  • interpolate: the interpolation method to use when the quantile does not correspond to an individual value. Must be one of the following options:
    • "linear": interpolates linearly between the two closest values. For example, the 0.333 quantile of (1,2) with linear interpolation is 1.333.
    • "lower": returns the lower of the two closest values.
    • "higher": returns the higher of the two closest values.
    • "midpoint": returns the midpoint between the two closest values. For example, the 0.333 quantile of (1,2) with midpoint interpolation is 1.5.
    • "nearest": returns the value at the nearest position.  For example, the 0.333 quantile of (1,2) with nearest interpolation is 1. In cases of ties, the higher value is returned.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series or list-basket of time-series of rolling quantiles over the interval.
    • If the quant parameter is a list then a list-basket will be returned.
    • If it is a float then a time-series will be returned.
    • The order of quantiles in the list-basket is equal to the order of the input.

Examples: median and quantile

Starttime: 2020-01-01 00:00:00

1. Median

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
median(x, interval=3, min_window=2)
{'2020-01-02': 1.5, '2020-01-03': 2, '2020-01-04': 2.5, '2020-01-05': 4}

2. Quantile with multiple values

quantile(x, interval=3, quant=[0.25, 0.5, 0.75], min_window=2, ignore_na=False)
[
    {'2020-01-02': 1.25, '2020-01-03': 1.5, '2020-01-04': nan, '2020-01-05': nan},
    {'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': nan, '2020-01-05': nan},
    {'2020-01-02': 1.75, '2020-01-03': 2.5, '2020-01-04': nan, '2020-01-05': nan}
]

3. Quantile with trigger

trigger = {'2020-01-03': True, '2020-01-05': True}
quantile(x, interval=timedelta(days=3), quant=0.333, min_window=timedelta(days=2), interpolate="midpoint", ignore_na=True, trigger=trigger)
{'2020-01-03': 1.5, '2020-01-05': 4}

4. NumPy array with multiple quantiles

x_np = {'2020-01-01': [1,2,3], '2020-01-02': [2,3,4], '2020-01-03': [3,4,5]}
quantile(x_np, interval=3, quant=[0.25,0.5,0.75], min_window=1)
# this is a listbasket of NumPy array time series
[
    {'2020-01-01': [1,2,3], '2020-01-02': [1.25, 2.25, 3.25], '2020-01-03': [1.5, 2.5, 3.5]},
    {'2020-01-01': [1,2,3], '2020-01-02': [1.5, 2.5, 3.5], '2020-01-03': [2., 3., 4.]},
    {'2020-01-01': [1,2,3], '2020-01-02': [1.75, 2.75, 3.75], '2020-01-03': [2.5, 3.5, 4.5]}
]

Argmin

argmin(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    return_most_recent: bool = True,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[datetime, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • return_most_recent: if True, in the case of a tie, the most recent time will be returned. If false, the least recent time will be returned.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid.

Returns:

  • a time-series of rolling argmin values over the interval, returned as a datetime or NumPy array of np.datetime64 objects. If no data is present or NaN invalidation occurs, the default time '1970-1-1 00:00:00' is returned.

Examples: argmin

See argmax

Argmax

argmax(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    return_most_recent: bool = True,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[datetime, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • return_most_recent: if True, in the case of a tie, the most recent time will be returned. If false, the least recent time will be returned.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling argmax values over the interval, returned as a datetime or NumPy array of np.datetime64 objects.  If no data is present or NaN invalidation occurs, the default time '1970-1-1 00:00:00' is returned.

Examples: argmax and argmin`

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 1, '2020-01-04': nan, '2020-01-05': 4}
argmax(x, 3)
{'2020-01-03': '2020-01-02', '2020-01-04': '2020-01-02', '2020-01-05': '2020-01-05'}
argmin(x, 3)
{'2020-01-03': '2020-01-03', '2020-01-04': '2020-01-03', '2020-01-05': '2020-01-03'}

2. NumPy example

x_np = {'2020-01-01': [1,2], '2020-01-02': [2,1], '2020-01-03': [3,0]}
argmax(x_np, 3, 2)
{'2020-01-02': ['2020-01-02', '2020-01-01'], '2020-01-03': ['2020-01-03', '2020-01-01']}
argmin(x_np, 3, 1)
{'2020-01-02': ['2020-01-01', '2020-01-02'], '2020-01-03': ['2020-01-01', '2020-01-03']}

3. return_most_recent=False

argmin(x, 3, return_most_recent=False)
{'2020-01-03': '2020-01-01', '2020-01-04': '2020-01-03', 2020-01-05: '2020-01-03'} # Note how the first element is '2020-01-01', not '2020-01-03'

Rank

rank(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    method: str = "min",
    ignore_na: bool = True,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    min_data_points: int = 0,
    na_option: str = "keep"
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use 100-tick rolling interval with no output until we have 50 ticks
  • method:  the method to use to rank groups of records that have the same value
    • "min": the lowest rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=1
    • "max": the highest rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=3
    • "avg": the average rank in the group is returned i.e. if the window data is [1,2,2,3] and the last tick is 2, then rank=2
    • By default, the "min" method is used.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, nan is returned.
  • na_option: how to rank a nan value when it is the last value to be ranked
    • "keep": return a nan rank for a nan value
    • "last": rank the last non-nan value present in the interval
    • By default, the "keep" option is used.
  • Output: a time-series of rolling ranks over the interval, where a rank of 0 means that the current (last) ticked value is the smallest in the given interval.

Examples: rank

Starttime: 2020-01-01 00:00:00

1. Default behavior

x = {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 2, '2020-01-04': 5, '2020-01-05': 4}
rank(x, 5, min_window=3)
{'2020-01-03': 1, '2020-01-04': 3, '2020-01-05': 3}

2. NumPy example

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,2], '2020-01-03': [2,1]}
rank(x_np, 3, 2)
# Note how the second element at '2020-01-02' is 0, not 1, as by default the "min" method is used
{'2020-01-02': [1, 0], '2020-01-03': [1, 0]}

3. "keep" vs "last" NaN option

x = {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 2, '2020-01-04': nan, '2020-01-05': 4}
rank(x, 5, min_window=3, na_option="keep")
{'2020-01-03': 1, '2020-01-04': nan, '2020-01-05': 3}
rank(x, 5, min_window=3, na_option="last")
# the last valid value, 1, is ranked at '2020-01-04'
{'2020-01-03': 1, '2020-01-04': 1, '2020-01-05': 3}

Moment-Based Statistics

Variance

var(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom. Example: if ddof=1, then normalization term is 1/(N-1). If ddof=0, then 1/N.
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted variance (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling variance over the interval. If insufficient samples for given ddof, then no value output is generated. Since the smart mean is being used, overflow is not a problem.

Examples: var

See Standard Error.

Standard Deviation

stddev(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted standard deviation (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling standard deviations over the interval. If insufficient samples for given ddof, then no value output is generated.

Examples: stddev

See Standard Error.

Standard Error

sem(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another optional time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned.
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted standard error (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling standard errors

Examples: Variance, Standard Deviation, Standard Error

Starttime: 2020-01-01 00:00:00

1. Variance

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
var(x, interval=3, min_window=2)
{'2020-01-02': 0.5, '2020-01-03': 1.0, '2020-01-04': 0.5, '2020-01-05': 2.0}

2. Biased variance

var(x, interval=3, min_window=2, ddof=0, ignore_na=True) # biased
{'2020-01-02': 0.25, '2020-01-03': 0.666, '2020-01-04': 0.25, '2020-01-05': 1.0}

3. Standard deviation including NaNs

stddev(x, interval=3, min_window=2, ignore_na=False)
{'2020-01-02': 0.707, '2020-01-03': 1.0, '2020-01-04': nan, '2020-01-05': nan}

4. Standard error with triggering

trigger = {'2020-01-03': True, '2020-01-05': True}
sem(x, interval=timedelta(days=3), min_window=timedelta(days=2), trigger=trigger)
{'2020-01-03': 0.707, '2020-01-05': 1.0}

Covariance

cov(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: time-series data. If x is of type np.ndarray, then the covariance calculation is performed element-wise with the corresponding values in y.
  • y: time-series data that ticks in sequence with x
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted covariance (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling covariances between x and y

Examples: cov

See Correlation.

Correlation

corr(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: time-series data. If x is of type np.ndarray, then the correlation calculation is performed element-wise with the corresponding values in y.
  • y: time-series data that ticks in sequence with x
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted correlation (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling Pearson correlation coefficients between x and y

Examples: Covariance and Correlation

Starttime: 2020-01-01 00:00:00

1. Covariance

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': 4, '2020-01-05': 5}
y = {'2020-01-01': 5, '2020-01-02': 4, '2020-01-03': 3, '2020-01-04': 2, '2020-01-05': 1}
cov(x, y, interval=3, min_window=2)
{'2020-01-02': -0.5, '2020-01-03': -1.0, '2020-01-04': -1.0, '2020-01-05': -1.0}

2. Correlation

corr(x, y, interval=3)
{'2020-01-03': -1.0, '2020-01-04': -1.0, '2020-01-05': -1.0}

Skewness

skew(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    bias: bool = False,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • bias: if True, calculates a biased (unadjusted) skew. If false (default), calculates a Gaussian-unbiased measure.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted skew (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling sample skew measures, using the adjusted Fisher–Pearson standardized moment coefficient.

Examples: skew

See Kurtosis.

Kurtosis

kurt(
    x: ts[Union[float, np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    excess: bool = True,
    biasbool = False,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
): → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • excess: if True (default) uses the definition of excess kurtosis (kurt - 3). If false, uses the standard definition.
  • bias: if True, calculates a biased (unadjusted) kurtosis. If false (default), calculates a Gaussian-unbiased measure.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted kurtosis (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the window statistic, and in doing so clears any accumulated floating-point error
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of rolling sample kurtosis measures, using the adjusted Fisher–Pearson standardized moment coefficient.

Examples: skew and kurt

Starttime: 2020-01-01 00:00:00

1. Skew

x = {'2020-01-01': 1, '2020-01-02': 2, ..., 2020-01-10: 10}
skew(x, interval=7)
{2020-01-07: 0, 2020-01-08: 0, 2020-01-09: 0, 2020-01-10: 0}

2. Kurtosis

kurt(x, interval=7) # excess kurtosis
{2020-01-07: -1.2, 2020-01-08: -1.2, 2020-01-09: -1.2, 2020-01-10: -1.2}

Exponential Moving Statistics

Exponential Moving Average

ema(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[timedelta] = None,
    adjust: bool = True,
    horizon: int = None,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data

  • min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.

  • alpha: the EMA weight parameter specified directly. If adjust = True, EMA is calculated such that

    $$EMA(t) = \frac{\sum\limits_{t=-n}^{0} (1-\alpha)^{-t} x(-t)}{\sum\limits_{t=-n}^{0} (1-\alpha)^{-t}}$$

    If adjust = False, EMA is calculated such that

    $$EMA(t) = (1-\alpha)EMA(t-1) + \alpha x(t)$$ $$EMA(t=0) = x(0)$$

    By default, adjust = True, to give better estimates for starting intervals.

    The following are alternative methods to specify the $\alpha$ parameter.

    • span: specify alpha in terms of span, such that

      $$\alpha = \frac{2}{span+1}$$

    • com: specify alpha in terms of centre of mass, such that

      $$\alpha = \frac{1}{1+com}$$

    • halflife: Halflife is different from the other parameters. Half-life is a timedelta argument that specifies the half-life of observation weights. Half-life is useful when observations are irregularly spaced and a better estimate is needed to properly weight more recent data. Let $t_{-1}$ be the time of the last observation.

      Then:

      $$\lambda(t)  = 1 - \exp(\frac{-(t-t_{-1})*\ln(2)}{halflife})$$ $$EMA(t) = \frac{ \lambda(t)*EMA(t-1) + x(t)}{\text{normalization constant}}$$

      Something to note is that the ignore_na flag does not matter if a halflife interval is specified. The behavior would be the same in both cases, since an absolute time interval is being used to re-weight the moving average, not a tick interval.

      Exactly one of alpha, span, com, halflife must be given

  • adjust: if True, early observations are adjusted to give a more "smoothed" estimate of the EMA. The difference is that if adjust=True, then each new observation receives a relative weight of 1. If adjust = False, each new observation receives a relative weight of alpha.

    • adjust=True means that:

    $$EMA(t) = \frac{x(t)+(1-\alpha)x(t-1)+(1-\alpha)^2 x(t-2) + ... + (1-\alpha)^n x(t-n)}{1+(1-\alpha)+(1-\alpha)^ 2 + ... + (1-\alpha)^n}$$

    • adjust=False means that:

    $$EMA(t) = \frac{\alpha * x(t) + \alpha * (1-\alpha) * x(t-1) + \alpha * (1-\alpha)^2 * x(t-2) + ... + \boldsymbol{(1-\alpha)^n x(0)}}{\alpha+\alpha*(1-\alpha)+\alpha*(1-\alpha)^ 2 + ... + (1-\alpha)^n}$$

    $$\text{and thus } EMA(t=0) = x(0)$$

    Adjust only applies with tick specified intervals, not time specified intervals. Time specified intervals (i.e. half-life) do not need adjustment as they are, by definition, already adjusted.

  • horizon: the maximum number of ticks to use in the computation. For example, if horizon = 10, then only the 10 most recent data points are used. If not specified, all data points for x are used, with early ticks decaying exponentially in weighting. Horizon will be ignored with a half-life (time-based) interval.

    • If horizon is set to h, then even if x has more than h ticks the EMA will computed as such if adjust=True.

    $$EMA(t) = \frac{\sum_{t=-h}^{0} (1-\alpha)^{-t} x(t)}{\sum_{t=-h}^{0} (1-\alpha)^{-t}}$$

    • The only difference if adjust=False is that the first ever tick, while in the window, receives weight 1 at the start instead of weight  $\alpha$ like the rest of the values.
  • ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position, and renormalized as such.

    • For example, let us consider a dataset (1,nan,2) using adjust=True.
      • If ignore_na=True then the weighting is based on relative position as such: $$EMA(t=2) = \frac{(1-\alpha)*1 + 2}{(1-\alpha)+1}$$
      • If ignore_na=False then the weighting is based on global position as such: $$EMA(t=2) = \frac{(1-\alpha)^2*1 + 2}{(1-\alpha)^2+1}$$
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned

    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:

    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation.

  • recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.

    • Note: only valid when a finite-horizon EMA is used.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of exponentially-weighted moving averages over the interval.

Examples: ema

Starttime: 2020-01-01 00:00:00

1. Unadjusted EMA

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': 4, '2020-01-05': 5}
ema(x, alpha=0.1, adjust=False) # unadjusted
{'2020-01-01': 1.0, '2020-01-02': 1.1, '2020-01-03': 1.29, '2020-01-04': 1.561, '2020-01-05': 1.9049}

2. Adjusted EMA

ema(x, alpha=0.1, adjust=True)  # adjusted, default method
{'2020-01-01': 1.0, '2020-01-02': 1.5263, '2020-01-03': 2.0701, '2020-01-04': 2.6313, '2020-01-05': 3.20971}

3. Finite horizon EMA

ema(x, alpha=0.1, adjust=True, horizon=2) # finite horizon
{'2020-01-01': 1.0, '2020-01-02': 1.5263, '2020-01-03': 2.5263, '2020-01-04': 3.5263, '2020-01-05': 4.5263}

4. Time-based decay EMA

ema(x, halflife=timedelta(days=1)) # time-based
{'2020-01-01': 1.0, '2020-01-02': 1.6666, '2020-01-03': 2.4286, '2020-01-04': 3.2666, '2020-01-05': 4.1613}

5. Unadjusted EMA for NumPy array

x_np = {'2020-01-01': [1,2], '2020-01-02': [4,5], '2020-01-03': [7,8]}
ema(x_np, alpha=0.1, adjust=False)
{'2020-01-01': [1,2], '2020-01-02': [1.3,2.3], '2020-01-03': [1.87,2.87] }

Exponential Moving Variance

ema_var(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
  • alpha, span, com, halflife: as described in EMA
  • adjust: as specified in EMA
  • horizon: as specified in EMA.
  • bias: if True, uses a biased population weighted variance. If false, normalized by a proper debiasing factor.
  • ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
    • Note: only valid when a finite-horizon EMA is used.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of exponentially-weighted moving variances over the interval.

Examples: ema_var

See Exponential Moving Standard Deviation

Exponential Moving Standard Deviation

ema_std(
    x: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0,
) → ts[Union[float, np.ndarray]]

Args:

  • x: the time-series data
  • min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
  • alpha, span, com, halflife: as described in EMA
  • adjust: as specified in EMA
  • horizon: as specified in EMA.
  • bias: if True, uses a biased population weighted variance. If false, normalized by debiasing factor
  • ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
    • Note: only valid when a finite-horizon EMA is used.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of exponentially-weighted moving standard deviations over the interval.

Examples: Exp. Moving Variance and Standard Deviation

Starttime: 2020-01-01 00:00:00

1. Exp. Moving Standard Deviation

x = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3, '2020-01-04': nan, '2020-01-05': 5}
ema_std(x, min_periods=2, span=20, adjust=False, bias=False, ignore_na=False)
{'2020-01-02': 0.707, '2020-01-03': 1.11636, '2020-01-04': 1.11636, '2020-01-05': 1.937005}

2. Exp. Moving Variance

ema_var(x, min_periods=2, span=20, adjust=False, bias=True, ignore_na=False)
{'2020-01-02': 0.086168, '2020-01-03': 0.390588 '2020-01-04': 0.390588, '2020-01-05': 1.644124}

Exponential Moving Covariance

ema_cov(
    x: ts[Union[float, np.ndarray]],
    y: ts[Union[float, np.ndarray]],
    min_periods: int = 1,
    alpha: Optional[float] = None,
    span: Optional[float] = None,
    com: Optional[float] = None,
    halflife: Optional[Union[float, timedelta]] = None,
    adjust: bool = True,
    horizon: int = None,
    bias: bool = False,
    ignore_na: bool = False,
    trigger: ts[object] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[Union[float, np.ndarray]]

Args:

  • x: time-series data. If x is of type np.ndarray, the exponential-moving covariance is calculated element-wise with the corresponding values in y.
  • y: time-series data which ticks in-sequence with x
  • min_periods: the minimum allowable number of ticks to use before outputting data. The default is 1 for any EMA function.
  • alpha, span, com, halflife: as described in EMA
  • adjust: as specified in EMA
  • horizon: as specified in EMA.
  • bias: if True, uses a biased population weighted covariance. If false, normalized by debiasing factor
  • ignore_na: if True, nan values will be "ignored" meaning weights will be placed on relative position. If False (default), weights are based on global position.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the EMA, and in doing so clears any accumulated floating-point error.
    • Note: only valid when a finite-horizon EMA is used.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of exponentially-weighted moving covariance over the interval.

NumPy Specific Statistics

Covariance Matrix

cov_matrix(
    x: ts[np.ndarray],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ddof: int = 1,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[np.ndarray]

Args:

  • x: the time-series of dimension (N,) arrays which represent N variables
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ddof: delta degrees of freedom
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted covariance matrix (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the statistic, and in doing so clears any accumulated floating-point error.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of (potentially weighted) covariance matrices, each of which is a NumpyNDArray of dimensionality (N,N)

Correlation Matrix

corr_matrix(
    x: ts[np.ndarray],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    ignore_na: bool = True,
    trigger: ts[object] = None,
    weights: ts[Union[float, np.ndarray]] = None,
    sampler: ts[object] = None,
    reset: ts[object] = None,
    recalc: ts[object] = None,
    min_data_points: int = 0
) → ts[np.ndarray]

Args:

  • x: the time-series of dimension (N,) arrays which represent N variables
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • ignore_na: if True, does not include any nan values in the window. If false, nan values in the window will make the entire window value nan.
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • weights: a time-series of weights for each observation in x, used to calculate a weighted correlation matrix (optional). Weights do not need to be normalized.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation
  • recalc: another optional time-series which triggers a clean recalculation of the statistic, and in doing so clears any accumulated floating-point error.
  • min_data_points: the minimum number of valid (non-nan) data points that must exist in the interval for a calculation to be valid. If there are fewer than min_data_point, NaN is returned.

Returns:

  • a time-series of (potentially weighted) correlation matrices, each of which is a NumpyNDArray of dimensionality (N,N)

Examples: Covariance and Correlation Matrices

Starttime: 2020-01-01 00:00:00

1. Covariance

x = {'2020-01-01': np.array([0., 0., 0.]), '2020-01-02': np.array([1., -1., 2.]), '2020-01-03': np.array([2., -2., 4.])}
cov_matrix(x, 3, ddof=0)
{'2020-01-03': np.array([1, -1, 2],
                     [-1, 1, -2],
                      [2, -2, 4])}

2. Correlation

corr_matrix(x, 3)
{'2020-01-03': np.array([1, -1, 1],
                     [-1, 1, -1],
                      [1, -1, 1])}

NumPy Conversions

list_to_numpy(x: [ts[float]], fillna: bool = False) → ts[np.ndarray]

Args:

  • x: a listbasket of time series
  • fillna: If False, unticked elements are treated as NaN. If True, unticked elements will hold their previous value in the array.

Returns:

  • a NumPy 1D array where each value corresponds to the element of the listbasket with the same index
numpy_to_list(x: ts[np.ndarray], n: int) → [ts[float]]

Args:

  • x: a NumPy array valued time series
  • n: the number of output channels in the listbasket Returns:
  • a listbasket where each value corresponds to the element of the array with the same index

Examples: NumPy Conversions

Starttime: 2020-01-01 00:00:00

1. List to NumPy

x1 = {'2020-01-01': 1, '2020-01-02': 2, '2020-01-03': 3}
x2 = {'2020-01-01': 1.5, '2020-01-03': 3.5}
list_to_numpy([x1,x2], fillna=False)
{'2020-01-01': [1, 1.5], '2020-01-02': [2, np.nan], '2020-01-03': [3, 3.5]} # no x2 tick on day 2
list_to_numpy([x1,x2], fillna=True)
{'2020-01-01': [1, 1.5], '2020-01-02': [2, 1.5], '2020-01-03': [3, 3.5]} # holds x2 value for day 2

2. NumPy to list

x_np = {'2020-01-01': [1,2], '2020-01-02': [3,4], '2020-01-03': [5,6]}
numpy_to_list(x_np, 2)
[
    {'2020-01-01': 1, '2020-01-02': 3, '2020-01-03': 5},
    {'2020-01-01': 2, '2020-01-02': 4, '2020-01-03': 6}
]

Cross-Sectional Statistics

Cross Sectional

cross_sectional(
    x: ts[Union[float,np.ndarray]],
    interval: Union[timedelta, int] = None,
    min_window: Union[timedelta, int] = None,
    trigger: ts[object] = None,
    as_numpy: bool = False,
    sampler: ts[object] = None,
    reset: ts[object] = None
) → ts[Union[np.ndarray, List[float], List[np.ndarray]]]

Args:

  • x: the time-series data
  • interval: the rolling interval over which to use data. If unspecified or set to None, an expanding (unbounded) window will be used.
    • if an int, represents the number of ticks to use
    • if a timedelta, represents the time interval to keep data (non-inclusive at left endpoint)
  • min_window: the minimum allowable interval to use before outputting data
    • If the interval is a timedelta then this must also be a timedelta. Example: interval=60s, min_window=30s means to use a 60s rolling interval with no output for the first 30s.
    • If the interval is a tick count then this must also be a tick count. Example: interval=100, min_window=50 means to use a 100-tick rolling interval with no output until we have 50 ticks
  • as_numpy: if True, the data will be returned as a NumPy array instead of a list.
    • For a single-valued time series, this is a one-dimensional NumPy array
    • For a NumPy array time series, this is a NumPy array of one extra dimension
  • trigger: another time-series which can be used to externally trigger computations. Whenever the trigger ticks, the given statistic will be updated and returned
    • By default, the trigger is the series itself.
  • sampler: another optional time-series which specifies when x should tick. The behavior is as follows:
    • If x ticks *and *sampler ticks, then the x tick is considered valid and is used.
    • If x ticks but sampler does not tick, then the x tick is considered invalid and is ignored.
    • If x does not tick but sampler ticks, then the x tick is considered NaN and is handled based on the ignore_na flag.
    • By default, the sampler is the series itself.
  • reset: another optional time-series which, when ticked, will clear all data in the interval and "reset" the calculation

Returns:

  • a time-series where each tick contains all the data of x currently within the interval. Use this for custom cross-sectional calculations

Examples: Cross-sectional calculations

Starttime: 2020-01-01 00:00:00

x = {'2020-01-01': 1, '2020-01-01': 2, '2020-01-01': 3, '2020-01-01': 4, '2020-01-01': 5}
cs = cross_sectional(x, interval=3, min_window=2)
cs
{'2020-01-02': [1,2], '2020-01-03': [1,2,3], '2020-01-04': [2,3,4], '2020-01-05': [3,4,5]}

Calculate a cross-sectional mean

cs_mean = csp.apply(cs, lambda v: sum(v)/len(v), float)
cs_mean
{'2020-01-02': 1.5, '2020-01-03': 2.0, '2020-01-04': 3.0, '2020-01-05': 4.0}

Get the results as a NumPy array

cs = cross_sectional(x, interval=3, min_window=2, as_numpy=True)
cs
{'2020-01-02': np.array([1,2]), '2020-01-03': np.array([1,2,3]), '2020-01-04': np.array([2,3,4]), '2020-01-05': np.array([3,4,5])}
Clone this wiki locally