Skip to content

The sMAP Archiver Interface

liberpic edited this page Sep 13, 2014 · 3 revisions

#The sMAP Archiver Interface

The sMAP archiver is a streaming storage manager which adds tools for storing time-series data from sMAP sources, and accessing both historical and real-time data. It may be used as an interface for developing applications which access sMAP data, or retrieve data for offline analysis.

The archiver API is available over HTTP. To provide easier access, the smap.archiver.client class provides python bindings for this API. Using this client interface may be faster than using the HTTP API directly, because it attempts to conduct multiple parallel downloads using HTTP 1.1 sessions, if pycurl bindings are available.

###ArchiverAPI

The archiver exposes a RESTful interface for retrieving data and metadata. This interface is presented under the /api/ resource; you can use one of our public archivers to experiment with this interface:


$ curl http://ar1.openbms.org:8079/
{"Contents": ["add", "api", "republish"]}

Each of these corresponds to a resource you can access, as explained below. If you are running your own archiver, you can use its URL instead.

####add, api, and republish

The level resources are add, api, and republish. If you have an API key, you may post valid objects to “http://ar1.openbms.org:8079/add/[key]”. These objects are ordinarily generated by the sMAP source, and are described in manual-format. The archiver daemon will return an HTTP 200 OK code if the data and tags were successfully entered into the databases.

republish allows clients to access incoming real-time data being sent to the archiver.

Finally, the api resource is used to access stored data, and has several sub-resources:


$ curl ar1.openbms.org:8079/api
{"Contents": ["streams", "query", "data", "next", "prev", "tags"]}

####query

The query resource is used to discover what tags are present. Called with no arguments, it returns a list of distinct tags which are known about. If one of those tag values is appended to the path, it will return a list of distinct tag values. Any number of tags may be specified in this manor.

The only special tag is the uuid tag; it will return the uuid of streams matching the query. Consider:


$ curl http://ar1.openbms.org:8079/api/query
["Description", ..., "Metadata/SourceName", "Path", "Properties/ReadingType", "Properties/Timezone", "Properties/UnitofMeasure"]

There are a number of tags which are required, such as the Properties ones. We can now add a restriction on the unit of measure:


$ curl http://ar1.openbms.org:8079/api/query/Properties__UnitofMeasure
["$", "A", "C", "deg", "Hz", "HZ", "kVA", "kVAh", "kVAR", "kVARh", "kVARh-", "kVARh+", "kVAR net", "kW", "kWh", "Lbs", "Lbs/hr", "mm", "m/s", "mVA", "mW", "mWh", "Pa", "pct", "pf", "PF", "rh", "second", "V"]

Tag names and values should be properly urlencoded, with slash (‘/’) characters replaced with ‘__’. You can continue to add tag names and restrictions to further filter the set of matched streams.

More sophisticated queries can be constructed using the ArdQuery language.

####tags

Once a query is constructed you may return the tag set for all matching streams using the tags resource. To avoid generating a huge result set, it’s desirable to check the number of matching streams before trying this one. Alternatively, you can request the tags for a particular stream by specifying a uuid; that’s guaranteed to match only one stream:


$ curl http://ar1.openbms.org:8079/api/tags/uuid/87c395ee-5ee3-5713-8928-c29e32937877 | jprint
[
  {
    "Metadata": {
      "Extra": {
        "DentElement": "elt-B",
        "Driver": "smap.drivers.dent.Dent18",
        "MeterName": "5DPB",
        "Panel": "5DPB",
        "Phase": "ABC",
        "ServiceArea": "BLDG.",
        "ServiceDetail": "East Passenger Elevator",
        "System": "elevator",
        "SystemDetail": "ELEVATOR",
        "SystemType": "Electrical"
      },
      "Instrument": {
        "Manufacturer": "Dent Industries",
        "Model": "PowerScout 18",
        "SamplingPeriod": "20"
      },
      "Location": {
        "Building": "Cory Hall",
        "Campus": "UCB",
        "Floor": "4"
      },
      "SourceName": "Cory Hall Dent Meters"
    },
    "Path": "/5DPB/elt-B/ABC/true_power",
    "Properties": {
      "ReadingType": "double",
      "Timezone": "America/Los_Angeles",
      "UnitofMeasure": "kW"
    },
    "uuid": "87c395ee-5ee3-5713-8928-c29e32937877"
  } 
]

The result of a list of Timeseries objects with everything but data.

####data, next, prev

These are used to retrieve data in the time series. Like the tags resource, it returns a list of partial Timeseries objects, although these contain only Readings. They accept several query params:

arg name value
starttime timestamp of first reading (inclusive) in unix milliseconds
endtime timestamp to end the query at (only for data)
limit maximum number of points to retrieve. “-1” is unlimited
streamtime maximum number of streams to query (default 10)

data returns data within a range. prev and next retrieve up to limit points behind or head of the start time reference. These can be used to determine the next point after a known reference time without generating a large result set, or to efficiently locate the latest data.

Again, these have the potential to generate large result sets which are slow to generate so it is recommended that you are careful to test carefully and use limit statements to avoid overwhelming yourself. By default you can only look up data from 10 streams; you may need to increase streamlimit if you are querying a number of streams in parallel.

For instance, you can use this to find the latest readings from all the ACme meters in room 465 at Berkeley:


$ curl 'http://new.openbms.org/backend/api/prev/Metadata__Instrument__Manufacturer/UC%20Berkeley/Metadata__Location__Room/465/Properties__UnitofMeasure/mW?starttime=1315272705000'
[{"uuid": "6fdde16d-d59a-5f38-84ad-6b04b26e0029", "Readings": [[1315272654000, 217.0]]}, {"uuid": "5ff3f108-eb71-531a-872c-e6e4c1aaa31f", "Readings": [[1315272695000, 0.0]]}, {"uuid": "87d1d01c-1358-5af2-b005-036e69c88832", "Readings": [[1315272701000, 3503.0]]}, {"uuid": "d02891b0-6cf0-5d69-a3da-418646c9b779", "Readings": [[1315272701000, 87.0]]}, {"uuid": "5f5ca043-6f34-5bbf-9a6e-6a0eff85f5ad", "Readings": [[1315272699000, 8179.0]]}, {"uuid": "250ba823-0d0c-5b75-906f-ac2f68288352", "Readings": [[1315272700000, 15378.0]]}, {"uuid": "df4c180d-3c78-568f-8ff2-5026c9f42d5d", "Readings": [[1315272692000, 0.0]]}]

###Query Language

To express more complicated queries, you can use a simple query language. The exact syntax is still in flux but the parts documented here should remain fairly constant. The core problem this language solves is that if we had a relational model for the set of tags, we could just query it using SQL where the tag names are columns, and tag values are rows. Since we don’t know all the column names ahead of time (you can tag your data with whatever you’d like), it’s tedious to construct queries on tags. This query language rewrites queries into SQL; this lets you pretend tag names are columns.

The language supports select, delete, and set operations; there is no need to refer to particular table since there is only one flat datastore. The select operation may be performed by anyone, and by default queries all public streams; the mutation operations delete and set will only operate on streams where the request includes an API key.

####Using the query language

You can execute queries by putting them in the body of a POST request to http://ar1.openbms.org:8079/api/query. If you have sMAP installed, there is an interactive tool, smap-query which you can use to do this.

If you have received an API key, you may include it in your request using the “key” query param; multiple keys may be specified by repeating the param. For instance, the query string ?key=[key]&key=[k2] will pass along those two keys. This will (a) allow you to query those streams, if they are marked as private, and (b) allow you to mutate them using the set and delete operators.

####select

syntax: select selector where where-clause

The result of a distinct query is a JSON list of all matching strings, while the result of a tag name query is a list of sMAP Timeseries objects populated only with the requested fields.

####data selector

You can access stored data from multiple streams by specifying a data specification:

select data in (start reference, end reference) limit where where-clause

select data before reference limit where where-clause

select data after reference limit where where-clause

A limit is optional, and can have the form limit number, streamlimit number, or limit number streamlimit number. Limit controls the number of points returned per stream, and streamlimit controls the number of time series returned. If a limit is not specified, specifications using before or after will return one point per stream.

You can select the time region queried using a range query, or a query relative to a reference time stamp. In all these cases, the reference times must either be a timestamp in units of UNIX milliseconds, the string literal now, or a quoted time string. Valid time strings match a time format of either %m/%d/%Y, %m/%d/%Y %M:%H, or %Y-%m-%dT%H:%M:%S. For instance “10/16/1985” and “2/29/2012 20:00” are valid. These strings are interpreted relative to the timezone of the server.

The reference may be modified by appending a relative time string, using unix “at”-style specifications. You can for instance say now + 1hour or now -1h -5m for the last 1:05. Available relative time quantities are days, hours, minutes, and seconds.

####Examples

Get all tags in the system:

query> select distinct

Get entire tag database:

query> select *

Get all buildings in use:

query> select distinct Metadata/Location/Building

Get all buildings and cities:

query> select Metadata/Location/Building, Metadata/Location/City

Get the latest readings from two streams:

select data before now where uuid = 'd26f4650-329a-5e14-8e5a-73e820dff9f0' or uuid = '87c395ee-5ee3-5713-8928-c29e32937877'

Retrieve a week’s worth of data for matching streams:

select data in ("1/1/2012", "1/7/2012") streamlimit 50 where Metadata/SourceName ~ "^410"

Retrieve the last five minutes of outside air data:

select data in (now -5minutes, now) where Metadata/Extra/Type = 'oat'

####Where Clause

You can filter your result set using several operators.

operator description
= compare tag values; tagname = “tagval“
like string matching with SQL LIKE; tagname likepattern
~ regular expression matching; tagname ~ “pattern
has assert the stream has a tag; has tagname
and logical and of two queries
or logical or of two queries
not invert a match

These statements can be grouped with parenthesis. Tag values should be specified as quoted strings, while tag names should not be quoted.

See the postgres manual for more information on regular expression syntax.

####Examples

Find all the sources using Dent meters:

query> select distinct Metadata/SourceName where Metadata/Instrument/Manufacturer like 'Dent%'

Find all paths tagged as refrigerators, in units of milliwatts:

query> select distinct Path where Metadata/Extra/ProductType = 'Refrigerator' and Properties/UnitofMeasure = 'mW'

####delete

Form 1: delete where where-clause

Form 2: delete tag-list where where-clause

Form 1 deletes a stream, including all tags and data from the repository; it cannot be recovered. It returns a list of deleted UUIDs.

Form 2 deletes a list of tag names; data and other tag names are unchanged.

The where-clause has the same syntax as for select statements; the tag-list is a comma-separated list of (unquoted) tag names.

####Examples

Delete a stream, where we know its identifier:


query> delete where uuid = '39ba89fe-29f9-5f61-82b6-f5c8a6d5d923'
[
  "39ba89fe-29f9-5f61-82b6-f5c8a6d5d923"
]

Remove a single tag from a stream:

query> delete Metadata/Instrument/Model where uuid = 'a8bec5d1-dced-5a05-a938-41f618a92ac0'

####set

syntax: set set-list where where-clause

The set command applies tags to a list of streams. The set-list is a comma separated list of new tag names and values. The where-clause has the same syntax as previously discussed. This command is limited to operating on streams owned by the API keys passed with the request.

####Examples

Change a timezone by UUID:


query> set Properties/Timezone = 'America/Los`Angeles' where uuid = '3f4d3767-74df-5882-9fcc-4ab530f0f1af'

Mark two feeds as full-building feeds:


query> set Metadata/Extra/ServiceRegion = 'building' where uuid = '960075e9-fb89-5527-9044-cd4239513478' or uuid = '814ed855-0174-5ca9-8f01-53c244d8996f'

###Real-time data access

The sMAP archiver also provides the ability to get near real-time data access to incoming data from sMAP sources, using the /republish resource. After requesting this resource, the server will continue writing incoming data to the client until the client closes the connection.

If the client requests the resource with a POST request and includes an ArchiverQuery Where Clause in the body, the server will only send time series which match that query. For instance, to access all real-time Outside Air Temperature feeds on our example server, you can use cURL:


$ curl -XPOST -d 'Metadata/Extra/Type = "oat"' http://new.openbms.org/backend/republish
{"/Bancroft/doe_ah-b1_ah-b2_rf-1/b_oat": {"uuid": "d64e8d73-f0e9-5927-bbeb-8d45ab927ca5", "Readings": [[1362946176000, 61.200000000000003]]}}

{"/versa_flame/oat": {"uuid": "5d8f73d5-0596-5932-b92e-b80f030a3bf7", "Readings": [[1362946168000, 61.300000000000004]]}}

Data consists of json sMAP objects separated by new lines; C-c to stop receiving the data.

###Manual data publication (JSON Edition)

The sMAP library ordinarily takes care of reliably sending data to the archiver backend; however it’s sometimes desirable to add data using some other source. You can do this using an HTTP POST with a properly formatted JSON object in the body. The sMAP specification contains the necessary details for doing this; here are some simple examples.

A simple example of a valid sMAP object is:


{
  "/sensor0" : {
    "Metadata" : {
      "SourceName" : "Test Source",
      "Location" : { "City" : "Berkeley" }
    },
    "Properties": {
      "Timezone": "America/Los_Angeles",
      "UnitofMeasure": "Watt",
      "ReadingType": "double"
    },
    "Readings" : [[1351043674000, 0], [1351043675000, 1]],
    "uuid" : "d24325e6-1d7d-11e2-ad69-a7c2fa8dba61"
  }
}

Supposing this was in data.json, you could send it to the archiver using cURL:


$ curl -XPOST -d @data.json -H "Content-Type: application/json" http://localhost:8079/add/

####Notes

  • /sensor0 is the resource path of the sensor on the sMAP server. You can make something sensible up if you’re not actually running a web server.
  • The Metadata/SourceName field is needed if you want your time series to show up in the powerdb2 plotter; other than that, all of Metadata is optional.
  • Valid ReadingTypes are double and long; the timezone determines the conversion to be used for display times.
  • Readings can consist of any number of (timestamp, value) arrays. The timestamps should be UTC milliseconds. Readings are currently truncated to 1-second resolution.
  • The uuid should be globally unique for each timeseries. Use an appropriate algorithm to generate them.
  • In order to add data, only the uuid and Readings fields are needed; you can only send the metadata fields occasionally (e.g., on startup) to reduce the amount of data sent.
  • Be sure to set the Content-Type: application/json HTTP header when implementing your own sMAP support.

####Examples

You can find an example of a valid set of readings.

###CSV Edition

The archiver (as of r421) also supports receiving data (but not metadata) using CSV. This can be a good choice if you have very simple devices. The CSV format is very simple; see an example.

To add data using this file, you can again use cURL:


$ curl -XPOST -d @report.csv -H "Content-Type: text/csv" http://localhost:8079/add/