Skip to content

Commit

Permalink
UPI Interface Update (#6)
Browse files Browse the repository at this point in the history
Motivated by the need for more flexible data passing mechanism and more
efficient interface definition for large message we introduces following
changes in this PR:

### Introduction of `Table` data structure


[Table](https://github.com/caraml-dev/universal-prediction-interface/blob/ac3775c5d81b461ce29d75e84aed70739091e801/proto/caraml/upi/v1/table.proto)
represents dataframe-like data structure in row-based format. Table has
following specifications:
- Table consists of one or more columns that potentially can have
different type.
- All values within a column must have same type. 
- A cell value can be null.
- A row within a table has `row_id` (similar purpose to `row_id` in the
PredictionRow)

### Replace `prediction_rows` in the request and
`prediction_result_rows` in the response to use`Table` definition

`prediction_rows` and `prediction_result_rows` are technically a
dataframe-like object which can be represented as a `Table`


Deserialization performance is improved significantly by at least 2x for
various message size.
NOTE: `0001_7467c7b` is the existing interface whereas `0002_ac3775c` is
with the changes.

```
--------------------------------------------------------------------------------------------------------- benchmark 'deserialize-request': 24 tests ----------------------------------------------------------------------------------------------------------
Name (time in us)                                                    Min                     Max                    Mean                 StdDev                  Median                    IQR            Outliers           OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_deserialize_proto_request[1-100] (0001_7467c7b)             41.9270 (16.13)      1,842.2130 (23.02)         52.1732 (17.22)        23.2068 (14.41)         45.3840 (15.89)         5.5505 (42.37)   1526;4056   19,166.9311 (0.06)      20015           1
test_deserialize_proto_request[1-100] (0002_ac3775c)             86.1700 (33.16)        377.9210 (4.72)          90.9080 (30.00)        11.1573 (6.93)          89.3640 (31.29)         3.1950 (24.39)     400;602   11,000.1373 (0.03)      10065           1
test_deserialize_proto_request[1-1] (0001_7467c7b)                2.5990 (1.0)          114.4450 (1.43)           3.0305 (1.0)           1.6895 (1.05)           2.8560 (1.0)           0.1310 (1.0)      332;6541  329,980.4986 (1.0)       49461           1
test_deserialize_proto_request[1-1] (0002_ac3775c)                4.6710 (1.80)          80.0310 (1.0)            5.1638 (1.70)          1.8626 (1.16)           5.0190 (1.76)          0.1610 (1.23)     564;1018  193,656.7518 (0.59)      42838           1
test_deserialize_proto_request[100-100] (0001_7467c7b)        4,181.3790 (>1000.0)    9,231.8540 (115.35)     5,351.1642 (>1000.0)     840.1283 (521.74)     5,403.3230 (>1000.0)   1,300.7730 (>1000.0)      60;4      186.8752 (0.00)        222           1
test_deserialize_proto_request[100-100] (0002_ac3775c)        3,081.5200 (>1000.0)    3,685.8110 (46.05)      3,180.9798 (>1000.0)      77.3511 (48.04)      3,162.0330 (>1000.0)      68.3450 (521.72)      35;16      314.3686 (0.00)        318           1
test_deserialize_proto_request[100-500] (0001_7467c7b)       22,122.4090 (>1000.0)   32,973.5540 (412.01)    26,359.4216 (>1000.0)   3,763.7983 (>1000.0)   24,061.2560 (>1000.0)   7,077.2085 (>1000.0)      10;0       37.9371 (0.00)         33           1
test_deserialize_proto_request[100-500] (0002_ac3775c)       15,255.4400 (>1000.0)   18,528.7330 (231.52)    16,450.6710 (>1000.0)     731.7893 (454.46)    16,373.3630 (>1000.0)     956.3830 (>1000.0)      16;2       60.7878 (0.00)         63           1
test_deserialize_proto_request[1000-100] (0001_7467c7b)      51,576.9360 (>1000.0)   84,187.4340 (>1000.0)   59,984.9779 (>1000.0)   8,331.1790 (>1000.0)   56,792.5575 (>1000.0)  12,483.7885 (>1000.0)       3;0       16.6708 (0.00)         20           1
test_deserialize_proto_request[1000-100] (0002_ac3775c)      33,100.3810 (>1000.0)   37,188.7500 (464.68)    35,079.6140 (>1000.0)   1,019.7075 (633.26)    35,332.1615 (>1000.0)   1,445.7335 (>1000.0)       9;0       28.5066 (0.00)         28           1
test_deserialize_proto_request[1000-500] (0001_7467c7b)     251,937.1810 (>1000.0)  327,753.4200 (>1000.0)  294,796.8826 (>1000.0)  36,910.2596 (>1000.0)  314,206.2790 (>1000.0)  67,611.3500 (>1000.0)       2;0        3.3922 (0.00)          5           1
test_deserialize_proto_request[1000-500] (0002_ac3775c)     162,492.4160 (>1000.0)  172,359.1360 (>1000.0)  166,427.1273 (>1000.0)   3,162.6573 (>1000.0)  166,304.3530 (>1000.0)   3,138.3735 (>1000.0)       2;1        6.0086 (0.00)          7           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
```

Serialization performance is also improved, although not as significant
as the deserialization perforamance.
```
---------------------------------------------------------------------------------------------------------- benchmark 'serialize-request': 24 tests ----------------------------------------------------------------------------------------------------------
Name (time in us)                                                  Min                     Max                    Mean                 StdDev                  Median                     IQR            Outliers           OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_serialize_proto_request[1-100] (0001_7467c7b)             73.3760 (21.72)        375.7880 (3.40)          78.3862 (21.26)        13.7893 (6.41)          75.6350 (21.40)          1.6200 (18.41)     527;662   12,757.3548 (0.05)      12398           1
test_serialize_proto_request[1-100] (0002_ac3775c)            126.0540 (37.32)     39,474.1100 (357.10)       215.1010 (58.35)       715.0832 (332.39)       138.9175 (39.31)        104.4490 (>1000.0)    66;193    4,648.9797 (0.02)       6382           1
test_serialize_proto_request[1-1] (0001_7467c7b)                3.3780 (1.0)          127.6300 (1.15)           3.6866 (1.0)           2.1514 (1.0)            3.5340 (1.0)            0.0880 (1.0)      385;1818  271,249.9283 (1.0)       50267           1
test_serialize_proto_request[1-1] (0002_ac3775c)                5.8470 (1.73)       1,560.9360 (14.12)         10.7064 (2.90)         13.6365 (6.34)          10.3590 (2.93)           4.9460 (56.20)    962;1241   93,402.3626 (0.34)      26766           1
test_serialize_proto_request[100-100] (0001_7467c7b)        7,148.5460 (>1000.0)    9,444.1250 (85.44)      7,525.8909 (>1000.0)     305.5168 (142.01)     7,450.3040 (>1000.0)      228.3147 (>1000.0)     21;10      132.8746 (0.00)        133           1
test_serialize_proto_request[100-100] (0002_ac3775c)        5,871.7720 (>1000.0)    7,544.0470 (68.25)      6,234.8434 (>1000.0)     264.2215 (122.82)     6,181.4380 (>1000.0)      216.3943 (>1000.0)     17;11      160.3890 (0.00)        155           1
test_serialize_proto_request[100-500] (0001_7467c7b)       36,230.4730 (>1000.0)   39,488.9470 (357.24)    37,301.1935 (>1000.0)     745.8722 (346.70)    37,284.1360 (>1000.0)      672.7235 (>1000.0)       4;2       26.8088 (0.00)         27           1
test_serialize_proto_request[100-500] (0002_ac3775c)       30,596.9580 (>1000.0)   33,150.7000 (299.90)    31,431.9774 (>1000.0)     571.1242 (265.47)    31,389.9560 (>1000.0)      741.8257 (>1000.0)       9;1       31.8147 (0.00)         33           1
test_serialize_proto_request[1000-100] (0001_7467c7b)      72,902.3020 (>1000.0)   78,092.1620 (706.46)    74,928.3109 (>1000.0)   1,500.6398 (697.53)    75,107.4010 (>1000.0)    2,256.1870 (>1000.0)       4;0       13.3461 (0.00)         13           1
test_serialize_proto_request[1000-100] (0002_ac3775c)      63,619.4170 (>1000.0)   75,790.0490 (685.63)    65,387.8999 (>1000.0)   3,043.8494 (>1000.0)   64,336.3970 (>1000.0)    1,927.8670 (>1000.0)       1;1       15.2933 (0.00)         16           1
test_serialize_proto_request[1000-500] (0001_7467c7b)     372,581.2870 (>1000.0)  402,702.5400 (>1000.0)  380,281.0462 (>1000.0)  12,622.1482 (>1000.0)  375,547.3720 (>1000.0)    9,360.2127 (>1000.0)       1;1        2.6296 (0.00)          5           1
test_serialize_proto_request[1000-500] (0002_ac3775c)     372,103.9150 (>1000.0)  535,268.8430 (>1000.0)  444,222.4522 (>1000.0)  67,463.8563 (>1000.0)  421,689.7070 (>1000.0)  109,535.8960 (>1000.0)       2;0        2.2511 (0.00)          5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

```


### Move `transformer_inputs` as top level field in the request

This PR extracts `transformer_inputs` from within `prediction_rows` to
top level request field to improve flexibility of passing data from
client without having to denormalize it beforehand. A new proto message
`TransformerInput` is introduced to store all tables and variables that
need to be passed to standard transformer.

```
message TransformerInput {
   // List of tables
   // All tables must have unique name.
   // Each table doesn't need to have same number of row.
   repeated Table tables = 1;  
   // List of variables
   repeated NamedValue variables = 2;
}
```


### Add utility package

2 utility functions are added to the package to facilitate a more
user-friendly api to use `Table`:
- `df_to_table` --> to convert pandas DataFrame to Table
- `table_to_df` --> to convert Table into pandas DataFrame

Example usage is as follow:
```
from caraml.upi.utils import df_to_table, table_to_df

df = pd.DataFrame(...)
table = df_to_table(df, "my-table)
new_df = table_to_df(table)
```


### Note

The PR contains generated docs and code, thus it's better to start the
review with following source files:
- table.proto
- upi.proto
- values.proto
- utils.py  

<img width="352" alt="Screenshot 2022-09-06 at 11 26 23 AM"
src="https://user-images.githubusercontent.com/4023015/188540757-c466ab95-31a1-4fa9-af37-5b03dbb22a8e.png">
  • Loading branch information
aria authored Sep 12, 2022
1 parent 2acf755 commit 8fbb12d
Show file tree
Hide file tree
Showing 83 changed files with 4,757 additions and 1,575 deletions.
3 changes: 2 additions & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ jobs:
go-version: '1.18.0'
- uses: bufbuild/buf-setup-action@v1
with:
version: '1.6.0'
version: '1.7.0'
github_token: ${{ github.token }}
- uses: s4u/[email protected]
with:
java-version: 11
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -191,3 +191,6 @@ cython_debug/

# End of https://www.toptal.com/developers/gitignore/api/go,python
**/*.jar


.idea/
323 changes: 240 additions & 83 deletions docs/api_html/caraml/upi/v1/index.html

Large diffs are not rendered by default.

148 changes: 110 additions & 38 deletions docs/api_markdown/caraml/upi/v1/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,21 @@
- [caraml/upi/v1/value.proto](#caraml_upi_v1_value-proto)
- [NamedValue](#caraml-upi-v1-NamedValue)

- [NamedValue.Type](#caraml-upi-v1-NamedValue-Type)
- [Type](#caraml-upi-v1-Type)

- [caraml/upi/v1/table.proto](#caraml_upi_v1_table-proto)
- [Column](#caraml-upi-v1-Column)
- [Row](#caraml-upi-v1-Row)
- [Table](#caraml-upi-v1-Table)
- [Value](#caraml-upi-v1-Value)

- [caraml/upi/v1/upi.proto](#caraml_upi_v1_upi-proto)
- [ModelMetadata](#caraml-upi-v1-ModelMetadata)
- [PredictValuesRequest](#caraml-upi-v1-PredictValuesRequest)
- [PredictValuesResponse](#caraml-upi-v1-PredictValuesResponse)
- [PredictionResultRow](#caraml-upi-v1-PredictionResultRow)
- [PredictionRow](#caraml-upi-v1-PredictionRow)
- [RequestMetadata](#caraml-upi-v1-RequestMetadata)
- [ResponseMetadata](#caraml-upi-v1-ResponseMetadata)
- [TransformerInput](#caraml-upi-v1-TransformerInput)

- [UniversalPredictionService](#caraml-upi-v1-UniversalPredictionService)

Expand All @@ -41,7 +46,7 @@ Oneof types are avoided as these can be difficult to handle
| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| name | [string](#string) | | Name describing what the value represents. Uses include: - Ensuring ML models process columns in the correct order - Defining a Feast row entity name - Parsing metadata to apply traffic rules |
| type | [NamedValue.Type](#caraml-upi-v1-NamedValue-Type) | | |
| type | [Type](#caraml-upi-v1-Type) | | |
| double_value | [double](#double) | | |
| integer_value | [int64](#int64) | | |
| string_value | [string](#string) | | |
Expand All @@ -53,9 +58,9 @@ Oneof types are avoided as these can be difficult to handle



<a name="caraml-upi-v1-NamedValue-Type"></a>
<a name="caraml-upi-v1-Type"></a>

### NamedValue.Type
### Type


| Name | Number | Description |
Expand All @@ -74,94 +79,144 @@ Oneof types are avoided as these can be difficult to handle



<a name="caraml_upi_v1_upi-proto"></a>
<a name="caraml_upi_v1_table-proto"></a>
<p align="right"><a href="#top">Top</a></p>

## caraml/upi/v1/upi.proto

## caraml/upi/v1/table.proto


<a name="caraml-upi-v1-ModelMetadata"></a>

### ModelMetadata
<a name="caraml-upi-v1-Column"></a>

### Column
Column represent a column definition within a table


| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| name | [string](#string) | | Model name that produce prediction |
| version | [string](#string) | | Model version that produce prediction |
| name | [string](#string) | | Column&#39;s name |
| type | [Type](#caraml-upi-v1-Type) | | Column&#39;s type |






<a name="caraml-upi-v1-PredictValuesRequest"></a>
<a name="caraml-upi-v1-Row"></a>

### PredictValuesRequest
Represents a request to predict multiple values
### Row
Row represents list of value stored within a row of a table


| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| prediction_rows | [PredictionRow](#caraml-upi-v1-PredictionRow) | repeated | Collection of prediction instances to be predicted. Each prediction row correspond to one prediction instance. NOTE: the ordering of prediction_rows might differ with prediction_result_rows in the response |
| target_name | [string](#string) | | Name of the concept we wish to predict. For example in context of iris classification problem it can be &#34;iris-species&#34; |
| prediction_context | [NamedValue](#caraml-upi-v1-NamedValue) | repeated | Prediction context may contain additional data applicable to all prediction instances For example it can be used to store information for traffic rules, experimentation or tracking purposes. Eg. country_code, service_type, service_area_id |
| metadata | [RequestMetadata](#caraml-upi-v1-RequestMetadata) | | |
| row_id | [string](#string) | | Id of the particular row in a table. The row id should be at least locally unique within the table. Row ID must be populated for prediction_table |
| values | [Value](#caraml-upi-v1-Value) | repeated | List of values within a row. It is table&#39;s creator responsibility to ensure that the number of entry values matches with the length of columns in the table. |






<a name="caraml-upi-v1-PredictValuesResponse"></a>
<a name="caraml-upi-v1-Table"></a>

### Table
Table represents a 2D data structure that has one or more columns
with potentially different types


| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| name | [string](#string) | | Table&#39;s name |
| columns | [Column](#caraml-upi-v1-Column) | repeated | Columns stores schema informations of all columns in the table. |
| rows | [Row](#caraml-upi-v1-Row) | repeated | Rows stores list of row values in the table. |


### PredictValuesResponse




<a name="caraml-upi-v1-Value"></a>

### Value
Value of a cell within a table. Value is nullable.


| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| prediction_result_rows | [PredictionResultRow](#caraml-upi-v1-PredictionResultRow) | repeated | Prediction results corresponding to the prediction rows provided in the request. NOTE: the ordering of prediction_result_rows might differ with prediction_rows in the request |
| target_name | [string](#string) | | Target name as defined in the request metadata |
| prediction_context | [NamedValue](#caraml-upi-v1-NamedValue) | repeated | Extensible field to cover unforeseen requirements |
| metadata | [ResponseMetadata](#caraml-upi-v1-ResponseMetadata) | | Response metadata |
| double_value | [double](#double) | | One of following field will be set depending on the column&#39;s type |
| integer_value | [int64](#int64) | | |
| string_value | [string](#string) | | |
| is_null | [bool](#bool) | | Flag to be used to signify that the value is null |







<a name="caraml-upi-v1-PredictionResultRow"></a>

### PredictionResultRow






<a name="caraml_upi_v1_upi-proto"></a>
<p align="right"><a href="#top">Top</a></p>

## caraml/upi/v1/upi.proto



<a name="caraml-upi-v1-ModelMetadata"></a>

### ModelMetadata



| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| row_id | [string](#string) | | Row ID defined by the caller used to join a prediction result with a prediction row |
| values | [NamedValue](#caraml-upi-v1-NamedValue) | repeated | Represents the predicted values corresponding to a single prediction row. This will often be the output of an ML model. This field is repeated to support multi-task models with non-scalar outputs |
| name | [string](#string) | | Model name that produce prediction |
| version | [string](#string) | | Model version that produce prediction |






<a name="caraml-upi-v1-PredictionRow"></a>
<a name="caraml-upi-v1-PredictValuesRequest"></a>

### PredictionRow
Represents an single instance we wish to predict.
Eg. for Matchmaking a prediction row will typically
correspond to a proposed driver plan
### PredictValuesRequest
Represents a request to predict multiple values


| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| row_id | [string](#string) | | Row ID is defined by the client and can be used to join a prediction row with the prediction result, and to track predictions generated by multiple models. The user is expected include row ID (along with prediction ID) when calling the observations API so that predictions and observations can be joined. |
| model_inputs | [NamedValue](#caraml-upi-v1-NamedValue) | repeated | Model inputs contain all preprocessed feature that model use to perform prediction. The feature ordering in model_inputs must be the same as feature order expected by model. Model inputs can be populated via 3 ways: - By performing preprocessing in the client-side and sent as part of original request. - By transforming raw feature values stored in transformer_inputs. - By retrieving precomputed feature value from feature store. |
| transformer_inputs | [NamedValue](#caraml-upi-v1-NamedValue) | repeated | Transformer input contains raw values that can be used to enrich model_inputs using transformer. Typically transformer_inputs contains: - unprocessed/raw features that requires further processing. - list of entities for which their precomputed features are retrieved from feature store. |
| prediction_table | [Table](#caraml-upi-v1-Table) | | Prediction table contains instances to be predicted. Each row in the table correspond to one prediction instance. Prediction table should contain all preprocessed feature that model use to perform prediction. The column ordering in the prediction table must be the same as feature order expected by model in the case of standard model. Prediction table can be populated via 3 ways: - By performing preprocessing in the client-side and sent as part of original request. - By transforming feature values stored in transformer_inputs. - By retrieving precomputed feature value from feature store. Row ID of the prediction_table must be populated by the client and can be used to join a row in prediction_table with another row in the prediction_result_table, and to track predictions generated by multiple models. The user is expected to include row ID (along with prediction ID) when calling the observations API so that predictions and observations can be joined. NOTE: the ordering of rows might differ in the response but the number of row must remain the same. |
| transformer_input | [TransformerInput](#caraml-upi-v1-TransformerInput) | | Transformer input contains list of tables and variables that can be used to enrich prediction_table using transformer. Typically transformer_inputs contains: - unprocessed/raw features that requires further transformation. - list of entities for which their precomputed features are retrieved from feature store using standard transformer. |
| target_name | [string](#string) | | Name of the concept we wish to predict. For example in context of iris classification problem it can be &#34;iris-species&#34; |
| prediction_context | [NamedValue](#caraml-upi-v1-NamedValue) | repeated | Prediction context may contain additional data applicable to all prediction instances For example it can be used to store information for traffic rules, experimentation or tracking purposes. Eg. country_code, service_type, service_area_id |
| metadata | [RequestMetadata](#caraml-upi-v1-RequestMetadata) | | Request metadata |






<a name="caraml-upi-v1-PredictValuesResponse"></a>

### PredictValuesResponse



| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| prediction_result_table | [Table](#caraml-upi-v1-Table) | | Prediction results corresponding to the prediction rows provided in the request. NOTE: the ordering of prediction_result_rows might differ with prediction_table in the request but the number of row must match with the prediction_table |
| target_name | [string](#string) | | Target name as defined in the request metadata |
| prediction_context | [NamedValue](#caraml-upi-v1-NamedValue) | repeated | Extensible field to cover unforeseen requirements |
| metadata | [ResponseMetadata](#caraml-upi-v1-ResponseMetadata) | | Response metadata |



Expand Down Expand Up @@ -201,6 +256,23 @@ correspond to a proposed driver plan




<a name="caraml-upi-v1-TransformerInput"></a>

### TransformerInput
Transformer input contains additional information that can be used to enrich prediction_table using standard transformer.
All tables and variables within transformer input will be imported to the standard transformer runtime automatically.


| Field | Type | Label | Description |
| ----- | ---- | ----- | ----------- |
| tables | [Table](#caraml-upi-v1-Table) | repeated | List of tables All tables must have unique name. Each table doesn&#39;t need to have same number of row. |
| variables | [NamedValue](#caraml-upi-v1-NamedValue) | repeated | List of variables |








Expand Down
43 changes: 43 additions & 0 deletions docs/openapiv2/caraml/upi/v1/table.swagger.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{
"swagger": "2.0",
"info": {
"title": "caraml/upi/v1/table.proto",
"version": "version not set"
},
"consumes": [
"application/json"
],
"produces": [
"application/json"
],
"paths": {},
"definitions": {
"protobufAny": {
"type": "object",
"properties": {
"@type": {
"type": "string"
}
},
"additionalProperties": {}
},
"rpcStatus": {
"type": "object",
"properties": {
"code": {
"type": "integer",
"format": "int32"
},
"message": {
"type": "string"
},
"details": {
"type": "array",
"items": {
"$ref": "#/definitions/protobufAny"
}
}
}
}
}
}
Loading

0 comments on commit 8fbb12d

Please sign in to comment.