From 8fbb12ddbc0e7f88d822d36ef685ef6c56475ca3 Mon Sep 17 00:00:00 2001 From: aria Date: Mon, 12 Sep 2022 11:51:20 +0800 Subject: [PATCH] UPI Interface Update (#6) Motivated by the need for more flexible data passing mechanism and more efficient interface definition for large message we introduces following changes in this PR: ### Introduction of `Table` data structure [Table](https://github.com/caraml-dev/universal-prediction-interface/blob/ac3775c5d81b461ce29d75e84aed70739091e801/proto/caraml/upi/v1/table.proto) represents dataframe-like data structure in row-based format. Table has following specifications: - Table consists of one or more columns that potentially can have different type. - All values within a column must have same type. - A cell value can be null. - A row within a table has `row_id` (similar purpose to `row_id` in the PredictionRow) ### Replace `prediction_rows` in the request and `prediction_result_rows` in the response to use`Table` definition `prediction_rows` and `prediction_result_rows` are technically a dataframe-like object which can be represented as a `Table` Deserialization performance is improved significantly by at least 2x for various message size. NOTE: `0001_7467c7b` is the existing interface whereas `0002_ac3775c` is with the changes. ``` --------------------------------------------------------------------------------------------------------- benchmark 'deserialize-request': 24 tests ---------------------------------------------------------------------------------------------------------- Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_deserialize_proto_request[1-100] (0001_7467c7b) 41.9270 (16.13) 1,842.2130 (23.02) 52.1732 (17.22) 23.2068 (14.41) 45.3840 (15.89) 5.5505 (42.37) 1526;4056 19,166.9311 (0.06) 20015 1 test_deserialize_proto_request[1-100] (0002_ac3775c) 86.1700 (33.16) 377.9210 (4.72) 90.9080 (30.00) 11.1573 (6.93) 89.3640 (31.29) 3.1950 (24.39) 400;602 11,000.1373 (0.03) 10065 1 test_deserialize_proto_request[1-1] (0001_7467c7b) 2.5990 (1.0) 114.4450 (1.43) 3.0305 (1.0) 1.6895 (1.05) 2.8560 (1.0) 0.1310 (1.0) 332;6541 329,980.4986 (1.0) 49461 1 test_deserialize_proto_request[1-1] (0002_ac3775c) 4.6710 (1.80) 80.0310 (1.0) 5.1638 (1.70) 1.8626 (1.16) 5.0190 (1.76) 0.1610 (1.23) 564;1018 193,656.7518 (0.59) 42838 1 test_deserialize_proto_request[100-100] (0001_7467c7b) 4,181.3790 (>1000.0) 9,231.8540 (115.35) 5,351.1642 (>1000.0) 840.1283 (521.74) 5,403.3230 (>1000.0) 1,300.7730 (>1000.0) 60;4 186.8752 (0.00) 222 1 test_deserialize_proto_request[100-100] (0002_ac3775c) 3,081.5200 (>1000.0) 3,685.8110 (46.05) 3,180.9798 (>1000.0) 77.3511 (48.04) 3,162.0330 (>1000.0) 68.3450 (521.72) 35;16 314.3686 (0.00) 318 1 test_deserialize_proto_request[100-500] (0001_7467c7b) 22,122.4090 (>1000.0) 32,973.5540 (412.01) 26,359.4216 (>1000.0) 3,763.7983 (>1000.0) 24,061.2560 (>1000.0) 7,077.2085 (>1000.0) 10;0 37.9371 (0.00) 33 1 test_deserialize_proto_request[100-500] (0002_ac3775c) 15,255.4400 (>1000.0) 18,528.7330 (231.52) 16,450.6710 (>1000.0) 731.7893 (454.46) 16,373.3630 (>1000.0) 956.3830 (>1000.0) 16;2 60.7878 (0.00) 63 1 test_deserialize_proto_request[1000-100] (0001_7467c7b) 51,576.9360 (>1000.0) 84,187.4340 (>1000.0) 59,984.9779 (>1000.0) 8,331.1790 (>1000.0) 56,792.5575 (>1000.0) 12,483.7885 (>1000.0) 3;0 16.6708 (0.00) 20 1 test_deserialize_proto_request[1000-100] (0002_ac3775c) 33,100.3810 (>1000.0) 37,188.7500 (464.68) 35,079.6140 (>1000.0) 1,019.7075 (633.26) 35,332.1615 (>1000.0) 1,445.7335 (>1000.0) 9;0 28.5066 (0.00) 28 1 test_deserialize_proto_request[1000-500] (0001_7467c7b) 251,937.1810 (>1000.0) 327,753.4200 (>1000.0) 294,796.8826 (>1000.0) 36,910.2596 (>1000.0) 314,206.2790 (>1000.0) 67,611.3500 (>1000.0) 2;0 3.3922 (0.00) 5 1 test_deserialize_proto_request[1000-500] (0002_ac3775c) 162,492.4160 (>1000.0) 172,359.1360 (>1000.0) 166,427.1273 (>1000.0) 3,162.6573 (>1000.0) 166,304.3530 (>1000.0) 3,138.3735 (>1000.0) 2;1 6.0086 (0.00) 7 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` Serialization performance is also improved, although not as significant as the deserialization perforamance. ``` ---------------------------------------------------------------------------------------------------------- benchmark 'serialize-request': 24 tests ---------------------------------------------------------------------------------------------------------- Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_serialize_proto_request[1-100] (0001_7467c7b) 73.3760 (21.72) 375.7880 (3.40) 78.3862 (21.26) 13.7893 (6.41) 75.6350 (21.40) 1.6200 (18.41) 527;662 12,757.3548 (0.05) 12398 1 test_serialize_proto_request[1-100] (0002_ac3775c) 126.0540 (37.32) 39,474.1100 (357.10) 215.1010 (58.35) 715.0832 (332.39) 138.9175 (39.31) 104.4490 (>1000.0) 66;193 4,648.9797 (0.02) 6382 1 test_serialize_proto_request[1-1] (0001_7467c7b) 3.3780 (1.0) 127.6300 (1.15) 3.6866 (1.0) 2.1514 (1.0) 3.5340 (1.0) 0.0880 (1.0) 385;1818 271,249.9283 (1.0) 50267 1 test_serialize_proto_request[1-1] (0002_ac3775c) 5.8470 (1.73) 1,560.9360 (14.12) 10.7064 (2.90) 13.6365 (6.34) 10.3590 (2.93) 4.9460 (56.20) 962;1241 93,402.3626 (0.34) 26766 1 test_serialize_proto_request[100-100] (0001_7467c7b) 7,148.5460 (>1000.0) 9,444.1250 (85.44) 7,525.8909 (>1000.0) 305.5168 (142.01) 7,450.3040 (>1000.0) 228.3147 (>1000.0) 21;10 132.8746 (0.00) 133 1 test_serialize_proto_request[100-100] (0002_ac3775c) 5,871.7720 (>1000.0) 7,544.0470 (68.25) 6,234.8434 (>1000.0) 264.2215 (122.82) 6,181.4380 (>1000.0) 216.3943 (>1000.0) 17;11 160.3890 (0.00) 155 1 test_serialize_proto_request[100-500] (0001_7467c7b) 36,230.4730 (>1000.0) 39,488.9470 (357.24) 37,301.1935 (>1000.0) 745.8722 (346.70) 37,284.1360 (>1000.0) 672.7235 (>1000.0) 4;2 26.8088 (0.00) 27 1 test_serialize_proto_request[100-500] (0002_ac3775c) 30,596.9580 (>1000.0) 33,150.7000 (299.90) 31,431.9774 (>1000.0) 571.1242 (265.47) 31,389.9560 (>1000.0) 741.8257 (>1000.0) 9;1 31.8147 (0.00) 33 1 test_serialize_proto_request[1000-100] (0001_7467c7b) 72,902.3020 (>1000.0) 78,092.1620 (706.46) 74,928.3109 (>1000.0) 1,500.6398 (697.53) 75,107.4010 (>1000.0) 2,256.1870 (>1000.0) 4;0 13.3461 (0.00) 13 1 test_serialize_proto_request[1000-100] (0002_ac3775c) 63,619.4170 (>1000.0) 75,790.0490 (685.63) 65,387.8999 (>1000.0) 3,043.8494 (>1000.0) 64,336.3970 (>1000.0) 1,927.8670 (>1000.0) 1;1 15.2933 (0.00) 16 1 test_serialize_proto_request[1000-500] (0001_7467c7b) 372,581.2870 (>1000.0) 402,702.5400 (>1000.0) 380,281.0462 (>1000.0) 12,622.1482 (>1000.0) 375,547.3720 (>1000.0) 9,360.2127 (>1000.0) 1;1 2.6296 (0.00) 5 1 test_serialize_proto_request[1000-500] (0002_ac3775c) 372,103.9150 (>1000.0) 535,268.8430 (>1000.0) 444,222.4522 (>1000.0) 67,463.8563 (>1000.0) 421,689.7070 (>1000.0) 109,535.8960 (>1000.0) 2;0 2.2511 (0.00) 5 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` ### Move `transformer_inputs` as top level field in the request This PR extracts `transformer_inputs` from within `prediction_rows` to top level request field to improve flexibility of passing data from client without having to denormalize it beforehand. A new proto message `TransformerInput` is introduced to store all tables and variables that need to be passed to standard transformer. ``` message TransformerInput { // List of tables // All tables must have unique name. // Each table doesn't need to have same number of row. repeated Table tables = 1; // List of variables repeated NamedValue variables = 2; } ``` ### Add utility package 2 utility functions are added to the package to facilitate a more user-friendly api to use `Table`: - `df_to_table` --> to convert pandas DataFrame to Table - `table_to_df` --> to convert Table into pandas DataFrame Example usage is as follow: ``` from caraml.upi.utils import df_to_table, table_to_df df = pd.DataFrame(...) table = df_to_table(df, "my-table) new_df = table_to_df(table) ``` ### Note The PR contains generated docs and code, thus it's better to start the review with following source files: - table.proto - upi.proto - values.proto - utils.py Screenshot 2022-09-06 at 11 26 23 AM --- .github/workflows/ci.yaml | 3 +- .gitignore | 3 + docs/api_html/caraml/upi/v1/index.html | 323 +++++++++--- docs/api_markdown/caraml/upi/v1/index.md | 148 ++++-- .../caraml/upi/v1/table.swagger.json | 43 ++ docs/openapiv2/caraml/upi/v1/upi.swagger.json | 180 ++++--- gen/go/grpc/caraml/upi/v1/table.pb.go | 439 ++++++++++++++++ gen/go/grpc/caraml/upi/v1/upi.pb.go | 481 ++++++++---------- gen/go/grpc/caraml/upi/v1/value.pb.go | 114 ++--- gen/go/openapi/.openapi-generator/FILES | 18 +- gen/go/openapi/README.md | 9 +- gen/go/openapi/api/openapi.yaml | 284 +++++++---- .../{V1NamedValueType.md => Upiv1Type.md} | 2 +- gen/go/openapi/docs/V1Column.md | 82 +++ gen/go/openapi/docs/V1NamedValue.md | 8 +- gen/go/openapi/docs/V1PredictValuesRequest.md | 52 +- .../openapi/docs/V1PredictValuesResponse.md | 26 +- gen/go/openapi/docs/V1PredictionRow.md | 108 ---- .../{V1PredictionResultRow.md => V1Row.md} | 32 +- gen/go/openapi/docs/V1Table.md | 108 ++++ gen/go/openapi/docs/V1TransformerInput.md | 82 +++ gen/go/openapi/docs/V1Value.md | 134 +++++ gen/go/openapi/model_upiv1_type.go | 115 +++++ gen/go/openapi/model_v1_column.go | 155 ++++++ gen/go/openapi/model_v1_named_value.go | 16 +- gen/go/openapi/model_v1_named_value_type.go | 115 ----- .../model_v1_predict_values_request.go | 72 ++- .../model_v1_predict_values_response.go | 36 +- gen/go/openapi/model_v1_prediction_row.go | 190 ------- ...ediction_result_row.go => model_v1_row.go} | 61 +-- gen/go/openapi/model_v1_table.go | 189 +++++++ gen/go/openapi/model_v1_transformer_input.go | 152 ++++++ gen/go/openapi/model_v1_value.go | 223 ++++++++ gen/python/grpc/caraml/upi/utils.py | 132 +++++ gen/python/grpc/caraml/upi/v1/table_pb2.py | 66 +++ gen/python/grpc/caraml/upi/v1/table_pb2.pyi | 111 ++++ .../grpc/caraml/upi/v1/table_pb2_grpc.py | 4 + .../grpc/caraml/upi/v1/table_pb2_grpc.pyi | 4 + gen/python/grpc/caraml/upi/v1/upi_pb2.py | 57 +-- gen/python/grpc/caraml/upi/v1/upi_pb2.pyi | 148 +++--- gen/python/grpc/caraml/upi/v1/value_pb2.py | 16 +- gen/python/grpc/caraml/upi/v1/value_pb2.pyi | 40 +- gen/python/grpc/setup.py | 3 +- gen/python/grpc/test/basic_test.py | 36 +- ...enchmark_test.py => benchmark_upi_test.py} | 35 +- gen/python/grpc/test/benchmark_utils_test.py | 39 ++ gen/python/grpc/test/utils_test.py | 103 ++++ gen/python/openapi/.openapi-generator/FILES | 27 +- gen/python/openapi/README.md | 91 +++- .../docs/UniversalPredictionServiceApi.md | 82 ++- .../{V1NamedValueType.md => Upiv1Type.md} | 2 +- .../{V1PredictionResultRow.md => V1Column.md} | 6 +- gen/python/openapi/docs/V1NamedValue.md | 2 +- .../openapi/docs/V1PredictValuesRequest.md | 3 +- .../openapi/docs/V1PredictValuesResponse.md | 2 +- gen/python/openapi/docs/V1PredictionRow.md | 14 - gen/python/openapi/docs/V1Row.md | 13 + gen/python/openapi/docs/V1Table.md | 14 + gen/python/openapi/docs/V1TransformerInput.md | 14 + gen/python/openapi/docs/V1Value.md | 16 + .../{v1_named_value_type.py => upiv1_type.py} | 6 +- .../openapi/openapi_client/model/v1_column.py | 273 ++++++++++ .../openapi_client/model/v1_named_value.py | 10 +- .../model/v1_predict_values_request.py | 18 +- .../model/v1_predict_values_response.py | 12 +- ...{v1_prediction_result_row.py => v1_row.py} | 16 +- .../openapi/openapi_client/model/v1_table.py | 279 ++++++++++ ...diction_row.py => v1_transformer_input.py} | 28 +- .../openapi/openapi_client/model/v1_value.py | 275 ++++++++++ .../openapi/openapi_client/models/__init__.py | 9 +- ...named_value_type.py => test_upiv1_type.py} | 12 +- ...iction_result_row.py => test_v1_column.py} | 16 +- .../openapi/test/test_v1_named_value.py | 4 +- .../test/test_v1_predict_values_request.py | 6 +- .../test/test_v1_predict_values_response.py | 4 +- gen/python/openapi/test/test_v1_row.py | 37 ++ gen/python/openapi/test/test_v1_table.py | 39 ++ ...on_row.py => test_v1_transformer_input.py} | 14 +- gen/python/openapi/test/test_v1_value.py | 35 ++ proto/buf.lock | 10 +- proto/caraml/upi/v1/table.proto | 50 ++ proto/caraml/upi/v1/upi.proto | 81 ++- proto/caraml/upi/v1/value.proto | 15 +- 83 files changed, 4757 insertions(+), 1575 deletions(-) create mode 100644 docs/openapiv2/caraml/upi/v1/table.swagger.json create mode 100644 gen/go/grpc/caraml/upi/v1/table.pb.go rename gen/go/openapi/docs/{V1NamedValueType.md => Upiv1Type.md} (94%) create mode 100644 gen/go/openapi/docs/V1Column.md delete mode 100644 gen/go/openapi/docs/V1PredictionRow.md rename gen/go/openapi/docs/{V1PredictionResultRow.md => V1Row.md} (61%) create mode 100644 gen/go/openapi/docs/V1Table.md create mode 100644 gen/go/openapi/docs/V1TransformerInput.md create mode 100644 gen/go/openapi/docs/V1Value.md create mode 100644 gen/go/openapi/model_upiv1_type.go create mode 100644 gen/go/openapi/model_v1_column.go delete mode 100644 gen/go/openapi/model_v1_named_value_type.go delete mode 100644 gen/go/openapi/model_v1_prediction_row.go rename gen/go/openapi/{model_v1_prediction_result_row.go => model_v1_row.go} (56%) create mode 100644 gen/go/openapi/model_v1_table.go create mode 100644 gen/go/openapi/model_v1_transformer_input.go create mode 100644 gen/go/openapi/model_v1_value.go create mode 100644 gen/python/grpc/caraml/upi/utils.py create mode 100644 gen/python/grpc/caraml/upi/v1/table_pb2.py create mode 100644 gen/python/grpc/caraml/upi/v1/table_pb2.pyi create mode 100644 gen/python/grpc/caraml/upi/v1/table_pb2_grpc.py create mode 100644 gen/python/grpc/caraml/upi/v1/table_pb2_grpc.pyi rename gen/python/grpc/test/{benchmark_test.py => benchmark_upi_test.py} (79%) create mode 100644 gen/python/grpc/test/benchmark_utils_test.py create mode 100644 gen/python/grpc/test/utils_test.py rename gen/python/openapi/docs/{V1NamedValueType.md => Upiv1Type.md} (95%) rename gen/python/openapi/docs/{V1PredictionResultRow.md => V1Column.md} (77%) delete mode 100644 gen/python/openapi/docs/V1PredictionRow.md create mode 100644 gen/python/openapi/docs/V1Row.md create mode 100644 gen/python/openapi/docs/V1Table.md create mode 100644 gen/python/openapi/docs/V1TransformerInput.md create mode 100644 gen/python/openapi/docs/V1Value.md rename gen/python/openapi/openapi_client/model/{v1_named_value_type.py => upiv1_type.py} (98%) create mode 100644 gen/python/openapi/openapi_client/model/v1_column.py rename gen/python/openapi/openapi_client/model/{v1_prediction_result_row.py => v1_row.py} (94%) create mode 100644 gen/python/openapi/openapi_client/model/v1_table.py rename gen/python/openapi/openapi_client/model/{v1_prediction_row.py => v1_transformer_input.py} (79%) create mode 100644 gen/python/openapi/openapi_client/model/v1_value.py rename gen/python/openapi/test/{test_v1_named_value_type.py => test_upiv1_type.py} (65%) rename gen/python/openapi/test/{test_v1_prediction_result_row.py => test_v1_column.py} (56%) create mode 100644 gen/python/openapi/test/test_v1_row.py create mode 100644 gen/python/openapi/test/test_v1_table.py rename gen/python/openapi/test/{test_v1_prediction_row.py => test_v1_transformer_input.py} (62%) create mode 100644 gen/python/openapi/test/test_v1_value.py create mode 100644 proto/caraml/upi/v1/table.proto diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml index 62f7d6d..e053135 100644 --- a/.github/workflows/ci.yaml +++ b/.github/workflows/ci.yaml @@ -14,7 +14,8 @@ jobs: go-version: '1.18.0' - uses: bufbuild/buf-setup-action@v1 with: - version: '1.6.0' + version: '1.7.0' + github_token: ${{ github.token }} - uses: s4u/setup-maven-action@v1.2.1 with: java-version: 11 diff --git a/.gitignore b/.gitignore index 1566e10..bb111f3 100644 --- a/.gitignore +++ b/.gitignore @@ -191,3 +191,6 @@ cython_debug/ # End of https://www.toptal.com/developers/gitignore/api/go,python **/*.jar + + +.idea/ diff --git a/docs/api_html/caraml/upi/v1/index.html b/docs/api_html/caraml/upi/v1/index.html index 8b5712f..abe8767 100644 --- a/docs/api_html/caraml/upi/v1/index.html +++ b/docs/api_html/caraml/upi/v1/index.html @@ -184,7 +184,7 @@

Table of Contents

  • - ENamedValue.Type + EType
  • @@ -194,27 +194,46 @@

    Table of Contents

  • - caraml/upi/v1/upi.proto + caraml/upi/v1/table.proto +
  • + + +
  • + caraml/upi/v1/upi.proto +