Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Motivated by the need for more flexible data passing mechanism and more efficient interface definition for large message we introduces following changes in this PR: ### Introduction of `Table` data structure [Table](https://github.com/caraml-dev/universal-prediction-interface/blob/ac3775c5d81b461ce29d75e84aed70739091e801/proto/caraml/upi/v1/table.proto) represents dataframe-like data structure in row-based format. Table has following specifications: - Table consists of one or more columns that potentially can have different type. - All values within a column must have same type. - A cell value can be null. - A row within a table has `row_id` (similar purpose to `row_id` in the PredictionRow) ### Replace `prediction_rows` in the request and `prediction_result_rows` in the response to use`Table` definition `prediction_rows` and `prediction_result_rows` are technically a dataframe-like object which can be represented as a `Table` Deserialization performance is improved significantly by at least 2x for various message size. NOTE: `0001_7467c7b` is the existing interface whereas `0002_ac3775c` is with the changes. ``` --------------------------------------------------------------------------------------------------------- benchmark 'deserialize-request': 24 tests ---------------------------------------------------------------------------------------------------------- Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_deserialize_proto_request[1-100] (0001_7467c7b) 41.9270 (16.13) 1,842.2130 (23.02) 52.1732 (17.22) 23.2068 (14.41) 45.3840 (15.89) 5.5505 (42.37) 1526;4056 19,166.9311 (0.06) 20015 1 test_deserialize_proto_request[1-100] (0002_ac3775c) 86.1700 (33.16) 377.9210 (4.72) 90.9080 (30.00) 11.1573 (6.93) 89.3640 (31.29) 3.1950 (24.39) 400;602 11,000.1373 (0.03) 10065 1 test_deserialize_proto_request[1-1] (0001_7467c7b) 2.5990 (1.0) 114.4450 (1.43) 3.0305 (1.0) 1.6895 (1.05) 2.8560 (1.0) 0.1310 (1.0) 332;6541 329,980.4986 (1.0) 49461 1 test_deserialize_proto_request[1-1] (0002_ac3775c) 4.6710 (1.80) 80.0310 (1.0) 5.1638 (1.70) 1.8626 (1.16) 5.0190 (1.76) 0.1610 (1.23) 564;1018 193,656.7518 (0.59) 42838 1 test_deserialize_proto_request[100-100] (0001_7467c7b) 4,181.3790 (>1000.0) 9,231.8540 (115.35) 5,351.1642 (>1000.0) 840.1283 (521.74) 5,403.3230 (>1000.0) 1,300.7730 (>1000.0) 60;4 186.8752 (0.00) 222 1 test_deserialize_proto_request[100-100] (0002_ac3775c) 3,081.5200 (>1000.0) 3,685.8110 (46.05) 3,180.9798 (>1000.0) 77.3511 (48.04) 3,162.0330 (>1000.0) 68.3450 (521.72) 35;16 314.3686 (0.00) 318 1 test_deserialize_proto_request[100-500] (0001_7467c7b) 22,122.4090 (>1000.0) 32,973.5540 (412.01) 26,359.4216 (>1000.0) 3,763.7983 (>1000.0) 24,061.2560 (>1000.0) 7,077.2085 (>1000.0) 10;0 37.9371 (0.00) 33 1 test_deserialize_proto_request[100-500] (0002_ac3775c) 15,255.4400 (>1000.0) 18,528.7330 (231.52) 16,450.6710 (>1000.0) 731.7893 (454.46) 16,373.3630 (>1000.0) 956.3830 (>1000.0) 16;2 60.7878 (0.00) 63 1 test_deserialize_proto_request[1000-100] (0001_7467c7b) 51,576.9360 (>1000.0) 84,187.4340 (>1000.0) 59,984.9779 (>1000.0) 8,331.1790 (>1000.0) 56,792.5575 (>1000.0) 12,483.7885 (>1000.0) 3;0 16.6708 (0.00) 20 1 test_deserialize_proto_request[1000-100] (0002_ac3775c) 33,100.3810 (>1000.0) 37,188.7500 (464.68) 35,079.6140 (>1000.0) 1,019.7075 (633.26) 35,332.1615 (>1000.0) 1,445.7335 (>1000.0) 9;0 28.5066 (0.00) 28 1 test_deserialize_proto_request[1000-500] (0001_7467c7b) 251,937.1810 (>1000.0) 327,753.4200 (>1000.0) 294,796.8826 (>1000.0) 36,910.2596 (>1000.0) 314,206.2790 (>1000.0) 67,611.3500 (>1000.0) 2;0 3.3922 (0.00) 5 1 test_deserialize_proto_request[1000-500] (0002_ac3775c) 162,492.4160 (>1000.0) 172,359.1360 (>1000.0) 166,427.1273 (>1000.0) 3,162.6573 (>1000.0) 166,304.3530 (>1000.0) 3,138.3735 (>1000.0) 2;1 6.0086 (0.00) 7 1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` Serialization performance is also improved, although not as significant as the deserialization perforamance. ``` ---------------------------------------------------------------------------------------------------------- benchmark 'serialize-request': 24 tests ---------------------------------------------------------------------------------------------------------- Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- test_serialize_proto_request[1-100] (0001_7467c7b) 73.3760 (21.72) 375.7880 (3.40) 78.3862 (21.26) 13.7893 (6.41) 75.6350 (21.40) 1.6200 (18.41) 527;662 12,757.3548 (0.05) 12398 1 test_serialize_proto_request[1-100] (0002_ac3775c) 126.0540 (37.32) 39,474.1100 (357.10) 215.1010 (58.35) 715.0832 (332.39) 138.9175 (39.31) 104.4490 (>1000.0) 66;193 4,648.9797 (0.02) 6382 1 test_serialize_proto_request[1-1] (0001_7467c7b) 3.3780 (1.0) 127.6300 (1.15) 3.6866 (1.0) 2.1514 (1.0) 3.5340 (1.0) 0.0880 (1.0) 385;1818 271,249.9283 (1.0) 50267 1 test_serialize_proto_request[1-1] (0002_ac3775c) 5.8470 (1.73) 1,560.9360 (14.12) 10.7064 (2.90) 13.6365 (6.34) 10.3590 (2.93) 4.9460 (56.20) 962;1241 93,402.3626 (0.34) 26766 1 test_serialize_proto_request[100-100] (0001_7467c7b) 7,148.5460 (>1000.0) 9,444.1250 (85.44) 7,525.8909 (>1000.0) 305.5168 (142.01) 7,450.3040 (>1000.0) 228.3147 (>1000.0) 21;10 132.8746 (0.00) 133 1 test_serialize_proto_request[100-100] (0002_ac3775c) 5,871.7720 (>1000.0) 7,544.0470 (68.25) 6,234.8434 (>1000.0) 264.2215 (122.82) 6,181.4380 (>1000.0) 216.3943 (>1000.0) 17;11 160.3890 (0.00) 155 1 test_serialize_proto_request[100-500] (0001_7467c7b) 36,230.4730 (>1000.0) 39,488.9470 (357.24) 37,301.1935 (>1000.0) 745.8722 (346.70) 37,284.1360 (>1000.0) 672.7235 (>1000.0) 4;2 26.8088 (0.00) 27 1 test_serialize_proto_request[100-500] (0002_ac3775c) 30,596.9580 (>1000.0) 33,150.7000 (299.90) 31,431.9774 (>1000.0) 571.1242 (265.47) 31,389.9560 (>1000.0) 741.8257 (>1000.0) 9;1 31.8147 (0.00) 33 1 test_serialize_proto_request[1000-100] (0001_7467c7b) 72,902.3020 (>1000.0) 78,092.1620 (706.46) 74,928.3109 (>1000.0) 1,500.6398 (697.53) 75,107.4010 (>1000.0) 2,256.1870 (>1000.0) 4;0 13.3461 (0.00) 13 1 test_serialize_proto_request[1000-100] (0002_ac3775c) 63,619.4170 (>1000.0) 75,790.0490 (685.63) 65,387.8999 (>1000.0) 3,043.8494 (>1000.0) 64,336.3970 (>1000.0) 1,927.8670 (>1000.0) 1;1 15.2933 (0.00) 16 1 test_serialize_proto_request[1000-500] (0001_7467c7b) 372,581.2870 (>1000.0) 402,702.5400 (>1000.0) 380,281.0462 (>1000.0) 12,622.1482 (>1000.0) 375,547.3720 (>1000.0) 9,360.2127 (>1000.0) 1;1 2.6296 (0.00) 5 1 test_serialize_proto_request[1000-500] (0002_ac3775c) 372,103.9150 (>1000.0) 535,268.8430 (>1000.0) 444,222.4522 (>1000.0) 67,463.8563 (>1000.0) 421,689.7070 (>1000.0) 109,535.8960 (>1000.0) 2;0 2.2511 (0.00) 5 1 ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` ### Move `transformer_inputs` as top level field in the request This PR extracts `transformer_inputs` from within `prediction_rows` to top level request field to improve flexibility of passing data from client without having to denormalize it beforehand. A new proto message `TransformerInput` is introduced to store all tables and variables that need to be passed to standard transformer. ``` message TransformerInput { // List of tables // All tables must have unique name. // Each table doesn't need to have same number of row. repeated Table tables = 1; // List of variables repeated NamedValue variables = 2; } ``` ### Add utility package 2 utility functions are added to the package to facilitate a more user-friendly api to use `Table`: - `df_to_table` --> to convert pandas DataFrame to Table - `table_to_df` --> to convert Table into pandas DataFrame Example usage is as follow: ``` from caraml.upi.utils import df_to_table, table_to_df df = pd.DataFrame(...) table = df_to_table(df, "my-table) new_df = table_to_df(table) ``` ### Note The PR contains generated docs and code, thus it's better to start the review with following source files: - table.proto - upi.proto - values.proto - utils.py <img width="352" alt="Screenshot 2022-09-06 at 11 26 23 AM" src="https://user-images.githubusercontent.com/4023015/188540757-c466ab95-31a1-4fa9-af37-5b03dbb22a8e.png">
- Loading branch information