[QST] Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? #386

Azilyss · 2023-10-25T16:48:59Z

** Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? **

Setup :
nvtabular version : 23.6.0
merlin-systems version : 23.6.0

The NvTabular workflow is defined as follows :

input_features = ["item_id-list"]
max_len = 20
cat_features = (
    ColumnSelector(input_features)
    >> ops.Categorify()
    >> nvt.ops.AddMetadata(tags=[Tags.CATEGORICAL])
)
seq_feats_list = (
    cat_features["item_id-list"]
    >> nvt.ops.ListSlice(-max_len, pad=True, pad_value=0)
    >> nvt.ops.Rename(postfix="_seq")
    >> nvt.ops.AddMetadata(tags=[Tags.LIST])
)
features = seq_feats_list >> nvt.ops.AddMetadata(tags=[Tags.ITEM, Tags.ID])
workflow = nvt.Workflow(features)

The dataset typically has sequences of items of different length and the workflow slice and pads them to the specified sequence_length.

The workflow is exported as follows:

transform_workflow_op = workflow.input_schema.column_names >> TransformWorkflow(workflow)
ensemble = Ensemble(transform_workflow_op, workflow.input_schema)
ens_config, node_configs = ensemble.export(preprocessing_path)

When exporting the workflow using the Ensemble module, the NvTabular triton config file creates two parameters for each ragged feature: "feature_name___offsets" and "feature_name___values" for both the inputs and outputs.

Is there a solution to avoid creating these new parameters and keep the inputs as is ?
Any workaround appreciated.

Code to reproduce

  import dask.dataframe as dd
  import nvtabular as nvt
  import pandas as pd
  from merlin.schema import Tags
  from merlin.systems.dag import Ensemble
  from merlin.systems.dag.ops.workflow import TransformWorkflow
  from nvtabular import ColumnSelector

  tmp_path = "tmp"

  d = {
      "item_id-list": [
          [28, 12, 44],
          [12, 28, 73],
          [24, 35, 6, 12],
          [74, 28, 9, 12, 44],
          [101, 102, 103, 104, 105],
      ],
  }

  df = pd.DataFrame(data=d)
  ddf = dd.from_pandas(df, npartitions=1)
  train_set = nvt.Dataset(ddf)

  input_features = ["item_id-list"]
  max_len = 20
  cat_features = (
          ColumnSelector(input_features)
          >> nvt.ops.Categorify()
          >> nvt.ops.AddMetadata(tags=[Tags.CATEGORICAL])
  )
  seq_feats_list = (
          cat_features["item_id-list"]
          >> nvt.ops.ListSlice(-max_len, pad=True, pad_value=0)
          >> nvt.ops.Rename(postfix="_seq")
          >> nvt.ops.AddMetadata(tags=[Tags.LIST])
  )
  features = seq_feats_list >> nvt.ops.AddMetadata(tags=[Tags.ITEM, Tags.ID])
  workflow = nvt.Workflow(features)

  workflow.fit(train_set)

  transform_workflow_op = workflow.input_schema.column_names >> TransformWorkflow(workflow)

  ensemble = Ensemble(transform_workflow_op, workflow.input_schema)
  ens_config, node_configs = ensemble.export(tmp_path)

  print(ens_config)

rnyak · 2023-10-25T18:02:26Z

@Azilyss this is done because then we can train DL models with ragged inputs, and then serve the on Triton accordingly. Is using pad=True does not set the is_ragged to False?

Azilyss · 2023-10-25T18:32:56Z

Apologies, the outputs are actually the correct ones.

However, because the inputs are expected to be ragged, the parameters item_id-list_seq___offsets, item_id-list_seq___values are created for the reasons you mentioned. In my current setup, I am running Triton inference on a single request at a time, not batched requests. So I was wondering if it was possible to keep the input as is, without having to pad the training dataset before fitting the workflow.

Thank you for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? #386

[QST] Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? #386

Azilyss commented Oct 25, 2023

rnyak commented Oct 25, 2023

Azilyss commented Oct 25, 2023

[QST] Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? #386

[QST] Can I ignore the is_ragged property of the categorical features when exporting the Workflow ? #386

Comments

Azilyss commented Oct 25, 2023

rnyak commented Oct 25, 2023

Azilyss commented Oct 25, 2023