Skip to content

Commit

Permalink
rename columns if custom column input (#1648)
Browse files Browse the repository at this point in the history
  • Loading branch information
hawestra authored Nov 3, 2023
1 parent b5bbadd commit 0398758
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 8 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: spark
name: gsq_annotation_compute_histogram
display_name: Annotation - Compute Histogram
description: Compute annotation histogram given a deployment's model data input.
version: 0.4.6
version: 0.4.7
is_deterministic: false
inputs:
production_dataset:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ type: pipeline
name: generation_safety_quality_signal_monitor
display_name: Generation Safety & Quality - Signal Monitor
description: Computes the content generation safety metrics over LLM outputs.
version: 0.4.8
version: 0.4.9
is_deterministic: true
inputs:
monitor_name:
Expand Down Expand Up @@ -97,7 +97,7 @@ outputs:
jobs:
compute_histogram:
type: spark
component: azureml://registries/azureml/components/gsq_annotation_compute_histogram/versions/0.4.6
component: azureml://registries/azureml/components/gsq_annotation_compute_histogram/versions/0.4.7
inputs:
production_dataset:
type: mltable
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1557,12 +1557,18 @@ def apply_annotation(
if SIMILARITY in metric_names and ground_truth_column_name not in production_df.columns:
raise ValueError(f"production_dataset must have column: {ground_truth_column_name}")

# rename columns to prompt, completion, context, ground truth to match metaprompt data
production_df = (production_df.withColumnRenamed(prompt_column_name, PROMPT)
.withColumnRenamed(completion_column_name, COMPLETION)
.withColumnRenamed(context_column_name, CONTEXT)
.withColumnRenamed(ground_truth_column_name, GROUND_TRUTH))

annotation_requirements = {
GROUNDEDNESS: [prompt_column_name, completion_column_name, context_column_name],
RELEVANCE: [prompt_column_name, completion_column_name, context_column_name],
FLUENCY: [prompt_column_name, completion_column_name],
COHERENCE: [prompt_column_name, completion_column_name],
SIMILARITY: [prompt_column_name, completion_column_name, ground_truth_column_name]
GROUNDEDNESS: [PROMPT, COMPLETION, CONTEXT],
RELEVANCE: [PROMPT, COMPLETION, CONTEXT],
FLUENCY: [PROMPT, COMPLETION],
COHERENCE: [PROMPT, COMPLETION],
SIMILARITY: [PROMPT, COMPLETION, GROUND_TRUTH]
}
# Sampling
production_df_sampled = production_df.sample(withReplacement=False, fraction=sample_rate)
Expand Down

0 comments on commit 0398758

Please sign in to comment.