Skip to content

lukekim/demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spice.ai Demo App

This is a Spice.ai data and AI app.

Prerequisites

  • Spice.ai CLI installed
  • OpenAI API key
  • Hugging Face API token (optional, for LLaMA model)
  • curl and jq for API calls

Learn More

To learn more about Spice.ai, take a look at the following resources:

Connect with us on Discord - your feedback is appreciated!


Demo Steps

Publishing a Spice App in the Cloud

Step 1: Forking and Using the Dataset

  1. Fork the repository https://github.com/jeadie/evals into your GitHub org.

Step 2: Creating a New App in the Cloud

  1. Log into the Spice.ai Cloud Platform and create a new app called evals. The app will start empty.
  2. Connect the app to your repository:
    • Go to the App Settings tab and select Connect Repository.
    • If the repository is not yet linked, follow the prompts to authenticate and link it.

Step 3: Deploying the App

  1. Set the app to Public:
    • Navigate to the app's settings and toggle the visibility to public.
  2. Redeploy the app:
    • Click Redeploy to load the datasets and configurations from the repository.

Step 4: Verifying and Testing

  1. Check the datasets in the Spice.ai Cloud:
    • Verify that the datasets are correctly loaded and accessible.
  2. Test public access:
    • Log in with a different account to confirm the app is accessible to external users.

Initializing a Local Spice App

  1. Initialize a new local Spice app

    mkdir demo
    cd demo
    spice init
  2. Login to Spice.ai Cloud

    spice login
  3. Get spicepod from Spicerack Navigate to spicerack.org, search for evals.

    image

Click on /evals, click on Use this app, and copy the spice connect command.

image

Paste the command into the terminal. Navigate to spicerack.org, search for evals, click on /evals, click on Use this app, and copy the spice connect command. Paste the command into the terminal.

spice connect <username>/evals

The spicepod.yml should be updated to:

version: v1beta1
kind: Spicepod
name: demo

dependencies:
  - Jeadie/evals
  1. Add a model to the spicepod

    models:
      - name: gpt-4o
        from: openai:gpt-4o
        params:
          openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }
  2. Start spice

    spice run
  3. Run an eval

    curl -XPOST "http://localhost:8090/v1/evals/taxes"      -H "Content-Type: application/json"      -d '{
        "model": "gpt-4o"
      }' | jq
  4. Explore incorrect results

    spice sql
    SELECT
      input,
      output,
      actual
    FROM eval.results
    WHERE value=0.0 LIMIT 5;

Optional: Create an Eval to Use a Smaller Model

  1. Track the outputs of all AI model calls:

    runtime:
      task_history:
        captured_output: truncated
  2. Define a new view and evaluation:

    views:
      - name: user_queries
        sql: |
          SELECT
            json_get_json(input, 'messages') AS input,
            json_get_str((captured_output -> 0), 'content') as ideal
          FROM runtime.task_history
          WHERE task='ai_completion'
      - name: latest_eval_runs
        sql: |
          SELECT model, MAX(created_at) as latest_run
             FROM eval.runs
             GROUP BY model
      - name: model_stats
        sql: |
          SELECT
            r.model,
            COUNT(*) as total_queries,
            SUM(CASE WHEN res.value = 1.0 THEN 1 ELSE 0 END) as correct_answers,
            AVG(res.value) as accuracy
          FROM eval.runs r
          JOIN latest_eval_runs lr ON r.model = lr.model AND r.created_at = lr.latest_run
          JOIN eval.results res ON res.run_id = r.id
          GROUP BY r.model
    
    evals:
      - name: mimic-user-queries
        description: |
          Evaluates how well a model can copy the exact answers already returned to a user. Useful for testing if a smaller/cheaper model is sufficient.
        dataset: user_queries
        scorers:
          - match
  3. Add a smaller model to the spicepod:

    models:
      - name: llama3
        from: huggingface:huggingface.co/meta-llama/Llama-3.2-3B-Instruct
        params:
          hf_token: ${ secrets:SPICE_HUGGINGFACE_API_KEY }
    
      - name: gpt-4o # Keep previous model.
  4. Verify models are loaded:

    spice models

    You should see both models listed:

    NAME    FROM                                                         STATUS
    gpt-4o  openai:gpt-4o                                                ready
    llama3  huggingface:huggingface.co/meta-llama/Llama-3.3-70B-Instruct ready
  5. Restart the Spice app:

    spice run
  6. Test the larger model or run another eval:

    spice chat
  7. Run evaluations on both models:

    # Run eval with GPT-4
    curl -XPOST "http://localhost:8090/v1/evals/mimic-user-queries" \
      -H "Content-Type: application/json" \
      -d '{"model": "gpt-4o"}' | jq
    
    # Run eval with LLaMA
    curl -XPOST "http://localhost:8090/v1/evals/mimic-user-queries" \
      -H "Content-Type: application/json" \
      -d '{"model": "llama3"}' | jq
  8. Compare model performance:

    spice sql
    SELECT
      model,
      total_queries,
      correct_answers,
      ROUND(accuracy * 100, 2) as accuracy_percentage
    FROM model_stats
    ORDER BY accuracy_percentage DESC;

    This query will show:

    • Total number of queries processed
    • Number of correct answers
    • Accuracy percentage as a percentage

    You can use these metrics to decide if the smaller model provides acceptable performance for your use case.


Full Spicepod Configuration

Include the following spicepod.yml for reference:

version: v1beta1
kind: Spicepod
name: demo

dependencies:
  - Jeadie/evals

runtime:
  task_history:
    captured_output: truncated

views:
  - name: user_queries
    sql: |
      SELECT
        json_get_json(input, 'messages') AS input,
        json_get_str((captured_output -> 0), 'content') as ideal
      FROM runtime.task_history
      WHERE task='ai_completion'

evals:
  - name: mimic-user-queries
    description: |
      Evaluates how well a model can copy the exact answers already returned to a user. Useful for testing if a smaller/cheaper model is sufficient.
    dataset: user_queries
    scorers:
      - match

models:
  - name: gpt-4o
    from: openai:gpt-4o
    params:
      openai_api_key: ${ secrets:SPICE_OPENAI_API_KEY }

  - name: llama3
    from: huggingface:huggingface.co/meta-llama/Llama-3.2-3B-Instruct
    params:
      hf_token: ${ secrets:SPICE_HUGGINGFACE_API_KEY }

About

Luke's Spice.ai demo app

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published