Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teacher Tool: Copilot Criteria Support #9953

Merged
merged 32 commits into from
Apr 10, 2024

Conversation

thsparks
Copy link
Contributor

@thsparks thsparks commented Apr 9, 2024

Overview

This change adds the front-end support for copilot criteria. In essence, we call into the backend with the Share ID of the project and the question being asked, then the backend forwards that information to DeepPrompt and returns the result to us.

The backend changes will not be checked in until we get more feedback and a better sense for funding, so this change has a few workarounds to compensate while still enabling easy demos and experimentation. Notably, there is a new "copilot" flag that can be set to point the copilot backend request to a staging slot. For example, http://localhost:3232/eval?copilot=thsparks sets it to the thsparks staging slot.

Adjustments to Evaluation Flow

This change necessitated a few adjustments to how we run evaluation of the rule set.

Firstly, it introduces the idea of System Parameters. These are parameters that get passed into a validator plan, but which the user doesn't necessarily need to input as a part of the criteria template. Instead, the teacher tool auto-populates the value during evaluation. For this scenario, it's used to pass along the ShareId, though hypothetical future scenarios could involve things like the target or student vs teacher modes.

Secondly, I've introduced Teacher Tool Validator Plan Overrides which allow the teacher tool to "intercept" specific validator plans and run its own validation instead of calling into the iframe. This is used to keep the ai validation contained purely in the teacher tool (for now) because I'm not actually sure we want to expose it for anyone with a MakeCode iframe to call. These overrides are designed to mirror the context of the editor evaluation quite closely, so it's easy to move code from one to the other if needed.

Additional Changes

Since the "evaluating..." state is much longer with AI Eval, I made some adjustments to what that state looks like. Specifically, I removed the dropdown and the notes while the result is still pending. Instead, there is a three-dot loading animation.

I made some minor adjustments to our catalog loading order so that test criteria gets placed at the top (and copilot is at the top of those). This was based on feedback for demo-ing, but I think it also makes sense to put testing criteria at the top, since that is presumably what someone cares about, if they specifically included test criteria.

I made some changes to CriteriaResultNotes and Textarea so that they resize when the content is set outside of user input, and so they re-check their height when the width changes.

Follow Up Items

There are a few additional considerations I think we'll need to address in future changes, but I didn't want to include them here since it's already a sizable change. Those include:

  1. Changes to auto-run so we don't send requests to the AI on every change
  2. We need to tidy up the catalog, either with categories or an overhauled flow. It's pretty cluttered at the moment.

Try It

Upload Target: https://makecode.microbit.org/app/6323a494671dcf086f00c1aea6a2641e797cc3dc-47ef17645f--eval?copilot=thsparks
Old: https://makecode.microbit.org/app/2b32c0f2978128bb706a8eae85cfe3c680a8515c-8b3d39bfa7--eval?copilot=thsparks

thsparks added 26 commits March 27, 2024 18:46
…t sure if we really want to expose this to anyone with a MakeCode iframe yet. I've tried to preserve the structure somewhat so it's easy to move back if desired.
… to anyone with a MakeCode iframe yet. I've tried to preserve the structure somewhat so it's easy to move back if desired.
…ks/copilot_criteria_before_autorun_changes
…ks/copilot_criteria_before_autorun_changes
@thsparks thsparks requested a review from a team April 9, 2024 18:34
@thsparks thsparks changed the title Copilot Criteria Support Teacher Tool: Copilot Criteria Support Apr 9, 2024
@abchatra
Copy link
Collaborator

abchatra commented Apr 9, 2024

Great job!

common-docs/teachertool/test/catalog-shared.json Outdated Show resolved Hide resolved
teachertool/src/components/CriteriaResultEntry.tsx Outdated Show resolved Hide resolved
},
{
id: "fail",
title: lf("needs work"),
label: lf("needs work"),
label: lf("Needs work"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to ask about these. I don't really have a strong preference either way, but it was all lower case in the design which is why these were lowercase at the start. Do we know if there's a design pattern that we should follow for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of any pattern, but I think it looks cleaner with capitalized first letters when shown in the page.

teachertool/src/services/backendRequests.ts Outdated Show resolved Hide resolved
const { state: teacherTool } = stateAndDispatch();

const url = `${
teacherTool.copilotEndpointOverride ? teacherTool.copilotEndpointOverride : pxt.Cloud.apiRoot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know that the override or the apiRoot will have a trailing slash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe not. Changed it to remove this assumption.

const copilotSlot = url.match(/copilot=([^&]+)/);
const copilotEndpoint =
copilotSlot && copilotSlot[1]
? `https://makecode-app-backend-ppe-${copilotSlot[1]}.azurewebsites.net/api/`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a URL that's okay to have hardcoded here? Is there not a pxt.Cloud.apiRoot equivalent for this? If there isn't should we make one?

Copy link
Contributor Author

@thsparks thsparks Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is all just temporary code until we can check in backend changes, I personally think it's okay to hardcode in this way. I like to keep these sort of "will be removed later" changes as contained as possible in one place.

import { setEvalResult } from "./setEvalResult";

// This will set the outcome and notes for a given criteria instance id, but if the provided value is undefined, it will not change that value.
export function mergeEvalResult(criteriaInstanceId: string, outcome?: EvaluationStatus, notes?: string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good change, but I wonder if this will also make it important to have "clear evaluation" functionality. Like if a teacher added criteria and wanted to evaluate and have a clean slate for the same student's project, it might be tedious to do that. No action needed now, and it might not even be a needed feature, but I just wanted to comment on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could call into the setEvalResult transform for that code (or if you're clearing all results, make a new transform that just clears all of them). But yeah, we wouldn't want to use this transform for that.

if (catalogParam.type === "system" && catalogParam.key) {
param.value = getSystemParameter(catalogParam.key, teacherTool);
if (!param.value) {
param.value = catalogParam.default;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding something/can't find it. What is catalogParam.default? Is it guaranteed that a catalog criteria have a default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an optional field you can set on a parameter value in the json to define the initial (i.e. default) value when the criteria is added to the rubric. It's not guaranteed to be set, and if that's the case, it'll go into the if statement on line 60 and log an error, the same as if any other parameter is missing a value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. Thanks for clarifying that.

const result = planResult.result ? EvaluationStatus.Pass : EvaluationStatus.Fail;
setEvalResultOutcome(criteriaInstance.instanceId, result);
const result =
planResult.result === undefined
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a nested ternary? This will always boil down to the result being pass or fail, right? Can the update here then be planResult.result === undefined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not always boil down to pass or fail. If planResult.result is undefined, then result is CompleteWithNoResult. If planResult.result is set, then we decide pass/fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, gotcha. Thanks!

logError(ErrorCode.askCopilotQuestion, e);
}

return result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to indicate progress with the thinking? Because the other evaluations are so fast, I found myself impatient as I waited. Not something needed for this PR and I don't feel strongly about it, but might be something interesting to investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at present. Deep Prompt just tells us if it's "in progress" or not. We could see if streaming the response text as it comes in or if getting some kind of % completion is in their plans, but I don't think it's possible currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. It would be interesting to find out if that's something they plan to do, but it can probably be something we inquire about after demoing.

Copy link
Contributor

@srietkerk srietkerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Exciting stuff!

Copy link
Contributor

@eanders-ms eanders-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just one small question.

const { state: teacherTool } = useContext(AppStateContext);
const [value, setValue] = useState(teacherTool.evalResults[criteriaId]?.notes ?? "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having difficulty understanding why value state was introduced. Was it something to do with initializing DebouncedTextArea?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I genuinely can't recall. I'll undo it.

@thsparks thsparks merged commit 820f383 into master Apr 10, 2024
6 checks passed
@thsparks thsparks deleted the thsparks/copilot_criteria_before_autorun_changes branch April 10, 2024 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants