-
Notifications
You must be signed in to change notification settings - Fork 585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Teacher Tool: Copilot Criteria Support #9953
Teacher Tool: Copilot Criteria Support #9953
Conversation
…ks/copilot_criteria
…t sure if we really want to expose this to anyone with a MakeCode iframe yet. I've tried to preserve the structure somewhat so it's easy to move back if desired.
… to anyone with a MakeCode iframe yet. I've tried to preserve the structure somewhat so it's easy to move back if desired.
…ks/copilot_criteria_before_autorun_changes
…ks/copilot_criteria_before_autorun_changes
Great job! |
}, | ||
{ | ||
id: "fail", | ||
title: lf("needs work"), | ||
label: lf("needs work"), | ||
label: lf("Needs work"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to ask about these. I don't really have a strong preference either way, but it was all lower case in the design which is why these were lowercase at the start. Do we know if there's a design pattern that we should follow for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of any pattern, but I think it looks cleaner with capitalized first letters when shown in the page.
const { state: teacherTool } = stateAndDispatch(); | ||
|
||
const url = `${ | ||
teacherTool.copilotEndpointOverride ? teacherTool.copilotEndpointOverride : pxt.Cloud.apiRoot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know that the override or the apiRoot will have a trailing slash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe not. Changed it to remove this assumption.
const copilotSlot = url.match(/copilot=([^&]+)/); | ||
const copilotEndpoint = | ||
copilotSlot && copilotSlot[1] | ||
? `https://makecode-app-backend-ppe-${copilotSlot[1]}.azurewebsites.net/api/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a URL that's okay to have hardcoded here? Is there not a pxt.Cloud.apiRoot
equivalent for this? If there isn't should we make one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is all just temporary code until we can check in backend changes, I personally think it's okay to hardcode in this way. I like to keep these sort of "will be removed later" changes as contained as possible in one place.
import { setEvalResult } from "./setEvalResult"; | ||
|
||
// This will set the outcome and notes for a given criteria instance id, but if the provided value is undefined, it will not change that value. | ||
export function mergeEvalResult(criteriaInstanceId: string, outcome?: EvaluationStatus, notes?: string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good change, but I wonder if this will also make it important to have "clear evaluation" functionality. Like if a teacher added criteria and wanted to evaluate and have a clean slate for the same student's project, it might be tedious to do that. No action needed now, and it might not even be a needed feature, but I just wanted to comment on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could call into the setEvalResult transform for that code (or if you're clearing all results, make a new transform that just clears all of them). But yeah, we wouldn't want to use this transform for that.
if (catalogParam.type === "system" && catalogParam.key) { | ||
param.value = getSystemParameter(catalogParam.key, teacherTool); | ||
if (!param.value) { | ||
param.value = catalogParam.default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding something/can't find it. What is catalogParam.default
? Is it guaranteed that a catalog criteria have a default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an optional field you can set on a parameter value in the json to define the initial (i.e. default) value when the criteria is added to the rubric. It's not guaranteed to be set, and if that's the case, it'll go into the if statement on line 60 and log an error, the same as if any other parameter is missing a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. Thanks for clarifying that.
const result = planResult.result ? EvaluationStatus.Pass : EvaluationStatus.Fail; | ||
setEvalResultOutcome(criteriaInstance.instanceId, result); | ||
const result = | ||
planResult.result === undefined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a nested ternary? This will always boil down to the result being pass or fail, right? Can the update here then be planResult.result === undefined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not always boil down to pass or fail. If planResult.result is undefined, then result is CompleteWithNoResult. If planResult.result is set, then we decide pass/fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, gotcha. Thanks!
logError(ErrorCode.askCopilotQuestion, e); | ||
} | ||
|
||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to indicate progress with the thinking? Because the other evaluations are so fast, I found myself impatient as I waited. Not something needed for this PR and I don't feel strongly about it, but might be something interesting to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at present. Deep Prompt just tells us if it's "in progress" or not. We could see if streaming the response text as it comes in or if getting some kind of % completion is in their plans, but I don't think it's possible currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. It would be interesting to find out if that's something they plan to do, but it can probably be something we inquire about after demoing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Exciting stuff!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just one small question.
const { state: teacherTool } = useContext(AppStateContext); | ||
const [value, setValue] = useState(teacherTool.evalResults[criteriaId]?.notes ?? ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having difficulty understanding why value
state was introduced. Was it something to do with initializing DebouncedTextArea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I genuinely can't recall. I'll undo it.
Overview
This change adds the front-end support for copilot criteria. In essence, we call into the backend with the Share ID of the project and the question being asked, then the backend forwards that information to DeepPrompt and returns the result to us.
The backend changes will not be checked in until we get more feedback and a better sense for funding, so this change has a few workarounds to compensate while still enabling easy demos and experimentation. Notably, there is a new "copilot" flag that can be set to point the copilot backend request to a staging slot. For example,
http://localhost:3232/eval?copilot=thsparks
sets it to the thsparks staging slot.Adjustments to Evaluation Flow
This change necessitated a few adjustments to how we run evaluation of the rule set.
Firstly, it introduces the idea of System Parameters. These are parameters that get passed into a validator plan, but which the user doesn't necessarily need to input as a part of the criteria template. Instead, the teacher tool auto-populates the value during evaluation. For this scenario, it's used to pass along the ShareId, though hypothetical future scenarios could involve things like the target or student vs teacher modes.
Secondly, I've introduced Teacher Tool Validator Plan Overrides which allow the teacher tool to "intercept" specific validator plans and run its own validation instead of calling into the iframe. This is used to keep the ai validation contained purely in the teacher tool (for now) because I'm not actually sure we want to expose it for anyone with a MakeCode iframe to call. These overrides are designed to mirror the context of the editor evaluation quite closely, so it's easy to move code from one to the other if needed.
Additional Changes
Since the "evaluating..." state is much longer with AI Eval, I made some adjustments to what that state looks like. Specifically, I removed the dropdown and the notes while the result is still pending. Instead, there is a three-dot loading animation.
I made some minor adjustments to our catalog loading order so that test criteria gets placed at the top (and copilot is at the top of those). This was based on feedback for demo-ing, but I think it also makes sense to put testing criteria at the top, since that is presumably what someone cares about, if they specifically included test criteria.
I made some changes to CriteriaResultNotes and Textarea so that they resize when the content is set outside of user input, and so they re-check their height when the width changes.
Follow Up Items
There are a few additional considerations I think we'll need to address in future changes, but I didn't want to include them here since it's already a sizable change. Those include:
Try It
Upload Target: https://makecode.microbit.org/app/6323a494671dcf086f00c1aea6a2641e797cc3dc-47ef17645f--eval?copilot=thsparks
Old:
https://makecode.microbit.org/app/2b32c0f2978128bb706a8eae85cfe3c680a8515c-8b3d39bfa7--eval?copilot=thsparks