Teacher Tool: Copilot Criteria Support #9953

thsparks · 2024-04-09T18:34:28Z

Overview

This change adds the front-end support for copilot criteria. In essence, we call into the backend with the Share ID of the project and the question being asked, then the backend forwards that information to DeepPrompt and returns the result to us.

The backend changes will not be checked in until we get more feedback and a better sense for funding, so this change has a few workarounds to compensate while still enabling easy demos and experimentation. Notably, there is a new "copilot" flag that can be set to point the copilot backend request to a staging slot. For example, http://localhost:3232/eval?copilot=thsparks sets it to the thsparks staging slot.

Adjustments to Evaluation Flow

This change necessitated a few adjustments to how we run evaluation of the rule set.

Firstly, it introduces the idea of System Parameters. These are parameters that get passed into a validator plan, but which the user doesn't necessarily need to input as a part of the criteria template. Instead, the teacher tool auto-populates the value during evaluation. For this scenario, it's used to pass along the ShareId, though hypothetical future scenarios could involve things like the target or student vs teacher modes.

Secondly, I've introduced Teacher Tool Validator Plan Overrides which allow the teacher tool to "intercept" specific validator plans and run its own validation instead of calling into the iframe. This is used to keep the ai validation contained purely in the teacher tool (for now) because I'm not actually sure we want to expose it for anyone with a MakeCode iframe to call. These overrides are designed to mirror the context of the editor evaluation quite closely, so it's easy to move code from one to the other if needed.

Additional Changes

Since the "evaluating..." state is much longer with AI Eval, I made some adjustments to what that state looks like. Specifically, I removed the dropdown and the notes while the result is still pending. Instead, there is a three-dot loading animation.

I made some minor adjustments to our catalog loading order so that test criteria gets placed at the top (and copilot is at the top of those). This was based on feedback for demo-ing, but I think it also makes sense to put testing criteria at the top, since that is presumably what someone cares about, if they specifically included test criteria.

I made some changes to CriteriaResultNotes and Textarea so that they resize when the content is set outside of user input, and so they re-check their height when the width changes.

Follow Up Items

There are a few additional considerations I think we'll need to address in future changes, but I didn't want to include them here since it's already a sizable change. Those include:

Changes to auto-run so we don't send requests to the AI on every change
We need to tidy up the catalog, either with categories or an overhauled flow. It's pretty cluttered at the moment.

Try It

Upload Target: https://makecode.microbit.org/app/6323a494671dcf086f00c1aea6a2641e797cc3dc-47ef17645f--eval?copilot=thsparks
Old: ~~https://makecode.microbit.org/app/2b32c0f2978128bb706a8eae85cfe3c680a8515c-8b3d39bfa7--eval?copilot=thsparks~~

…al to do items.

…ks/copilot_criteria

…t sure if we really want to expose this to anyone with a MakeCode iframe yet. I've tried to preserve the structure somewhat so it's easy to move back if desired.

… to anyone with a MakeCode iframe yet. I've tried to preserve the structure somewhat so it's easy to move back if desired.

… is on.

…ks/copilot_criteria_before_autorun_changes

abchatra · 2024-04-09T21:59:46Z

Great job!

common-docs/teachertool/test/catalog-shared.json

teachertool/src/components/CriteriaResultEntry.tsx

srietkerk · 2024-04-09T21:23:27Z

teachertool/src/components/CriteriaEvalResultDropdown.tsx

    },
    {
        id: "fail",
        title: lf("needs work"),
-        label: lf("needs work"),
+        label: lf("Needs work"),


Just wanted to ask about these. I don't really have a strong preference either way, but it was all lower case in the design which is why these were lowercase at the start. Do we know if there's a design pattern that we should follow for these?

I'm not aware of any pattern, but I think it looks cleaner with capitalized first letters when shown in the page.

teachertool/src/services/backendRequests.ts

srietkerk · 2024-04-09T21:32:25Z

teachertool/src/services/backendRequests.ts

+    const { state: teacherTool } = stateAndDispatch();
+
+    const url = `${
+        teacherTool.copilotEndpointOverride ? teacherTool.copilotEndpointOverride : pxt.Cloud.apiRoot


Do we know that the override or the apiRoot will have a trailing slash?

Hmm, maybe not. Changed it to remove this assumption.

srietkerk · 2024-04-09T21:40:57Z

teachertool/src/state/appStateContext.tsx

+    const copilotSlot = url.match(/copilot=([^&]+)/);
+    const copilotEndpoint =
+        copilotSlot && copilotSlot[1]
+            ? `https://makecode-app-backend-ppe-${copilotSlot[1]}.azurewebsites.net/api/`


Is this a URL that's okay to have hardcoded here? Is there not a pxt.Cloud.apiRoot equivalent for this? If there isn't should we make one?

Since this is all just temporary code until we can check in backend changes, I personally think it's okay to hardcode in this way. I like to keep these sort of "will be removed later" changes as contained as possible in one place.

srietkerk · 2024-04-09T21:46:14Z

teachertool/src/transforms/mergeEvalResult.ts

+import { setEvalResult } from "./setEvalResult";
+
+// This will set the outcome and notes for a given criteria instance id, but if the provided value is undefined, it will not change that value.
+export function mergeEvalResult(criteriaInstanceId: string, outcome?: EvaluationStatus, notes?: string) {


I think this is a good change, but I wonder if this will also make it important to have "clear evaluation" functionality. Like if a teacher added criteria and wanted to evaluate and have a clean slate for the same student's project, it might be tedious to do that. No action needed now, and it might not even be a needed feature, but I just wanted to comment on it.

I think you could call into the setEvalResult transform for that code (or if you're clearing all results, make a new transform that just clears all of them). But yeah, we wouldn't want to use this transform for that.

srietkerk · 2024-04-09T21:57:30Z

teachertool/src/transforms/runEvaluateAsync.ts

+        if (catalogParam.type === "system" && catalogParam.key) {
+            param.value = getSystemParameter(catalogParam.key, teacherTool);
+            if (!param.value) {
+                param.value = catalogParam.default;


I may be misunderstanding something/can't find it. What is catalogParam.default? Is it guaranteed that a catalog criteria have a default?

It's an optional field you can set on a parameter value in the json to define the initial (i.e. default) value when the criteria is added to the rubric. It's not guaranteed to be set, and if that's the case, it'll go into the if statement on line 60 and log an error, the same as if any other parameter is missing a value.

Ah, makes sense. Thanks for clarifying that.

srietkerk · 2024-04-09T22:02:08Z

teachertool/src/transforms/runEvaluateAsync.ts

-                    const result = planResult.result ? EvaluationStatus.Pass : EvaluationStatus.Fail;
-                    setEvalResultOutcome(criteriaInstance.instanceId, result);
+                    const result =
+                        planResult.result === undefined


Does this need to be a nested ternary? This will always boil down to the result being pass or fail, right? Can the update here then be planResult.result === undefined?

It does not always boil down to pass or fail. If planResult.result is undefined, then result is CompleteWithNoResult. If planResult.result is set, then we decide pass/fail.

Oh, gotcha. Thanks!

srietkerk · 2024-04-09T22:09:19Z

teachertool/src/services/backendRequests.ts

+        logError(ErrorCode.askCopilotQuestion, e);
+    }
+
+    return result;


Is there any way to indicate progress with the thinking? Because the other evaluations are so fast, I found myself impatient as I waited. Not something needed for this PR and I don't feel strongly about it, but might be something interesting to investigate.

Not at present. Deep Prompt just tells us if it's "in progress" or not. We could see if streaming the response text as it comes in or if getting some kind of % completion is in their plans, but I don't think it's possible currently.

Sounds good. It would be interesting to find out if that's something they plan to do, but it can probably be something we inquire about after demoing.

srietkerk

Looks good to me! Exciting stuff!

eanders-ms

Looks good. Just one small question.

eanders-ms · 2024-04-10T16:11:45Z

teachertool/src/components/CriteriaResultEntry.tsx

    const { state: teacherTool } = useContext(AppStateContext);
+    const [value, setValue] = useState(teacherTool.evalResults[criteriaId]?.notes ?? "");


I'm having difficulty understanding why value state was introduced. Was it something to do with initializing DebouncedTextArea?

Hm. I genuinely can't recall. I'll undo it.

thsparks added 26 commits March 27, 2024 18:46

One approach to having a validator plan for AI questions, still sever…

2372ed4

…al to do items.

Merge branch 'master' of https://github.com/microsoft/pxt into thspar…

eb7b97e

…ks/copilot_criteria

Move ai eval into teacher tool. (Does not remove from editor yet). No…

b210067

…t sure if we really want to expose this to anyone with a MakeCode iframe yet. I've tried to preserve the structure somewhat so it's easy to move back if desired.

Remove ai eval from editor. Not sure if we really want to expose this…

ed379e2

… to anyone with a MakeCode iframe yet. I've tried to preserve the structure somewhat so it's easy to move back if desired.

Fix Updating Result Display

fb952bb

Change "Not Evaluated" to "N/A"

7f77388

Capitalize

189e26f

Resize text area automatically when autoResize is true.

d78ce27

(Untested) Backend Request Changes to askCopilotQuestion

e5dce83

Remove target parameter, change SHAREID to SHARE_ID

3235655

Remove evaluating... from the dropdown.

806a65c

Use a "key" field for system parameters instead of "default"

ed5bc8c

Do not print N/A results

850989b

Vertical resize textarea when it's horizontally resized if autoresize…

b35f451

… is on.

Hide notes while evaluation is in progress.

3ed5832

New loading indicator for results

84377d8

Merge branch 'master' of https://github.com/microsoft/pxt into thspar…

7b6a85b

…ks/copilot_criteria_before_autorun_changes

Make test criteria appear at the top and put shared before specific.

f09f969

Add copilot url parameter for setting the endpoint to use.

f31bd65

Rings instead of circles for loading

ffa278a

Make copilot url param implicitly enable test criteria

328b16d

Merge branch 'master' of https://github.com/microsoft/pxt into thspar…

ba54a94

…ks/copilot_criteria_before_autorun_changes

Remove newline change

59dd7de

Clarify the different set result transforms

d2a430c

Comment update

52dccca

Prettier

0d83f85

thsparks requested a review from a team April 9, 2024 18:34

thsparks changed the title ~~Copilot Criteria Support~~ Teacher Tool: Copilot Criteria Support Apr 9, 2024

srietkerk reviewed Apr 9, 2024

View reviewed changes

thsparks added 5 commits April 9, 2024 15:23

Less wordy description for AI question

38144f9

import useEffect

4b5cc24

askCopilotQuestion -> askCopilotQuestionAsync

e5bba55

Remove trailing slash assumption

68ccfcf

Remove redundant "React."

b862442

srietkerk approved these changes Apr 9, 2024

View reviewed changes

eanders-ms approved these changes Apr 10, 2024

View reviewed changes

Remove value state. I don't think this is necessary.

476a8e1

thsparks merged commit 820f383 into master Apr 10, 2024
6 checks passed

thsparks deleted the thsparks/copilot_criteria_before_autorun_changes branch April 10, 2024 21:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Teacher Tool: Copilot Criteria Support #9953

Teacher Tool: Copilot Criteria Support #9953

thsparks commented Apr 9, 2024 •

edited

Loading

abchatra commented Apr 9, 2024

srietkerk Apr 9, 2024

thsparks Apr 9, 2024

srietkerk Apr 9, 2024

thsparks Apr 9, 2024

srietkerk Apr 9, 2024

thsparks Apr 9, 2024 •

edited

Loading

srietkerk Apr 9, 2024

thsparks Apr 9, 2024

srietkerk Apr 9, 2024

thsparks Apr 9, 2024

srietkerk Apr 9, 2024

srietkerk Apr 9, 2024

thsparks Apr 9, 2024

srietkerk Apr 9, 2024

srietkerk Apr 9, 2024

thsparks Apr 9, 2024

srietkerk Apr 9, 2024

srietkerk left a comment

eanders-ms left a comment

eanders-ms Apr 10, 2024

thsparks Apr 10, 2024

		const { state: teacherTool } = useContext(AppStateContext);
		const [value, setValue] = useState(teacherTool.evalResults[criteriaId]?.notes ?? "");

Teacher Tool: Copilot Criteria Support #9953

Teacher Tool: Copilot Criteria Support #9953

Conversation

thsparks commented Apr 9, 2024 • edited Loading

Overview

Adjustments to Evaluation Flow

Additional Changes

Follow Up Items

Try It

abchatra commented Apr 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thsparks Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srietkerk left a comment

Choose a reason for hiding this comment

eanders-ms left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thsparks commented Apr 9, 2024 •

edited

Loading

thsparks Apr 9, 2024 •

edited

Loading