Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"L2 Agent" #37

Open
0x4007 opened this issue Aug 21, 2024 · 9 comments
Open

"L2 Agent" #37

0x4007 opened this issue Aug 21, 2024 · 9 comments

Comments

@0x4007
Copy link
Member

0x4007 commented Aug 21, 2024

I was reading my friends blog post and was inspired to think about AI systems in a more structured way. They have these AI "level" designations.

It would be interesting to make an L2 agent according to the definition in the blog post:

L2 agents use LLMs selectively to decide how to handle key points in the program’s control flow.
Today, this often boils down to deciding which tool to invoke based on a set of tools which have been carefully curated by a human programmer.
The most common example of L2 agents today is invoking an LLM with access to tools in a while loop.
The majority of the program’s control flow still resides outside of the LLM’s purview and is controlled by a human programmer.

This is a stepping stone to L3 according to the blog because L3 coordinates L2 and below.


We can make this a command interface where we can tag the bot and ask for requests in plain language:

@ubiquity-os give me the wallet address of @0x4007


In the above example, we should pass the entire help menu to ChatGPT and it can invoke the correct plugin based on the command description.

I think this should be quite straightforward to implement, and is a useful stepping stone towards a more advanced AI powered system.

We can use ChatGPT 4o mini because this seems pretty simple to just look at the help menu.

Advanced Version

As a more advanced version of this plugin, we can listen for every comment (no bot tag required) and the bot can jump in to help if it thinks it can based on any comment. For example, if somebody asks to be assigned to a task, perhaps the bot can somehow invoke /start on behalf of that user (which inherits all of the checks, like if they are already assigned to too many other open tasks etc)

This makes the bot's presence much more pronounced, and it will truly feel like a helpful, and proactive member of the team instead of "a tool" that must be specifically called upon for help.

Remark

I suppose if it calls other plugins with LLMs (like conversation rewards, somehow) then technically this would be considered an L3 class system.

@0x4007
Copy link
Member Author

0x4007 commented Aug 21, 2024

Seems that L4 requires the bot to write a custom plugin at runtime to handle a novel task.

This could be really interesting (and feasible) with automated CI checking.

I feel like this might be quite slow to run CI on every commit, but would be incredible to see it self build, save, and install a new plugin for future runs, which the L2 described in this specification would be able to invoke in the future.

If we could pull off L4 I'm sure we could go viral/trend in programmer news. Most of the infrastructure is in place, but making robust CI end-to-end tests seems like a month long project.

@Keyrxng
Copy link

Keyrxng commented Aug 21, 2024

I previously experimented with building a L2 agent using V1

This could be really interesting (and feasible) with automated CI checking.

I agree both v interesting and definitely feasible.

A "simple" V1 is possible if we map safe commands and safe direct actions that'll fire the intended plugins.


Concerns and Questions for V2:
Does V2 involve this plugin posting a slash command to GitHub issues, or does it operate by dispatching plugins directly? If it uses slash commands, there's a limitation because plugins will identify the bot as the sender not the actual user, which will break most, if not all plugins.

Implementation Strategy:
To include non-slash command capabilities we'd need to:

  1. Provide the LLM with the manifest of each installed plugin
  2. "Teach" the bot both our API and relevant parts of the GitHub API.
  3. Use GitHub's workflow and repo dispatch when feasible; for other cases, enable the bot to build and execute API calls.

Operational Flow:

  1. User queries are sent to OpenAI.
  2. OpenAI determines if the response should trigger a function call or a simple text reply.
  3. If a function is triggered, the arguments are sent to our tool handler.
  4. After execution, responses are either posted directly to GitHub or returned to LLM for further processing.
  5. The loop ends with the addCommentToIssue tool, which posts results back to GitHub. (or this would be after the LLM interaction has ended and we invoke it manually not as part of the LLM loop)

Challenges:

  • Automating the invocation of any installed plugin directly is complex as it requires detailed knowledge of all plugins and the ability to generate specific payloads for each.
  • Most slash commands posted by the bot will fail unless "safe"
  • Using the chat-api we'll "teach" via the tools we write for it. Streamlined, not as smart.
  • Using the assistants-api we could load it with entire API spec docs. Less-streamlined, far smarter.

Potential Development Paths:

  1. V1 Safe Mode: Allow only pre-approved slash commands; convert all other commands into informative comments.
  2. V1 Direct Action: Enable direct actions on issues (e.g., adding/removing assignees/labels) using parameterized API calls constructed by the LLM. As seen in the old QA I linked above. These actions would cause non-slash commands to fire such as assistive-pricing & task-xp-guard.
  3. V1 Advanced Dispatch: Utilize workflow/repository dispatch which might be very tricky. Calling the kernel directly is trickier still because of the handshake verification etc but could probably be done.

@0x4007
Copy link
Member Author

0x4007 commented Aug 22, 2024

Intuitively I believe that providing all the context and doing direct invocations (not writing the slash command) seems like the best approach. However, this can get expensive because it would require the larger model, and we would be using a lot of context.

We probably would need to rely on tagging the bot if this is the case which is not as interesting.

I was under the impression that we have standardized payload interfaces for all of the plugins, and that we just need to understand the help menu of each plugin.

@0x4007
Copy link
Member Author

0x4007 commented Dec 5, 2024

/start

Copy link

! This task does not reflect a business priority at the moment. You may start tasks with one of the following labels: Priority: 3 (High), Priority: 4 (Urgent), Priority: 5 (Emergency)

@0x4007
Copy link
Member Author

0x4007 commented Dec 5, 2024

/start

Copy link

Warning! This task was created over 106 days ago. Please confirm that this issue specification is accurate before starting.
Deadline Fri, Dec 6, 3:45 PM UTC
Beneficiary 0x4007CE2083c7F3E18097aeB3A39bb8eC149a341d

Tip

  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.

@0x4007
Copy link
Member Author

0x4007 commented Dec 5, 2024

/stop

Copy link

! Adding a label to issue failed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants