Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: remove supabase #12

Closed
Keyrxng opened this issue Jul 27, 2024 · 19 comments · Fixed by #13
Closed

Proposal: remove supabase #12

Keyrxng opened this issue Jul 27, 2024 · 19 comments · Fixed by #13
Assignees

Comments

@Keyrxng
Copy link
Contributor

Keyrxng commented Jul 27, 2024

We could remove Supabase all together as everything could be done in memory using the GitHub API.

I assume it was used to cut down processing times vs parsing every open issue and set of comments every time but if this is the only reason then it should definitely be removed

As far as I can tell supabase is being used only to "watch" issues when the same could be done with assigned and opened issues followed by a quick parsing of bot comments for the deadline.

This would resolve #10 given the error log regarding the table not existing


If this is run strictly as a workflow then it should be possible for us to use the run output to store a lastCheck value and fetch it on each new run

rfc @gentlementlegen

@gentlementlegen
Copy link
Member

I am fine with the specification however how would you know which repositories to check? Would you set that trhough the configuration?

@Keyrxng
Copy link
Contributor Author

Keyrxng commented Jul 28, 2024

I am fine with the specification however how would you know which repositories to check? Would you set that trhough the configuration?

@gentlementlegen we can use the same approach the devpool does and just have a configurable optOut and include all other repos

@gentlementlegen
Copy link
Member

So we would have a new item within the configuration like:

watch:
   optIn:
      - ubiquity
      - ubiquibot
      - customOrg/some-repo
   optOut:
      - ubiquibot/user-activity-watcher

This way we can chose finely what we want to watch or ignore?

@Keyrxng
Copy link
Contributor Author

Keyrxng commented Jul 28, 2024

I think it should be handled such that you don't have to define all of the org repos tho because that's tedious. We can include optIn for 3rd party orgs or other user repos etc but yeah exactly like that seems ideal

P.S I removed another proposal in the spec. Why aren't we piggybacking the workflow_requested event
or one of that category instead of the wildcard event? Surely it would run less often but still way more often than what we actually need it to. For instance, the devpool workflow is a cron job so we'd be guaranteed an hourly update with that alone

@gentlementlegen
Copy link
Member

Indeed that's why in my example I included 2 orgs + one specific repo (I think we can guess that based of the presence of the /).

It is dangerous to use the workflow_requested as it will create an infinite loop of workflows with the kernel currently. It is also a debate within the automated-merging repo were I created a spec.

@Keyrxng
Copy link
Contributor Author

Keyrxng commented Jul 28, 2024

It is dangerous to use the workflow_requested

Very true but if it could be scoped to a particular workflow such as only running when it's a devpool update that would be a solve for it and is relevant to your comment. Something I'll keep in mind making those changes

I think for now this is the only use case we have but it might come handy later if there is something generic that we can add to events and filter them.

@whilefoo
Copy link
Member

I've also thought about this but the problem is that we'd have to check every repo's config, at a later stage that might mean fetching 1000 configs on every run which is a lot of requests and might get rate limited. It might be fine for now though since we don't have a lot of repos.

@Keyrxng
Copy link
Contributor Author

Keyrxng commented Jul 28, 2024

I've also thought about this but the problem is that we'd have to check every repo's config, at a later stage that might mean fetching 1000 configs on every run which is a lot of requests and might get rate limited. It might be fine for now though since we don't have a lot of repos.

Some plugins seems like they'd be used in more of a global sense and I feel like this is one of those but I know what you mean. I guess we also have the same issue with repo fetching, issue scanning, comment listing too it's quite call-heavy in that sense, adding some rate limit handling would be easy enough if it becomes a problem.

@0x4007
Copy link
Member

0x4007 commented Jul 29, 2024

I've also thought about this but the problem is that we'd have to check every repo's config, at a later stage that might mean fetching 1000 configs on every run which is a lot of requests and might get rate limited. It might be fine for now though since we don't have a lot of repos.

I think it makes more sense to optimize for rate limiting later when it becomes a problem. From what I understand, the rate limits are enforced per installation (organization) and they are quite generous.

GitHub Apps authenticating with an installation access token use the installation's minimum rate limit of 5,000 requests per hour

~83 requests per minute.

@gentlementlegen
Copy link
Member

So far Supabase is required only when we have to query wallets, permits, or things that are stored in the DB and cannot be fetched from GitHub. I think it's fine to use mostly the GitHub API for the long running plugins (Actions), however within workers it is quite problematic as you can only do 50 requests per worker.

@0x4007
Copy link
Member

0x4007 commented Jul 29, 2024

We should store these things in json files under the organization's .ubiquibot-config repository.

@gentlementlegen
Copy link
Member

Storing them in the repo is not reliable as hundreds of events are fired every minute. ACIDity could not be ensured.

@0x4007
Copy link
Member

0x4007 commented Jul 30, 2024

I think we can make that optimization when needed. Besides we have built in revision history. Git is built for changes coming in from different contributors and to be able to handle this gracefully. But realistically I don't foresee hundreds of changes per second etc. and anything less should be perfectly fine.

When multiple contributors push to a branch simultaneously, Git handles this using a combination of commit history, merging, and conflict resolution:

  1. Commit History: Each commit in Git has a unique hash and a reference to its parent commit(s). This ensures that changes are tracked in a linear history.

  2. Merging: When multiple branches diverge, Git can merge them back together. If two branches change different parts of the code, Git will merge them automatically.

  3. Conflict Resolution: If two changes conflict (i.e., they modify the same line of code differently), Git will flag this and require manual resolution by the user.

In cases of concurrent pushes to the same branch:

  • Push Rejection: If someone pushes to a branch after you pulled but before you pushed, your push will be rejected. You must first pull the latest changes, merge them with your local changes, resolve any conflicts, and then push again.
  • Rebase and Force Push: An alternative is to rebase your changes on top of the latest commits from the branch, then force push if necessary.

This system ensures that even with many contributors, the repository's history remains consistent and conflicts are managed systematically.

In theory we can handle push rejections with a refresh and retry.

@gentlementlegen
Copy link
Member

gentlementlegen commented Jul 30, 2024

There will be thousands of changes a day due to the nature of this plugin being triggered by a lot of events, even worse the bigger the organization is. Imagine the following scenario: (to give a sense of scale, Kernel received ~3.5k events a day)

  • two runs start nearly at the same time
  • run A pulls the json file
  • run B pulls the json file

Add that point, both DB are in the same state.

  • run A adds a record
  • run B removes a record
  • run A tries to push the db
  • run B tries to push the db

Conflict happens, but cannot be resolved automatically and needs human attention. The result is that all the changes by run B are lost. This would be a very common scenario given that a lot of different plugins would required access to that file, through the whole organization.

In the case of this plugin specifically, the need of a DB is not required anyway so that will be fine. But I don't think this is reliable on an organization scale.

@0x4007
Copy link
Member

0x4007 commented Jul 30, 2024

Kernel received ~3.5k events a day

That is 2.43 events per minute lol we are very far from any problems.

Conflict happens, but cannot be resolved automatically and needs human attention.

In theory we can handle push rejections with a refresh and retry.

@gentlementlegen
Copy link
Member

It's not about the amount technically but the concurrency. We can handle them only if they can be automatically merged, which would not be the case on a add and delete scenario, or changing the same content scenario.

Then comes the second problem of linking all the fragments of db together: when you query for a user, it checks the relation with the id and the wallet, if you spread that across different repositories it becomes extremely complex to compose queries.

I see what problem we are trying to resolve, and decentralizing data would be great, but it is not as easy as "creating a json per repo" sadly.

@0x4007
Copy link
Member

0x4007 commented Jul 30, 2024

You store in a single json #12 (comment)

Copy link
Contributor

ubiquity-os bot commented Aug 24, 2024

! No price label has been set. Skipping permit generation.

@0x4007
Copy link
Member

0x4007 commented Aug 24, 2024

Not sure how to credit this or if it makes sense to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants