fix: adds scraper logic and token handling #190

shiv810 · 2024-12-27T01:22:43Z

Resolves #187

Adds the scraper logic
Fetches if the data in the DB is older than 24 hrs

ubiquity-os-deployer · 2024-12-27T01:24:27Z

9f22bfb

264701a

d038a0b

7766a20

cbabb43

eb012dc

94fef8c

8f3b4e6

d80b712

609db1f

91fb9e7

0x4007

Seems generally fine changes not sure how can be QA'd

Was wondering if another approach makes sense such as only copying with high precision based on when an issue is closed as complete and only that is copied over for example. But if we are hunting down historical issues then obviously this isn't a solution

My concern is redundantly finding the same historical issues and doing the same redundant search and scrape work. I am wondering if there is a better approach to this problem.

shiv810 · 2024-12-28T23:39:20Z

@zugdev There are two locksfiles bun.lockdb and yarn.lock, is that intended, which one is suppoed to be used ?

shiv810 · 2024-12-28T23:40:03Z

Seems generally fine changes not sure how can be QA'd

I'll create a demo video to walk through the flows.

Right now, we store the timestamp of the last scrape for the user, I think it should be possible to use this timestamp as a filter to check for any newly closed issues marked as completed. If there are any, we can proceed with the scrape; otherwise, we can skip the process for that user.

zugdev · 2024-12-29T15:13:30Z

@zugdev There are two locksfiles bun.lockdb and yarn.lock, is that intended, which one is suppoed to be used ?

The bun.lockb one is related to Cloudflare Wrangler, the main one is yarn.lock. Use the later.

0x4007 · 2024-12-30T04:23:32Z

Perhaps we should consolidate all to bun? @gentlementlegen rfc

gentlementlegen · 2024-12-30T04:30:35Z

Ideally we should use bun everywhere for consistency, except if there is a reason not to. It matters mostly because when testing packages locally, bun link and yarn link are not interchangeable (at least from my experience).

shiv810 · 2024-12-30T05:22:53Z

Is there a way to add secrets in the final build without exposing them in dist? I’m thinking of creating a pg function to handle this or using Cloudflare KV.

gentlementlegen · 2024-12-30T06:04:06Z

Yes, you can store and pull them from the GitHub Action environment during deployment.

shiv810 · 2024-12-30T06:18:01Z

Yes, you can store and retrieve them from the GitHub Actions environment during deployment.

You’ll still be able to view the values in the home.js file after the final build. Is that fine? I think it’s fine for the SUPABASE URL and ANON KEY, but storing the VOYAGE API KEY in plain text doesn’t seem secure.

gentlementlegen · 2024-12-30T06:37:38Z

Why would it be in the final build? It should not be there.

shiv810 · 2024-12-30T06:45:09Z

The scraper logic is executed when a user logs in on the client side. So, we will need voyage API key for embeddings.

gentlementlegen · 2024-12-30T07:20:16Z

Ha yes I see now, I was missing the context sorry. We should definitely not expose that API key on the client side. Sadly, we don't use SSR so I don't know how we can handle this except having some API pour that we host ourselves somewhere, or storing it in cookies?

0x4007 · 2024-12-30T07:40:28Z

Make a worker endpoint. We have a setup like this for pay.ubq.fi that's related to the cards. Check its code.

@EresDev rfc

EresDev · 2024-12-30T09:33:11Z

Make a worker endpoint. We have a setup like this for pay.ubq.fi that's related to the cards. Check its code.

@EresDev rfc

Yes, it looks like VOYAGEAI part needs a backend. You can get started with pages functions.

We store API keys in cloudflare worker secrets. However, one similar thing that ubiquity-os-kernel does differently is store the API keys in Github, but pushing it to cloudflare during worker deployment. This helps with managing the secrets in one place (GitHub) and we probably will do the same soon in pay.ubq.fi if possible.

shiv810 · 2024-12-30T20:09:17Z

We store API keys in cloudflare worker secrets.

We can store API keys in Cloudflare KV, but I don't think this is the best approach, as we could quickly hit the read limit. A better option might be Supabase Functions, which would allow us to read directly from the Supabase Vault.

gentlementlegen · 2024-12-30T22:32:14Z

You can have environment secrets in cloud flare that do not count against any quota. We can upload then from Github environment secrets, like we do for most plug-ins.

shiv810 · 2025-01-04T16:53:03Z

QA:

Mean Stats:

CPU Wall time: 3.2ms
Sub Requests: 16 (For About 61 Issues)

@zugdev could you add the supabase service role key to the secrets under SUPABASE_KEY ? That's need for adding the scraped results back to the table. Please ping me in telegram for the value.

shiv810 · 2025-01-06T04:09:04Z

@0x4007

QA:

https://scraperui.work-ubq-fi-50d.pages.dev/

You should be able to see the scraper function request, in the Network Tab of the Inspect Window.

0x4007 · 2025-01-06T04:41:38Z

I'm mobile these days so I don't want to hold up the review somebody else please check on their computer for me.

.github/workflows/deploy.yml

functions/issue-scraper.ts

0x4007 · 2025-01-06T04:48:39Z

functions/issue-scraper.ts

+): Promise<void> {
+  const { error } = await supabase.from("issues").upsert(issues);
+  if (error) {
+    throw new Error(`Error during batch upsert: ${error.message}`);


Perhaps we can try an exponential backoff? I wonder if this is worth implementing to make this more robust although I imagine that errors are quite rare for this.

It should be possible to implement, but if a request fails and we retry multiple times, the entire request will eventually be terminated once it hits the maximum CPU wall time.

Retries should be handled from the client so no timeouts on the worker side should be relevant.

functions/issue-scraper.ts

src/home/authentication.ts

0x4007 · 2025-01-06T04:52:15Z

src/home/scraper/issue-scraper.ts

+
+  if (lastFetch) {
+    const lastFetchTimestamp = Number(lastFetch);
+    if (now - lastFetchTimestamp < 24 * 60 * 60 * 1000) {


I wonder if there is a better way to implement this. For example, instead of by timing, we can do based on an event.

Although I can't think of how it would be possible in outside organization contexts.

src/home/scraper/issue-scraper.ts

gentlementlegen · 2025-01-07T08:37:02Z

functions/issue-scraper.ts

+  return md.plainText;
+}
+
+const SEARCH_ISSUES_QUERY = `


You can add /* GraphQL */ to help IDEs parse and format GraphQL queries.

Suggested change

const SEARCH_ISSUES_QUERY = `

const SEARCH_ISSUES_QUERY = /* GraphQL */`

functions/issue-scraper.ts

gentlementlegen · 2025-01-07T08:40:46Z

package.json

@@ -36,10 +36,14 @@
    "@octokit/request-error": "^6.1.0",
    "@octokit/rest": "^20.0.2",
    "@supabase/supabase-js": "^2.39.0",
+    "@types/markdown-it": "^14.1.2",


Should be in devDependencies

gentlementlegen · 2025-01-07T08:41:23Z

yarn.lock

Shouldn't this be deleted since you used bun?

I'm not entirely sure about this. @zugdev mentioned that Bun is related to Cloudflare, while Yarn is the main one.So, I've reverted the bun.lockb to match the repo version and added the dependencies to the yarn.lock.

I suppose it's okay to use either because the frontend doesn't rely on plugins or the kernel, but having both lockfiles seems unnecessary, I would suggest removing one or the other 😄

fix: adds scraper logic and token handling

9f22bfb

shiv810 requested a review from 0x4007 as a code owner December 27, 2024 01:22

fix: supabase cleanup

264701a

0x4007 approved these changes Dec 28, 2024

View reviewed changes

fix: env

1e206d5

sshivaditya added 9 commits January 3, 2025 17:14

fix: base scraper function

d038a0b

fix: update the deploy.yml

7766a20

fix: update the deploy.yml

cbabb43

fix: update the deploy.yml

eb012dc

fix: update the deploy.yml

94fef8c

fix: update the deploy.yml and build.yml

bb38ea4

fix: update the deploy.yml and build.yml

f6eb70b

fix: update the deploy.yml and build.yml

624bf22

fix: update the deploy.yml and build.yml

ae6bd0e

sshivaditya added 14 commits January 3, 2025 21:59

fix: update deploy.yml

fd34c82

fix: update deploy.yml

c6efe22

fix: deploy env script

685122a

fix: deploy env script

b607041

fix: update env script

8f8cadf

fix: update env script

f8c0be9

fix: update env script

60ee890

fix: update env script, scraper function deploys on cloudflare

fb03b1f

fix: removing org subs check temp

6f49eba

fix: removing org subs check temp

ed3ad2e

fix: reduce the number of requests by batching

b53a607

fix: added back the org subs check

5e04669

fix: timestamp based issue scraping

b2dabcd

fix: timestamp based issue scraping

13424e4

0x4007 requested review from zugdev and gentlementlegen January 6, 2025 04:40

0x4007 reviewed Jan 6, 2025

View reviewed changes

sshivaditya added 3 commits January 7, 2025 02:25

fix: tsconfig and auth cleanup

8f3b4e6

fix: throw error when issue.title is empty

3ed6c54

fix: handle the failures where the issue.title is missing

d80b712

0x4007 approved these changes Jan 6, 2025

View reviewed changes

gentlementlegen reviewed Jan 7, 2025

View reviewed changes

functions/issue-scraper.ts Show resolved Hide resolved

gentlementlegen reviewed Jan 7, 2025

View reviewed changes

sshivaditya added 2 commits January 9, 2025 21:28

fix: graphql lint and bun.lockdb del

609db1f

fix: graphql formatting

91fb9e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: adds scraper logic and token handling #190

fix: adds scraper logic and token handling #190

shiv810 commented Dec 27, 2024

ubiquity-os-deployer bot commented Dec 27, 2024 •

edited

Loading

0x4007 left a comment •

edited

Loading

shiv810 commented Dec 28, 2024

shiv810 commented Dec 28, 2024 •

edited

Loading

zugdev commented Dec 29, 2024

0x4007 commented Dec 30, 2024

gentlementlegen commented Dec 30, 2024

shiv810 commented Dec 30, 2024 •

edited

Loading

gentlementlegen commented Dec 30, 2024

shiv810 commented Dec 30, 2024

gentlementlegen commented Dec 30, 2024

shiv810 commented Dec 30, 2024 •

edited

Loading

gentlementlegen commented Dec 30, 2024

0x4007 commented Dec 30, 2024 •

edited

Loading

EresDev commented Dec 30, 2024 •

edited

Loading

shiv810 commented Dec 30, 2024

gentlementlegen commented Dec 30, 2024

shiv810 commented Jan 4, 2025 •

edited

Loading

shiv810 commented Jan 6, 2025

0x4007 commented Jan 6, 2025

0x4007 Jan 6, 2025

shiv810 Jan 6, 2025

0x4007 Jan 6, 2025

0x4007 Jan 6, 2025

gentlementlegen Jan 7, 2025

gentlementlegen Jan 7, 2025

gentlementlegen Jan 7, 2025

shiv810 Jan 7, 2025

gentlementlegen Jan 7, 2025

	const SEARCH_ISSUES_QUERY = `
	const SEARCH_ISSUES_QUERY = /* GraphQL */`

fix: adds scraper logic and token handling #190

Are you sure you want to change the base?

fix: adds scraper logic and token handling #190

Conversation

shiv810 commented Dec 27, 2024

ubiquity-os-deployer bot commented Dec 27, 2024 • edited Loading

0x4007 left a comment • edited Loading

Choose a reason for hiding this comment

shiv810 commented Dec 28, 2024

shiv810 commented Dec 28, 2024 • edited Loading

zugdev commented Dec 29, 2024

0x4007 commented Dec 30, 2024

gentlementlegen commented Dec 30, 2024

shiv810 commented Dec 30, 2024 • edited Loading

gentlementlegen commented Dec 30, 2024

shiv810 commented Dec 30, 2024

gentlementlegen commented Dec 30, 2024

shiv810 commented Dec 30, 2024 • edited Loading

gentlementlegen commented Dec 30, 2024

0x4007 commented Dec 30, 2024 • edited Loading

EresDev commented Dec 30, 2024 • edited Loading

shiv810 commented Dec 30, 2024

gentlementlegen commented Dec 30, 2024

shiv810 commented Jan 4, 2025 • edited Loading

shiv810 commented Jan 6, 2025

0x4007 commented Jan 6, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ubiquity-os-deployer bot commented Dec 27, 2024 •

edited

Loading

0x4007 left a comment •

edited

Loading

shiv810 commented Dec 28, 2024 •

edited

Loading

shiv810 commented Dec 30, 2024 •

edited

Loading

shiv810 commented Dec 30, 2024 •

edited

Loading

0x4007 commented Dec 30, 2024 •

edited

Loading

EresDev commented Dec 30, 2024 •

edited

Loading

shiv810 commented Jan 4, 2025 •

edited

Loading