Skip to content
This repository has been archived by the owner on Sep 19, 2024. It is now read-only.

Database Schema Design: October 2023 #787

Open
4 of 5 tasks
0x4007 opened this issue Sep 20, 2023 · 32 comments
Open
4 of 5 tasks

Database Schema Design: October 2023 #787

0x4007 opened this issue Sep 20, 2023 · 32 comments

Comments

@0x4007
Copy link
Member

0x4007 commented Sep 20, 2023

Overview

  • We need a robust schema to manage settlements, which include both credits and debits in our payments ledger.
  • Every table should start with the following columns: id, created, updated and have foreign keys in the last columns.
  • We also should split off the permits into a separate table, and associate them with credits as needed.
    • Perhaps in the future we can allow for credits without permits.
  • The bot is expected to be stable after these changes are implemented which implies that all of the functions that interact with the database should also be refactored to accomodate this new architecture.

Schema

my_modified_backup.sql.zip

supabase com_dashboard_project_wfzpewmlyiozupulbuur_database_schemas (1)

access

Access control now has a column for JSON arrays, so that any project can set access control for fully customizable labels.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT now() NOT NULL
updated timestamp with time zone DEFAULT now() NOT NULL
user_id integer NOT NULL
location_id integer NOT NULL
multiplier smallint DEFAULT '1'::smallint NOT NULL
labels json

credits

Permits are optional. Perhaps we can assume XP calculations for credits without permits. Alternatively we need to think about other ways to compensate. For Gnosis Safe "permits", what data can do store instead of permits? rfc @rndquu

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
amount numeric NOT NULL
permit_id integer

debits

No remarks

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT now() NOT NULL
updated timestamp with time zone DEFAULT now() NOT NULL
amount numeric NOT NULL
location_id integer

location

Represents where the event occurred. org/repo/issue/comment are all optional fields, because I anticipate that we will track only the most granular level of detail. For example, if we are tracking a comment, we will not need to track the issue or repository.

We are using node_ids because GitHub does not allow direct lookup of resources using numerical IDs. Instead, we would have to "search" for those results.

I'm unsure if it makes the most sense to consolidate to a single column for the node_id or if we should keep them separate. I'm leaning towards keeping them separate, because it will be easier to query for specific events. For example, if we want to see all events that occurred on a specific repository, we can query for node_id_repository.

I included a URL for easy auditing while we are in development.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
node_id_organization text
node_id_repository text
node_id_issue text
node_id_comment text
url text

logs

I did not put significant thought into this, but I figured we could use it for generic remarks/comments in the database and dump everything into log_entry.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT now() NOT NULL
updated timestamp with time zone DEFAULT now() NOT NULL
log_entry text
location_id integer

partners

location_id is intended to save the partner organization.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT now() NOT NULL
updated timestamp with time zone DEFAULT now() NOT NULL
wallet_id integer
location_id integer NOT NULL

permits

Foreign keys to recognized tokens, partners, and users (beneficiary)

Just realized but not sure if I need to store the network for the transactions as well or if we should derive from token_id.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT now() NOT NULL
updated timestamp with time zone DEFAULT now() NOT NULL
amount numeric
nonce numeric NOT NULL
deadline numeric NOT NULL
signature character(132) NOT NULL
token_id integer
partner_id integer
beneficiary_id integer NOT NULL
transaction character(66)
location_id integer

tokens

Tokens take a lot of space. Let's make a recognized table of tokens. Perhaps we should also store the token symbol since they are unlikely to change.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT now() NOT NULL
updated timestamp with time zone DEFAULT now() NOT NULL
network smallint NOT NULL
address character(42) NOT NULL

users

I am debating on if we should store all their GitHub data. We don't have any planned use for it (other than generating embeddings for their solved issues) but even that isn't included in the original user schema which just had their profile information.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT now() NOT NULL
updated timestamp with time zone DEFAULT now() NOT NULL
wallet_id integer

wallets

No remarks.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT now() NOT NULL
updated timestamp with time zone DEFAULT now() NOT NULL
address character(42)
CONSTRAINT wallets_address_check CHECK ((length(address) = 42))

settlements

No remarks.

Column Name Type/Constraints
id integer NOT NULL
created timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
updated timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
user_id integer NOT NULL
location_id integer
credit_id integer
debit_id integer

Old Spec (For Reference)

Tables

I think it might make sense to do them in reverse order. Here I listed parents then children, but creating the children tables in order to link to the parent tables might be easiest for implementation.

Implementation Notes

  • The Credits table now has a foreign key permitId that can link to the Permits table, allowing you to associate a credit with a permit if necessary. However, this is optional, so you can have credits without permits as well.
  • Set up foreign key constraints for creditId and debitId in the Settlements table.
  • Application logic should prevent a Settlements entry from having both a creditId and a debitId.
  • Adjust data types like BigInteger or Decimal for large amounts if needed.
@0x4007 0x4007 changed the title Database: creditId Database: settlementId Sep 20, 2023
@0x4007 0x4007 changed the title Database: settlementId Database Schema Design: Settlements Sep 20, 2023
@0x4007
Copy link
Member Author

0x4007 commented Sep 20, 2023

Debating on this but still thinking about when we eventually make a command to allow repository maintainers to override the contributor's XP: we will need to create a new column in the settlements table. xpId or something. It will be a foreign key to an "XP Modifiers" table. There we can store all the details around that modification. For example: id, created time, updated time, amount.

We can exclude modifiers from the global leaderboard calculation (globalXP) so that its more difficult to exploit (i.e. collusion between an org and a contributor). We could even make a whistleblower incentive to make sure that people police each other for exploiting the XP system possibly.

Since it would be available in the settlements table, that means the penalties parent table can associate the user, orgId, repoId, issueId and commentURL to see exactly what happened.

@wannacfuture
Copy link
Contributor

/start

@ubiquibot
Copy link

ubiquibot bot commented Sep 20, 2023

Too many assigned issues, you have reached your max of 2

@seprintour
Copy link
Contributor

/start

@ubiquibot
Copy link

ubiquibot bot commented Sep 21, 2023

Deadline Fri, 22 Sep 2023 05:23:00 UTC
Registered Wallet 0x3623338046b101ecEc741De9C3594CC2176f39E5
Tips:
  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address @user.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the bounty.

    @seprintour
    Copy link
    Contributor

    @pavlovcik There was already a Permit table, guess we should use that instead..

    It has all of the fields here and more

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 22, 2023

    @pavlovcik There was already a Permit table, guess we should use that instead..

    It has all of the fields here and more

    Thanks for highlighting this. The only concern I have with the current implementation is that it has an issue ID associated with every permit, but I want to allow the flexibility for us to issue permits without necessarily associating them to issue IDs.

    I believe that my specification allows for more flexibility, but I also do not claim to be a database architecture expert.


    I also just realized that ChatGPT jumbled up some of the rows. It left some permit specific data inside of the credits table.

    @seprintour
    Copy link
    Contributor

    I also just realized that ChatGPT jumbled up some of the rows. It left some permit specific data inside of the credits table.

    Yea, network, signature and owner is permit data.

    Previous permit table contains repository data and issue id to pinpoint the exact origin of the permit. Are you saying we don't need that anymore.

    Because none of the tables here refers to a repository (origin) of the payment

    @seprintour
    Copy link
    Contributor

    Rfc

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    The parent table does. #767

    @seprintour
    Copy link
    Contributor

    The parent table does. #767

    Okay then, this issue entail just creating the tables, moving permit data to the new table. Is creating functions to add these data to DB included? @pavlovcik

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    The bot should be stable after the database is changed out so yes it includes modifying all the functions that interact with the database.

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    /ask how long do you estimate that this task would take, in hours?

    @ubiquibot
    Copy link

    ubiquibot bot commented Sep 23, 2023

    ! Error: This model's maximum context length is 16385 tokens. However, you requested 20407 tokens (4023 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    ! Error: This model's maximum context length is 16385 tokens. However, you requested 20407 tokens (4023 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

    @Keyrxng time for compression/prioritization? Not a great first real world attempt lol.

    Prioritization order:

    1. Current issue specification
    2. Linked issue specification (in order of linked, the first link taking higher priority than the next link)
    3. Current issue conversation
    4. Linked issue conversations (same ordering system)

    We should use a tokenization estimator to know how much we should exclude.

    @seprintour
    Copy link
    Contributor

    @pavlovcik I'm adding penalties to this too?

    I see the penalty issue was rolled onto this

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    Most likely but let me just get a little time to refine this spec. Sorry I realized after filing the issue that there were some mistakes in my specification. I might break this up into separate tasks.

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    /ask how long do you estimate that this task would take, in hours?

    @ubiquibot
    Copy link

    ubiquibot bot commented Sep 23, 2023

    ! Error: This model's maximum context length is 16385 tokens. However, you requested 21257 tokens (4873 in the messages, 16384 in the completion). Please reduce the length of the messages or completion.

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    @seprintour I broke it down into four subtasks

    @seprintour
    Copy link
    Contributor

    @seprintour I broke it down into four subtasks

    Okay, credit depends on permit (because permit is going to be modified and it needs to be linked to the permit id after modification)

    Settlements depends on credit and debit (If its not merged in this order, the supabase migration will fail to run automatically)

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    I noted that as well on the Tables section.

    @seprintour
    Copy link
    Contributor

    I also think the amount for debit and credit should be a float or big integer if we are converting decimal to bigint because permit amount is not always in whole numbers

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    Lets chat under that issue instead.

    @Keyrxng
    Copy link
    Contributor

    Keyrxng commented Sep 23, 2023

    4023 in the messages, 16384 in the completion

    Same as other shout out, you've set the output to be the maximum token amount for that model. max_tokens represents output so you gotta drop the token limit to half or 2/3.

    the actual context fed into GPT was only 4k tokens

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 23, 2023

    I set to 8000 max tokens in the org config and it silent crashes

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 26, 2023

    supabase com_dashboard_project_wfzpewmlyiozupulbuur_database_schemas (1)


    just realized that ids are useless. need node_ids and graphql api

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 26, 2023

    image

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 27, 2023

    Screenshot 2023-09-27 at 09 12 18

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 27, 2023

    I finished the schema but i still need to verify the node id situation. I might need to change the ids under 'location' table to track references on github e.g. org/repo/issue/comment ids
    final-database.zip

    image

    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 27, 2023

    access

    Access control now has a column for JSON arrays, so that any project can set access control for fully customizable labels.

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT now() NOT NULL
    updated timestamp with time zone DEFAULT now() NOT NULL
    user_id integer NOT NULL
    location_id integer NOT NULL
    multiplier smallint DEFAULT '1'::smallint NOT NULL
    labels json

    credits

    Permits are optional. Perhaps we can assume XP calculations for credits without permits. Alternatively we need to think about other ways to compensate. For Gnosis Safe "permits", what data can do store instead of permits? rfc @rndquu

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
    updated timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
    amount numeric NOT NULL
    permit_id integer

    debits

    No remarks

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT now() NOT NULL
    updated timestamp with time zone DEFAULT now() NOT NULL
    amount numeric NOT NULL
    location_id integer

    location

    Represents where the event occurred. org/repo/issue/comment are all optional fields, because I anticipate that we will track only the most granular level of detail. For example, if we are tracking a comment, we will not need to track the issue or repository.

    We are using node_ids because GitHub does not allow direct lookup of resources using numerical IDs. Instead, we would have to "search" for those results.

    I'm unsure if it makes the most sense to consolidate to a single column for the node_id or if we should keep them separate. I'm leaning towards keeping them separate, because it will be easier to query for specific events. For example, if we want to see all events that occurred on a specific repository, we can query for node_id_repository.

    I included a URL for easy auditing while we are in development.

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
    updated timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
    node_id_organization text
    node_id_repository text
    node_id_issue text
    node_id_comment text
    url text

    logs

    I did not put significant thought into this, but I figured we could use it for generic remarks/comments in the database and dump everything into log_entry.

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT now() NOT NULL
    updated timestamp with time zone DEFAULT now() NOT NULL
    log_entry text
    location_id integer

    partners

    location_id is intended to save the partner organization.

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT now() NOT NULL
    updated timestamp with time zone DEFAULT now() NOT NULL
    wallet_id integer
    location_id integer NOT NULL

    permits

    Foreign keys to recognized tokens, partners, and users (beneficiary)

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT now() NOT NULL
    updated timestamp with time zone DEFAULT now() NOT NULL
    amount numeric
    nonce numeric NOT NULL
    deadline numeric NOT NULL
    signature character(132) NOT NULL
    token_id integer
    partner_id integer
    beneficiary_id integer NOT NULL
    transaction character(66)
    location_id integer

    tokens

    Tokens take a lot of space. Let's make a recognized table of tokens. Perhaps we should also store the token symbol since they are unlikely to change.

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT now() NOT NULL
    updated timestamp with time zone DEFAULT now() NOT NULL
    network smallint NOT NULL
    address character(42) NOT NULL

    users

    I am debating on if we should store all their GitHub data. We don't have any planned use for it (other than generating embeddings for their solved issues) but even that isn't included in the original user schema which just had their profile information.

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT now() NOT NULL
    updated timestamp with time zone DEFAULT now() NOT NULL
    wallet_id integer

    wallets

    No remarks.

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT now() NOT NULL
    updated timestamp with time zone DEFAULT now() NOT NULL
    address character(42)
    CONSTRAINT wallets_address_check CHECK ((length(address) = 42))

    settlements

    No remarks.

    Column Name Type/Constraints
    id integer NOT NULL
    created timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
    updated timestamp with time zone DEFAULT CURRENT_TIMESTAMP NOT NULL
    user_id integer NOT NULL
    location_id integer
    credit_id integer
    debit_id integer

    supabase com_dashboard_project_wfzpewmlyiozupulbuur_database_schemas (1)

    @0x4007 0x4007 changed the title Database Schema Design: Settlements Database Schema Design: October 2023 Sep 27, 2023
    @0x4007
    Copy link
    Member Author

    0x4007 commented Sep 27, 2023

    supabase com_dashboard_project_wfzpewmlyiozupulbuur_database_schemas (2)
    1695844219.zip

    Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants