-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Add Person and Reference models #5
Conversation
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
No instance has this schema module yet, so you can remove all the migrations and generate a first migration script. |
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
@falexwolf Can you have a final look, in case I missed anything? |
Signed-off-by: zethson <[email protected]>
Signed-off-by: zethson <[email protected]>
Ahhhhhhhh, didn't we discuss that we do not want to change the Already the "full_text" vs. "text" change was problematic. I forgot the specific discussion but somehow there was an assumption of separating abstract vs. full_text or something like this and in my opinion this added too much structure; leaving the user confused and data fragmented. Now, it turns out that several fields were added here without them being discussed. Because it was a migration from another repo, we don't have a diff that we could review and I didn't notice the schema changes. The PR description is also silent about it and pretends the registry was simply moved over. I'm only noticing now when trying to run data migrations in SQL across all instances (including customer instances). Of course these are throwing errors now. Next time, please make diffs when making schema changes. Migrating package structure and database schema at the same time is a bad idea. Version in
|
In the absence of a diff, I'm identifying three changes by eye. Change 1 preprint: bool = BooleanField(default=False, db_index=True)
"""Whether the reference is from a preprint.""" I'd remove this field: It makes the model complicated and might lead users to register the same paper as two different references: once for the preprint and once for the published version. In almost all cases, we'd not want this and professional managers thereby support "merging references". Of course we don't support this and hence this will likely cause more problems than solve. Change 2 journal: str | None = TextField(null=True)
"""Name of the journal.""" I'd also remove this field or think more deeply before adding it. Journal names are highly standardized and it's hella annoying if people just add random strings to it; one will have all kinds of abbreviations, typos etc. -- this was the reason for why there was no such field before. In my experience it's hard to model this well (internally on notion we have the ReferenceSource table for this which is very dissatisfactory, too). A solution could be a field where people could add free form text to "type" the reference similar to the W&B API. But if it's called "Journal" it should be an FK to a constrained vocab for a journal ontology; which IMO is over-engineering. Change 3 authors: Person = models.ManyToManyField(Person, related_name="references")
"""All people associated with this reference.""" I think this is OK! This works both for internal and external references and the name choice "authors" is universally accepted, I believe. There is an issue with ordering that we can resolve via an |
Change 4 public: bool = BooleanField(default=True, db_index=True)
"""Whether the reference is public.""" Isn't this inconsistent with the |
About this: preprint: bool = BooleanField(default=False, db_index=True)
"""Whether the reference is from a preprint."""
public: bool = BooleanField(default=True, db_index=True)
"""Whether the reference is public."""
journal: str | None = TextField(null=True)
"""Name of the journal."""
description: str | None = TextField(null=True)
"""Description of the reference.""" Why are the indexes on the booleans but no indexes on the char fields? Not so critical right now, but there should be consistency on where we put indexes. |
I'm back tomorrow but IIRC these changes were done for cellxgene and I certainly remember having discussed them with Sunny.
True - point taken! You were the final reviewer btw and we even had discussions on some of these schema changes: #5 (comment) |
Now I'm feeding this into Claude who finds a few more changes:
Yes. But I approved assuming that there are zero changes to the Reference model given we need to migrate customer instances. This is what the PR description said and you can't expect me to memorize all fields of all registries we have. 😅
I only reviewed the But given these two more reasons I had assumed that there is no way that the Reference model might actually have changed:
Please, going forward, more cautiousness and more extensive PR descriptions in particular for migrations which persist stuff and are very hard to undo. Also @sunnyosun! |
Btw, this was migration experience: https://lamin.ai/laminlabs/lamin-dev/transform/ySlebT1AbuX80001/qghWI4H5rJ1kq5Zf5AyX |
I think we assumed no instance had the |
We had a few discussions about I think you're probably already convinced that this PR should have been more thoughtful. But if you don't do SQL migrations yourself, it's hard to develop the kind of cautiousness one should have. Below is how debugging the migration script proceeds. https://lamin.ai/laminlabs/lamin-dev/transform/ySlebT1AbuX80002 |
🎉 7 versions later! what a great AI agent I am 😆 💪 https://lamin.ai/laminlabs/lamin-dev/transform/ySlebT1AbuX80007 |
Add
Person
andReference
models into the schema.As part of integrating laminlabs/findrefs into
ourprojects
, theReference
model is now incorporated directly within theourprojects
schema.