Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entity creation #128

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

Entity creation #128

wants to merge 30 commits into from

Conversation

jimkang
Copy link
Contributor

@jimkang jimkang commented Sep 7, 2022

This is an initial PR for the purposes of discussing the changes so far to support the creation of freestanding entities that aren't necessarily associated with documents.

Additions:

  • route that maps the /api/freestanding_entities endpoint to FreestandingEntityViewSet.
  • FreestandingEntityViewSet to handle GET and POSTs to the endpoint.
  • Unit test of existing _get_or_create_entities method.
  • Test for FreestandingEntityViewSet. Currently, this only runs when only the documentcloud/documents tests are run. When all of the tests are run with inv test this test fails.
  • EntityAdmin so that freestanding entities can be viewed in the Django admin UI.

@@ -62,8 +62,12 @@ You must first have these set up and ready to go:
- If you get an error on your console about signatures, fix minio as above.
- If you get an error on your console about tipofday not found, add the static page as above.
15. Develop DocumentCloud and its frontend!
16. You can run the tests with `inv test`.
- If you want to run a subset of the tests, you can specify the directory containing the test you want with the `path` switch like so: `inv test --path documentcloud/documents`.
- You can specify a single file in `--path` if you only want to run the tests in that file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition to the docs. Maybe add how to run a single class or method?

config/urls.py Outdated
@@ -67,6 +69,9 @@ class BulkNestedDefaultRouter(BulkRouterMixin, NestedDefaultRouter):
documents_router.register(
"modifications", ModificationViewSet, basename="modifications"
)
router.register(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the freestanding name - I think it makes sense to just label everything "Entity". Since it is not nested under documents there is no clash, and although they can be freestanding, the idea is always to link them to a document - there is no point to them otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. I thought it might confuse existing users of the API because having the same entity either under documents or at the root level is a subtle distinction, but you know the users way better than I do.

class EntityAdmin(admin.ModelAdmin):
"""Entity Admin"""

list_display = ("name", "kind", "mid", "description", "wikipedia_url")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to read up on the admin options. I think description might be too much for the list display.

"""Entity Admin"""

list_display = ("name", "kind", "mid", "description", "wikipedia_url")
list_filter = ("kind", "description")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_filter makes sense for kind (allow one to filter by kind) but not for description


list_display = ("name", "kind", "mid", "description", "wikipedia_url")
list_filter = ("kind", "description")
search_fields = ("name", "kind", "mid", "description", "wikipedia_url")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a text search = I think name, mid and description make sense here (kind can be filtered by the list_filter options above, and searching through the wikipedia url does not make sense)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have the wikipedia url and want to know if there's an entity for it, how else would you find it? I'll drop kind.

)
)

def delete_model(self, request, obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documents require special clean up on deletion which the document destroy method handles. Entity's have no such requirement and do not have a destroy method. Leaving delete_model undefined so it uses the correct parent implementation.

add_perm("documents.view_entityoccurrence", always_allow)
add_perm("documents.add_entityoccurrence", is_authenticated)
add_perm("documents.change_entityoccurrence", always_deny)
add_perm("documents.delete_entityoccurrence", is_authenticated)
add_perm("documents.view_entity", always_allow)
add_perm("documents.add_entity", is_authenticated)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine for now, but I do think we need to think more about how the permissions should work.

"type_": 1,
"metadata": {"wikipedia_url": "https://en.wikipedia.org/wiki/Knight"},
"salience": 0.14932491,
"mentions": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe less mentions are necessary for testing purposes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that would be much more readable.

# TODO: Should everyone be able to view all entities?
return Entity.objects.all()

def perform_create(self, serializer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to think through how entity creation works in more detail, but I am leaning away from having a POST do a get or create, and more toward multiple API calls. This can be abstracted in the Python SDK if necessary.

I am going to start a notion page to start a broader discussion about if entities should be all global as they are now, or if they should be per user or per organization and what the pros and cons are, and what analysis types they allow for and what we want to use entities for in the future.


def perform_create(self, serializer):
# pdb.set_trace()
"""Initiate asyncrhonous creation of entities"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing asynchronous here, you are not calling it through celery

Copy link
Member

@mitchelljkotler mitchelljkotler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think while we wait to see who the users for this are and what features we want to support for them, you could start adding on to the EntityOccurenceViewSet, to allow people to add manually create, update and delete entity occurrences to their documents. This might be a little tricky since I have the two custom methods to create occurrences using the Google API and to bulk delete all occurrences, but should be possible without changing those API calls. If you run into issues around that, let me know.

"""Entity Admin"""

list_display = ("name", "kind", "mid", "wikipedia_url")
list_filter = ("kind")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a list or tuple.... one of python's downfalls is that ("string") == "string" and you need to do ("string",) with the trailing comma for a tuple of size 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa, very good to know.

readonly_fields = ()

@transaction.atomic
def save_model(self, request, obj, form, change):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can be removed for now

@@ -911,3 +914,18 @@ def post_process(self, request, document_pk=None):
post_process.delay(document_pk, request.data)

return Response("OK", status=status.HTTP_200_OK)


class FreestandingEntityViewSet(viewsets.ModelViewSet):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this to EntityViewSet and rename the current EntityViewSet to EntityOccurenceViewSet as that would be more accurate.

@jimkang
Copy link
Contributor Author

jimkang commented Sep 9, 2022

@mitchelljkotler Just added the latest requested changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants