Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ogc 508 replace elastic search by postgres v3 #1559

Open
wants to merge 42 commits into
base: master
Choose a base branch
from

Conversation

Tschuppi81
Copy link
Contributor

@Tschuppi81 Tschuppi81 commented Oct 25, 2024

Search: Adds postgres search including views /search-postgres?q=test

TYPE: Feature
LINK: ogc-508

Checklist

  • I have performed a self-review of my code
  • I considered adding a reviewer
  • I made changes/features for both org and town6, agency, fsi, translator, winterthur, feriennet
  • I have tested my code thoroughly by hand
  • I have added tests for my changes/features

Copy link

linear bot commented Oct 25, 2024

Copy link

codecov bot commented Oct 25, 2024

Codecov Report

Attention: Patch coverage is 85.60606% with 38 lines in your changes missing coverage. Please review.

Project coverage is 88.55%. Comparing base (675267d) to head (775c48e).

Files with missing lines Patch % Lines
src/onegov/org/models/search.py 80.68% 28 Missing ⚠️
src/onegov/org/views/search.py 68.18% 7 Missing ⚠️
src/onegov/fsi/views/search.py 92.85% 1 Missing ⚠️
src/onegov/search/cli.py 0.00% 1 Missing ⚠️
src/onegov/search/integration.py 91.66% 1 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
src/onegov/agency/views/search.py 100.00% <100.00%> (ø)
src/onegov/directory/models/directory_entry.py 95.23% <ø> (ø)
src/onegov/landsgemeinde/views/search.py 100.00% <100.00%> (ø)
src/onegov/onboarding/app.py 100.00% <100.00%> (ø)
src/onegov/onboarding/models/town_assistant.py 93.47% <100.00%> (ø)
src/onegov/org/app.py 97.81% <100.00%> (ø)
src/onegov/org/cronjobs.py 92.81% <ø> (ø)
src/onegov/org/layout.py 91.45% <100.00%> (+0.03%) ⬆️
src/onegov/org/models/__init__.py 100.00% <100.00%> (ø)
src/onegov/org/models/ticket.py 88.64% <ø> (ø)
... and 12 more

... and 4 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 675267d...775c48e. Read the comment docs.

func.setweight(
func.to_tsvector(
language,
getattr(model.fts_idx_data, field, '')),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the weighted vector bases on the static data from column fts_idx_data generated upon update or reindex events.

@Tschuppi81
Copy link
Contributor Author

With this approach no additional hybrid_properties are needed.

@Tschuppi81 Tschuppi81 marked this pull request as ready for review October 25, 2024 19:40
@Tschuppi81 Tschuppi81 requested a review from Daverball October 25, 2024 19:40
@Tschuppi81
Copy link
Contributor Author

@Daverball Final review for postgres searching on separate views /search-postgres?q=test (not yet productive)

@Tschuppi81 Tschuppi81 force-pushed the ogc-508-replace-elastic-search-by-postgres-v3 branch from a95eef6 to 0eef94f Compare November 7, 2024 14:45
Copy link
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fairly close to a first version we can deploy, there are however some engineering decisions that don't make sense to me and harm performance significantly, so I would like you to revisit those problem areas.

src/onegov/org/models/search.py Outdated Show resolved Hide resolved
src/onegov/org/models/search.py Show resolved Hide resolved
src/onegov/org/models/search.py Outdated Show resolved Hide resolved
else:
results = self.generic_search()

return results[self.offset:self.offset + self.batch_size]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal that we always retrieve all the results and then filter them. But I realize it may be difficult to do all the filtering and sorting in pure postgres and we'd still have to retrieve a full count of all the entries, so we're not saving so much in query time as we would in object translation overhead. But the latter may be significantly larger than the former for large result sets.

src/onegov/org/models/search.py Outdated Show resolved Hide resolved
src/onegov/org/models/search.py Outdated Show resolved Hide resolved
src/onegov/org/models/search.py Outdated Show resolved Hide resolved
src/onegov/org/models/search.py Outdated Show resolved Hide resolved
src/onegov/search/indexer.py Outdated Show resolved Hide resolved
src/onegov/search/integration.py Outdated Show resolved Hide resolved
def filter_non_base_models(models: 'set[type[T]]') -> 'set[type[T]]':
def filter_non_base_models(
models: 'set[type[T]]'
) -> 'set[type[T]]':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the opposite of what we want. This will now potentially miss some models rather than lead to overlapping queries. We want to get rid of models that are already covered by their base class, not the other way around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of indexing Topic and News we only index Page. Same with XYTicket just Ticket

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your examples in the tests, yes. In reality Page is not searchable, so it will still include both Topic and News and not Page (since that was never included to begin with).

I've seen some cases where polymorphic queries weren't working properly and the query always included the base class. If that's what's happening here we always need to use the polymorphic base class, regardless of whether it's searchable or not and instead apply the searchable filter on the results.

Or we manually create a filter expression on the type column for all the searchable polymorphic identities. There may be some performance benefit to this approach, since we avoid generating too many individual queries this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the following statement in reindex_model no rows will be indexed twice

                if i.polymorphic_on is not None:
                    q = q.filter(i.polymorphic_on == i.polymorphic_identity)

Copy link
Member

@Daverball Daverball Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into it briefly:

>>> i = inspect(Topic)
>>> i.base_mapper.class_
<class 'onegov.page.model.Page'>
>>> {
>>>     m.polymorphic_identity
>>>     for m in i.base_mapper.self_and_descendants
>>>     if issubclass(m.class_, Searchable)
>>> }
{'topic', 'news'}

This should give you everything you need. Note that you only want to do this for models that have i.polymorphic_on is not None. So you want a get_base_models function which de-duplicates and gives you the base class for polymorphic models and just the model itself for everything else. And then you change the filter from a single polymorphic identity, to all the searchable ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the base_mapper does the trick!! I did a very similar thing which lead me to identities like ['topic', 'news', 'generic', 'extended', 'image', 'general', 'landsgemeinde', 'builtin', 'custom', None, 'FRM', 'RSV', 'EVN', 'DIR', 'CHT', 'FER', 'vacation', 'daypass', 'room', 'daily-item']

+            for model in models:
+                i = inspect(model)
+                idents = set(e.polymorphic_identity
+                             for e in list(i.self_and_descendants)[1:]
+                             if e.polymorphic_identity is not None)

Copy link
Member

@Daverball Daverball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I was hinting at, you need to do this in two steps in separate locations, you can't do it in one function.

src/onegov/search/integration.py Outdated Show resolved Hide resolved
src/onegov/search/utils.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants