-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ogc 508 replace elastic search by postgres v3 #1559
base: master
Are you sure you want to change the base?
Conversation
…o hybrid properties)
…o hybrid properties)
…c documents in python rather than psql
src/onegov/org/models/search.py
Outdated
func.setweight( | ||
func.to_tsvector( | ||
language, | ||
getattr(model.fts_idx_data, field, '')), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the weighted vector bases on the static data from column fts_idx_data
generated upon update
or reindex
events.
With this approach no additional |
@Daverball Final review for postgres searching on separate views |
a95eef6
to
0eef94f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks fairly close to a first version we can deploy, there are however some engineering decisions that don't make sense to me and harm performance significantly, so I would like you to revisit those problem areas.
else: | ||
results = self.generic_search() | ||
|
||
return results[self.offset:self.offset + self.batch_size] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not ideal that we always retrieve all the results and then filter them. But I realize it may be difficult to do all the filtering and sorting in pure postgres and we'd still have to retrieve a full count of all the entries, so we're not saving so much in query time as we would in object translation overhead. But the latter may be significantly larger than the former for large result sets.
Fix typo Co-authored-by: David Salvisberg <[email protected]>
Remove unnecessary call `all()` Co-authored-by: David Salvisberg <[email protected]>
src/onegov/search/utils.py
Outdated
def filter_non_base_models(models: 'set[type[T]]') -> 'set[type[T]]': | ||
def filter_non_base_models( | ||
models: 'set[type[T]]' | ||
) -> 'set[type[T]]': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the opposite of what we want. This will now potentially miss some models rather than lead to overlapping queries. We want to get rid of models that are already covered by their base class, not the other way around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead of indexing Topic
and News
we only index Page
. Same with XYTicket
just Ticket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With your examples in the tests, yes. In reality Page
is not searchable, so it will still include both Topic
and News
and not Page
(since that was never included to begin with).
I've seen some cases where polymorphic queries weren't working properly and the query always included the base class. If that's what's happening here we always need to use the polymorphic base class, regardless of whether it's searchable or not and instead apply the searchable filter on the results.
Or we manually create a filter expression on the type
column for all the searchable polymorphic identities. There may be some performance benefit to this approach, since we avoid generating too many individual queries this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the following statement in reindex_model
no rows will be indexed twice
if i.polymorphic_on is not None:
q = q.filter(i.polymorphic_on == i.polymorphic_identity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into it briefly:
>>> i = inspect(Topic)
>>> i.base_mapper.class_
<class 'onegov.page.model.Page'>
>>> {
>>> m.polymorphic_identity
>>> for m in i.base_mapper.self_and_descendants
>>> if issubclass(m.class_, Searchable)
>>> }
{'topic', 'news'}
This should give you everything you need. Note that you only want to do this for models that have i.polymorphic_on is not None
. So you want a get_base_models
function which de-duplicates and gives you the base class for polymorphic models and just the model itself for everything else. And then you change the filter from a single polymorphic identity, to all the searchable ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the base_mapper
does the trick!! I did a very similar thing which lead me to identities like ['topic', 'news', 'generic', 'extended', 'image', 'general', 'landsgemeinde', 'builtin', 'custom', None, 'FRM', 'RSV', 'EVN', 'DIR', 'CHT', 'FER', 'vacation', 'daypass', 'room', 'daily-item']
+ for model in models:
+ i = inspect(model)
+ idents = set(e.polymorphic_identity
+ for e in list(i.self_and_descendants)[1:]
+ if e.polymorphic_identity is not None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I was hinting at, you need to do this in two steps in separate locations, you can't do it in one function.
Filter polymorphic query by polymorphic identity for Searchable models Co-authored-by: David Salvisberg <[email protected]>
rework base model filter Co-authored-by: David Salvisberg <[email protected]>
Search: Adds postgres search including views
/search-postgres?q=test
TYPE: Feature
LINK: ogc-508
Checklist