-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(9586): implement freetext search in cht datasource #9625
base: master
Are you sure you want to change the base?
feat(9586): implement freetext search in cht datasource #9625
Conversation
eba7aac
to
43efbef
Compare
…text-search-in-cht-datasource
…text-search-in-cht-datasource
@jkuester PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick partial review and overall this is quite cool. I did leave some requests and questions inline.
|
||
module.exports = { | ||
v1: { | ||
get: serverUtils.doOrError(async (req, res) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of this callback style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been doing this pattern for all the REST endpoints that call cht-datasource
. What's the alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO doOrError
is a nice way to reduce duplicated code and ensure we are handling errors consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a try-catch block is not so much duplication, and it's more transparent than a nested callback.
I understand this already exists. I'm not a fan.
api/src/controllers/contact.js
Outdated
Object.assign(qualifier, Qualifier.byContactType(req.query.type)); | ||
} | ||
|
||
const limit = req.query.limit ? Number(req.query.limit) : req.query.limit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems strange that we assign a random non-truthy value (as in: whatever is in req.query.limit
) instead of being specific.
Same applies to the reports controller.
const limit = req.query.limit ? Number(req.query.limit) : req.query.limit; | |
const limit = req.query.limit ? Number(req.query.limit) : false; |
The false is a random pick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why req.query.limit
is being passed when the conditional is falsy is that in the cht-datasource
there is already a validation for this and also a default value. This is to ensure that the validation does not happen twice and also the default value.
Reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then why not have cht-datasource also do the Number conversion then? Why have this validation here? Is limit ever expected to not be a number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is limit ever expected to not be a number?
Yeah, in cases like the one above, where it is passed as a query param in REST API, it is expected to be a stringified number. However, cht-datasource
can also be used in non-REST API codes where end-users will have to pass in a number to cht-datasource
because PouchDB expects the limit
value to be a number. I think that's a reasonable approach to make the limit
variable an explicit Number type, as that would align with the expected input for the PouchDB Adapter. This would provide better type safety and clarity in the code. The validation being present in cht-datasource
still makes sense, and whether to apply the same validation elsewhere should be at the discretion of the end-user, based on their specific use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So even if it's a stringified number or a number, we still only ever evaluate it as a number. so it makes sense to only have validation in one spot, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkuester thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I cannot see any reason why it would be a problem to just directly pass req.query.limit
to cht-datasource. We already thoroughly validate the limit
argument in the cht-datasource logic and JS is not going to have any issue auto-boxing any valid number (without needing to be wrapped in Number()
)....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In JS, the query parameters are almost all strings and when they are passed into other functions or classes, will still be strings. I've tried this before. Here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😅 😅 😅 I knew we had discussed this before. I was not able to find that thread you linked to and so I proceeded to repeat the exact same line of reasoning that lead me to start that thread in the first place... 🤦
sigh
Now I think I am 100% following what you are saying above and agree with this statement:
I'm not sure why the conversion of limit from string to number should be designated to cht-datasource, it's not its concern.
The cht-datasource apis are responsible for sanitizing the input to ensure it confirms to the specified expectations. Generally speaking, I do not think cht-datasource should need to include extra logic to "support" different ways that consumers decide to provide data (accidentally or intentionally). The type number | string
does not precisely communicate the valid range of values for limit
(which should only ever be numeric).
That being said, the cht-datasource interfaces should be designed to be convenient to consume as long as it does not compromise the clarity of the API. It turns out that TypeScript has a type that represents "a string containing a number value": `${number}`
! This means that we could specify the type of limit
to be number | `${number}`
and then handle converting any string to an actual number just inside the edge of the cht-datasource code flow. To me, that seems like the best of both worlds where we are not always fighting the weird non-auto-boxing behavior when calling cht-datasource from JS code, but at the same time, our TS APIs do not have to be unnecessarily promiscuous. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that we could specify the type of limit to be number |
${number}
Sounds good. As long as we don't have logic around limit in multiple places.
But, philosophically, I don't think a programming languages (typescript in this case) should dictate how software is written, and even if there wasn't a type
, then we should still have the option to design software however we want.
Please make the change.
getIds: serverUtils.doOrError(async (req, res) => { | ||
await checkUserPermissions(req); | ||
|
||
const qualifier = Qualifier.byFreetext(req.query.freetext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ... this endpoint .. if it doesn't get neither a freetext
query param or a limit
query param, it will end up returning ALL reports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -492,6 +494,12 @@ app.postJson('/api/v1/people', function(req, res) { | |||
app.get('/api/v1/person', person.v1.getAll); | |||
app.get('/api/v1/person/:uuid', person.v1.get); | |||
|
|||
app.get('/api/v1/contact/id', contact.v1.getIds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe /api/v1/contact/ids
is more suitable.
The idea is that the URL isn't suggestive at all, without reading the implementation, I would never guess what this endpoint does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, REST API conventions are to name the API endpoint in a plural way like /api/v1/contacts
or /api/v1/contacts/ids
but this design decision had already been taken even before this ticket. I couldn't find the link to the conversation for this though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That discussion happened in the parent ticket before we spun off the child isssue: #9544 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jkuester . Your argument here is that "we've already decided and your input is not welcome?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being aggressive and confrontational in the above comment.
I maintain my comment about /api/v1/contact/id
being quite unsuggestive, we shouldn't need thorough explanations and reasoning behind the naming choice in order for an api name to make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just trying to provide the context for the discussion that Sugat referenced. 😬
I am happy to continue the design discussion here to come to an agreed upon approach. It will just be most efficient if we all understand what was already said to get us here. When starting work on new REST endpoints for the cht-datasource code, we chose to go with the pattern of singular entity names (so /api/v1/person
instead of /api/v1/persons
). When the endpoint can return 0-n
entities I do not really see a compelling reason to prefer either singular or plural (since either might make more sense depending on the context). Two things seem clear to me though:
- Under normal circumstances, we should not duplicate endpoints for the same resource (e.g. having both
/api/v1/contact/id
and/api/v1/contact/ids
). - We should be consistent with our naming across our go-forward REST endpoints. Either using singular or plural, but not mixing both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with being consistent.
I personally can't recall seeing APIs in the world that used this singular form, so for me this seems quite unintuitive.
expect(getLineageDocsByIdOuter.calledOnceWithExactly(localContext.medicDb)).to.be.true; | ||
expect(getDocsByIdsOuter.calledOnceWithExactly(localContext.medicDb)).to.be.true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about assertions in afterEach and afterEach run order.
…alifier and fix eslint error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work here! This was a huge lift, but I think this will mean that cht-datasoure can support the majority of READ functionality we need! Got a bunch of nitpicks/suggestions, but also a few cases where I think we need to get aligned with the existing search code.
Also, we can either do it in this PR, or in a follow one (since we already have a ton of changes here), but ultimatly we need to refactor shared-libs/search
to call cht-datasource
instead of directly using the freetext indexes.
@@ -0,0 +1,14 @@ | |||
/** @ignore */ | |||
export const DEFAULT_CONTACT_PAGE_LIMIT = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: for these page limit values, I don't think we need separate values for each resource. Instead we need separate values for when we are getting whole docs vs just the ids. Something like:
DEFAULT_DOCS_PAGE_LIMIT = 100
DEFAULT_IDS_PAGE_LIMIT = 10000
getIdsPage: ( | ||
freetext: Nullable<string> = null, | ||
type: Nullable<string> = null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I think it would be better to have move verbosely named functions here than try to allow overloading these getIdsPage
and getIds
functions... Similar to what we did for person.getPageByType/getByType
, we could have getIdsPageByFreetext
, getIdsPageByType
, and getIdsPageByTypeFreetext
. Then, the params for these functions should never be Nullable
.
This is going to mean a lot more small functions here, but if we keep things as you have them now, it will be challenging to add support for other ways to retrieve contact Ids. (E.g. if you wanted to add support for a get-by-parent functionality, would we just add another param to this function?)
The only other alternative I can think of is to basically just accept the qualifier object (similar to the underlying Contact
api functions). But IIRC we wanted to avoid qualifier objects in this api for the sake of simplicity and convenience of consumption. What do you think? Does one of these options sound good or can you think of another alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with combining the qualifiers part but not the naming part. Suppose we now name it getIdsPageByTypeFreetext
and add a param as you said. Would we rename it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought is that if we have separate functions for each type of search, then when we go to support another type of search (e.g. get-contact-ids-by-parent or something like that) we would just add a new function instead of adding a new parameter to the existing functions.
* @returns the report or `null` if no report is found for the UUID | ||
* @throws InvalidArgumentError if no UUID is provided | ||
*/ | ||
// eslint-disable-next-line compat/compat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: super annoying that eslint-plugin-compat is not smart enough to realize we are talking about a different Report
here... I do think we can avoid needing these disable
lines by updating our .eslintrc.js
to include a "polyfill" for report. Basically just add this to the settings
block:
polyfills: [
'Report',
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh! yes. This was supposed to be temporary then I forgot once the errors were silenced. My bad.
import { InvalidArgumentError } from './libs/error'; | ||
|
||
/** @ignore */ | ||
export namespace v1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Currently, we have really only bothered with versioned namespaces on apis that are exported external to cht-datasource (e.g. exported explicitly or implicitly in src/index.ts
).
Is there a reason to have the code in here as a separate file from src/contacts.ts
? If so, could we maybe put this file in src/libs
to help indicate that it does not contain any public apis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, it seems like the Contact
and ContactWithLineage
might be able to go in src/contacts.ts
while the other stuff is just being used internally and could go in src/libs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file was separated from src/contacts.ts
due to a circular import issue. I don't remember the exact DAG but it's somewhat like src/contact.ts
-> src/person.ts
-> src/local/person.ts
-> src/local/index.ts
-> src/contact.ts
. I've separated the types
into src/contact-types.ts
whereas services(functionalities) lie in the src/contacts.ts
.
}; | ||
|
||
/** @ignore */ | ||
export const isContactType = (value: ContactTypeQualifier | FreetextQualifier): value is ContactTypeQualifier => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Is this function actually a duplicate of the Qualifier.isContactTypeQualifier
function? I get that the other function does a bit more processing, but eventually I think we should switch to a proper schema validation library that handles all of these kind of checks optimally. Seems best to try and keep logic like this in a single place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly a duplicate but a subset I'd say.
We have been using the following pattern in our code to validate the arguments into src/<entity>.ts
.
I've removed the use of the local isContactType
function for the screenshot in which case both type
and freetext
are turned into required params. We could write another assert*
function that does the combination of type
or freetext
like assertTypeOrFreetextQualifier
but for some reason, the current implementation seemed more appropriate because the control is in the caller's hands. I do not have a strong inclination towards the current implementation though. Both seem equally logical and valid. I still agree that functions like these would be more appropriate in a single file but this function seemed small enough to be kept here.
}); | ||
}); | ||
|
||
it('returns null when no contact is found for the UUID', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: really returning a 404 and not null...
path: `${endpoint}?${stringQueryParams}`, | ||
auth: adminUser | ||
}; | ||
const expectedContactIds = [contact0._id, contact1._id, contact2._id]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: ideally this case would not return exactly the same as the previous so that we could see the type filtering in action.
expect(secondPage.cursor).to.be.equal(null); | ||
}); | ||
|
||
it('returns a page of place type contact ids when limit and cursor is passed and cursor can be reused', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I really appreciate the permutations of combinations of freetext
and type
(since that drives completely different code flows under the hood). IMHO, though, we don't need integration tests for the permutations of skip
and limit
with each of the qualifiers. To me, it seems enough to have this test which covers skip
and limit
with both freetext
and type
. We don't really need the other skip/limit tests for just freetext or just type...
await expect(utils.request(opts)).to.be.rejectedWith('403 - {"code":403,"error":"Insufficient privileges"}'); | ||
}); | ||
|
||
it('throws 400 error when contactType is invalid', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: did I miss it somewhere, or do we also need a test case where freetext is empty and no type is provided?
}); | ||
|
||
it('returns a page of people type contact ids ' + | ||
'when stringified limit and null cursor is passed', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: IMHO this does not need to be an integration test. This functionality is adequately covered by unit tests.
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Co-authored-by: Joshua Kuestersteffen <[email protected]>
Description
Closes: #9586
Code review checklist
Compose URLs
If Build CI hasn't passed, these may 404:
License
The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.