-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt latest changes for data-prep #712
base: main
Are you sure you want to change the base?
Conversation
We need dataprep related changes get merged into GenAIExamples first. Otherwise, there will be no image in CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few text suggestions, otherwise looks fine.
initContainers: | ||
- name: wait-for-db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't helm wait
for the Chart be enough instead?
Or is this intended for cases where Helm is used just to generate manifest file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helm wait is not enough. The dataprep code itself will create a redis/milvus client to connect to DB when it starts, if that fails, it will never retry to create the db client and hence fails all the incoming user requests later. This is actually a cloud native non-friendly bug in the dataprep/retriever source code itself. But we don't have time to fix that in this release cycle. Already created an issue in GenAIComps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. And I guess helm template
manifest generation is valid use-case too...
cee6420
to
7ed1a69
Compare
@@ -38,21 +44,23 @@ Then run the command `kubectl port-forward svc/data-prep 6007:6007` to expose th | |||
Open another terminal and run the following command to verify the service if working: | |||
|
|||
```console | |||
curl http://localhost:6007/v1/dataprep \ | |||
curl http://localhost:6007/v1/dataprep/ingest \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataprep api has been changed for file uploading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. I have suggestion about the wait variable name though.
TESTHOST=$(MILVUS_HOST); | ||
TESTPORT=$(MILVUS_PORT); | ||
{{- end }} | ||
retry_count={{ .Values.retryCount | default 60 }}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO better name would be e.g. retryTimeoutSeconds
, because that way there's no dependency on whether checks are done at 1s, 2s or 5s interval...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting we do a single nc -z -w retryTimeoutSeconds <hostname> <port>
test, without the while loop? Sometime nc -z -w retryTimeoutSeconds
will fail early, even before retryTimeoutSeconds
time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if that is not reliable, loop can still be made independent of sleep interval:
timeout={{ .Values.retryTimeoutSeconds | default 60 }};
j=1
while ! nc -z ${TESTHOST} ${TESTPORT}; do
if [[ $j -ge ${timeout} ]] && echo "ERROR: ${TESTHOST}:${TESTPORT} is NOT reachable within $j seconds!" && exit 1;
j=$((j+2)); sleep 2;
done;
EDIT: I think it's fine if few extra secs are waited when the last interval increase does not match the timeout exactly, and service is still unreacable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjust the wait time code a little bit to be more time accurate as much as I could. In a very nasty network environment where a normal successful tcp 3-way handshakes takes more than 1 seconds(which is very rarely I assume), users can set waitTimeout
to a larger value.
If this is ok, I'll apply this to all other helm chart with the same wait-fot-xxx logic.
Signed-off-by: Lianhao Lu <[email protected]>
Need to wait for opea-project/GenAIExamples#1408 to get merged in first. |
Description
Adapt latest changes for data-prep.
Issues
n/a
.Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
List the newly introduced 3rd party dependency if exists.
Tests
Describe the tests that you ran to verify your changes.