Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pneumonia SetupService.ipynb : Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function #256

Open
magafaterr opened this issue Apr 5, 2022 · 9 comments
Assignees

Comments

@magafaterr
Copy link

Hello @JimDaly and community,

While running the SetupService.ipynb from my Azure workspace, I get the following error message for the LoC below,

LoC
image

Error

Updating service pneumonia-detection-onnx
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2022-04-05 15:11:01+00:00 Creating Container Registry if not exists.
2022-04-05 15:11:01+00:00 Registering the environment.
2022-04-05 15:11:02+00:00 Use the existing image.
2022-04-05 15:11:02+00:00 Generating deployment configuration.
2022-04-05 15:11:03+00:00 Submitting deployment to compute.
2022-04-05 15:11:05+00:00 Checking the status of deployment pneumonia-detection-onnx.Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: c0714bf6-edd6-4f8b-97ae-df4ae75d8cfd
More information can be found using '.get_logs()'
Error:
{
"code": "AciDeploymentFailed",
"statusCode": 400,
"message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.",
"details": [
{
"code": "CrashLoopBackOff",
"message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.
1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information."
},
{
"code": "AciDeploymentFailed",
"message": "Your container application crashed. Please follow the steps to debug:
1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.
2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.
3. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
4. View the diagnostic events to check status of container, it may help you to debug the issue.
"RestartCount": 3
"CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"}
"PreviousState": {"state":"Terminated","startTime":"2022-04-05T15:21:29.431Z","exitCode":111,"finishTime":"2022-04-05T15:21:35.873Z","detailStatus":"Error"}
"Events": null
"
}
]
}


WebserviceException Traceback (most recent call last)
in
17 service = Model.deploy(ws, aci_service_name, [model], inference_config, deployment_config)
18
---> 19 service.wait_for_deployment(True)
20 print(service.state)

/anaconda/envs/azureml_py38/lib/python3.8/site-packages/azureml/core/webservice/webservice.py in wait_for_deployment(self, show_output, timeout_sec)
917 logs_response = 'Current sub-operation type not known, more logs unavailable.'
918
--> 919 raise WebserviceException('Service deployment polling reached non-successful terminal state, current '
920 'service state: {}\n'
921 'Operation ID: {}\n'

WebserviceException: WebserviceException:
Message: Service deployment polling reached non-successful terminal state, current service state: Failed
Operation ID: c0714bf6-edd6-4f8b-97ae-df4ae75d8cfd
More information can be found using '.get_logs()'
Error:
{
"code": "AciDeploymentFailed",
"statusCode": 400,
"message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.",
"details": [
{
"code": "CrashLoopBackOff",
"message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.
1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information."
},
{
"code": "AciDeploymentFailed",
"message": "Your container application crashed. Please follow the steps to debug:
1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.
2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.
3. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.
4. View the diagnostic events to check status of container, it may help you to debug the issue.
"RestartCount": 3
"CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"}
"PreviousState": {"state":"Terminated","startTime":"2022-04-05T15:21:29.431Z","exitCode":111,"finishTime":"2022-04-05T15:21:35.873Z","detailStatus":"Error"}
"Events": null
"
}
]
}
InnerException None
ErrorResponse
{
"error": {
"message": "Service deployment polling reached non-successful terminal state, current service state: Failed\nOperation ID: c0714bf6-edd6-4f8b-97ae-df4ae75d8cfd\nMore information can be found using '.get_logs()'\nError:\n{\n "code": "AciDeploymentFailed",\n "statusCode": 400,\n "message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.\n\t1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\n\t2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information.",\n "details": [\n {\n "code": "CrashLoopBackOff",\n "message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.\n\t1. Please check the logs for your container instance: pneumonia-detection-onnx. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.\n\t2. You can interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t3. You can also try to run image c2ca784a051d4215b7af9e26ea9dbfe7.azurecr.io/azureml/azureml_17cae8c4aa5e2efda696b16b3d500c5d locally. Please refer to https://aka.ms/debugimage#service-launch-fails for more information."\n },\n {\n "code": "AciDeploymentFailed",\n "message": "Your container application crashed. Please follow the steps to debug:\n\t1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. Please refer to https://aka.ms/debugimage#dockerlog for more information.\n\t2. If your container application crashed. This may be caused by errors in your scoring file's init() function. You can try debugging locally first. Please refer to https://aka.ms/debugimage#debug-locally for more information.\n\t3. You can also interactively debug your scoring file locally. Please refer to https://docs.microsoft.com/azure/machine-learning/how-to-debug-visual-studio-code#debug-and-troubleshoot-deployments for more information.\n\t4. View the diagnostic events to check status of container, it may help you to debug the issue.\n"RestartCount": 3\n"CurrentState": {"state":"Waiting","startTime":null,"exitCode":null,"finishTime":null,"detailStatus":"CrashLoopBackOff: Back-off restarting failed"}\n"PreviousState": {"state":"Terminated","startTime":"2022-04-05T15:21:29.431Z","exitCode":111,"finishTime":"2022-04-05T15:21:35.873Z","detailStatus":"Error"}\n"Events": null\n"\n }\n ]\n}"
}
}.
2022-04-05 15:20:40+00:00 Checking the status of inference endpoint pneumonia-detection-onnx.
Failed

@JimDaly
Copy link
Member

JimDaly commented Apr 5, 2022

@magafaterr
Which sample are you trying to run?

@magafaterr
Copy link
Author

magafaterr commented Apr 5, 2022

This one > https://github.com/microsoft/PowerApps-Samples/tree/master/ai-builder/BringYourOwnModelTutorial
I just cloned the repository yesterday.

@JimDaly
Copy link
Member

JimDaly commented Apr 5, 2022

@JoeFernandezMS Are you able to help with this?

@JoeFernandezMS
Copy link
Contributor

Thanks Jim - I believe @shankarrk should be able to help on this one.

@JimDaly JimDaly assigned shankak and unassigned JoeFernandezMS Apr 5, 2022
@JimDaly
Copy link
Member

JimDaly commented Apr 5, 2022

@shankak
We can discuss this internally.
Lets understand if there is a problem with the sample code and get a fix applied.
Keeping this open for now until we determine if a change is required for the sample code.

@JimDaly
Copy link
Member

JimDaly commented Jun 12, 2022

@shankak Please take a look at this.

@iamramengirl
Copy link

Hi, any updates on this? I ran into the same problem. I debugged the script.py file and changed the file path to point to model path in my workspace. I found that the error occurs in the init() function, specifically in loading the onnx model via the onnx runtime.

image

@iamramengirl
Copy link

Hi @shankak @JimDaly Are there any updates to this?

@iamramengirl
Copy link

I've checked this issue and tried the suggested workaround #231

I downgraded the azureml-core package to 1.38.0 and added pandas in the environment yml file and it resolved the ACI deployment issue related to onnxruntime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants