-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
annotation input deadlock in train_kpi_extraction #174
Comments
Copying @JeremyGohBNP @andraNew for visibility. |
Same problem, but locks up differently, each time I run it:
|
@MichaelTiemannOSC I was able to reproduce the same error with the new dataset that you tried. We didn't see this error when we ran the notebook with ESG reports.
|
Talked with David Beßlich this morning. He suggested (1) deleting the column of row_id data (first column), and (2) the CSV reader protected double-quotes in the paragraph text by wrapping the overall cell text in curly double-quotes. I changed straight double-quotes to single-quotes, then curly double-quotes to straight double-quotes, and that got me through to training relevance OK. But then KPI extraction threw an error about missing the "company" column, which likely means that the ignored row_id column should not have been discarded. I'm pretty sure that the curly quotes were creating a condition that was locking up kpi_extraction. I now just need to fix more things correctly so I can proceed. |
Alas, the fixes I made to the file have not helped the cause. |
When I interrupt the kernel, it looks like a deadlock situation:
|
I have updated my experiments to include the latest changes from 7/14. I tried some binary searches to narrow the problem. I have found:
|
Hunting around I found this report: deepset-ai/FARM#119 (comment) Memory pressure is a cause of deadlock, and indeed the user reported the same sort of stack trace as above when using the Docker containers default shm size, which happens to look just like this:
The fix was to use --ipc=host or --shm=size but I don't see how to insert such options into the AICoE Dockerfile. Can we bump shm, especially for larger memory configurations? |
When I run kpi_extraction, I see that almost all shm is immediately used up:
|
Based on this response: elyra-ai/elyra#2838 (comment) Can somebody look at adapting the OpenShift pattern to the Kustomizations we use? |
Describe the bug
I've been adding annotations to
s3://redhat-osc-physical-landing-647521352890/test_cdp2/pipeline_run/cdp/annotations/20220709 CDP aggregated_annotations_needs_correction.xlsx
and have now managed to lock up the train_kpi_extraction notebook. Here's the output cell where progress stops:To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect train_kpi_extraction to complete
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
I did not see this error with previous versions of my annotation file (which initially contained Coca-Cola and PGE data). I later added Bayer AG and Apple, and that's when it locked up.
The text was updated successfully, but these errors were encountered: