Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR:example:Failed to ingest document due to exception Unable to get page count. #196

Open
grische opened this issue Sep 13, 2024 · 4 comments

Comments

@grische
Copy link
Contributor

grische commented Sep 13, 2024

Followed the instructions from the README and started the example from GenerativeAIExamples/RAG/examples/basic_rag/langchain.

The docker logs of chain-server:

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:faiss.loader:Loading faiss with AVX2 support.
INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
INFO:RAG.src.chain_server.utils:Using nvidia-ai-endpoints as model engine and nvidia/nv-embedqa-e5-v5 and model for embeddings
INFO:RAG.src.chain_server.utils:Using embedding model nvidia/nv-embedqa-e5-v5 hosted at api catalog
INFO:RAG.src.chain_server.utils:Using milvus collection: nvidia_api_catalog
INFO:RAG.src.chain_server.utils:Vector store created and saved.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8081 (Press CTRL+C to quit)
INFO:     172.18.0.6:48730 - "GET /documents HTTP/1.1" 200 OK
INFO:     172.18.0.6:40180 - "GET /documents HTTP/1.1" 200 OK
INFO:     172.18.0.6:60014 - "GET /documents HTTP/1.1" 200 OK
INFO:     172.18.0.6:60800 - "GET /documents HTTP/1.1" 200 OK
INFO:pikepdf._core:pikepdf C++ to Python logger bridge initialized
ERROR:example:Failed to ingest document due to exception Unable to get page count. Is poppler installed and in PATH?
ERROR:RAG.src.chain_server.server:Error from POST /documents endpoint. Ingestion of file: /tmp/gradio/b3131f976d42f2c5b2cab5027eeaabec73658e1423259694a7a7d107b65be0bd/test.pdf failed with error: Failed to upload document. Please upload an unstructured text document.
INFO:     172.18.0.6:60810 - "GET /documents HTTP/1.1" 200 OK
INFO:     172.18.0.6:60804 - "POST /documents HTTP/1.1" 500 Internal Server Error
@shubhadeepd
Copy link
Collaborator

Thanks for reporting ths issue!
Are you trying to ingest files with images embedded?

@grische
Copy link
Contributor Author

grische commented Sep 16, 2024

Yes, the PDF has images embedded: kb-terraform.pdf

@shubhadeepd
Copy link
Collaborator

Yes, the PDF has images embedded: kb-terraform.pdf

The basic RAG examples does not support ingesting PDFs with images embedded in them. Please consider using https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RAG/examples/advanced_rag/multimodal_rag which supports the same.

@grische
Copy link
Contributor Author

grische commented Sep 17, 2024

Would it be possible to strip the pictures instead of throwing an error? Or have a more clear error message?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants