Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to process txt file uploaded #977

Open
peterchanws opened this issue Dec 27, 2024 · 5 comments
Open

Not able to process txt file uploaded #977

peterchanws opened this issue Dec 27, 2024 · 5 comments

Comments

@peterchanws
Copy link

I tried to generate a graph for a text file uploaded to the Graph Builder app and got a "Failed" status. Below is part of the log in the backend. I successfully generated a graph for web sources in the same environment.

[ERROR]{'message': 'Failed To Process File:message-20.txt or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for message-20.txt. Please re-upload file and try again.', 'file_name': 'message-20.txt', 'status': 'Failed', 'db_url': 'bolt://localhost:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-12-27 18:49:35 UTC'}
2024-12-27 10:49:35,223 - File Failed in extraction: {'message': 'Failed To Process File:message-20.txt or LLM Unable To Parse Content ', 'error_message': 'Chunks are not created for message-20.txt. Please re-upload file and try again.', 'file_name': 'message-20.txt', 'status': 'Failed', 'db_url': 'bolt://localhost:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-12-27 18:49:35 UTC'}
Traceback (most recent call last):
File "/Users/pchan3/llm-graph-builder/backend/score.py", line 209, in extract_knowledge_graph_from_file
uri_latency, result = await extract_graph_from_file_local_file(uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/src/main.py", line 232, in extract_graph_from_file_local_file
return await processing_source(uri, userName, password, database, model, fileName, [], allowedNodes, allowedRelationship, True, merged_file_path, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/src/main.py", line 313, in processing_source
total_chunks, chunkId_chunkDoc_list = get_chunkId_chunkDoc_list(graph, file_name, pages, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/src/main.py", line 531, in get_chunkId_chunkDoc_list
raise Exception(f"Chunks are not created for {file_name}. Please re-upload file and try again.")
Exception: Chunks are not created for message-20.txt. Please re-upload file and try again.

@peterchanws
Copy link
Author

I tried another file book.txt. Somehow, the system delete the file "2024-12-27 19:16:31,758 - file book.txt deleted successfully"

2024-12-27 19:16:31,694 - File path:/Users/pchan3/llm-graph-builder/backend/merged_files/book.txt
2024-12-27 19:16:31,695 - Process file name :book.txt
2024-12-27 19:16:31,695 - file book.txt processing
2024-12-27 19:16:31,758 - Deleted File Path: /Users/pchan3/llm-graph-builder/backend/merged_files/book.txt and Deleted File Name : book.txt
2024-12-27 19:16:31,758 - file book.txt deleted successfully
[ERROR]{'message': 'Failed To Process File:book.txt or LLM Unable To Parse Content ', 'error_message': 'Error while reading the file content or metadata', 'file_name': 'book.txt', 'status': 'Failed', 'db_url': 'bolt://localhost:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-12-28 03:16:31 UTC'}
2024-12-27 19:16:31,758 - File Failed in extraction: {'message': 'Failed To Process File:book.txt or LLM Unable To Parse Content ', 'error_message': 'Error while reading the file content or metadata', 'file_name': 'book.txt', 'status': 'Failed', 'db_url': 'bolt://localhost:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-12-28 03:16:31 UTC'}
Traceback (most recent call last):
File "/Users/pchan3/llm-graph-builder/backend/src/document_sources/local_file.py", line 38, in get_documents_from_file_by_path
unstructured_pages = loader.load()
^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/langchain_core/document_loaders/base.py", line 31, in load
return list(self.lazy_load())
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py", line 107, in lazy_load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py", line 228, in _get_elements
return partition(filename=self.file_path, **self.unstructured_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/unstructured/partition/auto.py", line 181, in partition
file_type = detect_filetype(
^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 98, in detect_filetype
return _FileTypeDetector.file_type(ctx)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 131, in file_type
return cls(ctx)._file_type
^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 141, in _file_type
if file_type := self._file_type_from_guessed_mime_type:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 181, in _file_type_from_guessed_mime_type
mime_type = self._ctx.mime_type
^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/unstructured/utils.py", line 154, in get
value = self._fget(obj)
^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/unstructured/file_utils/filetype.py", line 362, in mime_type
import magic
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/magic/init.py", line 209, in
libmagic = loader.load_lib()
^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/envName/lib/python3.12/site-packages/magic/loader.py", line 49, in load_lib
raise ImportError('failed to find libmagic. Check your installation')
ImportError: failed to find libmagic. Check your installation

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/pchan3/llm-graph-builder/backend/score.py", line 209, in extract_knowledge_graph_from_file
uri_latency, result = await extract_graph_from_file_local_file(uri, userName, password, database, model, merged_file_path, file_name, allowedNodes, allowedRelationship, retry_condition)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/src/main.py", line 227, in extract_graph_from_file_local_file
file_name, pages, file_extension = get_documents_from_file_by_path(merged_file_path,fileName)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/pchan3/llm-graph-builder/backend/src/document_sources/local_file.py", line 41, in get_documents_from_file_by_path
raise Exception('Error while reading the file content or metadata')
Exception: Error while reading the file content or metadata
INFO: 127.0.0.1:61343 - "POST /extract HTTP/1.1" 200 OK
INFO: 127.0.0.1:61347 - "GET /update_extract_status/book.txt?url=bolt://localhost:7687&userName=neo4j&password=cGFzc3dvcmQ=&database=neo4j HTTP/1.1" 200 OK
2024-12-27 19:16:31,933 - SSE Client disconnected
2024-12-27 19:16:31,934 - update KNN graph

@NAMEs
Copy link

NAMEs commented Dec 30, 2024

Same mistake. Did you fix it?

@peterchanws
Copy link
Author

I am waiting for the neo4j grapher builder team to response.

@smartwhale8
Copy link

Hi @peterchanws, not from neo4j, but few things from the logs which caught my attention:

  1. 'error_message': 'Error while reading the file content or metadata': This seems to the root of the failure.
  2. ImportError: failed to find libmagic. Check your installation: Try installing this module "pip install python-libmagic". Libmagic is the module used to identify file types. This step is failing as the module is missing.

Additionally:
&userName=neo4j&password=: Please take care to remove your passwords when posting logs.

@peterchanws
Copy link
Author

Thanks, smartwhale8. I installed python-libmagic. Still failed to generate graph.

I am wondering when the txt file was deleted:

2024-12-31 08:24:08,745 - File path:/Users/pchan3/llm-graph-builder/backend/merged_files/message-2351.txt
2024-12-31 08:24:08,745 - Process file name :message-2351.txt
2024-12-31 08:24:08,745 - file message-2351.txt processing
2024-12-31 08:24:09,711 - Deleted File Path: /Users/pchan3/llm-graph-builder/backend/merged_files/message-2351.txt and Deleted File Name : message-2351.txt
2024-12-31 08:24:09,712 - file message-2351.txt deleted successfully
[ERROR]{'message': 'Failed To Process File:message-2351.txt or LLM Unable To Parse Content ', 'error_message': 'Error while reading the file content or metadata', 'file_name': 'message-2351.txt', 'status': 'Failed', 'db_url': 'bolt://localhost:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-12-31 16:24:09 UTC'}
2024-12-31 08:24:09,712 - File Failed in extraction: {'message': 'Failed To Process File:message-2351.txt or LLM Unable To Parse Content ', 'error_message': 'Error while reading the file content or metadata', 'file_name': 'message-2351.txt', 'status': 'Failed', 'db_url': 'bolt://localhost:7687', 'failed_count': 1, 'source_type': 'local file', 'source_url': None, 'wiki_query': None, 'logging_time': '2024-12-31 16:24:09 UTC'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants