Documents from DataFrame #3873
-
Hi team, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hey @stargeysir! Depends a lot on how your dataframe looks like. There's a node that does something similar with CSV files: you can take inspiration from that and create your own custom node that converts your dataframes into Documents: https://github.com/deepset-ai/haystack/blob/61f85b08435b51f9bad12b06fad1c3102ad84c4a/haystack/nodes/file_converter/csv.py |
Beta Was this translation helpful? Give feedback.
-
Hello - just bumping this up. I tried the workaround described. `docs_dicts = filtered_df.to_dict(orient="records") docs = [] print(docs)` Generates the Document object, with auto-assigned ids.
I get the following error: DuplicateDocumentError: ID '93724f75f58c2f2583c9b756c8664555712354a80b5a8517c5ea2009f06d5f43' already exists.` Is there a way to re-generate or generate unique IDs only when going from dictionary to Document? |
Beta Was this translation helpful? Give feedback.
Hey @stargeysir! Depends a lot on how your dataframe looks like. There's a node that does something similar with CSV files: you can take inspiration from that and create your own custom node that converts your dataframes into Documents: https://github.com/deepset-ai/haystack/blob/61f85b08435b51f9bad12b06fad1c3102ad84c4a/haystack/nodes/file_converter/csv.py