Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 1.03 KB

README.md

File metadata and controls

10 lines (6 loc) · 1.03 KB

my_directory_loader

This is a single-file GitHub repository that contains a modified version of LangChain's DirectoryLoader class. It allows you to load a directory of documents into a DocumentIndex object, and work around an apparent bug in Unstructured's support for PDF files that causes kernel crashes in Jupyter and segmentation faults when executed in .py files.

The only difference from LangChain's DirectoryLoader is that this version offers a separate loader class, pdfloader_cls, which defaults to PyPDFLoader (a wrapper around PyPDF2). By default, all other file types go through UnstructuredFileLoader.

For more details on the parameters and returned values of DirectoryLoader, please refer to LangChain's documentation: