Skip to content

Commit

Permalink
remove anchor
Browse files Browse the repository at this point in the history
  • Loading branch information
thomasht86 committed Sep 24, 2024
1 parent 46684a4 commit 0a43e8c
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -490,7 +490,7 @@
"source": [
"### Processing PDFs with LangChain\n",
"\n",
"[LangChain](https://python.langchain.com/) has a rich set of [document loaders](https://python.langchain.com/docs/how_to/#document-loaders) that can be used to load and process various file formats. In this notebook, we use the [PyPDFLoader](https://python.langchain.com/docs/how_to/document_loader_pdf/#using-pypdf).\n",
"[LangChain](https://python.langchain.com/) has a rich set of [document loaders](https://python.langchain.com/docs/how_to/#document-loaders) that can be used to load and process various file formats. In this notebook, we use the [PyPDFLoader](https://python.langchain.com/docs/how_to/document_loader_pdf/).\n",
"\n",
"We also want to split the extracted text into _contexts_ using a [text splitter](https://python.langchain.com/docs/how_to/#text-splitters). Most text embedding models have limited input lengths (typically less than 512 language model tokens, so splitting the text\n",
"into multiple contexts that each fits into the context limit of the embedding model is a common strategy.\n",
Expand Down Expand Up @@ -1222,4 +1222,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -408,7 +408,7 @@
"source": [
"## Processing PDFs with LangChain\n",
"\n",
"[LangChain](https://python.langchain.com/) has a rich set of [document loaders](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/) that can be used to load and process various file formats. In this notebook, we use the [PyPDFLoader](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf#using-pypdf).\n",
"[LangChain](https://python.langchain.com/) has a rich set of [document loaders](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/) that can be used to load and process various file formats. In this notebook, we use the [PyPDFLoader](https://python.langchain.com/v0.1/docs/modules/data_connection/document_loaders/pdf).\n",
"\n",
"We also want to split the extracted text into _chunks_ using a [text splitter](https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/). Most text embedding models have limited input lengths (typically less than 512 language model tokens, so splitting the text\n",
"into multiple chunks that fits into the context limit of the embedding model is a common strategy.\n",
Expand Down

0 comments on commit 0a43e8c

Please sign in to comment.