forked from run-llama/LlamaIndexTS
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' of github.com:run-llama/LlamaIndexTS into astra
- Loading branch information
Showing
85 changed files
with
2,621 additions
and
2,548 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
"create-llama": patch | ||
--- | ||
|
||
Add an option to provide an URL and chat with the website data |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 2 | ||
sidebar_position: 4 | ||
--- | ||
|
||
# Index | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 1 | ||
sidebar_position: 3 | ||
--- | ||
|
||
# Reader / Loader | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
label: "Document / Nodes" | ||
position: 0 |
File renamed without changes.
45 changes: 45 additions & 0 deletions
45
apps/docs/docs/modules/documents_and_nodes/metadata_extraction.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# Metadata Extraction Usage Pattern | ||
|
||
You can use LLMs to automate metadata extraction with our `Metadata Extractor` modules. | ||
|
||
Our metadata extractor modules include the following "feature extractors": | ||
|
||
- `SummaryExtractor` - automatically extracts a summary over a set of Nodes | ||
- `QuestionsAnsweredExtractor` - extracts a set of questions that each Node can answer | ||
- `TitleExtractor` - extracts a title over the context of each Node by document and combine them | ||
- `KeywordExtractor` - extracts keywords over the context of each Node | ||
|
||
Then you can chain the `Metadata Extractors` with the `IngestionPipeline` to extract metadata from a set of documents. | ||
|
||
```ts | ||
import { | ||
IngestionPipeline, | ||
TitleExtractor, | ||
QuestionsAnsweredExtractor, | ||
Document, | ||
OpenAI, | ||
} from "llamaindex"; | ||
|
||
async function main() { | ||
const pipeline = new IngestionPipeline({ | ||
transformations: [ | ||
new TitleExtractor(), | ||
new QuestionsAnsweredExtractor({ | ||
questions: 5, | ||
}), | ||
], | ||
}); | ||
|
||
const nodes = await pipeline.run({ | ||
documents: [ | ||
new Document({ text: "I am 10 years old. John is 20 years old." }), | ||
], | ||
}); | ||
|
||
for (const node of nodes) { | ||
console.log(node.metadata); | ||
} | ||
} | ||
|
||
main().then(() => console.log("done")); | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 1 | ||
sidebar_position: 3 | ||
--- | ||
|
||
# Embedding | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
label: "Ingestion Pipeline" | ||
position: 2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# Ingestion Pipeline | ||
|
||
An `IngestionPipeline` uses a concept of `Transformations` that are applied to input data. | ||
These `Transformations` are applied to your input data, and the resulting nodes are either returned or inserted into a vector database (if given). | ||
|
||
## Usage Pattern | ||
|
||
The simplest usage is to instantiate an IngestionPipeline like so: | ||
|
||
```ts | ||
import fs from "node:fs/promises"; | ||
|
||
import { | ||
Document, | ||
IngestionPipeline, | ||
MetadataMode, | ||
OpenAIEmbedding, | ||
TitleExtractor, | ||
SimpleNodeParser, | ||
} from "llamaindex"; | ||
|
||
async function main() { | ||
// Load essay from abramov.txt in Node | ||
const path = "node_modules/llamaindex/examples/abramov.txt"; | ||
|
||
const essay = await fs.readFile(path, "utf-8"); | ||
|
||
// Create Document object with essay | ||
const document = new Document({ text: essay, id_: path }); | ||
const pipeline = new IngestionPipeline({ | ||
transformations: [ | ||
new SimpleNodeParser({ chunkSize: 1024, chunkOverlap: 20 }), | ||
new TitleExtractor(), | ||
new OpenAIEmbedding(), | ||
], | ||
}); | ||
|
||
// run the pipeline | ||
const nodes = await pipeline.run({ documents: [document] }); | ||
|
||
// print out the result of the pipeline run | ||
for (const node of nodes) { | ||
console.log(node.getContent(MetadataMode.NONE)); | ||
} | ||
} | ||
|
||
main().catch(console.error); | ||
``` | ||
|
||
## Connecting to Vector Databases | ||
|
||
When running an ingestion pipeline, you can also chose to automatically insert the resulting nodes into a remote vector store. | ||
|
||
Then, you can construct an index from that vector store later on. | ||
|
||
```ts | ||
import fs from "node:fs/promises"; | ||
|
||
import { | ||
Document, | ||
IngestionPipeline, | ||
MetadataMode, | ||
OpenAIEmbedding, | ||
TitleExtractor, | ||
SimpleNodeParser, | ||
QdrantVectorStore, | ||
VectorStoreIndex, | ||
} from "llamaindex"; | ||
|
||
async function main() { | ||
// Load essay from abramov.txt in Node | ||
const path = "node_modules/llamaindex/examples/abramov.txt"; | ||
|
||
const essay = await fs.readFile(path, "utf-8"); | ||
|
||
const vectorStore = new QdrantVectorStore({ | ||
host: "http://localhost:6333", | ||
}); | ||
|
||
// Create Document object with essay | ||
const document = new Document({ text: essay, id_: path }); | ||
const pipeline = new IngestionPipeline({ | ||
transformations: [ | ||
new SimpleNodeParser({ chunkSize: 1024, chunkOverlap: 20 }), | ||
new TitleExtractor(), | ||
new OpenAIEmbedding(), | ||
], | ||
vectorStore, | ||
}); | ||
|
||
// run the pipeline | ||
const nodes = await pipeline.run({ documents: [document] }); | ||
|
||
// create an index | ||
const index = VectorStoreIndex.fromVectorStore(vectorStore); | ||
} | ||
|
||
main().catch(console.error); | ||
``` |
77 changes: 77 additions & 0 deletions
77
apps/docs/docs/modules/ingestion_pipeline/transformations.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# Transformations | ||
|
||
A transformation is something that takes a list of nodes as an input, and returns a list of nodes. Each component that implements the Transformatio class has both a `transform` definition responsible for transforming the nodes | ||
|
||
Currently, the following components are Transformation objects: | ||
|
||
- [SimpleNodeParser](../../api/classes/SimpleNodeParser.md) | ||
- [MetadataExtractor](../documents_and_nodes/metadata_extraction.md) | ||
- Embeddings | ||
|
||
## Usage Pattern | ||
|
||
While transformations are best used with with an IngestionPipeline, they can also be used directly. | ||
|
||
```ts | ||
import { SimpleNodeParser, TitleExtractor, Document } from "llamaindex"; | ||
|
||
async function main() { | ||
let nodes = new SimpleNodeParser().getNodesFromDocuments([ | ||
new Document({ text: "I am 10 years old. John is 20 years old." }), | ||
]); | ||
|
||
const titleExtractor = new TitleExtractor(); | ||
|
||
nodes = await titleExtractor.transform(nodes); | ||
|
||
for (const node of nodes) { | ||
console.log(node.getContent(MetadataMode.NONE)); | ||
} | ||
} | ||
|
||
main().catch(console.error); | ||
``` | ||
|
||
## Custom Transformations | ||
|
||
You can implement any transformation yourself by implementing the `TransformerComponent`. | ||
|
||
The following custom transformation will remove any special characters or punctutaion in text. | ||
|
||
```ts | ||
import { TransformerComponent, Node } from "llamaindex"; | ||
|
||
class RemoveSpecialCharacters extends TransformerComponent { | ||
async transform(nodes: Node[]): Promise<Node[]> { | ||
for (const node of nodes) { | ||
node.text = node.text.replace(/[^\w\s]/gi, ""); | ||
} | ||
|
||
return nodes; | ||
} | ||
} | ||
``` | ||
|
||
These can then be used directly or in any IngestionPipeline. | ||
|
||
```ts | ||
import { IngestionPipeline, Document } from "llamaindex"; | ||
|
||
async function main() { | ||
const pipeline = new IngestionPipeline({ | ||
transformations: [new RemoveSpecialCharacters()], | ||
}); | ||
|
||
const nodes = await pipeline.run({ | ||
documents: [ | ||
new Document({ text: "I am 10 years old. John is 20 years old." }), | ||
], | ||
}); | ||
|
||
for (const node of nodes) { | ||
console.log(node.getContent(MetadataMode.NONE)); | ||
} | ||
} | ||
|
||
main().catch(console.error); | ||
``` |
Oops, something went wrong.