Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Signed-off-by: Naarcha-AWS <[email protected]>
  • Loading branch information
Naarcha-AWS authored Dec 10, 2024
1 parent 4be8fb8 commit 1383d5a
Showing 1 changed file with 4 additions and 4 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ grand_parent: User guide

# Expanding the data corpus of a workload

This tutorial shows you how to use the [`expand-data-corpus.py`](https://github.com/opensearch-project/opensearch-benchmark/blob/main/scripts/expand-data-corpus.py) script to increase the size of the data corpus for an OpenSearch Benchmark workload. This is helpful when running time-series workloads like http_logs against a large scale OpenSearch cluster.
This tutorial shows you how to use the [`expand-data-corpus.py`](https://github.com/opensearch-project/opensearch-benchmark/blob/main/scripts/expand-data-corpus.py) script to increase the size of the data corpus for an OpenSearch Benchmark workload. This is helpful when running the `http_logs` workload against a large OpenSearch cluster.

Only the `http_logs` workload is currently supported.
This script only works with the `http_logs` workload.
{: .warning}

## Prerequisites
Expand All @@ -32,7 +32,7 @@ To use `expand-data-corpus.py`, use the following syntax:
./expand-data-corpus.py [options]
```

The script has several options for customization. The following are the most commonly-used customization options:
The script has several options for customization. The following are the most commonly-used options:

- `--corpus-size`: The desired corpus size in GB
- `--output-file-suffix`: The suffix for the output file name.
Expand All @@ -47,7 +47,7 @@ This example generates a 100 GB corpus.

The script will start generating documents. For a 100 GB corpus, it can take up to 30 minutes to generate the full corpus.

You can generate multiple corpora by running the script multiple times with different output suffixes.
You can generate multiple corpora by running the script multiple times with different output suffixes. All corpora generated by the script is used by OpenSearch Benchmark during injection, one at a time, sequentially.

## Verifying the documents

Expand Down

0 comments on commit 1383d5a

Please sign in to comment.