Skip to content

Commit

Permalink
Doc: Improve documentation (using Github alerts) (#46)
Browse files Browse the repository at this point in the history
  • Loading branch information
bpolaszek authored Nov 17, 2023
1 parent a926da9 commit 9a9f3ef
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 14 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,12 @@ Installation
composer require bentools/etl:^4.0@alpha
```

> **Warning #1**: Version 4.0 is a complete rewrite and introduces significant BC (backward compatibility) breaks.
> [!WARNING]
> Version 4.0 is a complete rewrite and introduces significant BC (backward compatibility) breaks.
> Avoid upgrading from `^2.0` or `^3.0` unless you're fully aware of the changes.
> **Warning #2**: Version 4.0 is still at an alpha stage. BC breaks might occur between alpha releases.
> [!IMPORTANT]
> Version 4.0 is still at an alpha stage. BC breaks might occur between alpha releases.
Usage
-----
Expand Down
8 changes: 4 additions & 4 deletions doc/advanced_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,10 @@ $etl = (new EtlExecutor())
$etl->process('file:///tmp/cities.csv', $pdo);
```

As you can see:
- Your transformer can _yield_ values, in case 1 extracted item becomes several items to load
- You can use `EtlState.destination` to retrieve the second argument you passed yo `$etl->process()`.
> [!IMPORTANT]
> As you can see:
> - Your transformer can _yield_ values, in case 1 extracted item becomes several items to load
> - You can use `EtlState.destination` to retrieve the second argument you passed yo `$etl->process()`.
The `EtlState` object contains all elements relative to the state of your ETL workflow being running.

Expand All @@ -54,7 +55,6 @@ But the last transformer of the chain (or your only one transformer) is determin
- If your transformer `yields` values, each yielded value will be passed to the loader (and the loader will be called for each yielded value).



Next tick
---------

Expand Down
26 changes: 18 additions & 8 deletions doc/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ Then, let's have a look at `/tmp/cities.json`:
]
```

Notice that we didn't _transform_ anything here, we just denormalized the CSV file to an array, then serialized that array to a JSON file.
> [!NOTE]
> We didn't _transform_ anything here, we just denormalized the CSV file to an array, then serialized that array to a JSON file.
The `CSVExtractor` has some options to _read_ the data, such as considering that the 1st row is the column keys.

Expand Down Expand Up @@ -92,14 +93,16 @@ Skipping items

You can skip items at any time.

Use the `$state->skip()` method from the `EtlState` object as soon as your business logic requires it.
> [!TIP]
> Use the `skip()` method from the `EtlState` object as soon as your business logic requires it.
Stopping the workflow
---------------------

You can stop the workflow at any time.

Use the `$state->stop()` method from the `EtlState` object as soon as your business logic requires it.
> [!TIP]
> Use the `stop()` method from the `EtlState` object as soon as your business logic requires it.
Using Events
------------
Expand All @@ -119,9 +122,15 @@ The `EtlExecutor` emits a variety of events during the ETL workflow, providing i
- `FlushExceptionEvent` when something wrong occured during flush (the exception can be dismissed)
- `EndEvent` whenever the workflow is complete.

All events give you access to the `EtlState` object, the state of the running ETL process, which allows you to read what's going on
(total number of items, number of loaded items, current extracted item index), write any arbitrary data into the `$state->context` array,
[skip items](#skipping-items), [stop the workflow](#stopping-the-workflow), and [trigger an early flush](#flush-frequency-and-early-flushes).
> [!IMPORTANT]
> All events give you access to the `EtlState` object, the state of the running ETL process.
Accessing `$event->state` allows you to:
- Read what's going on (total number of items, number of loaded items, current extracted item index)
- Write any arbitrary data into the `$state->context` array
- [Skip items](#skipping-items)
- [Stop the workflow](#stopping-the-workflow)
- [Trigger an early flush](#flush-frequency-and-early-flushes).

You can hook to those events during `EtlExecutor` instantiation, i.e.:

Expand All @@ -138,8 +147,9 @@ Flush frequency and early flushes
By default, the `flush()` method of your loader will be invoked at the end of the ETL,
meaning it will likely keep all loaded items in memory before dumping them to their final destination.

Feel free to adjust a `flushFrequency` that fits your needs to manage memory usage and data processing efficiency
and optionally trigger an early flush at any time during the ETL process:
> [!TIP]
> - Feel free to adjust a `flushFrequency` that fits your needs to manage memory usage and data processing efficiency
> - Optionally, trigger an early flush at any time during the ETL process.
```php
$etl = (new EtlExecutor(options: new EtlConfiguration(flushFrequency: 10)))
Expand Down

0 comments on commit 9a9f3ef

Please sign in to comment.