From 01d83fa76a6686f17223e05ba02e1bd232799903 Mon Sep 17 00:00:00 2001
From: Daniel Kavan <dk1844@gmail.com>
Date: Mon, 2 Aug 2021 11:10:49 +0200
Subject: [PATCH] #99 atum sdk s3 extension examples README.md, typos for core
 examples README.md

---
 examples/atum-examples/README.md             |  6 +-
 examples/s3-sdk-extension-examples/README.md | 60 ++++++++++++++++++++
 2 files changed, 63 insertions(+), 3 deletions(-)
 create mode 100644 examples/s3-sdk-extension-examples/README.md

diff --git a/examples/atum-examples/README.md b/examples/atum-examples/README.md
index 7c794338..75f65a62 100644
--- a/examples/atum-examples/README.md
+++ b/examples/atum-examples/README.md
@@ -1,7 +1,7 @@
 # Atum Spark Job Application Example
 
 This is a set of Atum Apache Spark Applications that can be used as inspiration for creating other
-Spark projects. It includes all dependencies in a 'fat' jar to run the job locally and on cluster.
+Spark projects. It includes all dependencies in a 'fat' jar to run the job locally and on a     cluster.
 
 Here is the list of examples (all from `za.co.absa.atum.examples` space):
 
@@ -27,7 +27,7 @@ mvn package -DskipTests=true
 ```
 
 ## Scala and Spark version switching
-Same as Atum itself, the example project also support switching to build with different Scala and Spark version:
+Same as Atum itself, the example project also supports switching to build with different Scala and Spark version:
 
 Switching Scala version (2.11 or 2.12) can be done via
 ```shell script
@@ -45,7 +45,7 @@ mvn clean install -Pspark-3.1
 ## Running via spark-submit
 
 After the project is packaged you can copy `target/2.11/atum-examples_2.11-0.0.1-SNAPSHOT.jar`
-to an edge node of a cluster and use `spark-submit` to run the job. Here us an example when running on Yarn:
+to an edge node of a cluster and use `spark-submit` to run the job. Here is an example when running on Yarn:
 
 ```shell script
 spark-submit --master yarn --deploy-mode client --class za.co.absa.atum.examples.SampleMeasurements1 atum-examples_2.11-0.0.1-SNAPSHOT.jar
diff --git a/examples/s3-sdk-extension-examples/README.md b/examples/s3-sdk-extension-examples/README.md
new file mode 100644
index 00000000..dc50cd43
--- /dev/null
+++ b/examples/s3-sdk-extension-examples/README.md
@@ -0,0 +1,60 @@
+# SDK-S3 Atum Spark Job Application Example
+
+This is a set of Atum Apache Spark Applications (using the SDK S3 Atum Extension) that can be used as inspiration for creating other
+Spark projects. It includes all dependencies in a 'fat' jar to run the job locally and on a cluster.
+
+- `SampleSdkS3Measurements{1|2}` - Example apps using Atum SDK S3 Extension to show the Atum initialization, 
+checkpoint setup and the resulting control measure handling (in the form of `_INFO` file originating and lading to AWS S3)  
+
+## Usage
+
+The example application is in `za.co.absa.atum.examples` package. The project contains build files for `Maven`.
+
+## Maven
+**To build an uber jar to run on cluster**
+```shell script
+mvn package -DskipTests=true
+```
+
+## Scala and Spark version switching
+Same as Atum itself, the example project also supports switching to build with different Scala and Spark version:
+
+Switching Scala version (2.11 or 2.12) can be done via
+```shell script
+mvn scala-cross-build:change-version -Pscala-2.11 # this is default
+# or
+mvn scala-cross-build:change-version -Pscala-2.12
+```
+
+Choosing a spark version to build, there are `spark-2.4` and `spark-3.1` profiles: 
+```shell script
+mvn clean install -Pspark-2.4 # this is default
+mvn clean install -Pspark-3.1
+``` 
+
+## Running Requirements
+Since these example apps demonstrate cooperation with S3 resources, a number of environment prerequisites must be met
+for the code to be truly runnable. Namely:
+ - having a AWS profile named `saml` in `~/.aws/credentials`
+ - having your bucket defined in `TOOLING_BUCKET_NAME` and your KMS Key ID in `TOOLING_KMS_KEY_ID` 
+ (the example is written to enforce AWS-KMS server-side encryption)
+
+## Running via spark-submit
+
+After the project is packaged you can copy `target/2.11/atum-examples-s3-sdk-extension_2.11-0.0.1-SNAPSHOT`
+to an edge node of a cluster and use `spark-submit` to run the job. Here is an example when running on Yarn:
+
+```shell script
+spark-submit --master yarn --deploy-mode client --class za.co.absa.atum.examples.SampleSdkS3Measurements1 atum-examples-s3-sdk-extension_2.11-0.0.1-SNAPSHOT.jar
+```
+
+### Running Spark Applications in local mode from an IDE
+If you try to run the example from an IDE you'll likely get the following exception: 
+```Exception in thread "main" java.lang.NoClassDefFoundError: scala/Option```
+
+This is because the jar is created with all Scala and Spark dependencies removed (using shade plugin). This is done so that the uber jar for `spark-submit` is not too big.
+
+There are multiple options to deal with it, namely:
+ - use the test runner class, for the SampleSdkS3Measurements, it is `SampleMeasurementsS3RunnerExampleSpec` (provided dependencies will be loaded for tests)
+ - use the  _Include dependencies with "Provided" scope_ option in Run Configuration in IDEA or equivalent in your IDE.
+ - change the scope of `provided` dependencies to `compile` in the POM file and run Spark Applications as a normal JVM App.