Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML XGBoost Classification #484

Closed
wants to merge 15 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- run: |
npm install
npm run generate
working-directory: tests
working-directory: dev
- name: clone gh-pages and clean-up
if: ${{ env.GITHUB_REF_SLUG == 'master' }}
run: |
Expand All @@ -31,8 +31,8 @@ jobs:
if: ${{ env.GITHUB_REF_SLUG != 'master' }}
run: mkdir gh-pages
- run: |
cp tests/docs.html index.html
cp tests/processes.json processes.json
cp dev/docs.html index.html
cp dev/processes.json processes.json
rsync -vrm --include='*.json' --include='*.html' --include='meta/***' --include='proposals/***' --exclude='*' . gh-pages
- name: deploy to root (master)
uses: peaceiris/actions-gh-pages@v3
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ jobs:
with:
node-version: 'lts/*'
- uses: actions/checkout@v3
- name: Run tests
- name: Run linter
run: |
npm install
npm run test
working-directory: tests
npm test
working-directory: dev
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- `filter_vector`
- `flatten_dimensions`
- `load_geojson`
- `load_ml_model`
- `load_url`
- `ml_fit_class_random_forest`
- `ml_fit_regr_random_forest`
- `ml_fit_class-xgboost`
- `ml_predict`
- `save_ml_model`
- `unflatten_dimension`
- `vector_buffer`
- `vector_reproject`
Expand Down
3 changes: 3 additions & 0 deletions dev/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
/node_modules/
/package-lock.json
/processes.json
57 changes: 57 additions & 0 deletions dev/.words
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
0-to-9
1-to-0
anno
behavior
boolean
center
centers
dekad
DEM-based
Domini
gamma0
GeoJSON
FeatureCollections
labeled
MathWorld
n-ary
neighbor
neighborhood
neighborhoods
openEO
orthorectification
orthorectified
radiometrically
reflectances
reproject
reprojected
Reprojects
resample
resampled
resamples
Resamples
resampling
Sentinel-2
Sentinel-2A
Sentinel-2B
signum
STAC
catalog
Catalog
summand
UDFs
gdalwarp
Lanczos
sinc
interpolants
Breiman
Hyndman
date1
date2
favor
XGBoost
Chen
Guestrin
early_stopping_rounds
Subsample
hessian
overfitting
30 changes: 30 additions & 0 deletions dev/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Tests for openEO Processes

To run the tests follow these steps:

1. Install [node and npm](https://nodejs.org) - should run with any recent version
2. Run `npm install` in this folder to install the dependencies
3. Run the tests with `npm test`. This will also lint the files and verify it follows best practices.
4. To show the files nicely formatted in a web browser, run `npm start`. It starts a server and opens the corresponding page in a web browser.

## Development processes

All new processes must be added to the `proposals` folder. Each process must be declared to be `experimental`.
Processes must comply to best practices, which ensure a certain degree of consistency.
`npm test` will validate and lint the processes and also ensure the best practices are applied.

The linting checks that the files are named correctly, that the content is correctly formatted and indented (JSON and embedded CommonMark).
The best practices ensure that for examples the fields are not too short and also not too long for example.

A spell check is also checking the texts. It may report names and rarely used technical words as errors.
If you are sure that these are correct, you can add them to the `.words` file to exclude the word from being reported as an error.
The file must contain one word per line.

New processes should be added via GitHub Pull Requests.

## Subtype schemas

Sometimes it is useful to define a new "data type" on top of the JSON types (number, string, array, object, ...).
For example, a client could make a select box with all collections available by adding a subtype `collection-id` to the JSON type `string`.
If you think a new subype should be added, you need to add it to the `meta/subtype-schemas.json` file.
It must be a valid JSON Schema. The tests mentioned above will also verify to a certain degree that the subtypes are defined correctly.
125 changes: 125 additions & 0 deletions dev/docs.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<title>openEO API Processes</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:300,400,400i,700|Roboto+Mono">
<script src="https://cdn.jsdelivr.net/npm/[email protected]"></script>
<script src="https://cdn.jsdelivr.net/npm/@openeo/processes-docgen@1/dist/DocGen.umd.min.js"></script>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/@openeo/processes-docgen@1/dist/DocGen.css">
<style>
html {
box-sizing: border-box;
text-size-adjust: none;
height: 100%;
font-size: 62.5%;
overflow-x: hidden;
}

@media only screen and (min-width: 100em) {
html {
font-size: 68.75%;
}
}

@media only screen and (min-width: 125em) {
html {
font-size: 75%;
}
}

*,
*::before,
*::after {
box-sizing: inherit;
}

body {
margin: 0;
position: relative;
height: 100%;
}

hr {
overflow: visible;
box-sizing: content-box;
display: block;
height: 0.1rem;
padding: 0;
border: 0;
}

a {
text-decoration: none;
}

small {
font-size: 80%;
}

sub, sup {
position: relative;
font-size: 80%;
line-height: 0;
vertical-align: baseline;
}
sub {
bottom: -0.25em;
}
sup {
top: -0.5em;
}

table {
border-collapse: separate;
border-spacing: 0;
}

body, input {
color: rgba(0, 0, 0, 0.87);
-webkit-font-feature-settings: "kern", "liga";
font-feature-settings: "kern", "liga";
font-family: "Roboto", "Helvetica Neue", Helvetica, Arial, sans-serif;
}

pre, code, kbd {
color: rgba(0, 0, 0, 0.87);
-webkit-font-feature-settings: "kern";
font-feature-settings: "kern";
font-family: "Roboto Mono", "Courier New", Courier, monospace;
}

.anchor {
position: relative;
display: block;
visibility: hidden;
}

#container {
font-size: 1.45rem;
height: 100%;
}
</style>
</head>
<body dir="ltr">
<div id="container">
<div id="app"></div>
</div>
<script>
new Vue({
el: '#app',
render: h => h(DocGen, {
props: {
document: 'processes.json',
categorize: true,
apiVersion: '1.2.0',
title: 'openEO processes (2.0.0-rc.1)',
notice: '**Note:** This is the list of all processes specified by the openEO project. Back-ends implement a varying set of processes. Thus, the processes you can use at a specific back-end may derive from the specification, may include non-standardized processes and may not implement all processes listed here. Please check each back-end individually for the processes they support. The client libraries usually have a function called `listProcesses` or `list_processes` for that.'
}
})
});
</script>
<noscript>Sorry, the documentation generator requires JavaScript to be enabled!</noscript>
</body>
</html>
30 changes: 30 additions & 0 deletions dev/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"name": "@openeo/processes",
"version": "2.0.0-rc.1",
"author": "openEO Consortium",
"contributors": [
{
"name": "Matthias Mohr"
}
],
"license": "Apache-2.0",
"description": "Validates the processes specified in this repository.",
"homepage": "http://openeo.org",
"bugs": {
"url": "https://github.com/Open-EO/openeo-processes/issues"
},
"repository": {
"type": "git",
"url": "git+https://github.com/Open-EO/openeo-processes.git"
},
"devDependencies": {
"@openeo/processes-lint": "^0.1.5",
"concat-json-files": "^1.1.0",
"http-server": "^14.1.1"
},
"scripts": {
"test": "openeo-processes-lint testConfig.json",
"generate": "concat-json-files \"../{*,proposals/*}.json\" -t \"processes.json\"",
"start": "npm run generate && http-server -p 9876 -o docs.html -c-1"
}
}
14 changes: 14 additions & 0 deletions dev/testConfig.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"folder": "../",
"proposalsFolder": "../proposals/",
"ignoredWords": ".words",
"anyOfRequired": [
"array_element",
"quantiles"
],
"subtypeSchemas": "../meta/subtype-schemas.json",
"checkSubtypeSchemas": true,
"forbidDeprecatedTypes": false,
"checkProcessLinks": true,
"verbose": false
}
6 changes: 6 additions & 0 deletions meta/subtype-schemas.json
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,12 @@
}
}
},
"ml-model": {
"type": "object",
"subtype": "ml-model",
"title": "Machine Learning Model",
"description": "A machine learning model, accompanied with STAC metadata that implements the the STAC ml-model extension."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accompanied with STAC metadata that implements the the STAC ml-model extension

What does this practically mean here in this context of defining a JSON schema? Isn't that more a concern of a process like save_ml_model that actually "exports" the model to a more concrete form?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accompanied with STAC metadata that implements the the STAC ml-model extension

What does this practically mean here in this context of defining a JSON schema? Isn't that more a concern of a process like save_ml_model that actually "exports" the model to a more concrete form?

This was to remove the error due to the ml model being returned. I created a branch from the draft and not the ml branch.

},
"output-format": {
"type": "string",
"subtype": "output-format",
Expand Down
Loading
Loading