Merge pull request #692 from automl/development

Merge into master
automl · Sep 25, 2020 · 9d7d09d · 9d7d09d
2 parents 9890e0c + 6e68d5f
commit 9d7d09d
Show file tree

Hide file tree

Showing 72 changed files with 6,334 additions and 1,521 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -12,8 +12,6 @@ matrix:
 
   include:
     # Unit tests
-    - os: linux
-      env: TESTSUITE=run_unittests.sh PYTHON_VERSION="3.5" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
     - os: linux
       env: TESTSUITE=run_unittests.sh PYTHON_VERSION="3.6" MINICONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh"
     - os: linux

diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Copyright (C) 2016-2018  [AutoML Group](http://www.automl.org/)
 
-__Attention__: This package is a re-implementation of the original SMAC tool
+__Attention__: This package is a reimplementation of the original SMAC tool
 (see reference below).
 However, the reimplementation slightly differs from the original SMAC.
 For comparisons against the original SMAC, we refer to a stable release of SMAC (v2) in Java
@@ -16,7 +16,7 @@ Status for master branch:
 [![Codacy Badge](https://api.codacy.com/project/badge/Grade/58f47a4bd25e45c9a4901ebca68118ff?branch=master)](https://www.codacy.com/app/automl/SMAC3?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=automl/SMAC3&amp;utm_campaign=Badge_Grade)
 [![codecov Status](https://codecov.io/gh/automl/SMAC3/branch/master/graph/badge.svg)](https://codecov.io/gh/automl/SMAC3)
 
-Status for development branch
+Status for the development branch
 
 [![Build Status](https://travis-ci.org/automl/SMAC3.svg?branch=development)](https://travis-ci.org/automl/SMAC3)
 [![Codacy Badge](https://api.codacy.com/project/badge/Grade/58f47a4bd25e45c9a4901ebca68118ff?branch=development)](https://www.codacy.com/app/automl/SMAC3?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=automl/SMAC3&amp;utm_campaign=Badge_Grade)
@@ -27,8 +27,8 @@ Status for development branch
 SMAC is a tool for algorithm configuration to optimize the parameters of
 arbitrary algorithms across a set of instances. This also includes
 hyperparameter optimization of ML algorithms. The main core consists of
-Bayesian Optimization in combination with a aggressive racing mechanism to
-efficiently decide which of two configuration performs better.
+Bayesian Optimization in combination with an aggressive racing mechanism to
+efficiently decide which of two configurations performs better.
 
 For a detailed description of its main idea,
 we refer to
@@ -38,7 +38,7 @@ we refer to
     In: Proceedings of the conference on Learning and Intelligent OptimizatioN (LION 5)
 
 
-SMAC v3 is written in Python3 and continuously tested with python3.5 and
+SMAC v3 is written in Python3 and continuously tested with Python 3.6 and
 python3.6. Its [Random Forest](https://github.com/automl/random_forest_run)
 is written in C++.
 
@@ -97,7 +97,7 @@ pip install smac[gp]
 pip install .[gp,lhd]
 ```
 
-For convenience there is also an `all` meta-dependency that installs all optional dependencies:
+For convenience, there is also an `all` meta-dependency that installs all optional dependencies:
 ```
 pip install smac[all]
 ```

diff --git a/changelog.md b/changelog.md
@@ -1,3 +1,21 @@
+# 0.13.0
+
+## Major Changes
+* Split choosing next challenger from evaluating challenger (#663)
+* Implemented parallel SMAC using dask (#675, #677, #681, #685, #686)
+* Drop support for Python 3.5
+
+## Minor Changes
+* Update Readme 
+* Remove runhistory from TAE (#663)
+* Store SMAC's internal config id in the configuration object (#679)
+* Introduce Status Type STOP (#690)
+
+## Bug Fixes
+* Only validate restriction of Sobol Sequence when choosing Sobol Sequence (#664)
+* Fix wrong initialization of list in local search (#680)
+* Fix setting random seed with a too small range in Latin Hypercube design (#688)
+
 # 0.12.3
 
 ## Minor Changes

diff --git a/doc/conf.py b/doc/conf.py
@@ -323,4 +323,5 @@
     # compile execute examples in the examples dir
     'filename_pattern': '.*example.py$|.*tutorial.py$',
     # TODO: fix back/forward references for the examples.
+    'ignore_pattern': '.*_func.py'
 }
diff --git a/doc/index.rst b/doc/index.rst
@@ -55,7 +55,7 @@ Contents:
      |    howpublished={\\url{https://github.com/automl/SMAC3}}
      | }
 
-SMAC3 is mainly written in Python 3 and continuously tested with Python 3.5-3.6.
+SMAC3 is mainly written in Python 3 and continuously tested with Python 3.6-3.8.
 Its `Random Forest <https://github.com/automl/random_forest_run>`_ is written in
 C++11.
 

diff --git a/examples/fmin_rosenbrock_parallel.py b/examples/fmin_rosenbrock_parallel.py
@@ -0,0 +1,45 @@
+"""
+============================================
+Parallel Intensifier with No Intensification
+============================================
+
+This example showcases how to use dask to
+launch parallel configurations via n_workers
+"""
+
+import logging
+
+from smac.intensification.simple_intensifier import SimpleIntensifier
+from smac.facade.func_facade import fmin_smac
+
+# --------------------------------------------------------------
+# We need to provide a pickable function and use __main__
+# to be compliant with multiprocessing API
+# Below is a work around to have a packaged function called
+# rosenbrock_2d
+# --------------------------------------------------------------
+import os
+import sys
+sys.path.append(os.path.join(os.path.dirname(__file__)))
+from rosenbrock_2d_delayed_func import rosenbrock_2d  # noqa: E402
+# --------------------------------------------------------------
+
+if __name__ == '__main__':
+
+    # debug output
+    logging.basicConfig(level=20)
+    logger = logging.getLogger("Optimizer")  # Enable to show Debug outputs
+
+    # fmin_smac assumes that the function is deterministic
+    # and uses under the hood the SMAC4HPO
+    # n_workers tells the SMBO loop to execute in parallel
+    x, cost, smac = fmin_smac(
+        func=rosenbrock_2d,
+        intensifier=SimpleIntensifier,
+        x0=[-3, -4],
+        bounds=[(-5, 10), (-5, 10)],
+        maxfun=25,
+        rng=3,
+        n_jobs=4,
+    )  # Passing a seed makes fmin_smac determistic
+    print("Best x: %s; with cost: %f" % (str(x), cost))
diff --git a/examples/hyperband_mlp.py b/examples/hyperband_mlp.py
@@ -11,64 +11,21 @@
 """
 
 import logging
-import warnings
 
-import numpy as np
 from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
     UniformFloatHyperparameter, UniformIntegerHyperparameter
-from sklearn.datasets import load_digits
-from sklearn.exceptions import ConvergenceWarning
-from sklearn.model_selection import cross_val_score, StratifiedKFold
-from sklearn.neural_network import MLPClassifier
+
+import numpy as np
 
 from smac.configspace import ConfigurationSpace
 from smac.facade.hyperband_facade import HB4AC
 from smac.scenario.scenario import Scenario
-
-digits = load_digits()
-
-
-# Target Algorithm
-# The signature of the function determines what arguments are passed to it
-# i.e., budget is passed to the target algorithm if it is present in the signature
-def mlp_from_cfg(cfg, seed, instance, budget, **kwargs):
-    """
-        Creates a MLP classifier from sklearn and fits the given data on it.
-        This is the function-call we try to optimize. Chosen values are stored in
-        the configuration (cfg).
-
-        Parameters
-        ----------
-        cfg: Configuration
-            configuration chosen by smac
-        seed: int or RandomState
-            used to initialize the rf's random generator
-        instance: str
-            used to represent the instance to use (just a placeholder for this example)
-        budget: float
-            used to set max iterations for the MLP
-
-        Returns
-        -------
-        float
-    """
-
-    with warnings.catch_warnings():
-        warnings.filterwarnings('ignore', category=ConvergenceWarning)
-
-        mlp = MLPClassifier(
-            hidden_layer_sizes=[cfg["n_neurons"]] * cfg["n_layer"],
-            batch_size=cfg['batch_size'],
-            activation=cfg['activation'],
-            learning_rate_init=cfg['learning_rate_init'],
-            max_iter=int(np.ceil(budget)),
-            random_state=seed)
-
-        # returns the cross validation accuracy
-        cv = StratifiedKFold(n_splits=5, random_state=seed, shuffle=True)  # to make CV splits consistent
-        score = cross_val_score(mlp, digits.data, digits.target, cv=cv, error_score='raise')
-
-    return 1 - np.mean(score)  # Because minimize!
+# --------------------------------------------------------------
+import os
+import sys
+sys.path.append(os.path.join(os.path.dirname(__file__)))
+from mlp_from_cfg_func import mlp_from_cfg  # noqa: E402
+# --------------------------------------------------------------
 
 
 logger = logging.getLogger("MLP-example")

diff --git a/examples/mlp_from_cfg_func.py b/examples/mlp_from_cfg_func.py
@@ -0,0 +1,56 @@
+import warnings
+
+import numpy as np
+
+from sklearn.datasets import load_digits
+from sklearn.exceptions import ConvergenceWarning
+from sklearn.model_selection import cross_val_score, StratifiedKFold
+from sklearn.neural_network import MLPClassifier
+
+
+# A common function to be optimized by a Real valued Intensifier
+digits = load_digits()
+
+
+# Target Algorithm
+# The signature of the function determines what arguments are passed to it
+# i.e., budget is passed to the target algorithm if it is present in the signature
+def mlp_from_cfg(cfg, seed, instance, budget, **kwargs):
+    """
+        Creates a MLP classifier from sklearn and fits the given data on it.
+        This is the function-call we try to optimize. Chosen values are stored in
+        the configuration (cfg).
+
+        Parameters
+        ----------
+        cfg: Configuration
+            configuration chosen by smac
+        seed: int or RandomState
+            used to initialize the rf's random generator
+        instance: str
+            used to represent the instance to use (just a placeholder for this example)
+        budget: float
+            used to set max iterations for the MLP
+
+        Returns
+        -------
+        float
+    """
+
+    with warnings.catch_warnings():
+        warnings.filterwarnings('ignore', category=ConvergenceWarning)
+
+        mlp = MLPClassifier(
+            hidden_layer_sizes=[cfg["n_neurons"]] * cfg["n_layer"],
+            batch_size=cfg['batch_size'],
+            activation=cfg['activation'],
+            learning_rate_init=cfg['learning_rate_init'],
+            max_iter=int(np.ceil(budget)),
+            random_state=seed)
+
+        # returns the cross validation accuracy
+        # to make CV splits consistent
+        cv = StratifiedKFold(n_splits=5, random_state=seed, shuffle=True)
+        score = cross_val_score(mlp, digits.data, digits.target, cv=cv, error_score='raise')
+
+    return 1 - np.mean(score)  # Because minimize!
diff --git a/examples/parallel_sh_mlp.py b/examples/parallel_sh_mlp.py
@@ -0,0 +1,99 @@
+"""
+================================================
+Optimizing an MLP with Parallel SuccesiveHalving
+================================================
+An example for the usage of a model-free SuccessiveHalving intensifier in SMAC,
+for parallel execution. The configurations are randomly sampled.
+
+This examples uses a real-valued SuccessiveHalving through epochs.
+
+4 workers are allocated for this run. As soon as any worker is idle,
+SMAC internally creates more SuccessiveHalving instances to take
+advantage of the idle resources.
+"""
+
+import logging
+
+import numpy as np
+from ConfigSpace.hyperparameters import CategoricalHyperparameter, \
+    UniformFloatHyperparameter, UniformIntegerHyperparameter
+
+from smac.configspace import ConfigurationSpace
+from smac.facade.roar_facade import ROAR
+from smac.scenario.scenario import Scenario
+from smac.intensification.successive_halving import SuccessiveHalving
+from smac.initial_design.random_configuration_design import RandomConfigurations
+
+# --------------------------------------------------------------
+# We need to provide a pickable function and use __main__
+# to be compliant with multiprocessing API
+# Below is a work around to have a packaged function called
+# mlp_from_cfg_func
+# --------------------------------------------------------------
+import os
+import sys
+sys.path.append(os.path.join(os.path.dirname(__file__)))
+from mlp_from_cfg_func import mlp_from_cfg  # noqa: E402
+# --------------------------------------------------------------
+
+if __name__ == '__main__':
+
+    logger = logging.getLogger("MLP-example")
+    logging.basicConfig(level=logging.INFO)
+
+    # Build Configuration Space which defines all parameters and their ranges.
+    # To illustrate different parameter types,
+    # we use continuous, integer and categorical parameters.
+    cs = ConfigurationSpace()
+
+    # We can add multiple hyperparameters at once:
+    n_layer = UniformIntegerHyperparameter("n_layer", 1, 4, default_value=1)
+    n_neurons = UniformIntegerHyperparameter("n_neurons", 8, 512, log=True, default_value=10)
+    activation = CategoricalHyperparameter("activation", ['logistic', 'tanh', 'relu'],
+                                           default_value='tanh')
+    batch_size = UniformIntegerHyperparameter('batch_size', 30, 300, default_value=200)
+    learning_rate_init = UniformFloatHyperparameter('learning_rate_init', 0.0001, 1.0, default_value=0.001, log=True)
+    cs.add_hyperparameters([n_layer, n_neurons, activation, batch_size, learning_rate_init])
+
+    # SMAC scenario object
+    scenario = Scenario({"run_obj": "quality",  # we optimize quality (alternative to runtime)
+                         "wallclock-limit": 100,  # max duration to run the optimization (in seconds)
+                         "cs": cs,  # configuration space
+                         "deterministic": "true",
+                         "limit_resources": True,  # Uses pynisher to limit memory and runtime
+                         # Alternatively, you can also disable this.
+                         # Then you should handle runtime and memory yourself in the TA
+                         "cutoff": 20,  # runtime limit for target algorithm
+                         "memory_limit": 3072,  # adapt this to reasonable value for your hardware
+                         })
+
+    # Intensification parameters
+    # Intensifier will allocate from 5 to a maximum of 25 epochs to each configuration
+    # Successive Halving child-instances are created to prevent idle
+    # workers.
+    intensifier_kwargs = {'initial_budget': 5, 'max_budget': 25, 'eta': 3,
+                          'min_chall': 1, 'instance_order': 'shuffle_once'}
+
+    # To optimize, we pass the function to the SMAC-object
+    smac = ROAR(scenario=scenario, rng=np.random.RandomState(42),
+                tae_runner=mlp_from_cfg,
+                intensifier=SuccessiveHalving,
+                intensifier_kwargs=intensifier_kwargs,
+                initial_design=RandomConfigurations,
+                n_jobs=4)
+
+    # Example call of the function with default values
+    # It returns: Status, Cost, Runtime, Additional Infos
+    def_value = smac.get_tae_runner().run(config=cs.get_default_configuration(),
+                                          instance='1', budget=25, seed=0)[1]
+    print("Value for default configuration: %.4f" % def_value)
+
+    # Start optimization
+    try:
+        incumbent = smac.optimize()
+    finally:
+        incumbent = smac.solver.incumbent
+
+    inc_value = smac.get_tae_runner().run(config=incumbent, instance='1',
+                                          budget=25, seed=0)[1]
+    print("Optimized Value: %.4f" % inc_value)