#Recent Changes

##H2O

###Turing (3.10.0.7) - 9/19/2016

Bug

[PUBDEV-3300] - NPE during categorical encoding with cross-validation (Windows 8 runit only??)
[PUBDEV-3306] - H2OFrame arithmetic/statistical functions return inconsistent types
[PUBDEV-3315] - Multi file parse fails with NPE
[PUBDEV-3374] - h2o.hist() does not respect breaks
[PUBDEV-3401] - importFiles, with s3n, gives NullPointerException
[PUBDEV-3409] - Python Structure() Breaks When Applied to Entire Dataframe

New Feature

[PUBDEV-2707] - Diff operation on column in H2O Frame
[HEXDEV-619] - calculate residuals in h2o-3 and in flow and create a new frame with a new column that contains the residuals

Task

[PUBDEV-2785] - Clean up Python booklet code in repo

Improvement

[PUBDEV-3296] - In R, allow x to be missing (meaning take all columns except y) for all supervised algo's
[PUBDEV-3329] - median() should return a list of medians from an entire frame
[PUBDEV-3334] - Conduct rbind and cbind on multiple frames
[PUBDEV-3387] - Add argument to H2OFrame.print in R to specify number of rows
[PUBDEV-3418] - Suppress chunk summary in describe()

###Turing (3.10.0.6) - 8/25/2016

Bug

[HEXDEV-608] - Hashmap in H2OIllegalArgumentException fails to deserialize & throws FATAL
[PUBDEV-2879] - NPE in MetadataHandler
[PUBDEV-3086] - hist() fails for constant numeric columns
[PUBDEV-3173] - Client mode: flatfile requires list of all nodes, but a single entry node should be sufficient
[PUBDEV-3207] - Make CreateFrame reproducible for categorical columns.
[PUBDEV-3208] - Fix intermittency of categorical encoding via eigenvector.
[PUBDEV-3211] - isBitIdentical is returning true for two Frames with different content
[PUBDEV-3222] - AssertionError for DL train/valid with categorical encoding
[PUBDEV-3237] - Wrong MAE for observation weights other than 1.
[PUBDEV-3244] - H2ODriver for CDH5.7.0 does not accept memory settings
[PUBDEV-3276] - H2OFrame.drop() leaves the frame in inconsistent state

New Feature

[PUBDEV-3007] - Implement skewness calculation for H2O Frames
[PUBDEV-3008] - Implement kurtosis calculation for H2O Frames
[PUBDEV-3128] - Add ability to do a deep copy in Python API
[PUBDEV-3163] - Add docs for h2o.make_metrics() for R and Python
[PUBDEV-3218] - Add RMSLE to model metrics
[PUBDEV-3264] - Return unique values of a categorical column as a Pythonic list

Task

[PUBDEV-3235] - Refactor and simplify implementation of Pearson Correlation
[PUBDEV-3238] - Add MAE to CV Summary

Improvement

[PUBDEV-2702] - Create h2o.* functions for H2O primitives
[PUBDEV-3098] - Add methods to get actual and default parameters of a model
[PUBDEV-3132] - Add ability to drop a list of columns or a subset of rows from an H2OFrame
[PUBDEV-3138] - Ensure all is*() functions return a list

###Turing (3.10.0.3) - 7/29/2016

Bug

[PUBDEV-2805] - Error when setting a string column to a single value in R/Py
[PUBDEV-2965] - R h2o.merge() ignores by.x and by.y
[PUBDEV-3135] - Download Logs broken URL from Flow

New Feature

[PUBDEV-2958] - H2O Version Check
[PUBDEV-3022] - Add an h2o.concat function equivalent to pandas.concat
[PUBDEV-3050] - Add Huber loss function for GBM and DL (for regression)
[PUBDEV-3071] - Add RMSE to model metrics
[PUBDEV-3104] - Add Mean Absolute Error to Model Metrics
[PUBDEV-3108] - Add mean absolute error to scoring history and model plotting
[PUBDEV-3116] - Add categorical encoding schemes for DL and Aggregator
[PUBDEV-3155] - Compute supervised ModelMetrics from predicted and actual values in Java/R
[PUBDEV-3162] - Compute supervised ModelMetrics from predicted and actual values in Python

Improvement

[PUBDEV-1888] - Implement gradient checking for DL
[PUBDEV-2627] - Add better warning message to functions of H2OModelMetrics objects
[PUBDEV-3021] - Add demo datasets to Python package
[PUBDEV-3113] - Replace "MSE" with "RMSE" in scoring history table
[PUBDEV-3122] - Make all TwoDimTable Headers Pythonic in R and Python API
[PUBDEV-3129] - Achieve consistency between DL and GBM/RF scoring history in regression case
[PUBDEV-3131] - Disable R^2 stopping criterion in tree model builders
[PUBDEV-3149] - Remove R^2 from all model output except GLM

###Turin (3.8.3.4) - 7/15/2016

Bug

[PUBDEV-3040] - File parse from S3 extremely slow
[PUBDEV-3145] - Fix Deep Learning POJO for hidden dropout other than 0.5

###Turin (3.8.3.2) - 7/1/2016

Bug

[PUBDEV-898] - DRF: sample_rate=1 not permitted unless validation is performed
[PUBDEV-2087] - create a set of tests which create large POJOs for each algo and compiles them
[PUBDEV-2322] - Merge (method="radix") bug1
[PUBDEV-2325] - Merge (method="radix") bug2
[PUBDEV-2565] - Fold Column not available in h2o.grid
[PUBDEV-2964] - h2o.merge(,method="radix") failing 15/40 runs
[PUBDEV-3030] - Parse: java.lang.IllegalArgumentException: 0 > -2147483648
[PUBDEV-3032] - Cached errors are not printed if H2O exits
[PUBDEV-3072] - java.lang.ClassCastException for Quantile GBM
[PUBDEV-3077] - model_summary number of trees is too high for multinomial DRF/GBM models
[PUBDEV-3079] - NPE when accessing invalid null Frame cache in a Frame's vecs()
[PUBDEV-3081] - TwoDimTable version of a Frame prints missing value (NA) as 0
[PUBDEV-3089] - Fix tree split finding logic for some cases where min_rows wasn't satisfied and the entire column was no longer considered even if there were allowed split points
[PUBDEV-3093] - saveModel and loadModel don't work with windows c:/ paths
[PUBDEV-3095] - getStackTrace fails on NumberFormatException
[PUBDEV-3096] - TwoDimTable for Frame Summaries doesn't always show the full precision
[PUBDEV-3097] - DRF OOB scoring isn't using observation weights
[PUBDEV-3099] - AIOOBE when calling 'getModel' in Flow while a GLM model is training

Task

[PUBDEV-2681] - Properly document the addition of missing_values_handling arg to GLM

Improvement

[PUBDEV-1617] - Matt's new merge (aka join) integrated into H2O
[PUBDEV-2822] - Improved handling of missing values in tree models (training and testing)
[PUBDEV-3060] - IPv6 documentation
[PUBDEV-3066] - Stop GBM models once the effective learning rate drops below 1e-6.
[PUBDEV-3094] - Log input parameters during boot of H2O

###Turchin (3.8.2.9) - 6/10/2016

Bug

[PUBDEV-2920] - Python apply() doesn't recognize % (modulo) within lambda function
[PUBDEV-2940] - Documentation: Add RoundRobin histogram_type to GBM/DRF
[PUBDEV-2957] - Add "seed" option to GLM in documentation
[PUBDEV-2973] - Documentation: Update supported Hadoop versions
[PUBDEV-2981] - Models hang when max_runtime_secs is too small
[PUBDEV-2982] - Default min/max_mem_size to gigabytes in h2o.init
[PUBDEV-2997] - Add "ignore_const_cols" argument to glm and gbm for Python API
[PUBDEV-2999] - AIOOBE in GBM if no nodes are split during tree building
[PUBDEV-3004] - Negative R^2 (now NaN) can prevent early stopping
[PUBDEV-3011] - Two grid sorting methods in Py API - only one works sometimes

New Feature

[PUBDEV-2743] - Add seed argument to GLM
[PUBDEV-2917] - Add cor() function to Rapids

Task

[PUBDEV-3005] - Verify checkpoint argument in h2o.gbm (for R)

Improvement

[PUBDEV-2040] - Sync up argument names in `h2o.init` between R and Python
[PUBDEV-2996] - Change `getjar` to `get_jar` in h2o.download_pojo in R
[PUBDEV-2998] - Change min_split_improvement default value from 0 to 1e-5 for GBM/DRF
[PUBDEV-3013] - Allow specification of "AUC" or "auc" or "Auc" for stopping_metrics, sorting of grids, etc.

###Turchin (3.8.2.8) - 6/2/2016

Bug

[PUBDEV-2985] - Make Random grid search consistent between clients for same parameters
[PUBDEV-2987] - Allow learn_rate_annealing to be passed to H2OGBMEstimator constructor in Python API
[PUBDEV-2989] - Fix typo in GBM/DRF Python API for col_sample_rate_change_per_level - was misnamed and couldn't be set

New Feature

[PUBDEV-2979] - Add a new metric: mean misclassification error for classification models

Improvement

[PUBDEV-2972] - No longer print negative R^2 values - show NaN instead
[PUBDEV-2984] - Add xval=True/False as an option to model_performance() in Python API

###Turchin (3.8.2.6) - 5/24/2016

Bug

[PUBDEV-1899] - Number of active predictors is off by 1 when Intercept is included
[PUBDEV-2942] - GLM with cross-validation AIOOBE (+ Grid-Search + Multinomial, may be related)
[PUBDEV-2943] - Improved accuracy for histogram_type="QuantilesGlobal" for DRF/GBM

New Feature

[PUBDEV-1705] - GLM needs 'seed' argument for new (random) implementation of n-folds
[PUBDEV-2743] - Add seed argument to GLM

Improvement

[PUBDEV-2928] - Remove _Dev from file name _DataScienceH2O-Dev
[PUBDEV-2945] - Clean up overly long and duplicate error message in KeyV3
[PUBDEV-2953] - Allow the user to pass column types of an existing H2OFrame during Parse/Upload in R and Python
[PUBDEV-2954] - Tweak Parser Heuristic
[PUBDEV-2955] - GLM improvements and fixes

###Turchin (3.8.2.5) - 5/19/2016

Technical task

[PUBDEV-2909] - Documentation update for relevel

Bug

[PUBDEV-2282] - DRF: cannot compile pojo
[PUBDEV-2304] - GBM pojo compile failures
[PUBDEV-2878] - Bug in h2o-py H2OScaler.inverse_transform()
[PUBDEV-2880] - Add NAOmit() to Rapids
[PUBDEV-2897] - AIOOBE in Vec.factor (due to Parse bug?)
[PUBDEV-2903] - In grid search, max_runtime_secs without max_models hangs
[PUBDEV-2933] - GBM's fold_assignment = "Stratified" breaks with missing values in response column

New Feature

[PUBDEV-2729] - Implement h2o.relevel, equivalent of base R's relevel function
[PUBDEV-2857] - Add Kerberos authentication to Flow
[PUBDEV-2893] - Summaries Fail in rdemo.citi.bike.small.R
[PUBDEV-2895] - DimReduction for EasyModelAPI
[PUBDEV-2915] - Make histograms truly adaptive (quantiles-based) for DRF/GBM

Task

[PUBDEV-2902] - Add a list of gridable parameters to the docs
[PUBDEV-2904] - Add relevel() to Python API

Improvement

[PUBDEV-2905] - Improve the progress bar based on max_runtime_secs & max_models & actual work
[PUBDEV-2908] - Improve GBM/DRF reproducibility for fixed parameters and hardware
[PUBDEV-2911] - Check sanity of random grid search parameters (max_models and max_runtime_secs)
[PUBDEV-2912] - Add Job's remaining time to Flow
[PUBDEV-2919] - Add enum option 'histogram_type' to DRF/GBM (and remove random_split_points)
[PUBDEV-2923] - JUnit: Separate POJO namespace during junit testing

###Turchin (3.8.2.3) - 4/25/2016

Bug

[PUBDEV-2852] - Incorrect sparse chunk getDoubles() extraction

New Feature

[PUBDEV-2825] - Create h2o.get_grid
[PUBDEV-2834] - Implement distributed Aggregator for visualization
[PUBDEV-2835] - Add col_sample_rate_change_per_level for GBM/DRF
[PUBDEV-2836] - Add learn_rate_annealing for GBM
[PUBDEV-2837] - Add random cut points for histograms in DRF/GBM (ExtraTreesClassifier)
[PUBDEV-2851] - Add limit on max. leaf node contribution for GBM

Task

[PUBDEV-2848] - Add tests for early stopping logic (stopping_rounds > 0)

Improvement

[PUBDEV-2877] - Make NA split decisions internally more consistent

###Turchin (3.8.2.2) - 4/8/2016

Bug

[PUBDEV-2820] - Implement max_runtime_secs to limit total runtime of building GLM models with and without cross-validation enabled

New Feature

[PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

###Turchin (3.8.2.1) - 4/7/2016

Bug

[PUBDEV-2766] - AIOOBE for quantile regression with stochastic GBM
[PUBDEV-2770] - Naive Bayes AIOOBE
[PUBDEV-2772] - AIOOBE for GBM if test set has different number of classes than training set
[PUBDEV-2775] - Number of CPUs incorrect in Flow when using a hypervisor
[PUBDEV-2796] - Grid search runtime isn't enforced for CV models
[PUBDEV-2819] - AIOOBE in GLM for dense rows in sparse data

New Feature

[PUBDEV-2540] - Compute and display statistics of cross-validation model metrics
[PUBDEV-2774] - Add keep_cross_validation_fold_assignment and more CV accessors
[PUBDEV-2776] - Set initial weights and biases for DL models
[PUBDEV-2791] - Control min. relative squared error reduction for a node to split (DRF/GBM)
[PUBDEV-2806] - On-the-fly interactions for GLM
[PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

Task

[PUBDEV-2055] - Create test cases to show that POJO prediction behavior can be different than in-h2o-model prediction behavior

Improvement

[PUBDEV-2620] - Populate start/end/duration time in milliseconds for all models
[PUBDEV-2695] - Consistent handling of missing categories in GBM/DRF (and between H2O and POJO)
[PUBDEV-2736] - Alert the user if columns can't be histogrammed due to numerical extremities
[PUBDEV-2756] - GLM should generate error if user enter an alpha value greater than 1.
[PUBDEV-2763] - Create full holdout prediction frame for cross-validation predictions
[PUBDEV-2769] - Support Validation Frame and Cross-Validation for Naive Bayes
[PUBDEV-2810] - Add class_sampling_factors argument to DRF/GBM for R and Python APIs

###Turan (3.8.1.4) - 3/16/16

Bug

[PUBDEV-542] - KMeans: Size of clusters in Model Output is different from the labels generated on the training set
[PUBDEV-1976] - GLM fails on negative alpha
[PUBDEV-2718] - countmatches bug
[PUBDEV-2727] - bug in processTables in communication.R
[PUBDEV-2742] - Allow strings to be set to NA

New Feature

[PUBDEV-2719] - Implement Shannon entropy for a string
[PUBDEV-2720] - Implement proportion of substrings that are valid English words
[PUBDEV-2733] - Add utility function, h2o.ensemble_performance for ensemble and base learner metrics
[PUBDEV-2741] - Add date/time and string columns to createFrame.

Task

[PUBDEV-58] - Certify sparkling water on CDH5.2

Improvement

[PUBDEV-277] - Make python equivalent of as.h2o() work for numpy array and pandas arrays

###Turan (3.8.1.3) - 3/6/16

Bug

[PUBDEV-2644] - Collinear columns cause NPE for P-values computation
[PUBDEV-2721] - Update default values in h2o.glm.wrapper from -1 and NaN to NULL
[PUBDEV-2722] - AIOOBE in NewChunk

New Feature

[PUBDEV-2111] - Hive UDF form for Scoring Engine POJO for H2O Models

###Turan (3.8.1.2) - 3/4/16

Bug

[PUBDEV-2713] - /3/scalaint fails with a 404

New Feature

[PUBDEV-2711] - Allow DL models to be pretrained on unlabeled data with an autoencoder

Improvement

[PUBDEV-2708] - H2O Flow does not contain CodeMirror library
[PUBDEV-2710] - Model export fails: parent directory does not exist
[PUBDEV-2712] - Flow doesn't show DL AE error (MSE) plot
[PUBDEV-2717] - Do not compute expensive quantiles during h2o.summary call

###Turan (3.8.1.1) - 3/3/16

Technical task

[PUBDEV-2705] - implement random (stochastic) hyperparameter search

Bug

[PUBDEV-2639] - Parse: Incorrect assertion error caused by very large few column data
[PUBDEV-2649] - h2o::|,& operator handles NA's differently than base::|,&
[PUBDEV-2655] - h2o::as.logical behavior is different than base::as.logical
[PUBDEV-2682] - Importing CSV file is not working with "java -jar h2o.jar -nthreads -1"
[PUBDEV-2685] - Allow DL reproducible mode to work with user-given train_samples_per_iteration >= 0
[PUBDEV-2690] - Grid Search NPE during Flow display after grid was cancelled
[PUBDEV-2693] - NPE in initialMSE computation for GBM
[PUBDEV-2696] - DL checkpoint restart doesn't honor a change in stopping_rounds

New Feature

[PUBDEV-1883] - Add option to train with mini-batch updates for DL
[PUBDEV-2698] - Return leaf node assignments for DRF + GBM

Improvement

[PUBDEV-2674] - Change default functionality of as_data_frame method in Py H2O
[PUBDEV-2697] - Add method setNames for setting column names on H2O Frame
[PUBDEV-2703] - NPE in Log.write during cluster shutdown

###Tukey (3.8.0.6) - 2/23/16

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####System

PUBDEV-2362: Handling Sparsity with Missing Values
PUBDEV-2683: Fix for erroneous conversion of NaNs to zeros during rebalancing
PUBDEV-2684: Remove bigdata test file (not available)

####Bug Fixes

The following changes resolve incorrect software behavior:

#####Algorithms

PUBDEV-2678: CV models during grid search get overwritten

#####R

PUBDEV-2648: Di/trigamma handle NA
PUBDEV-2679: Progress bar for grid search with N-fold CV is wrong when max_models is given

###Tukey (3.8.0.1) - 2/10/16

####New Features

These changes represent features that have been added since the previous release:

#####API

PUBDEV-1798: Ability to conduct a randomized grid search with optional limit of max. number of models or max. runtime
PUBDEV-1822: Add score_tree_interval to GBM to score every n'th tree
PUBDEV-2311: Make it easy for clients to sort by model metric of choice
PUBDEV-2548: Add ability to set a maximum runtime limit on all models
PUBDEV-2632: Return a grid search summary as a table with desired sort order and metric

#####Algorithms

HEXDEV-495: Added ability to calculate GLM p-values for non-regularized models
PUBDEV-853: Implemented gain/lift computation to allow using predicted data to evaluate the model performance
PUBDEV-2118: Compute the lift metric for binomial classification models
PUBDEV-2212: Add absolute loss (Laplace distribution) to GBM and Deep Learning
PUBDEV-2402: Add observations weights to quantile computation
PUBDEV-2469: For GBM/DRF, add ability to pick columns to sample from once per tree, instead of at every level
PUBDEV-2594: Quantile regression for GBM and Deep Learning
PUBDEV-2625: Add recall and specificity to default ROC metrics

#####Python

HEXDEV-399: Added support for Python 3.5 and better (in addition to existing support for 2.7 and better)

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

PUBDEV-2233: Adjust string substitution and global string substitution to do in place updates on a string column.

#####Python

PUBDEV-1981: Fix layout issues of Python docs.
PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
PUBDEV-2257: Table printout in Python doesn't warn the user about truncation
PUBDEV-2460: Version mismatch message directs user to get a matching download
HEXDEV-527: Implement secure Python h2o.init
PUBDEV-2504: Check and print a warning if a proxy environment variable is found

#####R

PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
PUBDEV-2257: Table printout in R doesn't warn the user about truncation
PUBDEV-2430: Improve R's reporting on quantiles
PUBDEV-2460: Version mismatch message directs user to get a matching download

#####Flow

PUBDEV-2407: Improve model convergence plots in Flow
PUBDEV-2596: Flow shows empty logloss box for regression models
PUBDEV-2617: Flow's histogram doesn't cover the full support

#####System

HEXDEV-436: exportFile should be a real job and have a progress bar
PUBDEV-2459: Improve parse chunk size heuristic for better use of cores on small data sets
PUBDEV-2606: Print all columns to stdout for Hadoop jobs for easier debugging

####Bug Fixes

The following changes resolve incorrect software behavior:

#####API

PUBDEV-2633: Ability to extend grid searches with more models

#####Algorithms

PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
PUBDEV-2114: Set GLM to give error when lower bound > upper bound in beta contraints
PUBDEV-2190: Set GLM to default to a value of rho = 0, if rho is not provided when beta constraints are used
PUBDEV-2210: Add check for epochs value when using checkpointing in deep learning
PUBDEV-2241: Set warnings about slowness from wide column counts comes before building a model, not after
PUBDEV-2278: Fix docstring reporting in iPython
PUBDEV-2366: Fix display of scoring speed for autoencoder
PUBDEV-2426: GLM gives different std. dev. and means than expected
PUBDEV-2595: Bad (perceived) quality of DL models during cross-validation due to internal weights handling
PUBDEV-2626: GLM with weights gives different answer h2o vs R

#####Python

PUBDEV-2319: sd not working inside group_by
PUBDEV-2403: Parser reads file of empty strings as 0 rows
PUBDEV-2404: Empty strings in Python objects parsed as missing

#####R

PUBDEV-2319: sd not working inside group_by
PUBDEV-2231: Fix bug in summary when zero-count categoricals were present.
PUBDEV-1749: Fix h2o.apply to correctly handle functions (so long as functions contain only H2O supported primitives)

#####System

PUBDEV-1872: Ability to ignore 0-byte files during parse
PUBDEV-2401: /Jobs fails if you build a Model and then overwrite it in the DKV with any other type
PUBDEV-2603: Improve progress bar for grid/hyper-param searches

###Tibshirani (3.6.0.9) - 12/7/15

####New Features

These changes represent features that have been added since the previous release:

#####API

PUBDEV-2189: H2O now allows selection of the non_negative flag in GLM for R and Python

#####Algorithms

PUBDEB-1540: Added Generalized Low-Rank Model (GLRM) algorithm
PUBDEV-2119: Added gains/lift computation
GitHub commit: Added remove_colinear_columns parameter to GLM

#####R

PUBDEV-2079: R now retrieves column types for a H2O Frame more efficiently

#####Python

PUBDEV-2294: Added Python equivalent for h2o.num_iterations
PUBDEV-2233: Added sub and gsub to Python client
GitHub commit: Added weighted quantiles to Python API
PUBDEV-1304: Added sapply operator to Python
PUBDEV-1969: H2O now plots decision boundaries for classifiers in Python

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

GitHub commit: Change in behavior in GLM beta constraints - when ignoring constant/bad columns, remove them from beta_constraints as well
GitHub commit: Added ignore_const_cols to all algos
PUBDEV-2311: Improved ability to sort by model metric of choice in client

#####Python

PUBDEV-2409: H2O now checks for H2O_DISABLE_STRICT_VERSION_CHECK env variable in Python GitHub commit
GitHub commit: H2O now allows l/r values to be null or an empty string
GitHub commit: H2O now accomodates LOAD_FAST and LOAD_GLOBAL in bytecode_to_ast

#####R

PUBDEV-1378: In R, h2o.getTimezone() previously returned a list of one, now it just returns the string

#####System

GitHub commit: Added more tweaks to help various low-memory configurations

####Bug Fixes

The following changes resolve incorrect software behavior:

#####API

PUBDEV-2042: h2o.grid failed when REST API version was not default
PUBDEV-2401: /Jobs failed if you built a Model and then overwrote it in the DKV with any other type GitHub commit
PUBDEV-2392: /3/Jobs failed with exception after running /3/SplitFrame
GitHub commit: PUBDEV-2426 - Fixed error where sd and mean were adjusted to weights even if no observation weights were passed

#####Algorithms

PUBDEV-2396: GLRM validation frames must have the same number of rows as the training frame
PUBDEV-2053: Fixed assertion failure in Deep Learning
PUBDEV-2315: Could not compile POJO using K-means
PUBDEV-2317: Could not compile POJO using PCA
PUBDEV-2320: Could not compile POJO using Naive Bayes
GitHub commit: Fixed weighted mean and standard deviation computation in GLM
GitHub commit: Fixed stopping criteria for lambda search and multinomial in GLM

#####Python

PUBDEV-2262: H2OFrame indexing was no longer Pythonic on Bleeding Edge 10/23
PUBDEV-2278: Trying to get help in python client displayed the frame
PUBDEV-2371: Fixed ASTEQ str_op bug GitHub commit

#####R

PUBDEV-1749: h2o.apply did not correctly handle functions
PUBDEV-2335: R: as.numeric for a string column only converted strings to ints rather than reals
PUBDEV-2319: R: sd was not working inside group_by
PUBDEV-2397: R: Ignore Constant Columns was not an argument in Algos in R like it is in Flow
PUBDEV-2134: When a dataset was sliced, the int mapping of enums was returned
PUBDEV-2408: Improved handling when H2O has already been shutdown in R GitHub commit
PUBDEV-2231: Fixed categorical levels mapping bug

#####System

PUBDEV-2403: Parser read file of empty strings as 0 rows GitHub commit
PUBDEV-2404: Empty strings in python objects were parsed as missing GitHub commit
PUBDEV-2375: Save Model (Deeplearning): the filename for the model metrics file is too long for windows to handle
GitHub commit: Fixed streaming load bug for large files
PUBDEV-2241: Column width slowness warning now prints before model build, not after

###Tibshirani (3.6.0.7) - 11/23/15

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

GitHub commit: Added Iterations and Epochs to DL job status updates, added Iterations to scoring history
GitHub commit: Cleaned up iteration counter to work for checkpointing
GitHub commit: Cleaned up counter iteration logic

####Bug Fixes

The following changes resolve incorrect software behavior:

#####Algorithms

GitHub commit: Fixed scoring speed display for autoencoder, was showing 0 because wrong runtime was used (ms since 1970 instead of actual runtime)

###Tibshirani (3.6.0.2) - 11/5/15

####New Features

#####Algorithms

GitHub commit: Added support for grid search
PUBDEV-2272: Implemented GLRM grid search in R and Python
GitHub commit: PUBDEV-2289: Enabled early convergence-based stopping by default for Deep Learning
GitHub commit: Added L1+LBFGS solver for multinomial GLM

#####Python

GitHub commit: PUBDEV-2289: Added Python API for convergence-based stopping

#####R

GitHub commit: Added .Last to Delete InitID
GitHub commit: PUBDEV-2289: Enabled convergence-based early stopping for R API of Deep Learning

####Enhancements

#####Algorithms

GitHub commit: Enable grid search for Deep Learning parameters overwrite_with_best_model, momentum_ramp, elastic_averaging, elastic_averaging_moving_rate, & elastic_averaging_regularization
GitHub commit: PUBDEV-2289: Stopping tolerance and stopping metric are no longer hidden if stopping_rounds is 0
GitHub commit: Added checks to verify the mean, median, nrow, var, and sd are calculated correctly in groupby
GitHub commit: mean and sd now return lists

#####Python

GitHub commit: [PUBDEV-2257] H2O now gives users [row x col] of Frame in __str__
GitHub commit: sd/var is now sampled for group_by
GitHub commit: Parameter checking is now split between float and strings/unicode
GitHub commit: H2O now only wipes src._ex if src_in_self
GitHub commit: Refactored default arg handling in astfun
GitHub commit: Added new parameters to estimators
GitHub commit: Added session start/end; Python now ends the session on exit
GitHub commit: src and self types are now checked for None
GitHub commit: H2O now passes caches through all prefix ops
GitHub commit: H2O now pushes cached types, names, and ncols forward if possible

#####R

PUBDEV-1951: Removed the R backward compatibility shim
GitHub commit: Added [rows x cols] to print.Frame in R
GitHub commit: sd can now alias sdev in group_by
GitHub commit: Changed .eval.driver to .fetch.data in h2o.getFrame
GitHub commit: Removed debug printing of ==Finalizer on in R
GitHub commit: Added metalearning function

#####System

HEXDEV-475: Added EasyPOJO comments and improvements
GitHub commit: [PUBDEV-2204] Enabled Vec#toCategoricalVec to convert string columns to categorical columns
GitHub commit: apply now works in

####Bug Fixes

#####Algorithms

PUBDEV-2317: PCA: Could not compile POJO
GitHub commit: [PUBDEV-2317] Incorrect PCA code was generated

#####Python

GitHub commit: PUBDEV-2297: Python was not updating exception on job update
GitHub commit: Added missing arguments to DRF/GBM/DL in scikit-learn-like API
GitHub commit: Fixed impute in Python
GitHub commit: Restored ASTRename
GitHub commit: Fixed reference to _quoted in H2O module

#####R

GitHub commit: [PUBDEV-2301, PUBDEV-2314] Hidden grid parameter was passed incorrectly from R
GitHub commit: H2O now uses deep copy when using assign from one global to another
GitHub commit: Fixed getFrame and directory unlink

#####System

PUBDEV-1824: h2o.init() failed to launch on the Docker image
PUBDEV-2043: Deep Learning generated an assertion error
GitHub commit: Fixed rm handling of non-frames
GitHub commit: Fixed log_level
GitHub commit: Fixed eq2 slot assign
GitHub commit: Fixed a bug found during benchmarking for small data
GitHub commit: PUBDEV-2295: User-given weights were accidentally passed to N-fold CV models
GitHub commit: Fixed NPE in Grid Schema
GitHub commit: PUBDEV-2289: Convergence checks are now numerically stable

###Slotnick (3.4.0.1)

####New Features

#####API

GitHub commit: Added NumList and StrList
PUBDEV-674: Added REST API and R / Python for grid search

#####Algorithms

GitHub commit: Added option in PCA to use randomized subspace iteration method for calculation
GitHub commit: Deep Learning: Added target_ratio_comm_to_comp to R and Python client APIs
GitHub commit: PUBDEV-1247: Added stochastic GBM parameters (sample_rate and col_sample_rate) to R/Py APIs
PUBDEV-1450: GLRM has been tested and removed from "experimental" status

#####Hadoop

GitHub commit: Added support for H2O with HDP2.3

#####Python

GitHub commit: Added _to_string method
PUBDEV-2166: Added Python grid client GitHub commit
PUBDEV-2098: Scoring history in Python is now visualized (GitHub commit)
GitHub commit: PUBDEV-2020: Python implementation and test for split_frame()

#####R

This software release introduces changes to the R API that may cause previously written R scripts to be inoperable. For more information, refer to the following link.

GitHub commit: Added h2o.getTypes() to the R wrapper
GitHub commit: Added ability to set col.types with a named list
GitHub commit: Added h2o.getId() to get the back-end distributed key/value store ID from a Frame
GitHub commit: Added column types to H2O frame in R, which allows R to set the correct column types when as.data.frame() is used on an H2O frame
GitHub commit: Added @export for exported R functions

#####System

GitHub commit: Added string length util for Enum columns
[GitHub commit: Added pass-through version of toCategoricalVec(), toNumericVec(), and toStringVec() to Vec.java for code simplicity and backwards compatibility
GitHub commit: Added string column handling to StrSplit()

#####Web UI

PUBDEV-1977: Added grid search to Flow web UI

####Enhancements

#####Algorithms

PUBDEV-467: Show Frames for DL weights/biases in Flow
PUBDEV-1847: DRF/GBM: nbins_top_level is now configurable
GitHub commit: Deep Learning: Scoring time is now shown in the logs
GitHub commit: Sped up GBM split finding by dynamically switching between single and multi-threaded based on workload
PUBDEV-1247: Implemented Stochastic GBM
GitHub commit: Parallelized split finding for GBM/DRF (useful for large numbers of columns and nbins).
GitHub commit: Added improvements to speed up DRF (up to 35% faster) and stochastic GBM (up to 5x faster)
GitHub commit: Added some straight-forward optimizations for GBM histogram building
GitHub commit: GLRM is now deterministic between one vs. many chunks
GitHub commit: Input parameters are now immutable
GitHub commit: PUBDEV-2135: Cleaned up N-fold CV model parameter sanity checking and error message propagation; now checks all N-fold model parameters upfront and lets the main model carry the message to the user
GitHub commit: PUBDEV-2130: N-fold CV models are no longer deleted when the main model is deleted
GitHub commit: PUBDEV-2107: The title in plot.H2OBinomialMetrics is now editable
GitHub commit: Parse Python lambda (bytecode -> ast -> rapids)
GitHub commit: PUBDEV-1847: Cleaned up/refactored GBM/DRF
GitHub commit: Updated MeanSquare to Quadratic for DL
GitHub commit: PUBDEV-2133: Speed up Enum mapping between train/test from O(N^2) to O(N*log(N))
GitHub commit: Added GLRM scoring history with step size and average change in objective function value
GitHub commit: SVD now outputs the V matrix as a frame with a frame key, rather than a double array in the API
GitHub commit: Modified k-means++ initialization in GLRM to set X to inverse of cluster distance with sum normalized to one, for each observation in training data
GitHub commit: Increased GBM worker thread priority to avoid deadlock with high parallel GBM job counts
GitHub commit: Added input parameter svd_method to GLRM

#####Python

GitHub commit: centers_std is now returned as a list of columns
GitHub commit: str(Frame) no longer returns an ID; updated ExprNode _to_string to accomodate
GitHub commit: Changed default setting for _isAllAscii to false
GitHub commit: Fixed var to return scalar/frame based on nrow
GitHub commit: Python now checks ncol, not nrow
PUBDEV-1060: Python's h2o.import_frame() now matches R's importFile() parameters where applicable
PUBDEV-1960: Python now uses the streaming endpoint /3/DownloadDataset.bin
PUBDEV-2223: Added normalization and standardization coefficients to the model output in Python
GitHub commit: Renamed logging to h2o_logging to avoid conflict with original logging package
GitHub commit: H2O now recognizes additional parameters (such as column names) for Python objects
GitHub commit: head and tail no longer download the entire dataset
GitHub commit: Truncated DF in head and tail before calling /DownloadDataset
GitHub commit: head() and tail() now default to pretty printing in Python
GitHub commit: Moved setup functionality from parse to parse setup; col_types and na_strings can now be dictionaries
GitHub commit: Updated H2OColSelect to supply extra argument
GitHub commit: PUBDEV-2174: Relative tolerance is now used for floating point comparison
GitHub commit: Added more cloud health output to run.py
GitHub commit: When Pandas frames are returned, they are now wrapped to display nicely in iPython

#####R

GitHub commit: Added null check
PUBDEV-2185: When appending a vec to an existing data frame, H2O now creates a new data frame while still keeping the original frame in memory
PUBDEV-1959: R now uses the streaming endpoint /3/DownloadDataset.bin
PUBDEV-2020: h2o.splitFrame() in R/Python now uses the runif technique instead of the horizontal slice technique
GitHub commit: Changed T/F to TRUE/FALSE
GitHub commit: xml2 package is now required for rversions package
GitHub commit: Package dependencies are taken into account when installing R packages
GitHub commit: Metrics are now always computer if a dataset is provided (R h2o.performance call)
GitHub commit: Column names are now fetched from H2O
GitHub commit: PUBDEV-2150: Time columns in H2O are now imported as Date columns in R
GitHub commit: h2o.ls() now returns data.frame
GitHub commit: h2o.ls() now returns the whole frame
GitHub commit: Removed unnamed additional parameters (ellipses) in R algos
GitHub commit: Added as.characterto Rapids implementation
GitHub commit: Updated plot.H2OModel in R
GitHub commit: Updated scoring history plot in R for training_frame only
GitHub commit: Instead of : and assign, attr is now used
GitHub commit: Raw strings are now used as accessors
GitHub commit: name.Frame and dimnames.Frame are now visible

#####System

GitHub commit: Added vertical prefetch of all chunks' worth of data for dense rows
PUBDEV-1426: Scoring is now a non-blocking job with a progress bar
GitHub commit: EasyPojo API is now serializable
GitHub commit: Changed parse setup guess when encountering large NA counts to not favor numeric over dates or UUIDs
GitHub commit: Refactored vector type conversion methods into a class called VecUtils
GitHub commit: Cleaned up ASTStrList to handle frames with more than one vector during column conversion; checks types before converting; added several new column type conversions
GitHub commit: If the job is cancelled, scoring is now canceled
GitHub commit: Refactored doAll_numericResult() -> doAll(nout, type, frame) where all output vecs are of the given type
GitHub commit: Improved hash function
GitHub commit: The output of _train.get() is now passed to a Frame
GitHub commit: Refactored binary/col ops for aesthetics and maintainability
GitHub commit: Added correct types for new Vecs; CategoricalWrappedVec now exports a utility for enum conversions instead of a constructor
GitHub commit: Mean/sigma values are now printed to the logs after parsing
GitHub commit: PUBDEV-2174: Added some optimizations for some chunks (mostly integers) in RollupStats
GitHub commit: PUBDEV-2174: Added instantiations of Rollups for dense numeric chunks
GitHub commit: PUBDEV-2174: Implemented single-pass variance/stddev calculation for rollups
GitHub commit: PUBDEV-2174: Added hasNA() for chunks
GitHub commit: Reordered args in sub/gsub (astid > astparameter, add string -> numeric
GitHub commit: Ensured all chunks get closed
GitHub commit: NewChunk.addString() now accepts a Java string or BufferedString, eliminating needless conversion to a BufferedString before inserting into the NewChunk buffer. Improves efficiency of several ASTStrOps as well as converting Categorical columns to String columns.
GitHub commit: Renamed enums to categoricals system-wide
GitHub commit: Renamed ValueString -> BufferedString
GitHub commit: Removed redundant frame creation; added Java comments to each string utility; changed RAPIDS name of gsub -> replaceall and sub -> replacefirst; added nchar utility to the R client; updated comments in Python and R client
GitHub commit: All NA chunks are now handled in string ops
GitHub commit: Added ability for string utils to handle NA chunks
GitHub commit: Added the ability to handle duplicate rows to merge
GitHub commit: countMatches utilities now only work on string columns
GitHub commit: Changed names of SubStr and GSubStr to ReplaceFirst and ReplaceAll; both methods now only accept string columns as input
GitHub commit: Changed toUpper and toLower to only work on string columns; includes an optimzied version of each method as well as a UTF-safe version
GitHub commit: CStrChunks now track whether they are pure ASCII to allow StringUtilities to use optimized versions of the utilities that operate directly on the string buffer
GitHub commit: Moved frame function to ArrayUtils
GitHub commit: Removed categorical versions of trim() and length()
GitHub commit: Changed the merge defaults to match the implementation
GitHub commit: Merge no longer uses a by argument
GitHub commit: Added trim and length functionality for string columns
GitHub commit: HEXDEV-442: Improved POJO handling
GitHub commit: Config files are now transferred using a hexstring to avoid issues with Hadoop XML parsing
GitHub commit: HEXDEV-445: Added isNA check
GitHub commit: Means, mults, modes, and size now do bulk rollups
GitHub commit: Increased priority of model builder Driver classes to prevent deadlock when bulk-launching parallel unrelated model builds
GitHub commit: Renamed Currents to Rapids
GitHub commit: CRAN-based R clients are now set to opt-out by default
GitHub commit: Assembly states are now saved in the DKV

#####Web UI

PUBDEV-1961: Flow now uses the streaming endpoit /3/DownloadDataset.bin

####Bug Fixes

#####Algorithms

GitHub commit: Fixed bug with CategoricalWrappedVec
PUBDEV-1664: Corrected math for GBM Tweedie with offsets/weights
PUBDEV-1665: Corrected math for GBM Poisson with offsets/weights
PUBDEV-2130: Deleting Deep Learning n-fold models resulted in a java.lang.AssertionError
GitHub commit: Fixed GLM with nfolds
GitHub commit: Updated GLM InitTsk to run at +1 priority level to avoid deadlock when launching hundreds of GLMs in parallel
GitHub commit: Column names (feature names) are now named correctly for the exported weight matrix connecting the input to the first hidden layer
GitHub commit: Changed isEnum to isCategorical
GitHub commit: Cleaned up DRF and GBM; fixed checkpoint restart logic for trees and changed which parameters are configurable
GitHub commit: Fixed incorrect logistic and hinge loss functions and apply to binary numeric columns in {0,1} only
GitHub commit: Fixed a bug where Poisson loss function was calculated incorrectly for values of 0
GitHub commit: Fixed DL POJO for large input columns

#####Python

GitHub commit: nrow was not filling cache correctly
GitHub commit: Fixed typo in Python object upload (header -> col_header)
GitHub commit: Append now does so in place
GitHub commit: Seed was not being set
GitHub commit: Fixed group_by
GitHub commit: Corrected .fromPython
GitHub commit: Corrected Python dict col names
GitHub commit: Fixed null/npe in H2O's fit for sklearn (Windows only)
GitHub commit: get_params now keeps "algo" out of params
GitHub commit: Improved compatibility with sklearn by using "train" as a model build verb and reserving "fit" for sklearn; if "fit" method is attempted, a warning displays
GitHub commit: Fixed accessor in Python model predict

#####R

GitHub commit: Fixed is.numeric
GitHub commit: Fixed h2o.anyFactor and h2o.impute
GitHub commit: Fixed levels
PUBDEV-1808: h2o.splitFrame was not splitting randomly in R
GitHub commit: Fixed range in R
GitHub commit: PUBDEV-2020: Fixed variable name for case where destination_frame is provided.
PUBDEV-2198: h2o.table ran slower than h2o.groupby by magnitudes
GitHub commit: Fixed location of datafile for for R example code
GitHub commit: Fixed length(column.names)==number_columns check
GitHub commit: Parse types can be specified by column index or column name, but not both
GitHub commit: Added connection (close HTTP header) to improve jetty connection pool behavior
GitHub commit: Added a sensible min on N
GitHub commit: Added Windows binaries to R package repo
GitHub commit: Fixed h2o.weights to show frame as output
GitHub commit: Fixed type conversion for time columns when ingested by as.data.frame()
GitHub commit: Fixed h2o.merge R interface
GitHub commit: head and tail now always return data.frame
GitHub commit: Fixed a bug in GLRM init in R
GitHub commit: Fixed bug in h2o.summary (constant categorical columns)
GitHub commit: Fixed bug in plot.H2OModel
PUBDEV-1974: When imputing columns from R, many temp files were created, which did not occur in Flow

#####System

PUBDEV-2250: During parsing, SVMLight-formatted files failed with an NPE GitHub commit
PUBDEV-2213: During parsing, alphanumeric data in a column was converted to missing values and the column was assigned a type of int
PUBDEV-1990: Spaces are now permitted in the Flow directory name
PUBDEV-1037: Space in the user name was preventing H2O from starting
GitHub commit: Fixed VecUtils.copyOver() to accept a column type for the resulting copy
GitHub commit: Fixed Vec.preWriting so that it does not use an anonymous inner task which causes the entire Vec header to be passed
GitHub commit: Fixed parse to mark categorical references in ParseWriter as transient (enums must be node-shared during the entire multiple parse task)
GitHub commit: PUBDEV-2182: Fixed DL checkpoint restart with given validation set after R (currents) behavior changed; now the validation set key no longer necessarily matches the file name
GitHub commit: Fixed makeCon memory leak when redistribute=T
GitHub commit: PUBDEV-2174: Fixed sigma calculation for sparse chunks
GitHub commit: Restored pre-existing string manipulation utilities for categorical columns
GitHub commit: Fixed syncRPackages task so it doesn't run during the normal build process
GitHub commit: Fixed intermittent failures caused by different default timezone settings on different machines; sets needed timezone before starting test
GitHub commit: Fixed error message for countmatches
GitHub commit: PUBDEV-1443: Fixed size computation in merge
GitHub commit: Fixed h2o.tabulate() to work in multi-node mode
GitHub commit: Fixed integer overflow in printout of CM to TwoDimTable

###Slater (3.2.0.7) - 10/09/15

####Bug Fixes

GitHub commit: Fix Java 6 compatibility

The Java 7 API call _rawChannel.setOption(StandardSocketOptions.TCP_NODELAY, true); has been replaced by the Java 6 API call _rawChannel.socket().setTcpNoDelay(true);

The Java 7 API call sock.getRemoteAddress()) has been replaced by sock.socket().getRemoteSocketAddress()

###Slater (3.2.0.5) - 09/24/15

####Enhancements

#####Algorithms

PUBDEV-2133: Enum test/train mapping is faster (GitHub commit)

PUBDEV-2030: Improved POJO support to DRF

###Slater (3.2.0.3) - 09/21/15

####New Features

#####R

PUBDEV-2078: H2O now returns per-feature reconstruction error for h2o.anomaly() (GitHub commit)

####Enhancements

#####Algorithms

GitHub commit: Added back support for sparse activations in DL; currently changes results as numerical values are de-scaled only, no standardized

#####Python

GitHub commit: Adjusted import_file in Python to accept the same parameters as import_file in R

#####R

GitHub commit: H2O now sets CRAN-based R clients to permanent opt-out.
GitHub commit: Modified output of h2o.tabulate in R
GitHub commit: Added default plotting for models in R
GitHub commit: Pre-pended graphics pkg to plot.H2OModel methods

####Bug Fixes

#####Algorithms

PUBDEV-2091: All algos: when offset is the same as the response, all train errors should be zero (GitHub commit)
GitHub commit: Fixed DL POJO for large input columns

#####R

GitHub commit: Fixed bugs in model plotting in R
GitHub commit: Fixed bugs in R plot.H2OModel for DL
GitHub commit: Fixed bug in plot.H2OModel

#####System

PUBDEV-1850: Parse not setting NA strings properly (GitHub commit)
GitHub commit: H2O now escapes XML entities
GitHub commit: Fixed Java 6 build -replaced AutoCloseable with Closeable
GitHub commit: Restored code that was needed for detecting NA strings

###Slater (3.2.0.1) - 09/12/15

####New Features

#####Algorithms

GitHub: PUBDEV-1888: Added loss function calculation for DL.
GitHub: Set more parameters for GLM to be gridable.
GitHub: [KMeans] Enable grid search with max_iterations parameter.
GitHub: Add kfold column builders
GitHub: Add stratified kfold method

#####Python

PUBDEV-684: Add nfolds to R/Python
GitHub: Improved group-by functionality
GitHub: Added python example for downloading glm pojo.
GitHub: Added countmatches to Python along with a test.
GitHub: Added support for getting false positive rates and true positive rates for all thresholds from binomial models; makes it easier to calculate custom metrics from ROC data (like weighted ROC)

#####R

PUBDEV-1788: Added a factor function that will allow the user to set the levels for a enum column GitHub
PUBDEV-1881: Fixed bug in h2o.group_by for enumerator columns
GitHub: Refactor SVD method name and add svd_method option to R package to set preferred calculation method
PUBDEV-2071: Accept columns of type integer64 from R through as.h2o()

#####Sparkling Water

PUBDEV-282: Support Windows OS in Sparkling Water

#####System

HEXDEV-120: Switch from NanoHTTPD to Jetty
GitHub: Allow for "most" and "mode" in groupby
GitHub: Added NA check to checking for matches in categorical columns
PUBDEV-1470: Dropped UDP mode in favor of TCP
PUBDEV-1431: /3/DownloadDataset.bin is now a registered handler in JettyHTTPD.java. Allows streaming of large downloads from H2O.GitHub
PUBDEV-1865: Implemented per-row 1D, 2D and 3D DCT transformations for signal/image/volume processing
PUBDEV-1686: LDAP Integration
HEXDEV-381: LDAP Integration
HEXDEV-224: Added https support
GitHub: Added mapr5.0 version to builds
GitHub: Add Vec.Reader which replaces lost caching

#####Web UI

GitHub: Disallow N-fold CV for GLM when lambda-search is on.
GitHub: Added typeahead for http and https.
PUBDEV-1821: Added Save Model and Load Model

####Enhancements

#####Algorithms

GitHub: Don't allocate input dropout helper if input_dropout_ratio = 0.
PUBDEV-1920: Datasets : Unbalanced sparse for binomial and multinomial
GitHub: Major code cleanup for DL: Remove dead code, deprecate sparse/col_major.
PUBDEV-1942: Use prior class probabilities to break ties when making labels GitHub
GitHub: Update DL perf Rmd file to get the overall CM error.
GitHub: Enable training data shuffling if train_samples_per_iteration==0 and reproducible==true
GitHub: Checkpointing for DL now follows the same convention as for DRF/GBM.
GitHub: No longer do sampling with replacement during training with shuffle_training_data
GitHub: Add printout of sparsity ratio for double chunks.
GitHub: Check memory footprint for Gram matrix in PCA and SVD initialization
GitHub: Print more fill ratio debugging.
GitHub: Fix the RNG for createFrame to be more random (since we are setting the seed for each row).
PUBDEV-2010: Improve reporting of unstable DL models GitHub
PUBDEV-2018: Improve auto-tuning for DL on large clusters / large datasets GitHub
GitHub: Add input parameter to h2o.glrm indicating whether to ignore constant columns
GitHub: Missing enums are imputed using the majority class of the column. For other types of missing categorical, just round the mean to the nearest integer.
GitHub: Skip rows in training frame with missing value(s) if requested
GitHub: Speed up direct SVD by working with transpose directly
GitHub: Fix a bug in initialization of SVD and change l2 norm to sum of squared error in convergence test.
GitHub: Use absolute value for mean weight and bias checks.
GitHub: No longer leak constant chunks during AE scoring/reconstruction.
GitHub: No longer differentiate between DL model instabilitites (weights vs biases).
GitHub: Make method static, where possible.
GitHub: Make GLRM seeding independent of number of chunks.

#####API

GitHub: Added REST end-points for glrm,svd,pca,naive bayes algorithms.
GitHub: Added unicode to frame getter possibilities
GitHub: Added proper lookup of offset/weights/fold_column
GitHub: Data should be eagered before download_csv.
GitHub: Simplified model builder
GitHub: Added None as default for "on" field
GitHub: Removed all of the unnecessary calls to h2o.init and removed the unnecessary environment variable for version checking during testing
PUBDEV-2064: rename the coordinate decent solvers in the REST API / Flow to (experimental)

#####Grid Search

GitHub: Added check that x is not null before verifying data in unsupervised grid search algorithm
GitHub: Made naivebayes parameters gridable.
PUBDEV-1933: Called drf as randomForest in algorithm option GitHub
GitHub: Validation of grid parameters against algo /parameters rest endpoint.
PUBDEV-1979: Train N-fold CV models in parallel GitHub
PUBDEV-1978: grid: would be good to add to h2o.grid R help example, how to access the individual grid models

#####Python

GitHub: Refactored into h2o.system_file so it's parallel to R client.
GitHub: Added h2o_deprecated decorator
GitHub: Use import_file in import_frame
GitHub: Handle a list of columns in python group-by api
GitHub: Use pandas if available for twodimtables and h2oframes
GitHub: Transform the parameters list into a dict with keys being the parameter label
GitHub: Added pop option which does inplace update on a frame (Frame.remove)
GitHub: ncol,dim,shape, and friends are now all properties
PUBDEV-193: Write python version of h2o.init() which knows how to start h2o
PUBDEV-1903: Method to get parameters of model in Python API
GitHub: Allow for single alpha specified not be in a list
GitHub: Updated endpoint for python client download_csv
GitHub: Allow for enum in scale/mean/sd (ignore or give NA)
GitHub: Allow for n_jobs=-1 and n_jobs > 1 for Parallel jobs
GitHub: Added frame_id property to frame
GitHub: Removed remaining splats on dicts
GitHub: Removed need to splat pass thru args
GitHub: Added get_jar flag to download_pojo

#####R

PUBDEV-1866: Rewrote h2o.ensemble to utilize nfolds/fold_column in h2o base learners
GitHub: Added max_active_predictors.
GitHub: Updated REST call from R for model export
PUBDEV-1853: Removed addToNavbar from RequestServer GitHub
GitHub: Add "Open H2O Flow" message.
GitHub: Replaced additive float op by multiplication
GitHub: Reimplement checksum for Model.Parameters
GitHub: Remove debug prints.
PUBDEV-1857: Removed the need for String[] path_params in RequestServer.register() GitHub
PUBDEV-1856: Removed the writeHTML_impl methods from all the schemas
PUBDEV-1854: Made _doc_method optional in the in Route constructors GitHub
PUBDEV-1858: Changed RequestServer so that only one handler instance is created for each Route
GitHub: Swapped out rjson for jsonlite for better handling of odd characters from dataset.
GitHub: Prettify R's grid output.
PUBDEV-1841: R now respects the TwoDimTable's column types
GitHub: Fixes show method for grid object when hyper_params is empty.
GitHub: h2o.levels returns R vector for single column
GitHub: Uses PredictCsv from genmodel now.
GitHub: Exposed stacktraces in R's summary() call.
GitHub: print type of failed value in $<-
GitHub: allow value to be integer in $<-
GitHub: Check for is_client being NULL since older H2O clusters may not have is_client.

#####Sparkling Water

GitHub: Copy content of h2o-dist into target directory.

#####System

GitHub: Rename label fields in prediction object.
GitHub: Uses the original Vec's domain in alignment
GitHub: Added columnName and unknownLevel to PredictUnknownCategoricalLevelException.
PUBDEV-1559: Added compression of 64-bit Reals GitHub
GitHub: Added time information to buildinfo.json.
GitHub: Put build metadata into a json file.
-GitHub: Add time information to buildinfo.json.
GitHub: Delete any prior main CV models of the same key if CV model building is cancelled before the main model started to build.
GitHub: Change loading name parameter to a String to address a Flow issue.
GitHub: Remove extra assertion to avoid NPEs after client call of bulk remove after done() is called but before the finally is done with updateModelOutput.
GitHub: Ensures that date time methods return year/month/day values in the currently set timezone.
GitHub: Frees memory from streamed zip reads after the chunk has been parsed.
GitHub: Unifies categorical strings to UTF-8 and warns the user about all conversion.
GitHub: add isNA checks to scale
GitHub: Do not start UDPRecevier thread (unless running with useUDP option)

#####Web UI

PUBDEV-1961: Flow: use streamining endpoint /3/DownloadDataset.bin

####Bug Fixes

#####Algorithms

PUBDEV-1785: Deadlock while running GBM
GitHub: Fix name for standardized_coefficient_magnitudes.
PUBDEV-1774: Setting gbm's balance_classes to True produces suspect models
PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
GitHub: Set the iters counter during kmeans center initialization correctly
GitHub: fixed parenthesis in GLM POJO generation
GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights
PUBDEV-451: Trees in GBM change for identical models GitHub
PUBDEV-1924: R^2 stopping criterion isn't working GitHub
PUBDEV-1776: GLM: cross-validation bug GitHub
PUBDEV-1682: GLM : Lending club dataset => build GLM model => 100% complete => click on model => null pointer exception GitHub
PUBDEV-1987: error returned on prediction for xval model
PUBDEV-1928: Properly implement Maxout/MaxoutWithDropout GitHub
GitHub: print actual number of columns (was just #cols) in DRF init
PUBDEV-2026: Fix setting the proper job state in DL models GitHub
PUBDEV-1950: Splitframe with rapids is not blocking
PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
PUBDEV-1910: Canceled GBM with CV keeps lock
GitHub: Fix DL checkpoint restart with new data.

#####API

PUBDEV-1955: Change Schema behavior to accept a single number in place of array GitHub
PUBDEV-1914: Iced deserialization fails for Enum Arrays

#####Grid

PUBDEV-1876: Grid: progress bar not working for grid jobs
PUBDEV-1875: Grid: the meta info should not be dumped on the R screen, once the grid job is over
GitHub: [PUBDEV-1876] Fix grid update.
PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation GitHub
HEXDEV-402: R: kmeans grid search doesn't work
PUBDEV-1901: Grid appends new models even though models already exist.
PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation
PUBDEV-1940: Grid: glm grid on alpha fails with error "Expected '[' while reading a double[], but found 1.0"
PUBDEV-1877: Grid: if user specify the parameter value he is running the grid on, would be good to warn him/her
PUBDEV-1938: Grid: randomForest: unsupported grid params and wrong error msg

#####Hadoop

PUBDEV-2036: importModel from hdfs doesn't work
PUBDEV-2027: Clicking shutdown in the Flow UI dropdown does not exit the Hadoop cluster

#####Python

PUBDEV-1789: Python client h2o.remove_vecs (ExprNode) makes bad ast
PUBDEV-1795: Unable to read H2OFrame from Python
PUBDEV-1764: Python importFile does not import all files in directory, only one file GitHub
GitHub: parameter name is "dir" not "path"
PUBDEV-1693: Python: Options for handling NAs in group_by is broken
PUBDEV-1415: Intermittent Unimplemented rapids exception: pyunit_var.py . Also prior test got unimplemented too, but test didn't fail (client wasn't notified)
PUBDEV-1119: Python: Need to be able to access resource genmodel.jar
GitHub: Fix download of pojo in Python.

#####R

GitHub: Fixed bug in h2o.ensemble .make_Z function
PUBDEV-1796: R: h2o.importFile doesn't allow user to choose column type during parse
PUBDEV-1768: R: Fails to return summary on subsetted frame GitHub
PUBDEV-1909: R: Adding column to frame changes string enums in column to numerics
PUBDEV-1936: R: h2o.levels return only the first factor of factor levels
PUBDEV-1869: R: sd function should convert enum column into numeric and calculate standard deviation GitHub
PUBDEV-1246: R: h2o.hist needs to run pretty function for pretty breakpoints to get same results as R's hist GitHub
PUBDEV-1868: R: h2o.performance returns error (not warning) when model is reloaded into H2O
PUBDEV-1723: h2o R : subsetting data :h2o removing wrong columns, when asked to delete more than 1 columns
GitHub: fix h2o.levels issue
PUBDEV-1972: R: setting weights_column = NULL causes unwanted variables to be used as predictors

#####Sparkling Water

PUBDEV-1173: create conversion tasks from primitive RDD
GitHub: Fix return value issue in distribution script.

#####System

HEXDEV-360: getFrame fails on Parsed Data
PUBDEV-366: Fix parsing for high-cardinality categorical features GitHub
PUBDEV-1143: Parse: Cancel parse unreliable; does not work at all times
PUBDEV-1872: Ability to ignore files during parse GitHub
PUBDEV-777: Parse : Parsing compressed files takes too long
PUBDEV-1916: Parse: 2 node cluster takes 49min vs 40sec on a 1 node cluster GitHub
PUBDEV-1431: Convert /3/DownloadDataset to streaming
PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
PUBDEV-1910: Canceled GBM with CV keeps lock GitHub
PUBDEV-1992: CreateFrame isn't totally random
GitHub: Fixes a bug that allowed big buffers to be constantly reallocated when it wasn't needed. This saves memory and time.
GitHub: Fix print statement.
GitHub: Fixed orderly shutdown to work with flatfile.
PUBDEV-1998: Parse : Lending club dataset parse => cancelled by user
PUBDEV-2028: Shutdown => unimplemented error on curl -X POST 172.16.2.186:54321/3/Shutdown.html
PUBDEV-2070: Download frame brings down cluster
PUBDEV-2067: Cannot mix negative and positive array selection
PUBDEV-2024: Save model to HDFS fails

#####Web UI

PUBDEV-2012: Histograms in Flow are slightly off
PUBDEV-2029: exportModel from Flow to HDFS doesn't work

###Simons (3.0.1.7) - 8/11/15

####New Features The following changes represent features that have been added since the previous release:

#####Python

PUBDEV-684: Add nfolds to R/Python

#####Web UI

HEXDEV-390: Print Flow to PDF / Printer

####Enhancements The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

GitHub: add seed to the model building that uses balance_classes, for determinism/repeatability
GitHub: Reduce the frequency at which tiny tree models are printed to stdout: Only print during the first 4 seconds if score_each_iteration is enabled.
GitHub: Only call the limited printout for TwoDimTables during Model.toString () that prints all TwoDimTables of the model._output.
GitHub: Only print up to 10 rows of TwoDimTables in ASCII logs (first/last 5).
GitHub: Remove some overflow/underflow checks: Let exp(x) be small and log(x) be large.
GitHub: Add nbins_top_level parameter to DRF/GBM. Not yet in R.
GitHub: Disallow N-fold CV for GLM when lambda-search is on.

#####API

GitHub: Cleanup of public API of Schema.java. Improve its JavaDoc a lot.

#####Python

PUBDEV-1765: Improve python online documentation
PUBDEV-1497: Python : Weights R tests to be ported from R for GLM/GBM/RF/DL
GitHub: adjust to split frame jobs result
GitHub: allow for update thingy to be a tuple (so rows and columns)
GitHub: when starting h2o jvm with h2o.init(), give h2o child process different id than parent, so it doesn't get killed on Ctrl-C
GitHub: add option to turn off progress bar print out
GitHub: add unicode to frame getter possibilities
GitHub: remove remaining splats on dicts
GitHub: no need to splat pass thru args
GitHub: proper lookup of offset/weights/fold_column
GitHub: data should be eagered before download_csv.
GitHub: simplify model builder
GitHub: use None as default for "on" field
GitHub: add get_jar flag to download_pojo
GitHub:remove all of the unnecessary calls to h2o.init and remove the unnecessary environment variable for version checking during testing

#####R

PUBDEV-1744: Improve help message of h2o.init function
GitHub: add valid expression to list of accepted R CMD check outputs.
GitHub: added h2o.anomaly demo to r package

#####System

GitHub: Add -JJ command line argument to allow extra JVM arguments to be passed.
GitHub: Refactored CSVStream to be more understandable. Fix empty chunk bug.
GitHub: Add hintFlushRemoteChunk to CSVStream.
GitHub: Add parameterized route for frame export
GitHub: allow string vecs to be toEnum'd (with a sensible cap)
GitHub: allow lists of numbers in reducer ops
GitHub: Add warning message during POJO export if offset_column is specified (is not supported)
PUBDEV-1853: cleanup: remove addToNavbar from RequestServer GitHub
GitHub: Add "Open H2O Flow" message.
GitHub: Code refactoring to allow GBM JUnits to work with H2OApp in multi-node mode.
GitHub: Replace additive float op by multiplication
GitHub: Reimplement checksum for Model.Parameters
GitHub: Remove debug prints.
PUBDEV-1857: cleanup: remove the need for String[] path_params in RequestServer.register() GitHub
PUBDEV-1856: cleanup: remove the writeHTML_impl methods from all the schemas
PUBDEV-1854: cleanup: make _doc_method optional in the in Route constructors GitHub
PUBDEV-1858: cleanup: change RequestServer so that only one handler instance is created for each Route

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

PUBDEV-1674: gbm w gamma: does not seems to split at all; all trees node pred=0 for attached data GitHub
PUBDEV-1760: GBM : Deviance testing for exp family
PUBDEV-1714: gbm gamma: R vs h2o same split variable, slightly different leaf predictions
PUBDEV-1755: DL : Math correctness for Tweedie with Offsets/Weights
PUBDEV-1758: DL : Deviance testing for exp family
PUBDEV-1756: DL : Math correctness for Poisson with Offsets/Weights
PUBDEV-1651: null/residual deviances don't match for various weights cases
PUBDEV-1757: DL : Math correctness for Gamma with Offsets/Weights
PUBDEV-1680: gbm gamma: seeing train set mse incs after sometime
PUBDEV-1724: gbm w tweedie: weird validation error behavior
PUBDEV-1774: setting gbm's balance_classes to True produces suspect models
PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
GitHub: Set the iters counter during kmeans center initialization correctly
GitHub: fixed parenthesis in GLM POJO generation
GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights

#####Python

PUBDEV-1779: Fixes intermittent failure seen when Model Metrics were looked at too quickly after a cross validation run.
PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message GitHub
PUBDEV-1630: GBM getting intermittent assertion error on iris scoring in pyunit_weights_api.py
PUBDEV-1770: sigterm caught by python is killing h2o GitHub
PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message
HEXDEV-397: Python fold_column option requires fold column to be in the training data
HEXDEV-394: Python client occasionally throws attached error
GitHub: add missing args to kmeans
GitHub: add missing kmeans params in
GitHub: add missing checkpoint param
PUBDEV-1785: Deadlock while running GBM

#####R

PUBDEV-1830: h2o.glm throws an error when fold_column and validation_frame are both specified
PUBDEV-1660: h2oR: when try to get a slice from pca eigenvectors get some formatting error GitHub
GitHub: fix broken %in% in R
PUBDEV-1831: Cross-validation metrics are not displayed in R (and Python?)
PUBDEV-1840: Autoencoder model doesn't display properly in R (training metrics) GitHub

#####System

PUBDEV-1790: can't convert iris species column to a character column.
PUBDEV-1520: Kmeans pojo naming inconsistency
GitHub: fix parse of range ast
GitHub: Sets POJO file name to match the class name. Prior behavior would allow them to be different and give a compile error.

#####Web UI

PUBDEV-1754: Export frame not working in flow : H2OKeyNotFoundArgumentException

###Simons (3.0.1.4) - 7/29/15

####New Features

#####Algorithms

HEXDEV-220: Tweedie distribution for DL
HEXDEV-219: Poisson distribution for DL
HEXDEV-221: Gamma distribution for DL
PUBDEV-683: Enable nfolds for all algos (where reasonable) GitHub
PUBDEV-1791: Add toString() for all models (especially model metrics) GitHub
GitHub: Enabling model checkpointing for DRF
GitHub: Enable checkpointing for GBM.
PUBDEV-1698: fold assignment in N-fold cross-validation

Python

PUBDEV-386: Expose ParseSetup to user in Python
PUBDEV-1239: Python: getFrame and getModel missing
HEXDEV-334: support rbind in python
PUBDEV-1215: python to have exportFile calll
GitHub: add cross-validation parameter to metric accessors and respective pyunit
PUBDEV-1729: Cross-validation metrics should be shown in R and Python for all models

R

PUBDEV-385: Expose ParseSetup to user in R
GitHub: add mean residual deviance accessor to R interface
GitHub: incorporate cross-validation metric access into the R client metric accessors
GitHub: R interface for checkpointing in RF enabled

#####System

PUBDEV-1735: Add 24-MAR-14 06.10.48.000000000 PM style date to autodetected

####Enhancements

#####API

PUBDEV-1451: design for cross-validation APIs GitHub

#####Algorithms

GitHub: Add proper deviance computation for DL regression.
GitHub: Print GLM model details to the logs.
GitHub: Disallow categorical response for GLM with non-binomial family.
GitHub: Disallow models with more than 1000 classes, can lead to too large values in DKV due to memory usage of 8*N^2 bytes (the Metrics objects which are in the model output)
GitHub: DL: Don't train too long in single node mode with auto-tuning.
GitHub: Use mean residual deviance to do early stopping in DL.
GitHub: Add a "AUTO" setting for fold_assignment (which is Random). This allows the code to reject non-default user-given values if n-fold CV is not enabled.

#####Python

HEXDEV-317: Python has to play nicely in a polyglot, long-running environment
GitHub: simplify ast in python frame slicer
GitHub: add cross validation metrics and mean residual deviance to model show()
GitHub: any to take a frame, simplify python's __contains__

#####R

GitHub: On detaching h2o R package, only shut down H2O instance if it was started by the R client
GitHub: update h2o load

#####System

GitHub: Print a handy message (Open H2O Flow in your web browser) when the cluster comes up like Sparkling Water does.
GitHub: Replace memory leaky RCurl getURL with curlPerform.
GitHub: Add -disable_web parameter.
GitHub: allow numerics in match
GitHub: More refactoring of h2o start. Includes:
- H2OStarter - a generic class to start H2O. It does all dynamic registration
- H2OTestStarter - a generic class to start h2o-core tests
GitHub: Use typed key when it is necessary. Key.make() now returns typed Key. The trick is that type T can be derived by left side of assignment. If it is not possible to derive type of the Key, then developer has to use typed syntax: Key.<Frame>make("myframe.hex") The change simplifies Scala code which will be able to derive type key.
PUBDEV-1793: Add Job state and start/end time to the model's output GitHub
GitHub: add more places to look when trying to start jar from python's h2o.init
GitHub: Cosmetic name changes
GitHub: Fetch local node differently from remote node.
GitHub: Don't clamp node_idx at 0 anymore.
GitHub: Added -log_dir option.

####Bug Fixes

#####API

PUBDEV-776: Schema.parse() needs to be better behaved (like, not crash)

#####Algorithms

PUBDEV-1725: pca:glrm - give bad results for attached data (bec of plus plus initialization)
GitHub: Fix deviance calculation, use the sanitized parameters from the model info, where Auto parameter values have been replaced with actual values
GitHub: Fix offset in DL for exponential family (that doesn't do standardization)
GitHub: Fix a bug where initial Y was set to all zeroes by kmeans++ when scaling was disabled
PUBDEV-1668: GBM: Math correctness for weights
PUBDEV-1783: dl: deviance off for large dataset GitHub
PUBDEV-1667: GBM: Math correctness for Offsets
PUBDEV-1778: drf: reporting incorrect mse on validation set GitHub
GitHub: Fix DRF scoring with 0 trees.

Python

PUBDEV-1260: Python: Requires asnumeric() function
GitHub: python interface: add folds_column to x, if it doesn't already exist in x
PUBDEV-1763: Python : Math correctness tests for Tweedie/Gamma/Possion with offsets/weights
PUBDEV-1762: Python : Deviance tests for all algos in python GitHub
PUBDEV-1671: intermittent: pyunit_weights_api.py, hex.tree.SharedTree$ScoreBuildOneTree@645acd60java.lang.AssertionError at hex.tree.DRealHistogram.scoreMSE(DRealHistogram.java:118), iris dataset GitHub

R

PUBDEV-1257: R: no is.numeric method for H2O objects
PUBDEV-1622: NPE in water.api.RequestServer, water.util.RString.replace(RString.java:132)...got flagged as WARN in log...I would think we should have all NPE's be ERROR / fatal? or ?? GitHub
PUBDEV-1655: h2o.strsplit needs isNA check
PUBDEV-1084: h2o.setTimezone NPE
PUBDEV-1738: R: cloud name creation can't handle user names with spaces

#####System

PUBDEV-1410: apply causes assert errors mentioning deadlock in runit_small_client_mode ...build never completes after hours ..deadlock?
PUBDEV-1195: docker build fails
HEXDEV-362: Bug in /parsesetup data preview GitHub
PUBDEV-1766: H2O xval: when delete all models: get Error evaluating future[6] :Error calling DELETE /3/Models/gbm_cv_13
PUBDEV-1767: H2O: when list frames after removing most frames, get: roll ups not possible vec deleted error GitHub

#####Web UI

PUBDEV-1782: Flow: View Data fails when there is a UUID column (and maybe also a String column)
PUBDEV-1769: xval: cancel job does not work GitHub

###Simons (3.0.1.3) - 7/24/15

####New Features

#####Python

PUBDEV-1734: Add save and load model to python api
PUBDEV-1314: Python needs "str" operator, like R's
GitHub: turn on H2OFrame __repr__

####Enhancements

#####API

GitHub: Increase sleep from 2 to 3 because h2o itself does a sleep 2 on the REST API before triggering the shutdown.

#####System

PUBDEV-1730: Make export file a job GitHub

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

PUBDEV-1743: gbm poisson w weights: deviance off
PUBDEV-1736: gbm poisson with offset: seems to be giving wrong leaf predictions

#####Python

PUBDEV-1731: Python get_frame() results in deleting a frame created by Flow
HEXDEV-389: Split frame from python
HEXDEV-388: python client H2OFrame constructor puts the header into the data (as the first row)

#####R

PUBDEV-1504: Runit intermittent fails : runit_pub_180_ddply.R
PUBDEV-1678: Client mode jobs fail on runit_hex_1750_strongRules_mem.R

#####System

GitHub: Model parameters should be always public.

###Simons (3.0.1.1) - 7/20/15

####New Features

Algorithms

HEXDEV-213: Tweedie distributions for GBM GitHub
HEXDEV-212: Poisson distributions for GBM GitHub
PUBDEV-1115: properly test PCA and mark it non-experimental

#####Python

PUBDEV-1437: Python needs "nlevels" operator like R
PUBDEV-1434: Python needs "levels" operator, like R
PUBDEV-1355: Python needs h2o.trim, like in R
PUBDEV-1354: Python needs h2o.toupper, like in R
PUBDEV-1352: Python needs h2o.tolower, like in R
PUBDEV-1350: Python needs h2o.strsplit, like in R
PUBDEV-1347: Python needs h2o.shutdown, like in R
PUBDEV-1343: Python needs h2o.rep_len, like in R
PUBDEV-1340: Python needs h2o.nlevels, like in R
PUBDEV-1338: Python needs h2o.ls, like in R
PUBDEV-1344: Python needs h2o.saveModel, like in R
PUBDEV-1337: Python needs h2o.loadModel, like in R
PUBDEV-1335: Python needs h2o.interaction, like in R
PUBDEV-1334: Python needs h2o.hist, like in R
PUBDEV-1351: Python needs h2o.sub, like in R
PUBDEV-1333: Python needs h2o.gsub, like in R
PUBDEV-1336: Python needs h2o.listTimezones, like in R
PUBDEV-1346: Python needs h2o.setTimezone, like in R
PUBDEV-1332: Python needs h2o.getTimezone, like in R
PUBDEV-1329: Python needs h2o.downloadCSV, like in R
PUBDEV-1328: Python needs h2o.downloadAllLogs, like in R
PUBDEV-1327: Python needs h2o.createFrame, like in R
PUBDEV-1326: Python needs h2o.clusterStatus, like in R
PUBDEV-1323: Python needs svd algo
PUBDEV-1322: Python needs prcomp algo
PUBDEV-1321: Python needs naiveBayes algo
PUBDEV-1320: Python needs model num_iterations accessor for clustering models, like R's
PUBDEV-1318: Python needs screeplot and plot methods, like R's. (should probably check for matplotlib)
PUBDEV-1317: Python needs multinomial model hit_ratio_table accessor, like R's
PUBDEV-1316: Python needs model scoreHistory accessor, like R's
PUBDEV-1315: R needs weights and biases accessors for deeplearning models
PUBDEV-1313: Python needs "as.Date" operator, like R's
PUBDEV-1312: Python needs "rbind" operator, like R's
PUBDEV-1345: Python needs h2o.setLevel and h2o.setLevels, like in R
PUBDEV-1311: Python needs "setLevel" operator, like R's
PUBDEV-1306: Python needs "anyFactor" operator, like R's
PUBDEV-1305: Python needs "table" operator, like R's
PUBDEV-1301: Python needs "as.numeric" operator, like R's
PUBDEV-1300: Python needs "as.character" operator, like R's
PUBDEV-1293: Python needs "signif" operator, like R's
PUBDEV-1292: Python needs "round" operator, like R's
PUBDEV-1291: Python need transpose operator, like R's t operator
PUBDEV-1289: Python needs element-wise division and multiplication operators, like %/% and %-%in R
PUBDEV-1330: Python needs h2o.exportHDFS, like in R
PUBDEV-1357: Python and R need which operator GitHub
PUBDEV-1356: Python and R needs isnumeric and ischaracter operators
PUBDEV-1342: Python needs h2o.removeVecs, like in R
PUBDEV-1324: Python needs h2o.assign, like in R GitHub
PUBDEV-1296: Python and R h2o clients need "any" operator, like R's
PUBDEV-1295: Python and R h2o clients need "prod" operator, like R's
PUBDEV-1294: Python and R h2o clients need "range" operator, like R's
PUBDEV-1290: Python and R h2o clients need "cummax", "cummin", "cumprod", and "cumsum" operators, like R's
PUBDEV-1325: Python needs h2o.clearLog, like in R
PUBDEV-1349: Python needs h2o.startLogging and h2o.stopLogging, like in R
PUBDEV-1341: Python needs h2o.openLog, like in R
PUBDEV-1348: Python needs h2o.startGLMJob, like in R
PUBDEV-1331: Python needs h2o.getFutureModel, like in R
PUBDEV-1302: Python needs "match" operator, like R's
PUBDEV-1298: Python needs "%in%" operator, like R's
PUBDEV-1310: Python needs "scale" operator, like R's
PUBDEV-1297: Python needs "all" operator, like R's
GitHub: add start_glm_job() and get_future_model() to python client. add H2OModelFuture class. add respective pyunit

R

PUBDEV-1273: Add h2oEnsemble R package to h2o-3
PUBDEV-1319: R needs centroid_stats accessor like Python, for clustering models

#####Rapids

PUBDEV-1635: the equivalent of R's "any" should probably implemented in rapids
PUBDEV-1634: the equivalent of R's cummin, cummax, cumprod, cumsum should probably implemented in rapids
PUBDEV-1633: the equivalent of R's "range" should probably implemented in rapids
PUBDEV-1632: the equivalent of R's "prod" should probably implemented in rapids
PUBDEV-1699: the equivalent of R's "unique" should probably implemented in rapids GitHub

#####System

GitHub: changed to new AMI
PUBDEV-679: Create cross-validation holdout sets using the per-row weights
GitHub: Add user_name. Add ExtensionHandler1.
GitHub: Added auth options to h2o.init().
GitHub: Added H2O.calcNextUniqueModelId().
GitHub: Add ldap arg.

Web UI

HEXDEV-231: Flow: Ability to change column type post-Parse

####Enhancements

#####Algorithms

GitHub: use fixed seed to avoid bad splits with some seeds
GitHub: Change seed to avoid type flip from integer to double after row slicing, which leads to different split decisions
GitHub: Add option during kmeans scoring to return matrix of indicator columns for cluster assignment, which is necessary for initializing GLRM
GitHub: Output number of processed observations in PCA
GitHub: Add validation into PCA with GramSVD
GitHub: Code cleanup of distributions. Also rename _n_folds -> _nfolds for consistency
GitHub: Remove restriction to data frames with more than 1 column
GitHub: Add debugging output for DL auto-tuning.
PUBDEV-556: implement algo-agnostic cross-validation mechanism via a column of weights
GitHub: When initializing with kmeans++ set X to matrix of indicator columns corresponding to cluster assignments, unless closed form solution exists
GitHub: Always print DL auto-tuning info for now.
PUBDEV-1657: pca: would be good to remove the redundant std dev from flow pca model object

#####API

GitHub: Set Content-Type: application/x-www-form-urlencoded for regular POST requests.
HEXDEV-272: Move response_column parameter above ignored_columns parameter GitHub
- All of the fields of a schema are now stored in the leaf child of the class hierarchy. Changed the implementation of fields() to simply return the fields variable of a schema. The function calls H2O.fail() if it attempts to access a field from a non-leaf child. response_column is now moved above ignored_columns for every applicable schema. 'own_fields' is also now renamed to 'fields'
GitHub: Don't use features from servlet api 3.0 or later anymore. Instead save the response status in a thread local variable and fish it out when needed.

#####Python

GitHub: don't use the header of the timezone table for a choice
GitHub: never delete models. ever.
GitHub: add na_rm argument
GitHub: add prod to python interface

#####System

GitHub: use Key instead of Vec in refcnter
GitHub: protect vecs in apply
GitHub: Allows for more than one column to remain unnamed. The new naming will fill in the blanks.
GitHub: Refactoring of hadoop mapper and driver.
GitHub: Remove -hdfs option.
GitHub: Adds more checks for a parse cancel at more stages during the post ingestion file parse.
GitHub: Refactor method name for clarification.
GitHub: Cleans up and comments the freeing of chunks from a parsed file.
GitHub: Since more startup logic is getting added, simplify H2OClientApp as much as possible. Remove H2OClient entirely.
GitHub: Add dedicated AddCommonResponseHeadersHandler handler to set common response headers up-front.
GitHub: More refactoring of startup. Pushed a bunch of code from H2OApp into H2O. Added H2O.configureLogging().
GitHub: Make Progress extend Keyed.
GitHub: Make createServer() protected.
GitHub: model_id should probably be a Key, not Key.
GitHub: Change Jetty version from 9 to 8 to get Java 6 compatibility back.

#####Web UI

PUBDEV-1521: show REST API and overall UI response times for each cell in Flow
HEXDEV-304: Flow: Emphasize run time in job-progress output
PUBDEV-1522: show wall-clock start and run times in the Flow outline
PUBDEV-1707: Hook up "Export" button for datasets (frames) in Flow.

####Bug Fixes

#####Algorithms

PUBDEV-1641: gbm w poisson: get java.lang.AssertionError' at hex.tree.gbm.GBM$GBMDriver.buildNextKTrees on attached data
PUBDEV-1672: kmeans: get AIOOB with user specified centroids GitHub
- Throw an error if the number of rows in the user-specified initial centers is not equal to k.
PUBDEV-1654: pca: gram-svd std dev differs for v2 vs v3 for attached data
GitHub: Fix DL
GitHub: Fix a bug in PCA utilities for k = 1
PUBDEV-1700: nfolds: flow-when set nfold =1 job hangs for ever; in terminal get java.lang.AssertionError
PUBDEV-1706: GBM/DRF: is balance_classes=TRUE and nfolds>1 valid? GitHub
PUBDEV-806: GLM => runit_demo_glm_uuid.R : water.exceptions.H2OIllegalArgumentException
PUBDEV-1696: Client (model-build) is blocked when passing illegal nfolds value. GitHub
PUBDEV-1690: Cross Validation: if nfolds > number of observations, should it default to leave-one-out cross-validation?
PUBDEV-1537: pca: on airlines get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:219) GitHub
PUBDEV-1603: pca: glrm giving very different std dev than R and h2o's other methods for attached data
GitHub: Fix a potential race condition in tree validation scoring.
GitHub: Fix GLM parameter schema. Clean up hasOffset() and hasWeights()

Python

PUBDEV-1627: column name missing (python client)
PUBDEV-1629: python client's tail() header incorrect GitHub
PUBDEV-1413: intermittent assertion errors in pyunit_citi_bike_small.py/pyunit_citi_bike_large.py. Client apparently not notified
PUBDEV-1590: "Trying to unlock null" assertion during pyunit_citi_bike_large.py
PUBDEV-1400: match operator should take numerics

#####R

PUBDEV-1663: R CMD Check failures GitHub
PUBDEV-1695: R CMD Check failing on running examples GitHub
PUBDEV-1721: R: group_by causes h2o to hang on multinode cluster
PUBDEV-1501: Python and R h2o clients need "unique" operator, like R's GitHub - R GitHub - Python
PUBDEV-1711: is.numeric in R interface faulty GitHub
PUBDEV-1719: Intermittent: runit_deeplearning_autoencoder_large.R : gets wrong answer?
PUBDEV-1688: 2 nfolds tests fail intermittently: runit_RF_iris_nfolds.R and runit_GBM_cv_nfolds.R GitHub
PUBDEV-1718: Intermittent: runit_deeplearning_anomaly_large.R : training slows down to 0 samples/ sec GitHub

#####Rapids

PUBDEV-1713: Rapids ASTAll faulty GitHub

Sparkling Water

PUBDEV-1562: Migration to Spark 1.4

System

PUBDEV-1551: Parser: Multifile Parse fails with 0-byte files in directory GitHub
HEXDEV-325: Empty reply when parsing dataset with mismatching header and data column length
PUBDEV-1509: Split frame : Big datasets : On 186K rows 3200 Cols split frame took 40 mins => which is too long
PUBDEV-1438: Column naming can create duplicate column names
PUBDEV-1105: NPE in Rollupstats after failed parse
PUBDEV-1142: H2O parse: When cancel a parse job, key remains locked and hence unable to delete the file GitHub
GitHub: client mode deadlock issue resolution
PUBDEV-1670: Client mode fails consistently sometimes : GBM_offset_tweedie.R.out.txt :
GitHub: nbhm bug: K == TOMBSTONE not key == TOMBSTONE
GitHub: Pulls out a GAID from resource in jar if the GAID doesn't equal the default. Presumably the GAID has been changed by the jar baking program.

Web UI

PUBDEV-872: Flows : Not able to load saved flows from hdfs/local GitHub
PUBDEV-554: Flow:Parse two different files simultaneously, flow should either complain or fill the additional (incompatible) rows with nas
PUBDEV-1527: missing .java extension when downloading pojo GitHub
PUBDEV-1642: Changing columns type takes column list back to first page of columns
PUBDEV-1508: Flow : Import file => Parse => Error compiling coffee-script Maximum call stack size exceeded
PUBDEV-1606: Flow :=> Cannot save flow on hdfs
PUBDEV-1527: missing .java extension when downloading pojo
PUBDEV-1653: Flow: the column names do not modify when user changes the dataset in model builder

###Shannon (3.0.0.26) - 7/4/15

####New Features

#####Algorithms

PUBDEV-1592: Expose standardization shift/mult values in the Model output in R/Python. GitHub

#####Python

GitHub: add h2o.shutdown to python client
GitHub: add h2o.hist and respective pyunit
GitHub: gbm weight pyunit (variable importances)

#####R

HEXDEV-375: Github home for R demos

#####Web UI

PUBDEV-203: Change data type in flow
PUBDEV-1277: Flow needs as.factor and as.numeric after parse

####Enhancements

#####Algorithms

PUBDEV-1494: GBM : Weights math correctness tests in R
PUBDEV-1523: GLM w tweedie: for attached data, R giving much better res dev than h2o
PUBDEV-1396: Offsets/Weights: Math correctness for GLM
PUBDEV-1496: RF : Weights Math correctness tests in R
HEXDEV-366: remove weights option from DRF and GBM in REST API, Python, R
PUBDEV-1553: Threshold in GLM is hardcoded to 0
GitHub: Make min_rows a double instead of int: Is now weighted number of observations (min_obs in R).
GitHub: Don't use sample weighted variance, but full weighted variance.
GitHub: Fix R^2 computation.
GitHub: Skip rows with missing response in weighted mean computation.
_binomial_double_trees disabled by default for DRF (was enabled).
GitHub: Relax tolerance.
HEXDEV-329 : Offset for GBM
HEXDEV-211 : Tweedie distributions for GLM

#####API

PUBDEV-1491: generated REST API POJOS should be compiled and jar'd up as part of the build
GitHub: Change schema for PCA, SVD, and GLRM to version 99

#####Python

GitHub: is factor returns TRUE/FALSE cast to scalar 1/0
GitHub: take a slightly different syntactic approach to dropping column
GitHub: better list comp in interaction call
GitHub: if weights_column argument is specified, attach the column to the training and/or validation frame (if not already specified as part of x/validation_x). if weights_column is not already part of x/validation_x, then a training_frame/validation_frame needs to be provided and the weights column is taken from here. respective pyunit added

#####R

GitHub: better ref handling in the [<- for python and R
GitHub: Pass binomial_double_trees in the R wrapper for DRF.
GitHub: carefully format NAs and non NAs
GitHub: for loop over the x[[j]] to format NAs properly
GitHub: Added example to h2o-r/ensemble/create_h2o_wrappers.R

#####System

GitHub: allow for no y in model_builder
GitHub: Enable auto-flag for Java6 generation.
GitHub: better compression in split frame
PUBDEV-1594: All basic file accessors in PersistHDFS should check file permissions
PUBDEV-1518: getFrames should show a Parse button for raw frames

#####Web UI

PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
PUBDEV-1546: Flow: Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column
PUBDEV-1254: Flow: Add Impute

####Bug Fixes

#####Algorithms

PUBDEV-1554: dl with offset: when offset same as response, do not get 0 mse
PUBDEV-1555: h2oR: dl with offset giving : Error in args$x_ignore : object of type 'closure' is not subsettable
PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
PUBDEV-1569: Investigate effectiveness of _binomial_double_trees (DRF) GitHub
PUBDEV-1574: Actually pass 'binomial_double_trees' argument given to R wrapper to DRF.
PUBDEV-1444: DL: h2o.saveModel cannot save metrics when a deeplearning model has a validation_frame
PUBDEV-1579: GBM test time predictions without weights seem off when training with weights GitHub
PUBDEV-1533: GLM: doubled weights should produce the same result as doubling the observations GitHub
PUBDEV-1531: GLM: it appears that observations with 0 weights are not ignored, as they should be.
GitHub: Fix a bug in PCA scoring that was handling categorical NAs inconsistently
PUBDEV-1581: Regression 3060 fails on GLRM in R tests
PUBDEV-1586: change Grid endpoints and schemas to v99 since they are still in flux
PUBDEV-1589: GLM : build model => airlinesbillion dataset => IRLSM/LBFGS => fails with array index out of bound exception
PUBDEV-1607: gbm w offset: predict seems to be wrong
PUBDEV-1600: Frame name creation fails when file name contains csv or zip (not as extension)
PUBDEV-1577: DL predictions on test set require weights if trained with weights
PUBDEV-1598: Flow: After running pca when call get Model/ jobs get: Failed to find schema for version: 3 and type: PCA
PUBDEV-1576: Test variable importances for weights for GBM/DRF/DL
PUBDEV-1517: With R, deep learning autoencoder using all columns in frame, not just those specified in x parameter
PUBDEV-1593: dl var importance:there is a .missing(NA) variable in Dl variable importnce even when data has no nas

#####Python

PUBDEV-1538: h2o.save_model fails on windoz due to path nonsense
GitHub: python leaked key check for Vecs, Chunks, and Frames
PUBDEV-1609: frame dimension mismatch between upload/import method

#####R

PUBDEV-1601: h2o.loadModel() from hdfs
PUBDEV-1611: R CMD Check failing on : The Date field is over a month old.

#####System

PUBDEV-1514: Large number of columns (~30000) on importFile (flow) is slow / unresponsive for long time
PUBDEV-841: Split frame : Flow should not show raw frames for SplitFrame dialog (water.exceptions.H2OIllegalArgumentException)
PUBDEV-1459: bug in GLM POJO: seems threshold for binary predictions is always 0
PUBDEV-1566: Cannot save model on windows since Key contains '@' (illegal character to path)
GitHub: Fixes the timezone lists.
GitHub: R CMD check fix for date
GitHub: add ec2 back into project

#####Web UI

HEXDEV-54: Flow : Import file 100k.svm => Something went wrong while displaying page

###Shannon (3.0.0.25) - 6/25/15

####Enhancements

#####API

PUBDEV-1452: branch 3.0.0.2 to REGRESSION_REST_API_3 and cherry-pick the /99/Rapids changes to it

#####Web UI

PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
PUBDEV-1546: Flow : Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
GitHub: Fix offset for DL.
GitHub: Gracefully handle 0 weight for GBM.

#####Python

PUBDEV-1547: Weights API: weights column not found in python client

#####R

GitHub: Fix R wrapper for DL for weights/offset.

#####Web UI

PUBDEV-1528: Flow model builder: the na filter does not select all ignored columns; just the first 100.

###Shannon (3.0.0.24) - 6/25/15

####New Features

#####Algorithms

GitHub: Allow validation for unsupervised models.

#####R

GitHub: Added runit GBM weights
GitHub: Updated runit_GBM_weights.R

#####Python

GitHub: add h2o.set_timezone h2o.get_timezone and h2o.list_timezones to python client and respective pyunit.
GitHub: add h2o.save_model and h2o.load_model to python client and respective pyunit

####Enhancements

#####Algorithms

GitHub: Skip rows with weight 0.
GitHub: x_ignore must be set when autoencoder is TRUE

#####System

GitHub: Fix Java bindings generator to generate code under project's location.
GitHub: Adds input parameter check to ParseSetup.

####Bug Fixes

#####Algorithms

PUBDEV-1529: dl with ae: get ava.lang.UnsupportedOperationException: Trying to predict with an unstable model.
GitHub: Bring back accidentally removed hiding of classification-related fields for unsupervised models.

#####API

PUBDEV-1456: fix REST API POJO generation for enums, + java.util.map import

###Shannon (3.0.0.23) - 6/19/15

####New Features

#####Algorithms

HEXDEV-21: Offset for GLM
HEXDEV-208: Add observation weights to GLM (was HEXDEV-4)
PUBDEV-677: Add observation weights to all metrics
PUBDEV-675: Pass a weight Vec as input to all algos
HEXDEV-6: Add observation weights to GBM
HEXDEV-7: Add observation weights to DL
HEXDEV-10: Add observation weights to DRF
PUBDEV-291: Add observation weights to GLM, GBM, DRF, DL (classification)
HEXDEV-332: Support Offsets for DL GitHub
GitHub: Use weights/offsets in GBM.

#####API

PUBDEV-61: do back-end work to allow document navigation from one Schema to another
PUBDEV-133: doing summary means calling it with each columns name, index not supported?

#####Python

GitHub: add num_iterations accessor to python client and respective pyunit
GitHub: add score_history accessor to python client and respective pyunit
GitHub: add hit ratio table accessor to python interface and respective pyunit
GitHub: add h2o.naivebayes and respective pyunits
GitHub: add h2o.prcomp and respective pyunits.
PUBDEV-681: Add user-given input weight parameters to Python
GitHub: add h2o.create_frame to python client and respective pyunit
GitHub: add h2o.interaction and respective pyunit
GitHub: add h2o.strplit to python client and respective pyunit
GitHub: add h2o.toupper and h2o.tolower to python client and respective pyunit
GitHub: add h2o.sub and h2o.gsub to python interface and respective pyunit
GitHub: add h2o.trim() to python client and respective pyunit
GitHub: add h2o.rep_len to python client and respective pyunit
GitHub: add h2o.svd to python client and respective golden pyunit
GitHub: add scree plot functionality to python client and respective pyunit
GitHub: add plotting functionality to python client and respective pyunit

#####R

GitHub: added h2o.weights and h2o.biases accessors to R client and update respective runit
GitHub: add h2o.centroid_stats to R client and respective runit
PUBDEV-680: Add user-given input weight parameters to R
GitHub: Add offset/weights to DRF/GBM R wrappers.

#####Web UI

PUBDEV-1513: Add cancelJob() routine to Flow

####Enhancements

#####Algorithms

PUBDEV-676: Use the user-given weight Vec as observation weights for all algos
GitHub: Refactor the code to let the caller compute the weighted sigma.
GitHub: Modify prior class distribution to be computed from weighted response.
GitHub: Put back the defaultThreshold that's based on training/validation metrics. Was accidentally removed together with SupervisedModel.
GitHub: Always sample to at least #class labels when doing stratified sampling.
GitHub: Cutout for NAs in GLM score0(data[],...), same as for score0(Chunk[],…)

#####R

PUBDEV-856: All h2o things in R should have an h2o.something version so it's unambiguous GitHub
GitHub: export clusterIsUp and clusterInfo commands
GitHub: update accessors in the shim
GitHub: gbm with async exec

#####System

HEXDEV-361: Wide frame handling for model builders
GitHub: Remove application plugin from assembly to speedup build process.
GitHub: add byteSize to ls
GitHub: option to launch randomForest async
GitHub: Return HDFS persist manager for URIs starting with s3n and s3a
GitHub: quote strings when writing to disk

####Bug Fixes

#####Algorithms

PUBDEV-1217: pca: when cancel the job the key remains locked
PUBDEV-1468: Error in GBM if response column is constant GitHub
PUBDEV-1476: dl with obs weights: nas in weights cause 'java.lang.AssertionError GitHub
PUBDEV-1458: pca: data with nas, v2 vs v3 slightly different results GitHub
PUBDEV-1477: dl w/obs wts: when all wts are zero, get java.lang.AssertionError GitHub
GitHub: Fix check for offset (allow offset for logistic regression).
GitHub: Gracefully handle exception when launching single-node DRF/GBM in client mode.
GitHub: Hack around the fact that hasWeights()/hasOffset() isn't available on remote nodes and that SharedTree is sent to remote nodes and its private internal classes need access to the above methods...
GitHub: Fix scoring when NAs are predicted.

#####Python

PUBDEV-1469: pyunit_citi_bike_large.py : test failing consistently on regression jobs
PUBDEV-1472: Regression job : Pyunit small tests groupie and pub_444_spaces failing consistently
PUBDEV-1372: Regression of pyunit_small, Groupby.py
PUBDEV-1386: intermittent fail in pyunit_citi_bike_small.py: -Unimplemented- failed lookup on token
PUBDEV-1471: pyunit_citi_bike_small.py : failing consistently on regression jobs
PUBDEV-1466: matplotlib.pyplot import failure on MASTER jenkins pyunit small jobs GitHub
GitHub: minor fix to python's h2o.create_frame
GitHub: update the path to jar in connection.py

#####R

PUBDEV-1475: Client mode failed tests : runit_GBM_one_node.R, runit_RF_one_node.R, runit_v_3_apply.R, runit_v_4_createfunctions.R GitHub
PUBDEV-1235: Split Frame causes AIOOBE on Chicago crimes data GitHub
PUBDEV-746: runit_demo_NOPASS_h2o_impute_R : h2o.impute() is missing. seems like we want that?
PUBDEV-582: H2O-R- does not give the full column summary
PUBDEV-1473: Regression : Runit small jobs failing on tests :
PUBDEV-741: runit_NOPASS_pub-668 R tests uses all() ...h2o says all is unimplemented
PUBDEV-1506: R: h2o.ls() needs to return data sizes
PUBDEV-1436: Intermitent runit fail : runit_GBM_ecology.R GitHub
PUBDEV-1464: R: toupper/tolower don't work GitHub GitHub
PUBDEV-1194: R: dataset is imported but can't return head of frame

#####Sparkling Water

PUBDEV-975: Download page for Sparkling Water should point to the right R-client and Python client
PUBDEV-1428: Sparkling water => Flow => Million song/KDD Cup path issues GitHub

Web UI

PUBDEV-1433: Flow UI: Change Help > FAQ link to h2o-docs/index.html#FAQ

###Shannon (3.0.0.22) - 6/13/15

####New Features

#####API

PUBDEV-633: Generate Java bindings for REST API: POJOs for the entities (schemas)

#####Python

GitHub: added h2o.anyfactor() and respective pyunit
GitHub: add h2o.scale and respective pyunit
GitHub: added levels, nlevels, setLevel and setLevels and respective pyunit...PUBDEV-1434 PUBDEV-1437 PUBDEV-1434 PUBDEV-1345 PUBDEV-1311
GitHub: add H2OFrame.as_date and pyunit addition. H2OFrame.setLevel should return a H2OFrame not a H2OVec.

####Enhancements

#####Algorithms

GitHub: Add _build_tree_one_node option to GBM

API

HEXDEV-352: Additional attributes on /Frames and /Frames/foo/summary

#####R

PUBDEV-706: Release h2o-dev to CRAN
Adding parameter parse_type to upload/import file (GitHub)

#####Python

GitHub: print out where h2o jar is looked for
GitHub:add h2o.ls and respective pyunit

#####System

PUBDEV-717: refector the duplicated code in FramesV2
PUBDEV-1281: Add horizontal pagination of frames to Flow GitHub
PUBDEV-607: Add Xmx reporting to GA
GitHub:Added support for Freezable[][][] in serialization (added addAAA to auto buffer and DocGen, DocGen will just throw H2O.fail())
GitHub: No longer set yyyy-MM-dd and dd-MMM-yy dates that precede the epoch to be NA. Negative time values are fine. This unifies these two time formats with the behavior of as.Date.
GitHub: Reduces the verbosity of parse tracing messages.
GitHub: Rename AUTO->GUESS for figuring out file type.

Web UI

HEXDEV-276: Add frame pagination
PUBDEV-1405: Flow : Decision to be made on display of number of columns for wider datasets for Parse and Frame summary
PUBDEV-1404: Usability improvements
PUBDEV-244: "View Data" display may need to be modified/shortened.

####Bug Fixes

#####Algorithms

PUBDEV-1365: GLM: Buggy when likelihood equals infinity
PUBDEV-1394: GLM: Some offsets hang
PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
PUBDEV-1403: pca: h2o-3 reporting incorrect proportion of variance and cum prop GitHub
HEXDEV-281: GLM - beta constraints with categorical variables fails with AIOOB
HEXDEV-280: GLM - gradient not within tolerance when specifying beta_constraints w/ and w/o prior values

Python

PUBDEV-1425: Class Cast Exception ValStr to ValNum GitHub
PUBDEV-1421: python client parse fail on hdfs /datasets/airlines/airlines.test.csv
PUBDEV-1153: Demo: Airlines Demo in Python GitHub
PUBDEV-1286: Python ifelse on H2OFrame never finishes
PUBDEV-1435: Run.py modify to accept phantomjs timeout command line option GitHub

R

PUBDEV-1154: Demo: Chicago Crime Demo in R
PUBDEV-1240: Merge causes IllegalArgumentException
PUBDEV-1447: R: no argument parser_type in h2o.uploadFile/h2o.importFile (GitHub)

System

PUBDEV-1423: Phantomjs : Add timeout command line option
PUBDEV-1401: Flow : Import file 15 M Rows 2.2K cols=> Parse these files => Change first column type => Unknown => Try to change other columns => Kind of hangs
PUBDEV-1406: make the ParseSetup / Parse API more efficient for high column counts GitHub

###Shannon (3.0.0.21) - 6/12/15

####New Features

Python

HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API

####Enhancements

#####Algorithms

GitHub Made intercept option public and added it to field list in parameter schema
GitHub GLM: Updated null model intercept fit.
GitHub GLM: Updated null-model constant term fitting when running with offset
GitHub glm update
GitHub DL code refactoring to reduce file sizes

#####Python

GitHub add h2o.round() and h2o.signif() and additional pyunit checks
GitHub add h2o.all() and respective pyunit checks

#####R

GitHub added intercept option top R

#####System

PUBDEV-607: Add Xmx reporting to GA GitHub

Web UI

GitHub Add horizontal pagination of /Frames to handle UI navigation of wide datasets more efficiently.
GitHub Only show the top 7 metrics for the max metrics table
GitHub Make the max metrics table entries be called max f1 etc.

####Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms

PUBDEV-1365: GLM: Buggy when likelihood equals infinity GitHub
PUBDEV-1394: GLM: Some offsets hang
PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
PUBDEV-1382: pca: giving wrong std- dev for mentioned data
PUBDEV-1383: pca: std dev numbers differ for v2 and v3 for attached data GitHub
PUBDEV-1381: GBM, RF: get an NPE when run with a validation set with no response GitHub
GitHub GLM fix - fixed fitting of null model constant term
GitHub Fix remote bug
GitHub Remove elastic averaging parameters from Flow.
PUBDEV-1398: pca: predictions on the attached data from v2 and v3 differ

Python

PUBDEV-1286: Python ifelse on H2OFrame never finishes GitHub

R

PUBDEV-761: Save model and restore model (from R)
PUBDEV-1236: h2o-r/tests/testdir_misc/runit_mergecat.R failure (client mode only)

System

PUBDEV-1402: move Rapids to /99 since it's going to be in flux for a while GitHub
GitHub Fixes an operator precedence issue, and replaces debug GA target with actual one.
GitHub Fix log download bug where all nodes were getting the same zip file.

###Shannon (3.0.0.18) - 6/9/15

####New Features

#####System

PUBDEV-1163: implement h2o1-style model save/restore in h2o-3 GitHub

#####Python

GitHub: Added --h2ojar option

####Enhancements

Python

PUBDEV-277: Make python equivalent of as.h2o() work for numpy array and pandas arrays

####Bug Fixes

#####Algorithms

PUBDEV-1371: pca: get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:198)
PUBDEV-1376: pca: predictions from h2o-3 and h2o-2 differs for attached data
PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found

R

PUBDEV-761: Save model and restore model (from R) GitHub

###Shannon (3.0.0.17) - 6/8/15

####New Features

Algorithms

HEXDEV-209:Poisson distributions for GLM

Python

PUBDEV-1270: Python Interface needs H2O Cut Function GitHub
PUBDEV-1242: Need equivalent of as.Date feature in Python GitHub
PUBDEV-1165: H2O Python needs Modulus Operations
HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API
PUBDEV-1237: environment variable to disable the strict version check in the R and Python bindings

Web UI

PUBDEV-1175: Flow: Good interactive confusion matrix for binomial
PUBDEV-1176: Flow: Good confusion matrix for multinomial

####Enhancements

#####Algorithms

GitHub: GLM weights fix: regularize by sum of weights rather than number of observations
GitHub: GLM fix: added line search (and limited number of iterations) to constant term model fitting with offset (could enter infinite loop)
GitHub: No longer warn if binomial_double_trees option is enabled for _nclass!=2
GitHub: Fix CM table to have integer entries unless there are real-valued entries
GitHub: Add extra assertion for train_samples_per_iteration
GitHub: Update model during runtime of algorithm.
GitHub: Changes to glm forloop to add offsets and add NOPASS/NOFEATURE functionality back to run.py

#####R

GitHub: month was off by one, runit test edited
GitHub: Comments to clarify the policy on dates in H2O.

#####System

HEXDEV-344: Logs should include JVM launch parameters

Web UI

PUBDEV-467: Show Frames for DL weights/biases in Flow
PUBDEV-1221: add a "I like this" style button with LinkedIn or Github (beside the Flow Assist Me button)
PUBDEV-1245: Flow: use new _exclude_fields query parameter to speed up REST API usage

####Bug Fixes

#####Algorithms

PUBDEV-1353: GLM: model with weights different in R than in H2o for attached data
PUBDEV-1358: GLM: when run with -ive weights, would be good to tell the user that -ive weights not allowed instead of throwing exception
PUBDEV-1264: GLM: reporting incorrect null deviance GitHub
PUBDEV-1362: GLM: when run with weights and offset get wrong ans
PUBDEV-1263: GLM: name ordering for the coefficients is incorrect GitHub
PUBDEV-1261: pca: wrong std dev for data with nas rest numeric cols GitHub
PUBDEV-1218: pca: progress bar not showing progress just the initial and final progress status GitHub
PUBDEV-1204: pca: from flow when try to invoke build model, displays-ERROR FETCHING INITIAL MODEL BUILDER STATE
PUBDEV-1212: pca: with enum column reporting (some junk) wrong stdev/ rotation GitHub
PUBDEV-1228: pca: no std dev getting reported for attached data
PUBDEV-1233: pca: std dev for attached data differ when run on h2o-3 and h2o-2
PUBDEV-1258: h2o.glm with offset column: get Error in .h2o.startModelJob(conn, algo, params) : Offset column 'logInsured' not found in the training frame.

R

PUBDEV-1234: h2o.setTimezone throwing an error GitHub
PUBDEV-1229: R: Most GLM accessors fail GitHub
PUBDEV-1227: R: Cannot extract an enum value using data[row,col] GitHub
HEXDEV-339: Feature engineering: log (1+x) fails GitHub
PUBDEV-1249: h2o.glm: no way to specify offset or weights from h2o R GitHub
PUBDEV-1255: create_frame: hangs with following msg in the terminal, java.lang.IllegalArgumentException: n must be positive
PUBDEV-1361: runit_hex_1841_asdate_datemanipulation.R fails intermittently GitHub
PUBDEV-1361: runit_hex_1841_asdate_datemanipulation.R fails intermittently

Sparkling Water

PUBDEV-692: Upgrade SparklingWater to Spark 1.3

#####System

PUBDEV-1288: Confusion Matrix: class java.lang.ArrayIndexOutOfBoundsException', with msg '2' java.lang.ArrayIndexOutOfBoundsException: 2 at hex.ConfusionMatrix.createConfusionMatrixHeader Github
HEXDEV-323: SVMLight Parse Bug GitHub
PUBDEV-1207: implement JSON field-filtering features: _exclude_fields
GitHub: Fix a missing field update in Job.
PUBDEV-65: Handling of strings columns in summary is broken
PUBDEV-1230: Parse: get AIOOB when parses the attached file with first two cols as enum while h2o-2 does fine
PUBDEV-1377: Get AIOOBE when parsing a file with fewer column names than columns GitHub
PUBDEV-1364: Variable importance Object

#####Web UI

PUBDEV-1198: Flow: Selecting "Cancel" for "Load Notebook" prompt clears current notebook anyway
PUBDEV-1172: Model builder takes forever to load the column names in Flow, hence cannot build any models
PUBDEV-1248: Flow GLM: from Flow the drop down with column names does not show up and hence not able to select the offset column
PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found GitHub

###Shannon (3.0.0.13) - 5/30/15

####New Features

#####Algorithms

HEXDEV-260: Add Random Forests for regression GitHub

Python

PUBDEV-1166: Converting H2OFrame into Python object
PUBDEV-1165: H2O Python needs Modulus Operations

#####R

PUBDEV-1188: Merge should handle non-numeric columns (github)
PUBDEV-1096: R: add weekdays() function in addition to month() and year()

####Enhancements

#####Algorithms

github: Updated weights handling, test.
HEXDEV-324poor GBM performance on KDD Cup 2009 competition dataset (github)
HEXDEV-326: varImp() function for DRF and GBM (github)
github: Change some of the defaults

#####API

PUBDEV-669: have the /Frames/{key}/summary API call Vec.startRollupStats

#####R/Python

PUBDEV-479: Port MissingInserter to R/Python
PUBDEV-632: Display TwoDimTable of HitRatios in R/Python
github: minor change to h2o.demo()
github: add h2o.demo() facility to python package, along with some built-in (small) data
github: remove cols param

####Bug Fixes

#####Algorithms

PUBDEV-1211: pca: descaled pca, std dev seems to be wrong for attached data github
PUBDEV-1213: pca: would be good to have the std dev numbered bec difficult to relate to the principal components (github)
PUBDEV-1201: pca: get ArrayIndexOutOfBoundsException (github)
PUBDEV-1203: pca: giving wrong std dev/rotation-labels for iris with species as enum (github)
PUBDEV-1199: DL with <1 epochs has wrong initial estimated time (github)
github: Fix missing AUC for training data in DL.
github: Add the seed back to GBM imbalanced test (was set to 0 by default before, now explicit)

#####R

PUBDEV-1189: R: h2o.hist broken for breaks that is a list of the break intervals (github)
PUBDEV-1206: Frame summary from R and Python need to use the Frame summary endpoint (github)
PUBDEV-1177: R summary() is slow when large number of columns
PUBDEV-1097: R: R should be able to take a of paths similar to how python does

###Shannon (3.0.0.11) - 5/22/15

####Enhancements

#####Algorithms

PUBDEV-1179: DRF: investigate if larger seeds giving better models
PUBDEV-1178: Add logloss/AUC/Error to GBM/DRF Logs & ScoringHistory
PUBDEV-1169: Use only 1 tree for DRF binomial (github)
PUBDEV-1170: Wrong ROC is shown for DRF (Training ROC, even though Validation is given)
PUBDEV-1162: Speed up sorting of histograms with O(N log N) instead of O(N^2)

#####System

PUBDEV-1152: Accept s3a URLs
HEXDEV-316: ImportFiles should not download files from HTTP

####Bug Fixes

#####Algorithms

HEXDEV-253: model output consistency
HEXDEV-319: DRF in h2o 3.0 is worse than in h2o 2.0 for Airline
PUBDEV-1180: DRF has wrong training metrics when validation is given

#####API

PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset

#####Python

PUBDEV-1183: Python version check should fail hard by default
PUBDEV-1185: Python binding version mismatch check should fail hard and be on by default
HEXDEV-138: Port Python tests for Deep Learning

#####R

PUBDEV-1160: R: h2o.hist doesn't support breaks argument
PUBDEV-1159: R: h2o.hist takes too long to run
PUBDEV-1150: R CMD Check: URLs not working
PUBDEV-1149: R CMD check not happy with our use of .OnAttach
PUBDEV-1174: R: h2o.hist FD implementation broken
PUBDEV-1167: R: h2o.group_by broken
HEXDEV-318: the fix to H2O startup for the host unreachable from R causes a security hole
PUBDEV-1187: FramesHandler.summary() needs to run summary on all Vecs concurrently.

#####System

PUBDEV-862: Building a model without training file -> NPE
HEXDEV-315: importFile fails: Error in fromJSON(txt, ...) : unexpected character: A
PUBDEV-1137: Parse: upload and import gives different chunk compression on the same file
PUBDEV-1054: Parse: h2o parses arff file incorrectly
PUBDEV-1181: Rapids should queue and block on the back-end to prevent overlapping calls
PUBDEV-1184: importFile fails for paths containing spaces

#####Web UI

PUBDEV-1182: Flow: when upload file fails, the control does not come back to the flow screen, and have to refresh the whole page to get it back
PUBDEV-1131: GBM crashes after calling getJobs in Flow

###Shannon (3.0.0.7) - 5/18/15

####Enhancements

API

PUBDEV-711: take a final look at all REST API parameter names and help strings
PUBDEV-757: Rename DocsV1 + DocsHandler to MetadataV1 + MetadataHandler
PUBDEV-1138: Performance improvements for big data sets => getModels
PUBDEV-1126: Performance improvements for big data sets => Get frame summary

#####System

HEXDEV-316: ImportFiles should not download files from HTTP

#####Web UI

PUBDEV-1144: Update/Fix Flow API for CreateFrame

####Bug Fixes

The following changes are to resolve incorrect software behavior:

API

PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
PUBDEV-1047: API : Get frames and Build model => takes long time to get frames
HEXDEV-149: Allow JobsV3 to return properly typed jobs, not always instances of JobV3
PUBDEV-1036: rename straggler V2 schemas to V3

R

PUBDEV-1159: R: h2o.hist takes too long to run

#####System

PUBDEV-1034: Windows 7/8/2012 Multicast Error UDP
PUBDEV-862: Building a model without training file -> NPE
HEXDEV-253: model output consistency
PUBDEV-1135: While predicting get:class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.ArrayIndexOutOfBoundsException: 5
PUBDEV-1090: POJO: Models with "." in key name (ex. pros.glm) can't access pojo endpoint
PUBDEV-1077: Getting an IcedHashMap warning from H2O startup

#####Web UI

PUBDEV-1133: getModels in Flow returns error
PUBDEV-926: Flow: When user hits build model without specifying the training frame, it would be good if Flow guides the user. It presently shows an NPE msg
PUBDEV-1131: GBM crashes after calling getJobs in Flow

###Shannon (3.0.0.2) - 5/15/15

####New Features

ModelMetrics

PUBDEV-411: ModelMetrics by model category

WebUI

PUBDEV-942: ModelMetrics by model category - Autoencoder

####Enhancements

#####Algorithms

github: GLM update: skip lambda max during lambda search
github: removed higher accuracy option
github: Rename constant col parameter
github: GLM update: added stopping criteria to lbfgs, tweaked some internal constants in ADMM
github: Add support for ignore_const_col in DL

######Python

PUBDEV-852: Binomial: show per-metric-optimal CM and per-threshold CM in Python
github: add filterNACols to python
github: h2o.delete replaced with h2o.removeFrameShallow
github: Add distribution summary to Python

#####R

github: add filterNACols to R
github: explicitly set cols=TRUE for R style str on frames
github: enable faster str, bulk nlevels, bulk levels, bulk is.factor
github: Add optional blocking parameter to h2o.uploadFile

System

PUBDEV-672 HTML version of the REST API docs should be available on the website
PUBDEV-827: class GenModel duplicates part of code of Model

#####Web UI

HEXDEV-181 Flow: Handle deep features prediction input and output
github: removed use_all_factor_levels from glm flows

####Bug Fixes

#####Algorithms

HEXDEV-302: AIOOBE during Prediction with DL github
github: glm fix: don't force in null model for lambda search with user given list of lambdas
github: Fix domain in glm scoring output for binomial
github: GLM Fix - fix degrees of freedom when running without intercept (+/-1)
github: GLM fix: make valid data info be clone of train data info (needs exactly the same categorical offsets, ignore unseen levels)
github: Fix glm scoring, fill in default domain {0,1} for binary columns when scoring

#####R

PUBDEV-1116: R: Parse that works from flow doesn't work from R using as.h2o
PUBDEV-798: R: String Munging Functions Missing
PUBDEV-584: R: hist() doesn't currently work for H2O objects
PUBDEV-820: H2oR: model objects should return the CM when run classification like h2o1
PUBDEV-1113: Remove Keys : Parse => Remove => doesn't complete
PUBDEV-1102: R: h2o.rbind fails to join two dataset together
PUBDEV-899: R: all doesn't work
PUBDEV-555: H2O-R: str does not work
PUBDEV-1110: H2OR: while printing a gbm model object, get invalid format '%d'; use format %f, %e, %g or %a for numeric objects
PUBDEV-903: R: Errors from some rapids calls seem to fail to return an error
HEXDEV-311: Performance bug from R with Expect: 100-continue
PUBDEV-1030: h2o.performance: ignores the user specified threshold
PUBDEV-1071: R: regression models don't show in print statement r2 but it exists in the model object
PUBDEV-1072: R: missing accessors for glm specific fields
PUBDEV-1032: After running some R and py demos when invoke a build model from flow get- rollup stats problem vec deleted error
PUBDEV-1069: R: missing implementation for h2o.r2
PUBDEV-1064: Passing sep="," to h2o.importFile() fails with '400 Bad Request'
PUBDEV-1092: Get NPE while predicting

#####System

PUBDEV-1091: S3 gzip parse failure
PUBDEV-1081: Probably want to cleanly disable multicast (not retry) and print suggestion message, if multicast not supported on picked multicast network interface
PUBDEV-1112: User has no way to specify whether to drop constant columns
PUBDEV-1109: Change all extdata imports to uploadFile
PUBDEV-1104: .gz file parse exception from local filesystem

Web UI

PUBDEV-1134: getPredictions in Flow returns error
PUBDEV-1020: Flow : Drop NA Cols enable => Should automatically populate the ignored columns
PUBDEV-1041: Flow GLM: formatting needed for the model parameter listing in the model object github
PUBDEV-1108: Flow: When predict on data with no response get :Error processing POST /3/Predictions/models/gbm-a179db76-ba96-420f-a643-0e166aea3af3/frames/subset_1 'undefined' is not an object (evaluating 'prediction.model')

##H2O-Dev

###Shackleford (0.2.3.6) - 5/8/15

####New Features

#####Python

Set up POJO download for Python client (PUBDEV-908) (github)

#####Sparkling Water

Publish h2o-scala and h2o-app latest version to maven central (PUBDEV-443)

####Enhancements

#####Algorithms

Use AUC's default threshold for label-making for binomial classifiers predict() (PUBDEV-1063) (github)
GLM update (github)
Cleanup AUC2, make incremental version (github)
Name change: override_with_best_model -> overwrite_with_best_model (github)
Couple of GLM updates (github)
Disable _replicate_training_data for data that's larger than 10GB (github)
Added replicate_training_data param for DL (github)
Change a few kmeans output parameters so no longer dividing by nrows or num_clusters (github)
GLMValidation Updated auc computation (github)
Do not delete model metrics at end of GBM/DRF (github)

#####API

Clean REST api for Parse (PUBDEV-993)
Removes is_valid, invalid_lines, and domains from REST api (github)
Annotate domains output field as expert level (github)

#####Python

Implement h2o.interaction() (PUBDEV-854) (github)
nice tables in ipython! (github)
added deeplearning weights and biases accessors and respective pyunit. (github)

#####R

Cleaner client POJO download for R (PUBDEV-907)
Implement h2o.interaction() (PUBDEV-854) (github)
R: h2o.impute missing (PUBDEV-796)
validation_frame is passed through to h2o (github)
Adding GBM accessor function runits (github)
Adding changes to h2o.hit_ratio_table to be like other accessors (i.e., no train) (github)
add h2o.getPOJO to R, fix impute ast build in python (github)

#####System

Change NA strings to an array in ParseSetup (PUBDEV-995)
Document way of passing S3 credentials for S3N (PUBDEV-947)
Add H2O-dev doc on docs.h2o.ai via a new structure (proposed below) (PUBDEV-355)
Rapids Ref Doc (PUBDEV-667)
Show Timestamp and Duration for all model scoring histories (PUBDEV-1018) (github)
Logs slow reads, mainly meant for noting slow S3 reads (github)
Make prediction frame column names non-integer (github)
Add String[] factor_columns instead of int[] factors (github)
change the runtime exception to a Log.info() if interface doesn't support multicast (github)
More robust way to copy Flow files to web root per Prithvi (github)
Switches na_string from a single value per column to an array per column (github)

#####Web UI

Model output improvements (HEXDEV-150)

####Bug Fixes

#####Algorithms

H2O cloud shuts down with some H2O.fail error, while building some kmeans clusters (PUBDEV-1051) (github)
GLM:beta constraint does not seem to be working (PUBDEV-1083)
GBM - random attack bug (probably because max_after_balance_size is really small) (PUBDEV-1061) (github)
GLM: LBFGS objval java lang assertion error (PUBDEV-1042) (github)
PCA Cholesky NPE (PUBDEV-921)
GBM: H2o returns just 5525 trees, when ask for a much larger number of trees (PUBDEV-860)
CM returned by AUC2 doesn't agree with manual-made labels from F1-optimal threshold (HEXDEV-263)
AUC: h2o reporting wrong auc on a modified covtype data (PUBDEV-891)
GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
KMeans metrics incomplete (PUBDEV-1029)
GLM: Java Assertion Error (PUBDEV-1025)
Random forest bug (PUBDEV-1015)
A particular random forest model has an empty (training) metric json max_criteria_and_metric_scores (PUBDEV-1001)
PCA results exhibit numerical inaccuracies compared to R (PUBDEV-550)
DRF: reporting wrong depth for attached dataset (PUBDEV-1006)
added missing "names" column name to beta constraints processing (github)
Fix balance_classes probability correction consistency between H2O and POJO (github)
Fix in GLM scoring - check actual for NaNs as well (github)

#####Python

Cannot import_file path=url python interface (PUBDEV-1059)
head()/tail() should show labels, rather than number encoding, for enum columns (PUBDEV-1017)
h2o.py: for binary response printing transpose and hence wrong cm (PUBDEV-1013)

#####R

Broken Summary in R (PUBDEV-1073
h2oR summary: displaying no labels in summary (PUBDEV-1008)
R/Python impute bugs (PUBDEV-1055)
R: h2o.varimp doubles the print statement (PUBDEV-1068)
R: h2o.varimp returns NULL when model has no variable importance (PUBDEV-1078)
h2oR: h2o.confusionMatrix(my_gbm, validation=F) should not show a null (PUBDEV-849)
h2o.impute doesn't impute (PUBDEV-1024)
R: as.h2o cutting entries when trying to import data.frame into H2O (HEXDEV-293)
The default names are too long, for an R-datafile parsed to H2O, and needs to be changed (PUBDEV-976)
H2o.confusionMatrix: when invoked with threshold gives error (PUBDEV-1010)
removing train and adding error messages for valid = TRUE when there's not validation metrics (github)

#####System

Download logs is returning the same log file bundle for every node (PUBDEV-1056)
ParseSetup is useless and misleading for SVMLight (PUBDEV-994)
Fixes bug that was short circuiting the setting of column names (github)

#####Web UI

Flow: Predict should not show mse confusion matrix etc (PUBDEV-987) (github)
Flow: Raw frames left out after importing files from directory (PUBDEV-1046)

###Shackleford (0.2.3.5) - 5/1/15

####New Features

#####API

Need a /Log REST API to log client-side errors to H2O's log (HEXDEV-291)

#####Python

add impute to python interface (github)

#####System

Job admission control (PUBDEV-536) (github)
Get Flow Exceptions/Stack Traces in H2O Logs (PUBDEV-920)

####Enhancements

#####Algorithms

GLM: Name to be changed from normalized to standardized in output to be consistent between input/output (PUBDEV-954)
GLM: It would be really useful if the coefficient magnitudes are reported in descending order (PUBDEV-923)
PUBDEV-536: Limit DL models to 100M parameters (github)
PUBDEV-536: Add accurate memory-based admission control for GBM/DRF (github)
relax the tolerance a little more...(github)
Tree depth correction (github)
Comment out duration_in_ms for now, as it's always left at 0 (github)
Updated min mem computation for glm (github)
GLM update: added lambda search info to scoring history (github)

#####Python

python .show() on model and metric objects should match R/Flow as much as possible (HEXDEV-289)
GLM model output, details from Python (HEXDEV-95)
GBM model output, details from Python (HEXDEV-102)
Run GBM from Python (HEXDEV-99)
map domain to result from /Frames if needed (github)
added confusion matrix to metric output (github)
update metrics_base_confusion_matrices() (github)
fetch out string_data if type is string (github)

#####R

GBM model output, details from R (HEXDEV-101)
Run GBM from R (HEXDEV-98)
check if it's a frame then check NA (github)

#####System

Report MTU to logs (PUBDEV-614) (github)
Make parameter changes Log.info() instead of Log.warn() (github)

#####Web UI

Flow: Confusion matrix: good to have consistency in the column and row name (letter) case (PUBDEV-971)
Run GBM Multinomial from Flow (HEXDEV-111)
Run GBM Regression from Flow (HEXDEV-112)
Sort model types in alphabetical order in Flow (PUBDEV-1011)

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

GLM: Model output display issues (PUBDEV-956)
h2o.glm: ignores validation set (PUBDEV-958)
DRF: reports wrong number of leaves in a summary (PUBDEV-930)
h2o.glm: summary of a prediction frame gives na's as labels (PUBDEV-959)
GBM: reports wrong max depth for a binary model on german data (PUBDEV-839)
GLM: Confusion matrix missing in R for binomial models (PUBDEV-950) (github)
GLM: On airlines(40g) get ArrayIndexOutOfBoundsException (PUBDEV-967)
GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
Domains returned by GLM for binomial classification problem are integers, but should be mapped to their label (PUBDEV-999)
GLM: Validation on non training data gives NaN Res Deviance and AIC (PUBDEV-1005)
Confusion matrix has nan's in it (PUBDEV-1000)
glm fix: pass model_id from R (was being dropped) (github)

#####Python

H2OPy: warns about version mismatch even when installed the latest from master (PUBDEV-980)
Columns of type enum lose string label in Python H2OFrame.show() (PUBDEV-965)
Bug in H2OFrame.show() (HEXDEV-295) (github)

#####R

h2o.confusionMatrix for binary response gives not-found thresholds (PUBDEV-957)
GLM: model_id param is ignored in R (PUBDEV-1007)
h2o.confusionmatrix: mixing cases(letter) for categorical labels while printing multinomial cm (PUBDEV-996)
fix the dupe thresholds error (github)
extra arg in impute example (github)
fix missing param data (github)

#####System

Builds : Failing intermittently due to java.lang.StackOverflowError (PUBDEV-972)
Get H2O cloud hang with NPE and roll up stats problem, when click on build model glm from flow, on laptop after running a few python demos and R scripts (PUBDEV-963)

#####Web UI

Flow :=> Airlines dataset => Build models glm/gbm/dl => water.DException$DistributedException: from /172.16.2.183:54321; by class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.NullPointerException: null (PUBDEV-603)
Flow => Preview Pojo => collapse not working (PUBDEV-977)
Flow => Any algorithm => Select response => Select Add all for ignored columns => Try to unselect some from ignored columns => Build => Response column IsDepDelayed not found in frame: allyears_1987_2013.hex. (PUBDEV-978)
Flow => ROC curve select something on graph => Table is displayed for selection => Collapse ROC curve => Doesn't collapse table, collapses only graph (PUBDEV-1003)

###Severi (0.2.2.16) - 4/29/15

####New Features

#####Python

Release h2o-dev to PyPi (PUBDEV-762)
Python Documentation (PUBDEV-901)
Python docs Wrap Up (PUBDEV-966)
add getters for res/null dev, fix kmeans,dl getters (github)

####Enhancements

#####Algorithms

Use partial-sum version of mat-vec for DL POJO (PUBDEV-936)
Always store weights and biases for DLTest Junit (github)
Show the DL model size in the model summary (github)
Remove assertion in hot loop (github)
Rename ADMM to IRLSM (github)
Added no intercept option to glm (github)
Code cleanup. Moved ModelMetricsPCAV3 out of H2O-algos (github)
Improve DL model checkpoint logic (github)
Updated glm output (github)
Renamed normalized coefficients to standardized coefficients in glm output (github)
Use proper tie breaking for NB (github)
Add check that DL parameters aren't modified by model training (github)
Reduce tolerances (github)
If no observations of a response leveland prediction is numeric, assume it is drawn from standard normal distribution (mean 0, standard deviation 1). Add validation test with split frame for naive Bayes (github)

#####Python

replaced H2OFrame.send_frame() calls with cbind Exprs so that lazy evaluation is enforced (github)
change default xmx/s behavior of h2o.init() (github)
better handling of single row return and print (github)

#####R

Added interpolation to quantile to match R type 7 (github)
Removed and tidied if's in quantile.H2OFrame since it now uses match.arg (github)
Connected validation dataset to glm in R (github)
Removing h2o.aic from seealso link (doesn't exist) and updating documentation (github)

#####System

Add number of rows (per node) to ChunkSummary (PUBDEV-938) (github)
allow nrow as alias for count in groupby (github)
Only launches task to fill in SVM zeros if the file is SVM (github)
Adds more log traces to track progress of post-ingest actions (github)
Adds svm as a file extension to the hex name cleanup (github)

#####Web UI

Flow: Inspect data => Round decimal points to 1 to be consistent with h2o1 (PUBDEV-453)
Setup POJO download method for Flow (PUBDEV-909)
Pretty-print POJO preview in flow (PUBDEV-940)
Flow: It would be good if 'get predictions' also shows the data (PUBDEV-883)
GBM model output, details in Flow (HEXDEV-103)
Display a linked data table for each visualization in Flow (PUBDEV-318)
Run GBM binomial from Flow (needs proper CM) (PUBDEV-943)

####Bug Fixes

#####Algorithms

GLM: results from model and prediction on the same dataset do not match (PUBDEV-922)
GLM: when select AUTO as solver, for prostate, glm gives all zero coefficients (PUBDEV-916)
Large (DL) models cause oversize issues during serialization (PUBDEV-941)
Fixed name change for ADMM (github)

#####API

Fix schema warning on startup (PUBDEV-946) (github)

#####Python

H2OVec.row_select(H2OVec) fails on case where only 1 row is selected (PUBDEV-948)
fix pyunit (github)

#####R

R: Parse of zip file fails, Summary fails on citibike data (PUBDEV-835)
h2o. performance reports a different Null Deviance than the model object for the same dataset (PUBDEV-816)
h2o.glm: no example on h2o.glm help page (PUBDEV-962)
H2O R: Confusion matrices from R still confused (PUBDEV-904) (github)
R: h2o.confusionMatrix("H2OModel", ...) extra parameters not working (PUBDEV-953) (github)
h2o.confusionMatrix for binomial gives not-found thresholds on S3 -airlines 43g (PUBDEV-957)
H2O summary quartiles outside tolerance of (max-min)/1000 (PUBDEV-671)
fix space headers issue from R (was not url-encoding the column strings) (github)
R CMD fixes (github)
Fixed broken R interface - make validation_frame non-mandatory (github)

#####Sparkling Water

Sparkling water : #UDP-Recv ERRR: UDP Receiver error on port 54322java.lang.ArrayIndexOutOfBoundsException:(PUBDEV-311)

#####System

Mapr 3.1.1 : Memory is not being allocated for what is asked for instead the default is what cluster gets (PUBDEV-937)
GLM: AIOOBwith msg '-14' at water.RPC$2.compute2(RPC.java:593) (PUBDEV-917)
h2o.glm: model summary listing same info twice (PUBDEV-915)
Parse: Detect and reject UTF-16 encoded files (HEXDEV-285)
DataInfo Row categorical encoding AIOOBE (HEXDEV-283)
Fix POJO Preview exception (github)
Fix NPE in ChunkSummary (github)
fix global name collision (github)

###Severi (0.2.2.15) - 4/25/15

####New Features

#####Python

added min, max, sum, median for H2OVecs and respective pyunit (github)
added min(), max(), and sum() functionality on H2OFrames and respective pyunits (github)

#####Web UI

View POJO in Flow (PUBDEV-781)
help > about page or add version on main page for easy bug reporting. (PUBDEV-804)
POJO generation: GLM (PUBDEV-712) (github)
GLM model output, details in Flow (HEXDEV-96)

####Enhancements

#####Algorithms

K means output clean up (HEXDEV-187)
Add FNR/TNR/FPR/TPR to threshold tables, remove recall, specificity (github)
Add accessor for variable importances for DL (github)
Relax CM error tolerance for F1-optimal threshold now that AUC2 doesn't necessarily create consistent thresholds with its own CMs. (github)
Added scoring history to glm (github)
Added model summary to glm (github)
Add flag to support reading data from S3N (github)
Added degrees of freedom to GLM metrics schemas (github)
Allow DL scoring_history to be unlimited in length (github)
add plotting for binomial models (github)
Ignore certain parameters that are not applicable (class balancing, max CM size, etc.) (github)
Updated glm scoring, fill training/validation metrics in model output (github)
Rename gbm loss parameter to distribution (github)
Fix GBM naming: loss -> distribution (github)
GLM LBFGS update (github)
na.rm for quantile is default behavior (github)
GLM update: enabled max_predictors in REST, updated lbfgs (github)
Remove keep_cross_validation_splits for now from DL (github)
Get rid of sigma in the model metrics, instead show r2 (github)
Don't show score_every_iteration for DL (github)
Don't print too large confusion matrices in Tree models (github)

#####API

publish h2o-model.jar via REST API (PUBDEV-779)
move all schemas and endpoints to v3 (PUBDEV-471)
clean up routes (remove AddToNavbar, fix /Quantiles, etc) (PUBDEV-618) (github)
More data in chunk_homes call. Add num_chunks_per_vec. Add num_vec. (github)
Added chunk_homes route for frames (github)
Update to use /3 routes (github)

#####Python

Python client should check that version number == server version number (PUBDEV-799)
Add asfactor for month (github)
in Expr.show() only show 10 or less rows. remove locate from runit test because full path used (github)
change nulls to () (github)
sigma is no longer part of ModelMetricsRegressionV3 (github)

#####R

Fix integer -> int in R (github)
add autoencoder show method (github)
accessor is $ not @ (github)
add hit_ratio_table and varimp calls to R (github)
add h2o.predict as alternative (github)
update model output in R (github)

#####System

Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)
Rapids: require a (put "key" %frame) (PUBDEV-868)
Need pojo base model jar file embedded in h2o-dev via build process (PUBDEV-780) (github)
Make .json the default (PUBDEV-619) (github)
Rename class for clarification (github)
Classifies all NA columns as numeric. Also improves preview sampling accuracy by trimming partial lines at end of chunk. (github)
Implements sampling of files within the ParseSetup preview. This prevents poor column type guesses from only sampling the beginning of a file. (github).
Rename fields drop_na20_col (github)
allow for many deletes as final statements in a block (github)
rename initF -> init_f, dropNA20Cols -> drop_na20_cols (github)
Removed tweedie param (github)
thresholds -> threshold (github)
JSON of TwoDimTable with all null values in the first column (no row headers) now doesn't have an empty column for of "" or nulls. (github)
move H2O_Load, fix all the timezone functions (github)
Add extra verbose printout in case Frames don't match identically (github)
allow delayed column lookup (github)
add mixed type list (github)
Added WaterMeterIo to count persist info (github)
Remove special setChunkSize code in HDFS and NFS file vec (github)
add check for Frame on string parse (github)
Disable Memory Cleaner (github)
Handle '<' chars in Keys when swapping (github)
allow for colnames in slicing (github)
Adjusts parse type detection. If column is all one string value, declare it an enum (github)

#####Web UI

nice algo names in the Flow dropdown (full word names) (PUBDEV-707)
Compute and Display Hit Ratios (PUBDEV-630)
Limit POJO preview to 1000 lines (github)

####Bug Fixes

#####Algorithms

GLM: lasso i.e alpha =1 seems to be giving wrong answers (PUBDEV-769)
AUC: h2o reports .5 auc when actual auc is 1 (PUBDEV-879)
h2o.glm: No output displayed for the model (PUBDEV-858)
h2o.glm model object output needs a fix (PUBDEV-815)
h2o.glm model object says : fill me in GLMModelOutputV2; I think I'm redundant [1] FALSE (PUBDEV-765)
GLM : Build GLM Model => Java Assertion error (PUBDEV-686)
GLM :=> Progress shows -100% (PUBDEV-861)
GBM: Negative sign missing in initF value for ad dataset (PUBDEV-880)
K-Means takes a validation set but doesn't use it (PUBDEV-826)
Absolute_MCC is NaN (sometimes) (PUBDEV-848) (github)
GBM: A proper error msg should be thrown when the user sets the max depth =0 (PUBDEV-838) (github)
DRF Regression Assertion Error (PUBDEV-824)
h2o.randomForest: if h2o is not returning the mse for the 0th tree then it should not be reported in the model object (PUBDEV-811)
GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.tree.gbm.GBM$GBMDriver$GammaPass.map (PUBDEV-693)
GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.ModelMetricsMultinomial$MetricBuildMultinomial.perRow (HEXDEV-248)
GBM get java.lang.AssertionError: Coldata 2199.0 out of range C17:5086.0-19733.0 step=57.214844 nbins=256 isInt=1 (HEXDEV-241)
GLM: glmnet objective function better than h2o.glm (PUBDEV-749)
GLM: get AIOOB:-36 at hex.glm.GLMTask$GLMIterationTask.postGlobal(GLMTask.java:733) (PUBDEV-894) (github)
Fixed glm behavior in case no rows are left after filtering out NAs (github)
Fix memory leak in validation scoring in K-Means (github)

#####API

API unification: DataFrame should be able to accept URI referencing file on local filesystem (PUBDEV-709) (github)

#####Python

Python: describe returning all zeros (PUBDEV-875)
python/R & merge() (PUBDEV-834)
python Expr min, max, median, sum bug (PUBDEV-845) (github)

#####R

(R and Python) clients must not pass response to DL AutoEncoder model builder (PUBDEV-897) (github)
h2o.varimp, h2o.hit_ratio_table missing in R (PUBDEV-842)
GLM: No help for h2o.glm from R (PUBDEV-732)
h2o.confusionMatrix not working for binary response (PUBDEV-782) (github)
h2o.splitframe complains about destination keys (PUBDEV-783)
h2o.assign does not work (PUBDEV-784) (github)
H2oR: should display only first few entries of the variable importance in model object (PUBDEV-850)
R: h2o.confusion matrix needs formatting (PUBDEV-764)
R: h2o.confusionMatrix => No Confusion Matrices for H2ORegressionMetrics (PUBDEV-710)
h2o.deeplearning: model object output needs a fix (PUBDEV-821)
h2o.varimp, h2o.hit_ratio_table missing in R (PUBDEV-842)
force gc more frequently (github)

#####System

MapR FS loads are too slow (PUBDEV-927)
ensure that HDFS works from Windows (PUBDEV-812)
Summary: on a time column throws,'null' is not an object (evaluating 'column.domain[level.index]') in Flow (PUBDEV-867)
Parse: An enum column gets parsed as int for the attached file (PUBDEV-606)
Parse => 40Mx1_uniques => class java.lang.RuntimeException (PUBDEV-729)
if there are fewer than 5 unique values in a dataset column, mins/maxs reports e+308 values (PUBDEV-150) (github)
Sparkling water - DataFrame[T_UUID] to SchemaRDD[StringType] (PUDEV-771)
Sparkling water - DataFrame[T_NUM(Long)] to SchemaRDD[LongType] (PUBDEV-767)
Sparkling water - DataFrame[T_ENUM] to SchemaRDD[StringType] (PUBDEV-766)
Inconsistency in row and col slicing (HEXDEV-265) (github)
rep_len expects literal length only (HEXDEV-268) (github)
cbind and = don't work within a single rapids block (HEXDEV-237)
Rapids response for c(value) does not have frame key (HEXDEV-252)
S3 parse takes forever (PUBDEV-876)
Parse => Enum unification fails in multi-node parse (PUBDEV-718) (github)
All nodes are not getting updated with latest status of each other nodes info (PUBDEV-768)
Cluster creation is sometimes rejecting new nodes (post jenkins-master-1128+) (PUBDEV-807)
Parse => Multiple files 1 zip/ 1 csv gives Array index out of bounds (PUBDEV-840)
Parse => failed for X5MRows6KCols ==> OOM => Cluster dies (PUBDEV-836)
/frame/foo pagination weirded out (HEXDEV-277) (github)
Removed code that flipped enums to strings (github)

#####Web UI

Flow: It would be really useful to have the mse plots back in GBM (PUBDEV-889)
State change in Flow is not fully validated (PUBDEV-919)
Flows : Not able to load saved flows from hdfs (PUBDEV-872)
Save Function in Flow crashes (PUBDEV-791) (github)
Flow: should throw a proper error msg when user supplied response have more categories than algo can handle (PUBDEV-866)
Flow display of a summary of a column with all missing values fails. (HEXDEV-230)
Split frame UI improvements (HEXDEV-275)
Flow : Decimal point precisions to be consistent to 4 as in h2o1 (PUBDEV-844)
Flow: Prediction frame is outputing junk info (PUBDEV-825)
EC2 => Cluster of 16 nodes => Water Meter => shows blank page (PUBDEV-831)
Flow: Predict - "undefined is not an object (evaluating prediction.thresholds_and_metric_scores.name) (PUBDEV-559)
Flow: inspect getModel for PCA returns error (PUBDEV-610)
Flow, RF: Can't get Predict results; "undefined is not an object (evaluating prediction.confusion_matrices.length)" (PUBDEV-695)
Flow, GBM: getModel is broken -Error processing GET /3/Models.json/gbm-b1641e2dc3-4bad-9f69-a5f4b67051ba null is not an object (evaluating source.length) (PUBDEV-800)

###Severi (0.2.2.1) - 4/10/15

####New Features

#####R

Implement /3/Frames/<my_frame>/summary (PUBDEV-6) (github)
add allparameters slot to allow default values to be shown (github)
add log loss accessor (github)

####Enhancements

#####Algorithms

POJO generation: GBM (PUBDEV-713)
POJO generation: DRF (PUBDEV-714)
Compute and Display Hit Ratios (PUBDEV-630) (github)
Add DL POJO scoring (PUBDEV-585)
Allow validation dataset for AutoEncoder (PUDEV-581)
PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
increase tolerance to 2e-3 (was 1e-3 ..failed with 0.001647 relative difference (github)
change tolerance to 1e-3 (github)
Add option to export weights and biases to REST API / Flow. (github)
Add scree plot for H2O PCA models and fix Runit test. (github)
Remove quantiles from the model builders list. (github)
GLM update: added row filtering argument to line search task, fixed issues with dfork/asyncExec (github)
Updated rho-setting in GLM. (github)
No threshold 0.5; use the default (max F1) instead (github)
GLM update: updated initilization, NA row filtering, default lambda is now empty, will be picked based on the fraction of lambda_max. (github)
Updated ADMM solver. (github)
Added makeGLMModel call. (github)
Start with classification error NaN at t=0 for DL, not with 1. (github)
Relax DL POJO relative tolerance to 1e-2. (github)
Override nfeatures() method in DLModelOutput. (github)
Renaming of fields in GLM (github)
GLM: Take out Balance Classes (PUBDEV-795)

#####API

schema metadata for Map fields should include the key and value types (PUBDEV-753) (github)
schema metadata should include the superclass (PUBDEV-754)
rest api naming convention: n_folds vs ntrees (PUBDEV-737)
schema metadata for Map fields should include the key and value types (PUBDEV-753)
Create REST Endpoint for exposing .java pojo models (PUBDEV-778)

#####Python

Run GLM from Python (including LBFGS) (HEXDEV-92)
added H2OFrame show(), as_list(), and slicing pyunits (github)
changed solver parameter to "L_BFGS" (github)
added multidimensional slicing of H2OFrames and Exprs. (github)
add h2o.groupby to python interface (github)
added H2OModel.confusionMatrix() to return confusion matrix of a prediction (github)

#####R

PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)
R: Cannot create new columns through R (PUBDEV-571)
H2O-R: it would be more useful if h2o.confusion matrix reports the actual class labels instead of [,1] and [,2] (PUBDEV-553)
Support both multinomial and binomial CM (github)

#####System

Flow: Standardize max_iters/max_iterations parameters (PUBDEV-447) (github)
Add ERROR logging level for too-many-retries case (PUBDEV-146) (github)
Simplify checking of cluster health. Just report the status immediately. (github)
reduce timeout (github)
strings can have ' or " beginning (github)
Throw a validation error in flow if any training data cols are non-numeric (github)
Add getHdfsHomeDirectory(). (github)
Added --verbose. (github)

#####Web UI

PUBDEV-707: nice algo names in the Flow dropdown (full word names) (github)
Unbreak Flow's ConfusionMatrix display. (github)
POJO generation: DL (PUBDEV-715)

####Bug Fixes

#####Algorithms

GLM : Build GLM model with nfolds brings down the cloud => FATAL: unimplemented (PUBDEV-731) (github)
DL : Build DL Model => FATAL: unimplemented: n_folds >= 2 is not (yet) implemented => SHUTSDOWN CLOUD (PUBDEV-727) (github)
GBM => Build GBM model => No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-723)
GBM: When run with loss = auto with a numeric column get- error :No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-708) (github)
gbm: does not complain when min_row >dataset size (PUBDEV-694) (github)
GLM: reports wrong residual degrees of freedom (PUBDEV-668)
H2O dev reports less accurate aucs than H2O (PUBDEV-602)
GLM : Build GLM model fails => ArrayIndexOutOfBoundsException (PUBDEV-601)
divide by zero in modelmetrics for deep learning (PUBDEV-568)
GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)
GLM : Build Model fails with Array Index Out of Bound exception (PUBDEV-454) (github)
Custom Functions don't work in apply() in R (PUBDEV-436)
GLM failure: got NaNs and/or Infs in beta on airlines (PUBDEV-362)
MetricBuilderMultinomial.perRow AssertionError while running GBM (HEXDEV-240)
Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226) (github)
AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
glm pyunit intermittent failure (HEXDEV-199)
Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
get rid of nfolds= param since it's not supported in GLM yet (github)
Fixed degrees of freedom (off by 1) in glm, added test. (github)
GLM fix: fix filtering of rows with NAs and fix in sparse handling. (github)
Fix GLM job fail path to call Job.fail(). (github)
Full AUC computation, bug fixes (github)
Fix ADMM for upper/lower bounds. (updated rho settings + update u-vector in ADMM for intercept) (github)
Few glm fixes (github)
DL : KDD Algebra data set => Build DL model => ArrayIndexOutOfBoundsException (PUBDEV-696)
GBm: Dev vs H2O for depth 5, minrow=10, on prostate, give different trees (PUBDEV-759)
GBM param min_rows doesn't throw exception for negative values (PUBDEV-697)
GBM : Build GBM Model => Too many levels in response column! (java.lang.IllegalArgumentException) => Should display proper error message (PUBDEV-698)
GBM:Got exception 'class java.lang.AssertionError', with msg 'Something is wrong with GBM trees since returned prediction is Infinity (PUBDEV-722)

#####API

Cannot adapt numeric response to factors made from numbers (PUBDEV-620)
not specifying response_column gets NPE (deep learning build_model()) I think other algos might have same thing (PUBDEV-131)
NPE response has null msg, exception_msg and dev_msg (HEXDEV-225)
Flow :=> Save Flow => On Mac and Windows 8.1 => NodePersistentStorage failure while attempting to overwrite (?) a flow (HEXDEV-202) (github)
the can_build field in ModelBuilderSchema needs values[] to be set (PUBDEV-755)
value field in the field metadata isn't getting serialized as its native type (PUBDEV-756)

#####Python

python api asfactor() on -1/1 column issue (HEXDEV-203)

#####R

Rapids: Operations %/% and %% returns Illegal Argument Exception in R (PUBDEV-736)
quantile: H2oR displays wrong quantile values when call the default quantile without specifying the probs (PUBDEV-689)(github)
as.factor: If a user reruns as.factor on an already factor column, h2o should not show an exception (PUBDEV-622)
as.factor works only on positive integers (PUBDEV-617) (github)
H2O-R: model detail lists three mses, the first MSE slot does not contain any info about the model and hence, should be removed from the model details (PUBDEV-605) (github)
H2O-R: Strings: While slicing get Error From H2O: water.DException$DistributedException (PUBDEV-592)
R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
R: as.Date not functional with H2O objects (PUBDEV-583) (github)
R: some apply functions don't work on H2OFrame objects (PUBDEV-579) (github)
h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
R: slicing issues (PUBDEV-573)
R: length and is.factor don't work in h2o.ddply (PUBDEV-572) (github)
R: apply(hex, c(1,2), ...) doesn't properly raise an error (PUBDEV-570) (github)
R: Slicing negative indices to negative indices fails (PUBDEV-569) (github)
h2o.ddply: doesn't accept anonymous functions (PUBDEV-567) (github)
ifelse() cannot return H2OFrames in R (PUBDEV-543)
as.h2o loses track of headers (PUBDEV-541)
H2O-R not showing meaningful error msg (PUBDEV-502)
H2O.fail() had better fail (PUBDEV-470) (github)
fix issue in toEnum (github)
fix colnames and new col creation (github)
R: h2o.init() is posting warning messages of an unhealthy cluster when the cluster is fine. (PUBDEV-734)
h2o.split frame is failing (PUBDEV-560)

#####System

key type failure should fail the request, not the cloud (PUBDEV-739) (github)
Parse => Import Medicare supplier file => Parse = > Illegal argument for field: column_names of schema: ParseV2: string and key arrays' values must be quoted, but the client sent: " (PUBDEV-719)
Overwriting a constant vector with strings fails (PUBDEV-702)
H2O - gets stuck while calculating quantile,no error msg, just keeps running a job that normally takes less than a sec (PUBDEV-685)
Summary and quantile on a column with all missing values should not throw an exception (PUBDEV-673) (github)
View Logs => class java.lang.RuntimeException: java.lang.IllegalArgumentException: File /home2/hdp/yarn/usercache/neeraja/appcache/application_1427144101512_0039/h2ologs/h2o_172.16.2.185_54321-3-info.log does not exist (PUBDEV-600)
Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
Parse: Numbers completely parsed wrong (PUBDEV-574)
Flow: converting a column to enum while parsing does not work (PUBDEV-566)
Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)
Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
The quote stripper for column names should report when the stripped chars are not the expected quotes (PUBDEV-424)
import directory with large files,then Frames..really slow and disk grinds. Files are unparsed. Shouldn't be grinding (PUBDEV-98)
NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
h2o.exec won't be supported (github)
fixed import issue (github)
fixed init param (github)
fix repeat as.factor NPE (github)
startH2O set to False in init (github)
hang on glm job removal (PUBDEV-726)
Flow - changed column types need to be reflected in parsed data (HEXDEV-189)
water.DException$DistributedException while running kmeans in multinode cluster (PUBDEV-691)
Frame inspection prior to file parsing, corrupts parsing (PUBDEV-425)

#####Web UI

Flow, DL: Need better fail message if "Autoencoder" and "use_all_factor_levels" are both selected (PUBDEV-724)
When select AUTO while building a gbm model get ERROR FETCHING INITIAL MODEL BUILDER STATE (PUBDEV-595)
Flow : Build h2o-dev-0.1.17.1009 : Building GLM model gives java.lang.ArrayIndexOutOfBoundsException: (PUBDEV-205 (github)
Flow:Summary on flow broken for a long time (PUBDEV-785)

Serre (0.2.1.1) - 3/18/15

####New Features

#####Algorithms

Naive Bayes in H2O-dev (PUBDEV-158)
GLM model output, details from R (HEXDEV-94)
Run GLM Regression from Flow (including LBFGS) (HEXDEV-110)
PCA (PUBDEV-157)
Port Random Forest to h2o-dev (PUBDEV-455)
Enable DRF model output (github)
Add DRF to Flow (Model Output) (PUBDEV-533)
Grid for GBM (github)
Run Deep Learning Regression from Flow (HEXDEV-109)

#####Python

Add Python wrapper for DRF (PUBDEV-534)

#####R

Add R wrapper for DRF (PUBDEV-530)

#####System

Include uploadFile (PUBDEV-299) (github)
Added -flow_dir to hadoop driver (github)

#####Web UI

Add Flow packs (HEXDEV-190) (PUBDEV-247)
Integrate H2O Help inside Help panel (PUBDEV-108) (github)
Add quick toggle button to show/hide the sidebar (github)
Add New, Open toolbar buttons (github)
Auto-refresh data preview when parse setup input parameters are changed (PUBDEV-532)
Flow: Add playbar with Run, Continue, Pause, Progress controls (HEXDEV-192)
You can now stop/cancel a running flow

####Enhancements

#####Algorithms

Display GLM coefficients only if available (PUBDEV-466)
Add random chance line to RoC chart (HEXDEV-168)
Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
Use getRNG for Dropout (github)
PUBDEV-598: Add tests for determinism of RNGs (github)
PUBDEV-598: Implement Chi-Square test for RNGs (github)
Add DL model output toString() (github)
Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
Print number of categorical levels once we hit >1000 input neurons. (github)
Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
Fully remove _convert_to_enum in all algos (github)
Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)

#####API

Display point layer for tree vs mse plots in GBM output (PUBDEV-504)
Rename API inputs/outputs (github)
Rename Inf to Infinity (github)

#####Python

added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
Make H2OVec.levels() return the levels (github)
H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)

#####System

Customize H2O web UI port (PUBDEV-483)
Make parse setup interactive (PUBDEV-532)
Added --verbose (github)
Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)

#####Web UI

Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
'Run' button selects next cell after running
ModelMetrics by model category: Clustering (PUBDEV-416)
ModelMetrics by model category: Regression (PUBDEV-415)
ModelMetrics by model category: Multinomial (PUBDEV-414)
ModelMetrics by model category: Binomial (PUBDEV-413)
Add ability to select and delete multiple models (github)
Add ability to select and delete multiple frames (github)
Flows now stop running when an error occurs
Print full number of mismatches during POJO comparison check. (github)
Make Grid multi-node safe (github)
Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

####Bug Fixes

#####Algorithms

GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
GBM predict fails without response column (PUBDEV-478)
GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
PUBDEV-580: Fix some numerical edge cases (github)
Fix two missing float -> double conversion changes in tree scoring. (github)
Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
Old GLM Parameters Missing (PUBDEV-431)
GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)

#####API

SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121) (github)

#####Python

fix python syntax error (github)
Fixes handling of None in python for a returned na_string. (github)

#####R

R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
h2o.confusionmatrices does not work (PUBDEV-547)
How do i convert an enum column back to integer/double from R? (PUBDEV-546)
Summary in R is faulty (PUBDEV-539)
R: as.h2o should preserve R data types (PUBDEV-578)
NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
Custom Functions don't work in apply() in R (PUBDEV-436)
got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)
R-H2O Managing Memory in a loop (PUB-1125)
h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
H2O-R not showing meaningful error msg

#####System

Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
Not able to start h2o on hadoop (PUBDEV-487)
one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
0 / Y / N parsing (PUBDEV-229)
NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
Check reproducibility on multi-node vs single-node (PUBDEV-557)
Parse : After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)

#####Web UI

Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
GBM Model : Params in flow show two times (PUBDEV-440)
Flow multinomial confusion matrix visualization (HEXDEV-204)
Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
[MapR] unable to give hdfs file name from Flow (PUBDEV-409)

###Selberg (0.2.0.1) - 3/6/15 ####New Features

#####Algorithms

Naive Bayes in H2O-dev (PUBDEV-158)
GLM model output, details from R (HEXDEV-94)
Run GLM Regression from Flow (including LBFGS) (HEXDEV-110)
PCA (PUBDEV-157)
Port Random Forest to h2o-dev (PUBDEV-455)
Enable DRF model output (github)
Add DRF to Flow (Model Output) (PUBDEV-533)
Grid for GBM (github)
Run Deep Learning Regression from Flow (HEXDEV-109)

#####Python

Add Python wrapper for DRF (PUBDEV-534)

#####R

Add R wrapper for DRF (PUBDEV-530)

#####System

Include uploadFile (PUBDEV-299) (github)
Added -flow_dir to hadoop driver (github)

#####Web UI

Add Flow packs (HEXDEV-190) (PUBDEV-247)
Integrate H2O Help inside Help panel (PUBDEV-108) (github)
Add quick toggle button to show/hide the sidebar (github)
Add New, Open toolbar buttons (github)
Auto-refresh data preview when parse setup input parameters are changed (PUBDEV-532) -Flow: Add playbar with Run, Continue, Pause, Progress controls (HEXDEV-192)
You can now stop/cancel a running flow

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

Display GLM coefficients only if available (PUBDEV-466)
Add random chance line to RoC chart (HEXDEV-168)
Allow validation dataset for AutoEncoder (PUDEV-581)
Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
Use getRNG for Dropout (github)
PUBDEV-598: Add tests for determinism of RNGs (github)
PUBDEV-598: Implement Chi-Square test for RNGs (github)
PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
Add DL model output toString() (github)
Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
Print number of categorical levels once we hit >1000 input neurons. (github)
Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
Fully remove _convert_to_enum in all algos (github)
Add DL POJO scoring (PUBDEV-585)

#####API

Display point layer for tree vs mse plots in GBM output (PUBDEV-504)
Rename API inputs/outputs (github)
Rename Inf to Infinity (github)

#####Python

added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
Make H2OVec.levels() return the levels (github)
H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)

#####R

PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)

#####System

Customize H2O web UI port (PUBDEV-483)
Make parse setup interactive (PUBDEV-532)
Added --verbose (github)
Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)

#####Web UI

Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
'Run' button selects next cell after running
ModelMetrics by model category: Clustering (PUBDEV-416)
ModelMetrics by model category: Regression (PUBDEV-415)
ModelMetrics by model category: Multinomial (PUBDEV-414)
ModelMetrics by model category: Binomial (PUBDEV-413)
Add ability to select and delete multiple models (github)
Add ability to select and delete multiple frames (github)
Flows now stop running when an error occurs
Print full number of mismatches during POJO comparison check. (github)
Make Grid multi-node safe (github)
Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

####Bug Fixes The following changes are to resolve incorrect software behavior:

#####Algorithms

GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
GBM predict fails without response column (PUBDEV-478)
GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
divide by zero in modelmetrics for deep learning (PUBDEV-568)
AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
PUBDEV-580: Fix some numerical edge cases (github)
Fix two missing float -> double conversion changes in tree scoring. (github)
Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226)
Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
Old GLM Parameters Missing (PUBDEV-431)
GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)

#####API

SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121)

#####Python

fix python syntax error (github)
Fixes handling of None in python for a returned na_string. (github)

#####R

R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
h2o.confusionmatrices does not work (PUBDEV-547)
How do i convert an enum column back to integer/double from R? (PUBDEV-546)
Summary in R is faulty (PUBDEV-539)
Custom Functions don't work in apply() in R (PUBDEV-436)
R: as.h2o should preserve R data types (PUBDEV-578)
as.h2o loses track of headers (PUBDEV-541)
NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
Custom Functions don't work in apply() in R (PUBDEV-436)
got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)

#####System

Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
Not able to start h2o on hadoop (PUBDEV-487)
one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
0 / Y / N parsing (PUBDEV-229)
NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
Flow: converting a column to enum while parsing does not work (PUBDEV-566)
Parse: Numbers completely parsed wrong (PUBDEV-574)
NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
Check reproducibility on multi-node vs single-node (PUBDEV-557)
Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)

#####Web UI

Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
GBM Model : Params in flow show two times (PUBDEV-440)
Flow multinomial confusion matrix visualization (HEXDEV-204)
Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
[MapR] unable to give hdfs file name from Flow (PUBDEV-409)

###Selberg (0.2.0.1) - 3/6/15 ####New Features

#####Web UI

Flow: Delete functionality to be available for import files, jobs, models, frames (PUBDEV-241)
Implement "Download Flow" (PUBDEV-407)
Flow: Implement "Run All Cells" (PUBDEV-110)

#####API

Create python package (PUBDEV-181)
as.h2o in Python (HEXDEV-72)

#####System

Add a README.txt to the hadoop zip files (github)
Build a cdh5.2 version of h2o (github)

####Enhancements

#####Web UI

Flow: Job view should have info on start and end time (PUBDEV-267)
Flow: Implement 'File > Open' (PUBDEV-408)
Display IP address in ADMIN -> Cluster Status (HEXDEV-159)
Flow: Display alternate UI for splitFrames() (PUBDEV-399)

#####Algorithms

Added K-Means scoring (github)
Flow: Implement model output for Deep Learning (PUBDEV-118)
Flow: Implement model output for GLM (PUBDEV-120)
Deep Learning model output (HEXDEV-89, Flow),(HEXDEV-88, Python),(HEXDEV-87, R)
Run GLM Binomial from Flow (including LBFGS) (HEXDEV-90)
Flow: Display confusion matrices for multinomial models (PUBDEV-397)
During PCA, missing values in training data will be replaced with column mean (github)
Update parameters for best model scan (github)
Change Quantiles to match h2o-1; both Quantiles and Rollups now have the same default percentiles (github)
Massive cleanup and removal of old PCA, replacing with quadratically regularized PCA based on alternating minimization algorithm in GLRM (github)
Add model run time to DL Model Output (github)
Don't gather Neurons/Weights/Biases statistics (github)
Only store best model if override_with_best_model is enabled (github)
beta_eps added, passing tests changed (github)
For GLM, default values for max_iters parameter were changed from 1000 to 50.
For quantiles, probabilities are displayed.
Run Deep Learning Multinomial from Flow (HEXDEV-108)

#####API

Expose DL weights/biases to clients via REST call (PUBDEV-344)
Flow: Implement notification bar/API (PUBDEV-359)
Variable importance data in REST output for GLM (PUBDEV-359)
Add extra DL parameters to R API (average_activation, sparsity_beta, max_categorical_features, reproducible) (github)
Update GLRM API model output (github)
h2o.anomaly missing in R (PUBDEV-434)
No method to get enum levels (PUBDEV-432)

#####System

Improve memory footprint with latest version of h2o-dev (github)
For now, let model.delete() of DL delete its best models too. This allows R code to not leak when only calling h2o.rm() on the main model. (github)
Bind both TCP and UDP ports before clustering (github)
Round summary row#. Helps with pctiles for very small row counts. Add a test to check for getting close to the 50% percentile on small rows. (github)
Increase Max Value size in DKV to 256MB (github)
Flow: make parseRaw() do both import and parse in sequence (HEXDEV-184)
Remove notion of individual job/job tracking from Flow (PUBDEV-449)
Capability to name prediction results Frame in flow (PUBDEV-233)

####Bug Fixes

#####Algorithms

GLM binomial prediction failing (PUBDEV-403)
DL: Predict with auto encoder enabled gives Error processing error (PUBDEV-433)
balance_classes in Deep Learning intermittent poor result (PUBDEV-437)
Flow: Building GLM model fails (PUBDEV-186)
summary returning incorrect 0.5 quantile for 5 row dataset (PUBDEV-95)
GBM missing variable importance and balance-classes (PUBDEV-309)
H2O Dev GBM first tree differs from H2O 1 (PUBDEV-421)
get glm model from flow fails to find coefficient name field (PUBDEV-394)
GBM/GLM build model fails on Hadoop after building 100% => Failed to find schema for version: 3 and type: GBMModel (PUBDEV-378)
Parsing KDD wrong (PUBDEV-393)
GLM AIOOBE (PUBDEV-199)
Flow : Build GLM Model with family poisson => java.lang.ArrayIndexOutOfBoundsException: 1 at hex.glm.GLM$GLMLambdaTask.needLineSearch(GLM.java:359) (PUBDEV-210)
Flow : GLM Model Error => Enum conversion only works on small integers (PUBDEV-365)
GLM binary response, do_classfication=FALSE, family=binomial, prediction error (PUBDEV-339)
Epsilon missing from GLM parameters (PUBDEV-354)
GLM NPE (PUBDEV-395)
Flow: GLM bug (or incorrect output) (PUBDEV-252)
GLM binomial prediction failing (PUBDEV-403)
GLM binomial on benign.csv gets assertion error in predict (PUBDEV-132)
current summary default_pctiles doesn't have 0.001 and 0.999 like h2o1 (PUBDEV-94)
Flow: Build GBM/DL Model: java.lang.IllegalArgumentException: Enum conversion only works on integer columns (PUBDEV-213) (github)
ModelMetrics on cup98VAL_z dataset has response with many nulls (PUBDEV-214)
GBM : Predict model category output/inspect parameters shows as Regression when model is built with do classification enabled (PUBDEV-441)
Fix double-precision DRF bugs (github)

#####System

Null columnTypes for /smalldata/arcene/arcene_train.data (PUBDEV-406) (github)
Flow: Waiting for -1 responses after starting h2o on hadoop cluster of 5 nodes (PUBDEV-419)
Parse: airlines_all.csv => Airtime type shows as ENUM instead of Integer (PUBDEV-426) (github)
Flow: Typo - "Time" option displays twice in column header type menu in Parse (PUBDEV-446)
Duplicate validation messages in k-means output (PUBDEV-305) (github)
Fixes Parse so that it returns to supplying generic column names when no column names exist (github)
Flow: Import File: File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
Flow: Parse => 1m.svm hangs at 42% (HEXDEV-174)
Prediction NFE (PUBDEV-308)
NPE doing Frame to key before it's fully parsed (PUBDEV-79)
h2o_master_DEV_gradle_build_J8 #351 hangs for past 17 hrs (PUBDEV-239)
Sparkling water - container exited due to unavailable port (PUBDEV-357)

#####API

Flow: Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410) (github)
Incorrect dest.type, description in /CreateFrame jobs (PUBDEV-404)
space in windows filename on python (PUBDEV-444) (github)
Python end-to-end data science example 1 runs correctly (PUBDEV-182)
3/NodePersistentStorage.json/foo/id should throw 404 instead of 500 for 'not-found' (HEXDEV-163)
POST /3/NodePersistentStorage.json should handle Content-Type:multipart/form-data (HEXDEV-165)
by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-92)
Sparkling water : val train:DataFrame = prostateRDD => Fails with ArrayIndexOutOfBoundsException (PUBDEV-392)
Flow : getModels produces error: Error calling GET /3/Models.json (PUBDEV-254)
Flow : Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410)
ddply 'Could not find the operator' (HEXDEV-162) (github)
h2o.table AIOOBE during NewChunk creation (HEXDEV-161) (github)
Fix warning in h2o.ddply when supplying multiple grouping columns (github)

###0.1.26.1051 - 2/13/15

####New Features

Flow: Display alternate UI for splitFrames() (PUBDEV-399)

####Enhancements

#####System

Embedded H2O config can now provide flat file (needed for Hadoop) (github)
Don't logging GET of individual jobs to avoid filling up the logs (github)

#####Algorithms

Increase GBM/DRF factor binning back to historical levels. Had been capped accidentally at nbins (typically 20), was intended to support a much higher cap. (github)
Tweaked rho heuristic in glm (github)
Enable variable importances for autoencoders (github)
Removed group_split option from GBM
Flow: display varimp for GBM output (PUBDEV-398)
variable importance for GBM (github)
GLM in H2O-Dev may provide slightly different coefficient values when applying an L1 penalty in comparison with H2O1.

####Bug Fixes

#####Algorithms

Fixed bug in GLM exception handling causing GLM jobs to hang (github)
Fixed a bug in kmeans input parameter schema where init was always being set to Furthest (github)
Fixed mean computation in GLM (github)
Fixed kmeans.R (github)
Flow: Building GBM model fails with Error executing javascript (PUBDEV-396)

#####System

DataFrame propagates absolute path to parser (github)
Fix flow shutdown bug (github)

###0.1.26.1032 - 2/6/15

####New Features

#####General Improvements

better model output
support for Python client
support for Maven
support for Sparkling Water
support for REST API schema
support for Hadoop CDH5 (github)

#####UI

Display summary visualizations by default in column summary output cells (PUBDEV-337)
Display AUC curve by default in binomial prediction output cells (PUBDEV-338)
Flow: Implement About H2O/Flow with version information (PUBDEV-111)
Add UI for CreateFrame (PUBDEV-218)
Flow: Add ability to cancel running jobs (PUBDEV-373)
Flow: warn when user navigates away while having unsaved content (PUBDEV-322)

#####Algorithms

Implement splitFrame() in Flow (PUBDEV-356)
Variable importance graph in Flow for GLM (PUBDEV-360)
Flow: Implement model building form init and validation (PUBDEV-102)
Added a shuffle-and-split-frame function; Use it to build a saner model on time-series data (github)
Added binomial model metrics (github)
Run KMeans from R (HEXDEV-105)
Be able to create a new GLM model from an existing one with updated coefficients (HEXDEV-48)
Run KMeans from Python (HEXDEV-106)
Run Deep Learning Binomial from Flow (HEXDEV-83)
Run KMeans from Flow (HEXDEV-104)
Run Deep Learning from Python (HEXDEV-85)
Run Deep Learning from R (HEXDEV-84)
Run Deep Learning Multinomial from Flow (HEXDEV-108)
Run Deep Learning Regression from Flow (HEXDEV-109)

#####API

Flow: added REST API documentation to the web ui (PUBDEV-60)
Flow: Implement visualization API (PUBDEV-114)

#####System

Dataset inspection from Flow (HEXDEV-66)
Basic data munging (Rapids) from R (HEXDEV-70)
Implement stack operator/stacking in Lightning (HEXDEV-128)

####Enhancements

#####UI

Added better message when h2o.init() not yet called (No active connection to an H2O cluster. Try calling "h2o.init()") (github)

#####Algorithms

Updated column-based gradient task to use sparse interface (github)
Updated LBFGS (added progress monitor interface, updated some default params), added progress and job support to GLM lbfgs (github)
Added pretty print (github)
Added AutoEncoder to R model categories (github)
Added Coefficients table to GLM model (github)
Updated glm lbfgs to allow for efficient lambda-search (l2 penalty only) (github)
Removed splitframe shuffle parameter (github)
Simplified model builders and added deeplearning model builder (github)
Add DL model outputs to Flow (PUBDEV-372)
Flow: Deep Learning: Expert Mode (PUBDEV-284)
Flow: Display multinomial and regression DL model outputs (PUBDEV-383)
Display varimp details for DL models (PUBDEV-381)
Make binomial response "0" and "1" by default (github)
Add Coefficients table to GLM model (github)
Removed splitframe shuffle parameter (github)
Update R GBM demos to reflect new input parameter names (github)
Rename GLM variable importance to normalized coefficient magnitudes (github)

#####API

Changed key to destination_key (github)
Cleaned up REST API schema interface (github)
Changed method name, cleaned setup, added a pyunit runner (github)

#####System

Allow changing column types during parse-setup (PUBDEV-376)
Display %NAs in model builder column lists (PUBDEV-375)
Figure out how to add H2O to PyPl (PUBDEV-178)

####Bug Fixes

#####UI

Flow: Parse => 1m.svm hangs at 42% (PUBDEV-345)
cup98 Dataset has columns that prevent validation/prediction (PUBDEV-349)
Flow: predict step failed to function (PUBDEV-217)
Flow: Arrays of numbers (ex. hidden in deeplearning)require brackets (PUBDEV-303)
Flow v.0.1.26.1030: StackTrace was broken (PUBDEV-371)
Flow: Import files -> Search -> Parse these files -> null pointer exception (PUBDEV-170)
Flow: "getJobs" not working (PUBDEV-320)
Thresholds x Metrics and Max Criteria x Metrics tables were flipped in flow (HEXDEV-155)
Flow v.0.1.26.1030: StackTrace is broken (PUBDEV-348)
flow: getJobs always shows "Your H2O cloud has no jobs" (PUBDEV-243)
Flow: First and last characters deleted from ignored columns (PUBDEV-300)
Sparkling water => Flow => Menu buttons for cell do not show up (PUBDEV-294)

#####Algorithms

Flow: Build K Means model with default K value gives error "Required field k not specified" (PUBDEV-167)
Slicing out a specific data point is broken (PUBDEV-280)
Flow: SplitFrame and grep in algorithms for flow and loops back onto itself (PUBDEV-272)
Fixed the predict method (github)
Refactor ModelMetrics into a different class for Binomial (github)
/Predictions.json did not cache predictions (HEXDEV-119)
Flow, DL: Error after changing hidden layer size (PUBDEV-323)
Error in node$h2o#node: $ operator is invalid for atomic vectors (PUBDEV-348)
Fixed K-means predict (PUBDEV-321)
Flow: DL build mode fails => as it's missing adding quotes to parameter (PUBDEV-301)
Flow: Build K means model with training/validation frames => unknown error (PUBDEV-185)
Flow: Build quantile mode=> Click goes in loop (PUBDEV-188)

#####API

Sparkling Water/Flow: Failed to find version for schema (PUBDEV-367)
Cloud.json returns odd node name (PUBDEV-259)

#####System

guesser needs to send types to parse (PUBDEV-279)
Got h2o.clusterStatus function working in R. (github)
Parse: Using R => java.lang.NullPointerException (PUBDEV-380)
Flow: Jobs => click on destination key => unimplemented: Unexpected val class for Inspect: class water.fvec.DataFrame (PUBDEV-363)
Column assignment in R exposes NullPointerException in Rollup (PUBDEV-155)
import from hdfs doesn't add files (PUBDEV-260)
AssertionError: ERROR: got tcp resend with existing in-progress task (PUBDEV-219)
HDFS parse fails when H2O launched on Spark CDH5 (PUBDEV-138)
Flow: Parse failure => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-296)
"predict" step is not working in flow (PUBDEV-202)
Flow: Frame finishes parsing but comes up as null in flow (PUBDEV-270)
scala >flightsToORD.first() fails with "not serializable result" (PUBDEV-304)
DL throws NPE for bad column names (PUBDEV-15)
Flow: Build model: Not able to build KMeans/Deep Learning model (PUBDEV-297)
Flow: Col summary for NA/Y cols breaks (PUBDEV-325)
Sparkling Water : util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread NanoHTTPD Session,9,main (PUBDEV-346)
toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)

###0.1.20.1019 - 1/19/15

####New Features

#####UI

Added various documentation links to the build page (github)

#####Algorithms

Ported matrix multiply over and connected it to rapids (github)

####Enhancements

#####UI

Allow user to specify (the log of) the number of rows per chunk for a new constant chunk; use this new function in CreateFrame (github)
Make CreateFrame non-blocking, now displays progress bar in Flow (github)
Add row and column count to H2OFrame show method (github)
Admin watermeter page (PUBDEV-234)
Admin stack trace (PUBDEV-228)
Admin profile (PUBDEV-227)
Flow: Add download logs in UI (PUBDEV-204)
Need shutdown, minimally like h2o (PUBDEV-74)

#####API

Changed 2 to 3 for JSON requests (github)
Rename some more fields per consistency (max_iters changed to max_iterations, _iters to _iterations, _ncats to _categorical_column_count, _centersraw to centers_raw, _avgwithinss to tot_withinss, _withinmse to withinss) (github)
Changed K-Means output parameters (withinmse to within_mse, avgss to avg_ss, avgbetweenss to avg_between_ss) (github)
Remove default field values from DeepLearning parameters schema, since they come from the backing class (github)
Add @API help annotation strings to JSON model output (PUBDEV-216)

#####Algorithms

Minor fix in rapids matrix multiplicaton (github)
Updated sparse chunk to cut off binary search for prefix/suffix zeros (github)
Updated L_BFGS for GLM - warm-start solutions during lambda search, correctly pass current lambda value, added column-based gradient task (github)
Fix model parameters' default values in the metadata (github)
Set default value of k = number of clusters to 1 for K-Means (PUBDEV-251)

#####System

Reject any training data with non-numeric values from KMeans model building (github)

####Bug Fixes

#####API

Fixed isSparse call for constant chunks (github)
Fixed sparse interface of constant chunks (no nonzero if const 1= 0) (github)

#####System

Typeahead for folder contents apparently requires trailing "/" (github)
Fix build and instructions for R install.packages() style of installation; Note we only support source installs now (github)
Fixed R test runner h2o package install issue that caused it to fail to install on dev builds (github)

###0.1.18.1013 - 1/14/15

####New Features

#####UI

Admin timeline (PUBDEV-226)
Admin cluster status (PUBDEV-225)
Markdown cells should auto run when loading a saved Flow notebook (PUBDEV-87)
Complete About page to include info about the H2O version (PUBDEV-223)

####Enhancements

#####Algorithms

Flow: Implement model output for GBM (PUBDEV-119)

###0.1.20.1016 - 12/28/14

Added ip_port field in node json output for Cloud query (github)

Files

Changes.md

Latest commit

History

Changes.md

File metadata and controls

Bug

New Feature

Task

Improvement

Bug

New Feature

Task

Improvement

Bug

New Feature

Improvement

Bug

Bug

Task

Improvement

Bug

New Feature

Task

Improvement

Bug

New Feature

Improvement

Bug

New Feature

Improvement

Technical task

Bug

New Feature

Task

Improvement

Bug

New Feature

Task

Improvement

Bug

New Feature

Bug

New Feature

Task

Improvement

Bug

New Feature

Task

Improvement

Bug

New Feature

Bug

New Feature

Improvement

Technical task

Bug

New Feature

Improvement

Python

R

Python

R

Algorithms

R

Web UI

Python

Sparkling Water

System

Web UI

Web UI

API

Web UI

Python

R

System

Python

Web UI

Algorithms

Python