Skip to content

Latest commit

 

History

History
4893 lines (3528 loc) · 367 KB

Changes.md

File metadata and controls

4893 lines (3528 loc) · 367 KB

#Recent Changes

##H2O

###Turing (3.10.0.7) - 9/19/2016

Bug

  • [PUBDEV-3300] - NPE during categorical encoding with cross-validation (Windows 8 runit only??)
  • [PUBDEV-3306] - H2OFrame arithmetic/statistical functions return inconsistent types
  • [PUBDEV-3315] - Multi file parse fails with NPE
  • [PUBDEV-3374] - h2o.hist() does not respect breaks
  • [PUBDEV-3401] - importFiles, with s3n, gives NullPointerException
  • [PUBDEV-3409] - Python Structure() Breaks When Applied to Entire Dataframe

New Feature

  • [PUBDEV-2707] - Diff operation on column in H2O Frame
  • [HEXDEV-619] - calculate residuals in h2o-3 and in flow and create a new frame with a new column that contains the residuals

Task

Improvement

  • [PUBDEV-3296] - In R, allow x to be missing (meaning take all columns except y) for all supervised algo's
  • [PUBDEV-3329] - median() should return a list of medians from an entire frame
  • [PUBDEV-3334] - Conduct rbind and cbind on multiple frames
  • [PUBDEV-3387] - Add argument to H2OFrame.print in R to specify number of rows
  • [PUBDEV-3418] - Suppress chunk summary in describe()

###Turing (3.10.0.6) - 8/25/2016

Bug

  • [HEXDEV-608] - Hashmap in H2OIllegalArgumentException fails to deserialize & throws FATAL
  • [PUBDEV-2879] - NPE in MetadataHandler
  • [PUBDEV-3086] - hist() fails for constant numeric columns
  • [PUBDEV-3173] - Client mode: flatfile requires list of all nodes, but a single entry node should be sufficient
  • [PUBDEV-3207] - Make CreateFrame reproducible for categorical columns.
  • [PUBDEV-3208] - Fix intermittency of categorical encoding via eigenvector.
  • [PUBDEV-3211] - isBitIdentical is returning true for two Frames with different content
  • [PUBDEV-3222] - AssertionError for DL train/valid with categorical encoding
  • [PUBDEV-3237] - Wrong MAE for observation weights other than 1.
  • [PUBDEV-3244] - H2ODriver for CDH5.7.0 does not accept memory settings
  • [PUBDEV-3276] - H2OFrame.drop() leaves the frame in inconsistent state

New Feature

  • [PUBDEV-3007] - Implement skewness calculation for H2O Frames
  • [PUBDEV-3008] - Implement kurtosis calculation for H2O Frames
  • [PUBDEV-3128] - Add ability to do a deep copy in Python API
  • [PUBDEV-3163] - Add docs for h2o.make_metrics() for R and Python
  • [PUBDEV-3218] - Add RMSLE to model metrics
  • [PUBDEV-3264] - Return unique values of a categorical column as a Pythonic list

Task

  • [PUBDEV-3235] - Refactor and simplify implementation of Pearson Correlation
  • [PUBDEV-3238] - Add MAE to CV Summary

Improvement

  • [PUBDEV-2702] - Create h2o.* functions for H2O primitives
  • [PUBDEV-3098] - Add methods to get actual and default parameters of a model
  • [PUBDEV-3132] - Add ability to drop a list of columns or a subset of rows from an H2OFrame
  • [PUBDEV-3138] - Ensure all is*() functions return a list

###Turing (3.10.0.3) - 7/29/2016

Bug

  • [PUBDEV-2805] - Error when setting a string column to a single value in R/Py
  • [PUBDEV-2965] - R h2o.merge() ignores by.x and by.y
  • [PUBDEV-3135] - Download Logs broken URL from Flow

New Feature

  • [PUBDEV-2958] - H2O Version Check
  • [PUBDEV-3022] - Add an h2o.concat function equivalent to pandas.concat
  • [PUBDEV-3050] - Add Huber loss function for GBM and DL (for regression)
  • [PUBDEV-3071] - Add RMSE to model metrics
  • [PUBDEV-3104] - Add Mean Absolute Error to Model Metrics
  • [PUBDEV-3108] - Add mean absolute error to scoring history and model plotting
  • [PUBDEV-3116] - Add categorical encoding schemes for DL and Aggregator
  • [PUBDEV-3155] - Compute supervised ModelMetrics from predicted and actual values in Java/R
  • [PUBDEV-3162] - Compute supervised ModelMetrics from predicted and actual values in Python

Improvement

  • [PUBDEV-1888] - Implement gradient checking for DL
  • [PUBDEV-2627] - Add better warning message to functions of H2OModelMetrics objects
  • [PUBDEV-3021] - Add demo datasets to Python package
  • [PUBDEV-3113] - Replace "MSE" with "RMSE" in scoring history table
  • [PUBDEV-3122] - Make all TwoDimTable Headers Pythonic in R and Python API
  • [PUBDEV-3129] - Achieve consistency between DL and GBM/RF scoring history in regression case
  • [PUBDEV-3131] - Disable R^2 stopping criterion in tree model builders
  • [PUBDEV-3149] - Remove R^2 from all model output except GLM

###Turin (3.8.3.4) - 7/15/2016

Bug

  • [PUBDEV-3040] - File parse from S3 extremely slow
  • [PUBDEV-3145] - Fix Deep Learning POJO for hidden dropout other than 0.5

###Turin (3.8.3.2) - 7/1/2016

Bug

  • [PUBDEV-898] - DRF: sample_rate=1 not permitted unless validation is performed
  • [PUBDEV-2087] - create a set of tests which create large POJOs for each algo and compiles them
  • [PUBDEV-2322] - Merge (method="radix") bug1
  • [PUBDEV-2325] - Merge (method="radix") bug2
  • [PUBDEV-2565] - Fold Column not available in h2o.grid
  • [PUBDEV-2964] - h2o.merge(,method="radix") failing 15/40 runs
  • [PUBDEV-3030] - Parse: java.lang.IllegalArgumentException: 0 > -2147483648
  • [PUBDEV-3032] - Cached errors are not printed if H2O exits
  • [PUBDEV-3072] - java.lang.ClassCastException for Quantile GBM
  • [PUBDEV-3077] - model_summary number of trees is too high for multinomial DRF/GBM models
  • [PUBDEV-3079] - NPE when accessing invalid null Frame cache in a Frame's vecs()
  • [PUBDEV-3081] - TwoDimTable version of a Frame prints missing value (NA) as 0
  • [PUBDEV-3089] - Fix tree split finding logic for some cases where min_rows wasn't satisfied and the entire column was no longer considered even if there were allowed split points
  • [PUBDEV-3093] - saveModel and loadModel don't work with windows c:/ paths
  • [PUBDEV-3095] - getStackTrace fails on NumberFormatException
  • [PUBDEV-3096] - TwoDimTable for Frame Summaries doesn't always show the full precision
  • [PUBDEV-3097] - DRF OOB scoring isn't using observation weights
  • [PUBDEV-3099] - AIOOBE when calling 'getModel' in Flow while a GLM model is training

Task

  • [PUBDEV-2681] - Properly document the addition of missing_values_handling arg to GLM

Improvement

  • [PUBDEV-1617] - Matt's new merge (aka join) integrated into H2O
  • [PUBDEV-2822] - Improved handling of missing values in tree models (training and testing)
  • [PUBDEV-3060] - IPv6 documentation
  • [PUBDEV-3066] - Stop GBM models once the effective learning rate drops below 1e-6.
  • [PUBDEV-3094] - Log input parameters during boot of H2O

###Turchin (3.8.2.9) - 6/10/2016

Bug

  • [PUBDEV-2920] - Python apply() doesn't recognize % (modulo) within lambda function
  • [PUBDEV-2940] - Documentation: Add RoundRobin histogram_type to GBM/DRF
  • [PUBDEV-2957] - Add "seed" option to GLM in documentation
  • [PUBDEV-2973] - Documentation: Update supported Hadoop versions
  • [PUBDEV-2981] - Models hang when max_runtime_secs is too small
  • [PUBDEV-2982] - Default min/max_mem_size to gigabytes in h2o.init
  • [PUBDEV-2997] - Add "ignore_const_cols" argument to glm and gbm for Python API
  • [PUBDEV-2999] - AIOOBE in GBM if no nodes are split during tree building
  • [PUBDEV-3004] - Negative R^2 (now NaN) can prevent early stopping
  • [PUBDEV-3011] - Two grid sorting methods in Py API - only one works sometimes

New Feature

Task

  • [PUBDEV-3005] - Verify checkpoint argument in h2o.gbm (for R)

Improvement

  • [PUBDEV-2040] - Sync up argument names in `h2o.init` between R and Python
  • [PUBDEV-2996] - Change `getjar` to `get_jar` in h2o.download_pojo in R
  • [PUBDEV-2998] - Change min_split_improvement default value from 0 to 1e-5 for GBM/DRF
  • [PUBDEV-3013] - Allow specification of "AUC" or "auc" or "Auc" for stopping_metrics, sorting of grids, etc.

###Turchin (3.8.2.8) - 6/2/2016

Bug

  • [PUBDEV-2985] - Make Random grid search consistent between clients for same parameters
  • [PUBDEV-2987] - Allow learn_rate_annealing to be passed to H2OGBMEstimator constructor in Python API
  • [PUBDEV-2989] - Fix typo in GBM/DRF Python API for col_sample_rate_change_per_level - was misnamed and couldn't be set

New Feature

  • [PUBDEV-2979] - Add a new metric: mean misclassification error for classification models

Improvement

  • [PUBDEV-2972] - No longer print negative R^2 values - show NaN instead
  • [PUBDEV-2984] - Add xval=True/False as an option to model_performance() in Python API

###Turchin (3.8.2.6) - 5/24/2016

Bug

  • [PUBDEV-1899] - Number of active predictors is off by 1 when Intercept is included
  • [PUBDEV-2942] - GLM with cross-validation AIOOBE (+ Grid-Search + Multinomial, may be related)
  • [PUBDEV-2943] - Improved accuracy for histogram_type="QuantilesGlobal" for DRF/GBM

New Feature

  • [PUBDEV-1705] - GLM needs 'seed' argument for new (random) implementation of n-folds
  • [PUBDEV-2743] - Add seed argument to GLM

Improvement

  • [PUBDEV-2928] - Remove _Dev from file name _DataScienceH2O-Dev
  • [PUBDEV-2945] - Clean up overly long and duplicate error message in KeyV3
  • [PUBDEV-2953] - Allow the user to pass column types of an existing H2OFrame during Parse/Upload in R and Python
  • [PUBDEV-2954] - Tweak Parser Heuristic
  • [PUBDEV-2955] - GLM improvements and fixes

###Turchin (3.8.2.5) - 5/19/2016

Technical task

Bug

  • [PUBDEV-2282] - DRF: cannot compile pojo
  • [PUBDEV-2304] - GBM pojo compile failures
  • [PUBDEV-2878] - Bug in h2o-py H2OScaler.inverse_transform()
  • [PUBDEV-2880] - Add NAOmit() to Rapids
  • [PUBDEV-2897] - AIOOBE in Vec.factor (due to Parse bug?)
  • [PUBDEV-2903] - In grid search, max_runtime_secs without max_models hangs
  • [PUBDEV-2933] - GBM's fold_assignment = "Stratified" breaks with missing values in response column

New Feature

  • [PUBDEV-2729] - Implement h2o.relevel, equivalent of base R's relevel function
  • [PUBDEV-2857] - Add Kerberos authentication to Flow
  • [PUBDEV-2893] - Summaries Fail in rdemo.citi.bike.small.R
  • [PUBDEV-2895] - DimReduction for EasyModelAPI
  • [PUBDEV-2915] - Make histograms truly adaptive (quantiles-based) for DRF/GBM

Task

Improvement

  • [PUBDEV-2905] - Improve the progress bar based on max_runtime_secs & max_models & actual work
  • [PUBDEV-2908] - Improve GBM/DRF reproducibility for fixed parameters and hardware
  • [PUBDEV-2911] - Check sanity of random grid search parameters (max_models and max_runtime_secs)
  • [PUBDEV-2912] - Add Job's remaining time to Flow
  • [PUBDEV-2919] - Add enum option 'histogram_type' to DRF/GBM (and remove random_split_points)
  • [PUBDEV-2923] - JUnit: Separate POJO namespace during junit testing

###Turchin (3.8.2.3) - 4/25/2016

Bug

  • [PUBDEV-2852] - Incorrect sparse chunk getDoubles() extraction

New Feature

  • [PUBDEV-2825] - Create h2o.get_grid
  • [PUBDEV-2834] - Implement distributed Aggregator for visualization
  • [PUBDEV-2835] - Add col_sample_rate_change_per_level for GBM/DRF
  • [PUBDEV-2836] - Add learn_rate_annealing for GBM
  • [PUBDEV-2837] - Add random cut points for histograms in DRF/GBM (ExtraTreesClassifier)
  • [PUBDEV-2851] - Add limit on max. leaf node contribution for GBM

Task

  • [PUBDEV-2848] - Add tests for early stopping logic (stopping_rounds > 0)

Improvement

  • [PUBDEV-2877] - Make NA split decisions internally more consistent

###Turchin (3.8.2.2) - 4/8/2016

Bug

  • [PUBDEV-2820] - Implement max_runtime_secs to limit total runtime of building GLM models with and without cross-validation enabled

New Feature

  • [PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

###Turchin (3.8.2.1) - 4/7/2016

Bug

  • [PUBDEV-2766] - AIOOBE for quantile regression with stochastic GBM
  • [PUBDEV-2770] - Naive Bayes AIOOBE
  • [PUBDEV-2772] - AIOOBE for GBM if test set has different number of classes than training set
  • [PUBDEV-2775] - Number of CPUs incorrect in Flow when using a hypervisor
  • [PUBDEV-2796] - Grid search runtime isn't enforced for CV models
  • [PUBDEV-2819] - AIOOBE in GLM for dense rows in sparse data

New Feature

  • [PUBDEV-2540] - Compute and display statistics of cross-validation model metrics
  • [PUBDEV-2774] - Add keep_cross_validation_fold_assignment and more CV accessors
  • [PUBDEV-2776] - Set initial weights and biases for DL models
  • [PUBDEV-2791] - Control min. relative squared error reduction for a node to split (DRF/GBM)
  • [PUBDEV-2806] - On-the-fly interactions for GLM
  • [PUBDEV-2815] - Add stratified sampling per-tree for DRF/GBM

Task

  • [PUBDEV-2055] - Create test cases to show that POJO prediction behavior can be different than in-h2o-model prediction behavior

Improvement

  • [PUBDEV-2620] - Populate start/end/duration time in milliseconds for all models
  • [PUBDEV-2695] - Consistent handling of missing categories in GBM/DRF (and between H2O and POJO)
  • [PUBDEV-2736] - Alert the user if columns can't be histogrammed due to numerical extremities
  • [PUBDEV-2756] - GLM should generate error if user enter an alpha value greater than 1.
  • [PUBDEV-2763] - Create full holdout prediction frame for cross-validation predictions
  • [PUBDEV-2769] - Support Validation Frame and Cross-Validation for Naive Bayes
  • [PUBDEV-2810] - Add class_sampling_factors argument to DRF/GBM for R and Python APIs

###Turan (3.8.1.4) - 3/16/16

Bug

  • [PUBDEV-542] - KMeans: Size of clusters in Model Output is different from the labels generated on the training set
  • [PUBDEV-1976] - GLM fails on negative alpha
  • [PUBDEV-2718] - countmatches bug
  • [PUBDEV-2727] - bug in processTables in communication.R
  • [PUBDEV-2742] - Allow strings to be set to NA

New Feature

  • [PUBDEV-2719] - Implement Shannon entropy for a string
  • [PUBDEV-2720] - Implement proportion of substrings that are valid English words
  • [PUBDEV-2733] - Add utility function, h2o.ensemble_performance for ensemble and base learner metrics
  • [PUBDEV-2741] - Add date/time and string columns to createFrame.

Task

  • [PUBDEV-58] - Certify sparkling water on CDH5.2

Improvement

  • [PUBDEV-277] - Make python equivalent of as.h2o() work for numpy array and pandas arrays

###Turan (3.8.1.3) - 3/6/16

Bug

  • [PUBDEV-2644] - Collinear columns cause NPE for P-values computation
  • [PUBDEV-2721] - Update default values in h2o.glm.wrapper from -1 and NaN to NULL
  • [PUBDEV-2722] - AIOOBE in NewChunk

New Feature

  • [PUBDEV-2111] - Hive UDF form for Scoring Engine POJO for H2O Models

###Turan (3.8.1.2) - 3/4/16

Bug

New Feature

  • [PUBDEV-2711] - Allow DL models to be pretrained on unlabeled data with an autoencoder

Improvement

  • [PUBDEV-2708] - H2O Flow does not contain CodeMirror library
  • [PUBDEV-2710] - Model export fails: parent directory does not exist
  • [PUBDEV-2712] - Flow doesn't show DL AE error (MSE) plot
  • [PUBDEV-2717] - Do not compute expensive quantiles during h2o.summary call

###Turan (3.8.1.1) - 3/3/16

Technical task

  • [PUBDEV-2705] - implement random (stochastic) hyperparameter search

Bug

  • [PUBDEV-2639] - Parse: Incorrect assertion error caused by very large few column data
  • [PUBDEV-2649] - h2o::|,& operator handles NA's differently than base::|,&
  • [PUBDEV-2655] - h2o::as.logical behavior is different than base::as.logical
  • [PUBDEV-2682] - Importing CSV file is not working with "java -jar h2o.jar -nthreads -1"
  • [PUBDEV-2685] - Allow DL reproducible mode to work with user-given train_samples_per_iteration >= 0
  • [PUBDEV-2690] - Grid Search NPE during Flow display after grid was cancelled
  • [PUBDEV-2693] - NPE in initialMSE computation for GBM
  • [PUBDEV-2696] - DL checkpoint restart doesn't honor a change in stopping_rounds

New Feature

  • [PUBDEV-1883] - Add option to train with mini-batch updates for DL
  • [PUBDEV-2698] - Return leaf node assignments for DRF + GBM

Improvement

  • [PUBDEV-2674] - Change default functionality of as_data_frame method in Py H2O
  • [PUBDEV-2697] - Add method setNames for setting column names on H2O Frame
  • [PUBDEV-2703] - NPE in Log.write during cluster shutdown

###Tukey (3.8.0.6) - 2/23/16

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####System

  • PUBDEV-2362: Handling Sparsity with Missing Values
  • PUBDEV-2683: Fix for erroneous conversion of NaNs to zeros during rebalancing
  • PUBDEV-2684: Remove bigdata test file (not available)

####Bug Fixes

The following changes resolve incorrect software behavior:

#####Algorithms

  • PUBDEV-2678: CV models during grid search get overwritten

#####R

  • PUBDEV-2648: Di/trigamma handle NA
  • PUBDEV-2679: Progress bar for grid search with N-fold CV is wrong when max_models is given

###Tukey (3.8.0.1) - 2/10/16

####New Features

These changes represent features that have been added since the previous release:

#####API

  • PUBDEV-1798: Ability to conduct a randomized grid search with optional limit of max. number of models or max. runtime
  • PUBDEV-1822: Add score_tree_interval to GBM to score every n'th tree
  • PUBDEV-2311: Make it easy for clients to sort by model metric of choice
  • PUBDEV-2548: Add ability to set a maximum runtime limit on all models
  • PUBDEV-2632: Return a grid search summary as a table with desired sort order and metric

#####Algorithms

  • HEXDEV-495: Added ability to calculate GLM p-values for non-regularized models
  • PUBDEV-853: Implemented gain/lift computation to allow using predicted data to evaluate the model performance
  • PUBDEV-2118: Compute the lift metric for binomial classification models
  • PUBDEV-2212: Add absolute loss (Laplace distribution) to GBM and Deep Learning
  • PUBDEV-2402: Add observations weights to quantile computation
  • PUBDEV-2469: For GBM/DRF, add ability to pick columns to sample from once per tree, instead of at every level
  • PUBDEV-2594: Quantile regression for GBM and Deep Learning
  • PUBDEV-2625: Add recall and specificity to default ROC metrics

#####Python

  • HEXDEV-399: Added support for Python 3.5 and better (in addition to existing support for 2.7 and better)

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

  • PUBDEV-2233: Adjust string substitution and global string substitution to do in place updates on a string column.

#####Python

  • PUBDEV-1981: Fix layout issues of Python docs.
  • PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
  • PUBDEV-2257: Table printout in Python doesn't warn the user about truncation
  • PUBDEV-2460: Version mismatch message directs user to get a matching download
  • HEXDEV-527: Implement secure Python h2o.init
  • PUBDEV-2504: Check and print a warning if a proxy environment variable is found

#####R

  • PUBDEV-2335: as.numeric for a string column only converts strings to ints rather than reals
  • PUBDEV-2257: Table printout in R doesn't warn the user about truncation
  • PUBDEV-2430: Improve R's reporting on quantiles
  • PUBDEV-2460: Version mismatch message directs user to get a matching download

#####Flow

  • PUBDEV-2407: Improve model convergence plots in Flow
  • PUBDEV-2596: Flow shows empty logloss box for regression models
  • PUBDEV-2617: Flow's histogram doesn't cover the full support

#####System

  • HEXDEV-436: exportFile should be a real job and have a progress bar
  • PUBDEV-2459: Improve parse chunk size heuristic for better use of cores on small data sets
  • PUBDEV-2606: Print all columns to stdout for Hadoop jobs for easier debugging

####Bug Fixes

The following changes resolve incorrect software behavior:

#####API

  • PUBDEV-2633: Ability to extend grid searches with more models

#####Algorithms

  • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
  • PUBDEV-2114: Set GLM to give error when lower bound > upper bound in beta contraints
  • PUBDEV-2190: Set GLM to default to a value of rho = 0, if rho is not provided when beta constraints are used
  • PUBDEV-2210: Add check for epochs value when using checkpointing in deep learning
  • PUBDEV-2241: Set warnings about slowness from wide column counts comes before building a model, not after
  • PUBDEV-2278: Fix docstring reporting in iPython
  • PUBDEV-2366: Fix display of scoring speed for autoencoder
  • PUBDEV-2426: GLM gives different std. dev. and means than expected
  • PUBDEV-2595: Bad (perceived) quality of DL models during cross-validation due to internal weights handling
  • PUBDEV-2626: GLM with weights gives different answer h2o vs R

#####Python

  • PUBDEV-2319: sd not working inside group_by
  • PUBDEV-2403: Parser reads file of empty strings as 0 rows
  • PUBDEV-2404: Empty strings in Python objects parsed as missing

#####R

  • PUBDEV-2319: sd not working inside group_by
  • PUBDEV-2231: Fix bug in summary when zero-count categoricals were present.
  • PUBDEV-1749: Fix h2o.apply to correctly handle functions (so long as functions contain only H2O supported primitives)

#####System

  • PUBDEV-1872: Ability to ignore 0-byte files during parse
  • PUBDEV-2401: /Jobs fails if you build a Model and then overwrite it in the DKV with any other type
  • PUBDEV-2603: Improve progress bar for grid/hyper-param searches

###Tibshirani (3.6.0.9) - 12/7/15

####New Features

These changes represent features that have been added since the previous release:

#####API

  • PUBDEV-2189: H2O now allows selection of the non_negative flag in GLM for R and Python

#####Algorithms

#####R

  • PUBDEV-2079: R now retrieves column types for a H2O Frame more efficiently

#####Python

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

  • GitHub commit: Change in behavior in GLM beta constraints - when ignoring constant/bad columns, remove them from beta_constraints as well
  • GitHub commit: Added ignore_const_cols to all algos
  • PUBDEV-2311: Improved ability to sort by model metric of choice in client

#####Python

  • PUBDEV-2409: H2O now checks for H2O_DISABLE_STRICT_VERSION_CHECK env variable in Python GitHub commit
  • GitHub commit: H2O now allows l/r values to be null or an empty string
  • GitHub commit: H2O now accomodates LOAD_FAST and LOAD_GLOBAL in bytecode_to_ast

#####R

  • PUBDEV-1378: In R, h2o.getTimezone() previously returned a list of one, now it just returns the string

#####System

  • GitHub commit: Added more tweaks to help various low-memory configurations

####Bug Fixes

The following changes resolve incorrect software behavior:

#####API

  • PUBDEV-2042: h2o.grid failed when REST API version was not default
  • PUBDEV-2401: /Jobs failed if you built a Model and then overwrote it in the DKV with any other type GitHub commit
  • PUBDEV-2392: /3/Jobs failed with exception after running /3/SplitFrame
  • GitHub commit: PUBDEV-2426 - Fixed error where sd and mean were adjusted to weights even if no observation weights were passed

#####Algorithms

  • PUBDEV-2396: GLRM validation frames must have the same number of rows as the training frame
  • PUBDEV-2053: Fixed assertion failure in Deep Learning
  • PUBDEV-2315: Could not compile POJO using K-means
  • PUBDEV-2317: Could not compile POJO using PCA
  • PUBDEV-2320: Could not compile POJO using Naive Bayes
  • GitHub commit: Fixed weighted mean and standard deviation computation in GLM
  • GitHub commit: Fixed stopping criteria for lambda search and multinomial in GLM

#####Python

#####R

  • PUBDEV-1749: h2o.apply did not correctly handle functions
  • PUBDEV-2335: R: as.numeric for a string column only converted strings to ints rather than reals
  • PUBDEV-2319: R: sd was not working inside group_by
  • PUBDEV-2397: R: Ignore Constant Columns was not an argument in Algos in R like it is in Flow
  • PUBDEV-2134: When a dataset was sliced, the int mapping of enums was returned
  • PUBDEV-2408: Improved handling when H2O has already been shutdown in R GitHub commit
  • PUBDEV-2231: Fixed categorical levels mapping bug

#####System


###Tibshirani (3.6.0.7) - 11/23/15

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

  • GitHub commit: Added Iterations and Epochs to DL job status updates, added Iterations to scoring history
  • GitHub commit: Cleaned up iteration counter to work for checkpointing
  • GitHub commit: Cleaned up counter iteration logic

####Bug Fixes

The following changes resolve incorrect software behavior:

#####Algorithms

  • GitHub commit: Fixed scoring speed display for autoencoder, was showing 0 because wrong runtime was used (ms since 1970 instead of actual runtime)

###Tibshirani (3.6.0.2) - 11/5/15

####New Features

#####Algorithms

  • GitHub commit: Added support for grid search
  • PUBDEV-2272: Implemented GLRM grid search in R and Python
  • GitHub commit: PUBDEV-2289: Enabled early convergence-based stopping by default for Deep Learning
  • GitHub commit: Added L1+LBFGS solver for multinomial GLM

#####Python

  • GitHub commit: PUBDEV-2289: Added Python API for convergence-based stopping

#####R

  • GitHub commit: Added .Last to Delete InitID
  • GitHub commit: PUBDEV-2289: Enabled convergence-based early stopping for R API of Deep Learning

####Enhancements

#####Algorithms

  • GitHub commit: Enable grid search for Deep Learning parameters overwrite_with_best_model, momentum_ramp, elastic_averaging, elastic_averaging_moving_rate, & elastic_averaging_regularization
  • GitHub commit: PUBDEV-2289: Stopping tolerance and stopping metric are no longer hidden if stopping_rounds is 0
  • GitHub commit: Added checks to verify the mean, median, nrow, var, and sd are calculated correctly in groupby
  • GitHub commit: mean and sd now return lists

#####Python

  • GitHub commit: [PUBDEV-2257] H2O now gives users [row x col] of Frame in __str__
  • GitHub commit: sd/var is now sampled for group_by
  • GitHub commit: Parameter checking is now split between float and strings/unicode
  • GitHub commit: H2O now only wipes src._ex if src_in_self
  • GitHub commit: Refactored default arg handling in astfun
  • GitHub commit: Added new parameters to estimators
  • GitHub commit: Added session start/end; Python now ends the session on exit
  • GitHub commit: src and self types are now checked for None
  • GitHub commit: H2O now passes caches through all prefix ops
  • GitHub commit: H2O now pushes cached types, names, and ncols forward if possible

#####R

#####System

  • HEXDEV-475: Added EasyPOJO comments and improvements
  • GitHub commit: [PUBDEV-2204] Enabled Vec#toCategoricalVec to convert string columns to categorical columns
  • GitHub commit: apply now works in

####Bug Fixes

#####Algorithms

#####Python

#####R

  • GitHub commit: [PUBDEV-2301, PUBDEV-2314] Hidden grid parameter was passed incorrectly from R
  • GitHub commit: H2O now uses deep copy when using assign from one global to another
  • GitHub commit: Fixed getFrame and directory unlink

#####System


###Slotnick (3.4.0.1)

####New Features

#####API

#####Algorithms

  • GitHub commit: Added option in PCA to use randomized subspace iteration method for calculation
  • GitHub commit: Deep Learning: Added target_ratio_comm_to_comp to R and Python client APIs
  • GitHub commit: PUBDEV-1247: Added stochastic GBM parameters (sample_rate and col_sample_rate) to R/Py APIs
  • PUBDEV-1450: GLRM has been tested and removed from "experimental" status

#####Hadoop

#####Python

#####R

This software release introduces changes to the R API that may cause previously written R scripts to be inoperable. For more information, refer to the following link.

  • GitHub commit: Added h2o.getTypes() to the R wrapper
  • GitHub commit: Added ability to set col.types with a named list
  • GitHub commit: Added h2o.getId() to get the back-end distributed key/value store ID from a Frame
  • GitHub commit: Added column types to H2O frame in R, which allows R to set the correct column types when as.data.frame() is used on an H2O frame
  • GitHub commit: Added @export for exported R functions

#####System

  • GitHub commit: Added string length util for Enum columns
  • [GitHub commit: Added pass-through version of toCategoricalVec(), toNumericVec(), and toStringVec() to Vec.java for code simplicity and backwards compatibility
  • GitHub commit: Added string column handling to StrSplit()

#####Web UI

####Enhancements

#####Algorithms

  • PUBDEV-467: Show Frames for DL weights/biases in Flow
  • PUBDEV-1847: DRF/GBM: nbins_top_level is now configurable
  • GitHub commit: Deep Learning: Scoring time is now shown in the logs
  • GitHub commit: Sped up GBM split finding by dynamically switching between single and multi-threaded based on workload
  • PUBDEV-1247: Implemented Stochastic GBM
  • GitHub commit: Parallelized split finding for GBM/DRF (useful for large numbers of columns and nbins).
  • GitHub commit: Added improvements to speed up DRF (up to 35% faster) and stochastic GBM (up to 5x faster)
  • GitHub commit: Added some straight-forward optimizations for GBM histogram building
  • GitHub commit: GLRM is now deterministic between one vs. many chunks
  • GitHub commit: Input parameters are now immutable
  • GitHub commit: PUBDEV-2135: Cleaned up N-fold CV model parameter sanity checking and error message propagation; now checks all N-fold model parameters upfront and lets the main model carry the message to the user
  • GitHub commit: PUBDEV-2130: N-fold CV models are no longer deleted when the main model is deleted
  • GitHub commit: PUBDEV-2107: The title in plot.H2OBinomialMetrics is now editable
  • GitHub commit: Parse Python lambda (bytecode -> ast -> rapids)
  • GitHub commit: PUBDEV-1847: Cleaned up/refactored GBM/DRF
  • GitHub commit: Updated MeanSquare to Quadratic for DL
  • GitHub commit: PUBDEV-2133: Speed up Enum mapping between train/test from O(N^2) to O(N*log(N))
  • GitHub commit: Added GLRM scoring history with step size and average change in objective function value
  • GitHub commit: SVD now outputs the V matrix as a frame with a frame key, rather than a double array in the API
  • GitHub commit: Modified k-means++ initialization in GLRM to set X to inverse of cluster distance with sum normalized to one, for each observation in training data
  • GitHub commit: Increased GBM worker thread priority to avoid deadlock with high parallel GBM job counts
  • GitHub commit: Added input parameter svd_method to GLRM

#####Python

  • GitHub commit: centers_std is now returned as a list of columns
  • GitHub commit: str(Frame) no longer returns an ID; updated ExprNode _to_string to accomodate
  • GitHub commit: Changed default setting for _isAllAscii to false
  • GitHub commit: Fixed var to return scalar/frame based on nrow
  • GitHub commit: Python now checks ncol, not nrow
  • PUBDEV-1060: Python's h2o.import_frame() now matches R's importFile() parameters where applicable
  • PUBDEV-1960: Python now uses the streaming endpoint /3/DownloadDataset.bin
  • PUBDEV-2223: Added normalization and standardization coefficients to the model output in Python
  • GitHub commit: Renamed logging to h2o_logging to avoid conflict with original logging package
  • GitHub commit: H2O now recognizes additional parameters (such as column names) for Python objects
  • GitHub commit: head and tail no longer download the entire dataset
  • GitHub commit: Truncated DF in head and tail before calling /DownloadDataset
  • GitHub commit: head() and tail() now default to pretty printing in Python
  • GitHub commit: Moved setup functionality from parse to parse setup; col_types and na_strings can now be dictionaries
  • GitHub commit: Updated H2OColSelect to supply extra argument
  • GitHub commit: PUBDEV-2174: Relative tolerance is now used for floating point comparison
  • GitHub commit: Added more cloud health output to run.py
  • GitHub commit: When Pandas frames are returned, they are now wrapped to display nicely in iPython

#####R

  • GitHub commit: Added null check
  • PUBDEV-2185: When appending a vec to an existing data frame, H2O now creates a new data frame while still keeping the original frame in memory
  • PUBDEV-1959: R now uses the streaming endpoint /3/DownloadDataset.bin
  • PUBDEV-2020: h2o.splitFrame() in R/Python now uses the runif technique instead of the horizontal slice technique
  • GitHub commit: Changed T/F to TRUE/FALSE
  • GitHub commit: xml2 package is now required for rversions package
  • GitHub commit: Package dependencies are taken into account when installing R packages
  • GitHub commit: Metrics are now always computer if a dataset is provided (R h2o.performance call)
  • GitHub commit: Column names are now fetched from H2O
  • GitHub commit: PUBDEV-2150: Time columns in H2O are now imported as Date columns in R
  • GitHub commit: h2o.ls() now returns data.frame
  • GitHub commit: h2o.ls() now returns the whole frame
  • GitHub commit: Removed unnamed additional parameters (ellipses) in R algos
  • GitHub commit: Added as.characterto Rapids implementation
  • GitHub commit: Updated plot.H2OModel in R
  • GitHub commit: Updated scoring history plot in R for training_frame only
  • GitHub commit: Instead of : and assign, attr is now used
  • GitHub commit: Raw strings are now used as accessors
  • GitHub commit: name.Frame and dimnames.Frame are now visible

#####System

  • GitHub commit: Added vertical prefetch of all chunks' worth of data for dense rows
  • PUBDEV-1426: Scoring is now a non-blocking job with a progress bar
  • GitHub commit: EasyPojo API is now serializable
  • GitHub commit: Changed parse setup guess when encountering large NA counts to not favor numeric over dates or UUIDs
  • GitHub commit: Refactored vector type conversion methods into a class called VecUtils
  • GitHub commit: Cleaned up ASTStrList to handle frames with more than one vector during column conversion; checks types before converting; added several new column type conversions
  • GitHub commit: If the job is cancelled, scoring is now canceled
  • GitHub commit: Refactored doAll_numericResult() -> doAll(nout, type, frame) where all output vecs are of the given type
  • GitHub commit: Improved hash function
  • GitHub commit: The output of _train.get() is now passed to a Frame
  • GitHub commit: Refactored binary/col ops for aesthetics and maintainability
  • GitHub commit: Added correct types for new Vecs; CategoricalWrappedVec now exports a utility for enum conversions instead of a constructor
  • GitHub commit: Mean/sigma values are now printed to the logs after parsing
  • GitHub commit: PUBDEV-2174: Added some optimizations for some chunks (mostly integers) in RollupStats
  • GitHub commit: PUBDEV-2174: Added instantiations of Rollups for dense numeric chunks
  • GitHub commit: PUBDEV-2174: Implemented single-pass variance/stddev calculation for rollups
  • GitHub commit: PUBDEV-2174: Added hasNA() for chunks
  • GitHub commit: Reordered args in sub/gsub (astid > astparameter, add string -> numeric
  • GitHub commit: Ensured all chunks get closed
  • GitHub commit: NewChunk.addString() now accepts a Java string or BufferedString, eliminating needless conversion to a BufferedString before inserting into the NewChunk buffer. Improves efficiency of several ASTStrOps as well as converting Categorical columns to String columns.
  • GitHub commit: Renamed enums to categoricals system-wide
  • GitHub commit: Renamed ValueString -> BufferedString
  • GitHub commit: Removed redundant frame creation; added Java comments to each string utility; changed RAPIDS name of gsub -> replaceall and sub -> replacefirst; added nchar utility to the R client; updated comments in Python and R client
  • GitHub commit: All NA chunks are now handled in string ops
  • GitHub commit: Added ability for string utils to handle NA chunks
  • GitHub commit: Added the ability to handle duplicate rows to merge
  • GitHub commit: countMatches utilities now only work on string columns
  • GitHub commit: Changed names of SubStr and GSubStr to ReplaceFirst and ReplaceAll; both methods now only accept string columns as input
  • GitHub commit: Changed toUpper and toLower to only work on string columns; includes an optimzied version of each method as well as a UTF-safe version
  • GitHub commit: CStrChunks now track whether they are pure ASCII to allow StringUtilities to use optimized versions of the utilities that operate directly on the string buffer
  • GitHub commit: Moved frame function to ArrayUtils
  • GitHub commit: Removed categorical versions of trim() and length()
  • GitHub commit: Changed the merge defaults to match the implementation
  • GitHub commit: Merge no longer uses a by argument
  • GitHub commit: Added trim and length functionality for string columns
  • GitHub commit: HEXDEV-442: Improved POJO handling
  • GitHub commit: Config files are now transferred using a hexstring to avoid issues with Hadoop XML parsing
  • GitHub commit: HEXDEV-445: Added isNA check
  • GitHub commit: Means, mults, modes, and size now do bulk rollups
  • GitHub commit: Increased priority of model builder Driver classes to prevent deadlock when bulk-launching parallel unrelated model builds
  • GitHub commit: Renamed Currents to Rapids
  • GitHub commit: CRAN-based R clients are now set to opt-out by default
  • GitHub commit: Assembly states are now saved in the DKV

#####Web UI

  • PUBDEV-1961: Flow now uses the streaming endpoit /3/DownloadDataset.bin

####Bug Fixes

#####Algorithms

  • GitHub commit: Fixed bug with CategoricalWrappedVec
  • PUBDEV-1664: Corrected math for GBM Tweedie with offsets/weights
  • PUBDEV-1665: Corrected math for GBM Poisson with offsets/weights
  • PUBDEV-2130: Deleting Deep Learning n-fold models resulted in a java.lang.AssertionError
  • GitHub commit: Fixed GLM with nfolds
  • GitHub commit: Updated GLM InitTsk to run at +1 priority level to avoid deadlock when launching hundreds of GLMs in parallel
  • GitHub commit: Column names (feature names) are now named correctly for the exported weight matrix connecting the input to the first hidden layer
  • GitHub commit: Changed isEnum to isCategorical
  • GitHub commit: Cleaned up DRF and GBM; fixed checkpoint restart logic for trees and changed which parameters are configurable
  • GitHub commit: Fixed incorrect logistic and hinge loss functions and apply to binary numeric columns in {0,1} only
  • GitHub commit: Fixed a bug where Poisson loss function was calculated incorrectly for values of 0
  • GitHub commit: Fixed DL POJO for large input columns

#####Python

#####R

#####System

  • PUBDEV-2250: During parsing, SVMLight-formatted files failed with an NPE GitHub commit
  • PUBDEV-2213: During parsing, alphanumeric data in a column was converted to missing values and the column was assigned a type of int
  • PUBDEV-1990: Spaces are now permitted in the Flow directory name
  • PUBDEV-1037: Space in the user name was preventing H2O from starting
  • GitHub commit: Fixed VecUtils.copyOver() to accept a column type for the resulting copy
  • GitHub commit: Fixed Vec.preWriting so that it does not use an anonymous inner task which causes the entire Vec header to be passed
  • GitHub commit: Fixed parse to mark categorical references in ParseWriter as transient (enums must be node-shared during the entire multiple parse task)
  • GitHub commit: PUBDEV-2182: Fixed DL checkpoint restart with given validation set after R (currents) behavior changed; now the validation set key no longer necessarily matches the file name
  • GitHub commit: Fixed makeCon memory leak when redistribute=T
  • GitHub commit: PUBDEV-2174: Fixed sigma calculation for sparse chunks
  • GitHub commit: Restored pre-existing string manipulation utilities for categorical columns
  • GitHub commit: Fixed syncRPackages task so it doesn't run during the normal build process
  • GitHub commit: Fixed intermittent failures caused by different default timezone settings on different machines; sets needed timezone before starting test
  • GitHub commit: Fixed error message for countmatches
  • GitHub commit: PUBDEV-1443: Fixed size computation in merge
  • GitHub commit: Fixed h2o.tabulate() to work in multi-node mode
  • GitHub commit: Fixed integer overflow in printout of CM to TwoDimTable

###Slater (3.2.0.7) - 10/09/15

####Bug Fixes

  • GitHub commit: Fix Java 6 compatibility

    The Java 7 API call _rawChannel.setOption(StandardSocketOptions.TCP_NODELAY, true); has been replaced by the Java 6 API call _rawChannel.socket().setTcpNoDelay(true);

    The Java 7 API call sock.getRemoteAddress()) has been replaced by sock.socket().getRemoteSocketAddress()


###Slater (3.2.0.5) - 09/24/15

####Enhancements

#####Algorithms


###Slater (3.2.0.3) - 09/21/15

####New Features

#####R

####Enhancements

#####Algorithms

  • GitHub commit: Added back support for sparse activations in DL; currently changes results as numerical values are de-scaled only, no standardized

#####Python

  • GitHub commit: Adjusted import_file in Python to accept the same parameters as import_file in R

#####R

####Bug Fixes

#####Algorithms

#####R

#####System


###Slater (3.2.0.1) - 09/12/15

####New Features

#####Algorithms

  • GitHub: PUBDEV-1888: Added loss function calculation for DL.
  • GitHub: Set more parameters for GLM to be gridable.
  • GitHub: [KMeans] Enable grid search with max_iterations parameter.
  • GitHub: Add kfold column builders
  • GitHub: Add stratified kfold method

#####Python

  • PUBDEV-684: Add nfolds to R/Python
  • GitHub: Improved group-by functionality
  • GitHub: Added python example for downloading glm pojo.
  • GitHub: Added countmatches to Python along with a test.
  • GitHub: Added support for getting false positive rates and true positive rates for all thresholds from binomial models; makes it easier to calculate custom metrics from ROC data (like weighted ROC)

#####R

  • PUBDEV-1788: Added a factor function that will allow the user to set the levels for a enum column GitHub
  • PUBDEV-1881: Fixed bug in h2o.group_by for enumerator columns
  • GitHub: Refactor SVD method name and add svd_method option to R package to set preferred calculation method
  • PUBDEV-2071: Accept columns of type integer64 from R through as.h2o()

#####Sparkling Water

  • PUBDEV-282: Support Windows OS in Sparkling Water

#####System

  • HEXDEV-120: Switch from NanoHTTPD to Jetty
  • GitHub: Allow for "most" and "mode" in groupby
  • GitHub: Added NA check to checking for matches in categorical columns
  • PUBDEV-1470: Dropped UDP mode in favor of TCP
  • PUBDEV-1431: /3/DownloadDataset.bin is now a registered handler in JettyHTTPD.java. Allows streaming of large downloads from H2O.GitHub
  • PUBDEV-1865: Implemented per-row 1D, 2D and 3D DCT transformations for signal/image/volume processing
  • PUBDEV-1686: LDAP Integration
  • HEXDEV-381: LDAP Integration
  • HEXDEV-224: Added https support
  • GitHub: Added mapr5.0 version to builds
  • GitHub: Add Vec.Reader which replaces lost caching

#####Web UI

  • GitHub: Disallow N-fold CV for GLM when lambda-search is on.
  • GitHub: Added typeahead for http and https.
  • PUBDEV-1821: Added Save Model and Load Model

####Enhancements

#####Algorithms

  • GitHub: Don't allocate input dropout helper if input_dropout_ratio = 0.
  • PUBDEV-1920: Datasets : Unbalanced sparse for binomial and multinomial
  • GitHub: Major code cleanup for DL: Remove dead code, deprecate sparse/col_major.
  • PUBDEV-1942: Use prior class probabilities to break ties when making labels GitHub
  • GitHub: Update DL perf Rmd file to get the overall CM error.
  • GitHub: Enable training data shuffling if train_samples_per_iteration==0 and reproducible==true
  • GitHub: Checkpointing for DL now follows the same convention as for DRF/GBM.
  • GitHub: No longer do sampling with replacement during training with shuffle_training_data
  • GitHub: Add printout of sparsity ratio for double chunks.
  • GitHub: Check memory footprint for Gram matrix in PCA and SVD initialization
  • GitHub: Print more fill ratio debugging.
  • GitHub: Fix the RNG for createFrame to be more random (since we are setting the seed for each row).
  • PUBDEV-2010: Improve reporting of unstable DL models GitHub
  • PUBDEV-2018: Improve auto-tuning for DL on large clusters / large datasets GitHub
  • GitHub: Add input parameter to h2o.glrm indicating whether to ignore constant columns
  • GitHub: Missing enums are imputed using the majority class of the column. For other types of missing categorical, just round the mean to the nearest integer.
  • GitHub: Skip rows in training frame with missing value(s) if requested
  • GitHub: Speed up direct SVD by working with transpose directly
  • GitHub: Fix a bug in initialization of SVD and change l2 norm to sum of squared error in convergence test.
  • GitHub: Use absolute value for mean weight and bias checks.
  • GitHub: No longer leak constant chunks during AE scoring/reconstruction.
  • GitHub: No longer differentiate between DL model instabilitites (weights vs biases).
  • GitHub: Make method static, where possible.
  • GitHub: Make GLRM seeding independent of number of chunks.

#####API

  • GitHub: Added REST end-points for glrm,svd,pca,naive bayes algorithms.
  • GitHub: Added unicode to frame getter possibilities
  • GitHub: Added proper lookup of offset/weights/fold_column
  • GitHub: Data should be eagered before download_csv.
  • GitHub: Simplified model builder
  • GitHub: Added None as default for "on" field
  • GitHub: Removed all of the unnecessary calls to h2o.init and removed the unnecessary environment variable for version checking during testing
  • PUBDEV-2064: rename the coordinate decent solvers in the REST API / Flow to (experimental)

#####Grid Search

  • GitHub: Added check that x is not null before verifying data in unsupervised grid search algorithm
  • GitHub: Made naivebayes parameters gridable.
  • PUBDEV-1933: Called drf as randomForest in algorithm option GitHub
  • GitHub: Validation of grid parameters against algo /parameters rest endpoint.
  • PUBDEV-1979: Train N-fold CV models in parallel GitHub
  • PUBDEV-1978: grid: would be good to add to h2o.grid R help example, how to access the individual grid models

#####Python

  • GitHub: Refactored into h2o.system_file so it's parallel to R client.
  • GitHub: Added h2o_deprecated decorator
  • GitHub: Use import_file in import_frame
  • GitHub: Handle a list of columns in python group-by api
  • GitHub: Use pandas if available for twodimtables and h2oframes
  • GitHub: Transform the parameters list into a dict with keys being the parameter label
  • GitHub: Added pop option which does inplace update on a frame (Frame.remove)
  • GitHub: ncol,dim,shape, and friends are now all properties
  • PUBDEV-193: Write python version of h2o.init() which knows how to start h2o
  • PUBDEV-1903: Method to get parameters of model in Python API
  • GitHub: Allow for single alpha specified not be in a list
  • GitHub: Updated endpoint for python client download_csv
  • GitHub: Allow for enum in scale/mean/sd (ignore or give NA)
  • GitHub: Allow for n_jobs=-1 and n_jobs > 1 for Parallel jobs
  • GitHub: Added frame_id property to frame
  • GitHub: Removed remaining splats on dicts
  • GitHub: Removed need to splat pass thru args
  • GitHub: Added get_jar flag to download_pojo

#####R

  • PUBDEV-1866: Rewrote h2o.ensemble to utilize nfolds/fold_column in h2o base learners
  • GitHub: Added max_active_predictors.
  • GitHub: Updated REST call from R for model export
  • PUBDEV-1853: Removed addToNavbar from RequestServer GitHub
  • GitHub: Add "Open H2O Flow" message.
  • GitHub: Replaced additive float op by multiplication
  • GitHub: Reimplement checksum for Model.Parameters
  • GitHub: Remove debug prints.
  • PUBDEV-1857: Removed the need for String[] path_params in RequestServer.register() GitHub
  • PUBDEV-1856: Removed the writeHTML_impl methods from all the schemas
  • PUBDEV-1854: Made _doc_method optional in the in Route constructors GitHub
  • PUBDEV-1858: Changed RequestServer so that only one handler instance is created for each Route
  • GitHub: Swapped out rjson for jsonlite for better handling of odd characters from dataset.
  • GitHub: Prettify R's grid output.
  • PUBDEV-1841: R now respects the TwoDimTable's column types
  • GitHub: Fixes show method for grid object when hyper_params is empty.
  • GitHub: h2o.levels returns R vector for single column
  • GitHub: Uses PredictCsv from genmodel now.
  • GitHub: Exposed stacktraces in R's summary() call.
  • GitHub: print type of failed value in $<-
  • GitHub: allow value to be integer in $<-
  • GitHub: Check for is_client being NULL since older H2O clusters may not have is_client.

#####Sparkling Water

  • GitHub: Copy content of h2o-dist into target directory.

#####System

  • GitHub: Rename label fields in prediction object.
  • GitHub: Uses the original Vec's domain in alignment
  • GitHub: Added columnName and unknownLevel to PredictUnknownCategoricalLevelException.
  • PUBDEV-1559: Added compression of 64-bit Reals GitHub
  • GitHub: Added time information to buildinfo.json.
  • GitHub: Put build metadata into a json file.
  • -GitHub: Add time information to buildinfo.json.
  • GitHub: Delete any prior main CV models of the same key if CV model building is cancelled before the main model started to build.
  • GitHub: Change loading name parameter to a String to address a Flow issue.
  • GitHub: Remove extra assertion to avoid NPEs after client call of bulk remove after done() is called but before the finally is done with updateModelOutput.
  • GitHub: Ensures that date time methods return year/month/day values in the currently set timezone.
  • GitHub: Frees memory from streamed zip reads after the chunk has been parsed.
  • GitHub: Unifies categorical strings to UTF-8 and warns the user about all conversion.
  • GitHub: add isNA checks to scale
  • GitHub: Do not start UDPRecevier thread (unless running with useUDP option)

#####Web UI

  • PUBDEV-1961: Flow: use streamining endpoint /3/DownloadDataset.bin

####Bug Fixes

#####Algorithms

  • PUBDEV-1785: Deadlock while running GBM
  • GitHub: Fix name for standardized_coefficient_magnitudes.
  • PUBDEV-1774: Setting gbm's balance_classes to True produces suspect models
  • PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
  • GitHub: Set the iters counter during kmeans center initialization correctly
  • GitHub: fixed parenthesis in GLM POJO generation
  • GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
  • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
  • PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights
  • PUBDEV-451: Trees in GBM change for identical models GitHub
  • PUBDEV-1924: R^2 stopping criterion isn't working GitHub
  • PUBDEV-1776: GLM: cross-validation bug GitHub
  • PUBDEV-1682: GLM : Lending club dataset => build GLM model => 100% complete => click on model => null pointer exception GitHub
  • PUBDEV-1987: error returned on prediction for xval model
  • PUBDEV-1928: Properly implement Maxout/MaxoutWithDropout GitHub
  • GitHub: print actual number of columns (was just #cols) in DRF init
  • PUBDEV-2026: Fix setting the proper job state in DL models GitHub
  • PUBDEV-1950: Splitframe with rapids is not blocking
  • PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
  • PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
  • PUBDEV-1910: Canceled GBM with CV keeps lock
  • GitHub: Fix DL checkpoint restart with new data.

#####API

  • PUBDEV-1955: Change Schema behavior to accept a single number in place of array GitHub
  • PUBDEV-1914: Iced deserialization fails for Enum Arrays

#####Grid

  • PUBDEV-1876: Grid: progress bar not working for grid jobs
  • PUBDEV-1875: Grid: the meta info should not be dumped on the R screen, once the grid job is over
  • GitHub: [PUBDEV-1876] Fix grid update.
  • PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation GitHub
  • HEXDEV-402: R: kmeans grid search doesn't work
  • PUBDEV-1901: Grid appends new models even though models already exist.
  • PUBDEV-1874: Grid search: observe issues with model naming/overwriting and error msg propagation
  • PUBDEV-1940: Grid: glm grid on alpha fails with error "Expected '[' while reading a double[], but found 1.0"
  • PUBDEV-1877: Grid: if user specify the parameter value he is running the grid on, would be good to warn him/her
  • PUBDEV-1938: Grid: randomForest: unsupported grid params and wrong error msg

#####Hadoop

  • PUBDEV-2036: importModel from hdfs doesn't work
  • PUBDEV-2027: Clicking shutdown in the Flow UI dropdown does not exit the Hadoop cluster

#####Python

  • PUBDEV-1789: Python client h2o.remove_vecs (ExprNode) makes bad ast
  • PUBDEV-1795: Unable to read H2OFrame from Python
  • PUBDEV-1764: Python importFile does not import all files in directory, only one file GitHub
  • GitHub: parameter name is "dir" not "path"
  • PUBDEV-1693: Python: Options for handling NAs in group_by is broken
  • PUBDEV-1415: Intermittent Unimplemented rapids exception: pyunit_var.py . Also prior test got unimplemented too, but test didn't fail (client wasn't notified)
  • PUBDEV-1119: Python: Need to be able to access resource genmodel.jar
  • GitHub: Fix download of pojo in Python.

#####R

  • GitHub: Fixed bug in h2o.ensemble .make_Z function
  • PUBDEV-1796: R: h2o.importFile doesn't allow user to choose column type during parse
  • PUBDEV-1768: R: Fails to return summary on subsetted frame GitHub
  • PUBDEV-1909: R: Adding column to frame changes string enums in column to numerics
  • PUBDEV-1936: R: h2o.levels return only the first factor of factor levels
  • PUBDEV-1869: R: sd function should convert enum column into numeric and calculate standard deviation GitHub
  • PUBDEV-1246: R: h2o.hist needs to run pretty function for pretty breakpoints to get same results as R's hist GitHub
  • PUBDEV-1868: R: h2o.performance returns error (not warning) when model is reloaded into H2O
  • PUBDEV-1723: h2o R : subsetting data :h2o removing wrong columns, when asked to delete more than 1 columns
  • GitHub: fix h2o.levels issue
  • PUBDEV-1972: R: setting weights_column = NULL causes unwanted variables to be used as predictors

#####Sparkling Water

  • PUBDEV-1173: create conversion tasks from primitive RDD
  • GitHub: Fix return value issue in distribution script.

#####System

  • HEXDEV-360: getFrame fails on Parsed Data
  • PUBDEV-366: Fix parsing for high-cardinality categorical features GitHub
  • PUBDEV-1143: Parse: Cancel parse unreliable; does not work at all times
  • PUBDEV-1872: Ability to ignore files during parse GitHub
  • PUBDEV-777: Parse : Parsing compressed files takes too long
  • PUBDEV-1916: Parse: 2 node cluster takes 49min vs 40sec on a 1 node cluster GitHub
  • PUBDEV-1431: Convert /3/DownloadDataset to streaming
  • PUBDEV-1995: nfold: when user cancels an nfold job, fold data still remains in the cluster memory
  • PUBDEV-1994: nfold: cancel results in a java.lang.AssertionError
  • PUBDEV-1910: Canceled GBM with CV keeps lock GitHub
  • PUBDEV-1992: CreateFrame isn't totally random
  • GitHub: Fixes a bug that allowed big buffers to be constantly reallocated when it wasn't needed. This saves memory and time.
  • GitHub: Fix print statement.
  • GitHub: Fixed orderly shutdown to work with flatfile.
  • PUBDEV-1998: Parse : Lending club dataset parse => cancelled by user
  • PUBDEV-2028: Shutdown => unimplemented error on curl -X POST 172.16.2.186:54321/3/Shutdown.html
  • PUBDEV-2070: Download frame brings down cluster
  • PUBDEV-2067: Cannot mix negative and positive array selection
  • PUBDEV-2024: Save model to HDFS fails

#####Web UI


###Simons (3.0.1.7) - 8/11/15

####New Features The following changes represent features that have been added since the previous release:

#####Python

#####Web UI

####Enhancements The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

  • GitHub: add seed to the model building that uses balance_classes, for determinism/repeatability
  • GitHub: Reduce the frequency at which tiny tree models are printed to stdout: Only print during the first 4 seconds if score_each_iteration is enabled.
  • GitHub: Only call the limited printout for TwoDimTables during Model.toString () that prints all TwoDimTables of the model._output.
  • GitHub: Only print up to 10 rows of TwoDimTables in ASCII logs (first/last 5).
  • GitHub: Remove some overflow/underflow checks: Let exp(x) be small and log(x) be large.
  • GitHub: Add nbins_top_level parameter to DRF/GBM. Not yet in R.
  • GitHub: Disallow N-fold CV for GLM when lambda-search is on.

#####API

  • GitHub: Cleanup of public API of Schema.java. Improve its JavaDoc a lot.

#####Python

  • PUBDEV-1765: Improve python online documentation
  • PUBDEV-1497: Python : Weights R tests to be ported from R for GLM/GBM/RF/DL
  • GitHub: adjust to split frame jobs result
  • GitHub: allow for update thingy to be a tuple (so rows and columns)
  • GitHub: when starting h2o jvm with h2o.init(), give h2o child process different id than parent, so it doesn't get killed on Ctrl-C
  • GitHub: add option to turn off progress bar print out
  • GitHub: add unicode to frame getter possibilities
  • GitHub: remove remaining splats on dicts
  • GitHub: no need to splat pass thru args
  • GitHub: proper lookup of offset/weights/fold_column
  • GitHub: data should be eagered before download_csv.
  • GitHub: simplify model builder
  • GitHub: use None as default for "on" field
  • GitHub: add get_jar flag to download_pojo
  • GitHub:remove all of the unnecessary calls to h2o.init and remove the unnecessary environment variable for version checking during testing

#####R

  • PUBDEV-1744: Improve help message of h2o.init function
  • GitHub: add valid expression to list of accepted R CMD check outputs.
  • GitHub: added h2o.anomaly demo to r package

#####System

  • GitHub: Add -JJ command line argument to allow extra JVM arguments to be passed.
  • GitHub: Refactored CSVStream to be more understandable. Fix empty chunk bug.
  • GitHub: Add hintFlushRemoteChunk to CSVStream.
  • GitHub: Add parameterized route for frame export
  • GitHub: allow string vecs to be toEnum'd (with a sensible cap)
  • GitHub: allow lists of numbers in reducer ops
  • GitHub: Add warning message during POJO export if offset_column is specified (is not supported)
  • PUBDEV-1853: cleanup: remove addToNavbar from RequestServer GitHub
  • GitHub: Add "Open H2O Flow" message.
  • GitHub: Code refactoring to allow GBM JUnits to work with H2OApp in multi-node mode.
  • GitHub: Replace additive float op by multiplication
  • GitHub: Reimplement checksum for Model.Parameters
  • GitHub: Remove debug prints.
  • PUBDEV-1857: cleanup: remove the need for String[] path_params in RequestServer.register() GitHub
  • PUBDEV-1856: cleanup: remove the writeHTML_impl methods from all the schemas
  • PUBDEV-1854: cleanup: make _doc_method optional in the in Route constructors GitHub
  • PUBDEV-1858: cleanup: change RequestServer so that only one handler instance is created for each Route

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

  • PUBDEV-1674: gbm w gamma: does not seems to split at all; all trees node pred=0 for attached data GitHub
  • PUBDEV-1760: GBM : Deviance testing for exp family
  • PUBDEV-1714: gbm gamma: R vs h2o same split variable, slightly different leaf predictions
  • PUBDEV-1755: DL : Math correctness for Tweedie with Offsets/Weights
  • PUBDEV-1758: DL : Deviance testing for exp family
  • PUBDEV-1756: DL : Math correctness for Poisson with Offsets/Weights
  • PUBDEV-1651: null/residual deviances don't match for various weights cases
  • PUBDEV-1757: DL : Math correctness for Gamma with Offsets/Weights
  • PUBDEV-1680: gbm gamma: seeing train set mse incs after sometime
  • PUBDEV-1724: gbm w tweedie: weird validation error behavior
  • PUBDEV-1774: setting gbm's balance_classes to True produces suspect models
  • PUBDEV-1849: K-Means: negative sum-of-squares after mean imputation
  • GitHub: Set the iters counter during kmeans center initialization correctly
  • GitHub: fixed parenthesis in GLM POJO generation
  • GitHub: Should be updating model each iteration with the newly fitted kmeans clusters, not the old ones!
  • PUBDEV-1867: GLRM with Simplex Fails with Infinite Objective
  • PUBDEV-1666: GBM:Math correctness for Gamma with offsets/weights

#####Python

  • PUBDEV-1779: Fixes intermittent failure seen when Model Metrics were looked at too quickly after a cross validation run.
  • PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message GitHub
  • PUBDEV-1630: GBM getting intermittent assertion error on iris scoring in pyunit_weights_api.py
  • PUBDEV-1770: sigterm caught by python is killing h2o GitHub
  • PUBDEV-1409: h2o python h2o.locate() should stop and return "Not found" rather than passing path=None to h2o? causes confusion h2o message
  • HEXDEV-397: Python fold_column option requires fold column to be in the training data
  • HEXDEV-394: Python client occasionally throws attached error
  • GitHub: add missing args to kmeans
  • GitHub: add missing kmeans params in
  • GitHub: add missing checkpoint param
  • PUBDEV-1785: Deadlock while running GBM

#####R

  • PUBDEV-1830: h2o.glm throws an error when fold_column and validation_frame are both specified
  • PUBDEV-1660: h2oR: when try to get a slice from pca eigenvectors get some formatting error GitHub
  • GitHub: fix broken %in% in R
  • PUBDEV-1831: Cross-validation metrics are not displayed in R (and Python?)
  • PUBDEV-1840: Autoencoder model doesn't display properly in R (training metrics) GitHub

#####System

  • PUBDEV-1790: can't convert iris species column to a character column.
  • PUBDEV-1520: Kmeans pojo naming inconsistency
  • GitHub: fix parse of range ast
  • GitHub: Sets POJO file name to match the class name. Prior behavior would allow them to be different and give a compile error.

#####Web UI

  • PUBDEV-1754: Export frame not working in flow : H2OKeyNotFoundArgumentException

###Simons (3.0.1.4) - 7/29/15

####New Features

#####Algorithms

Python
  • PUBDEV-386: Expose ParseSetup to user in Python
  • PUBDEV-1239: Python: getFrame and getModel missing
  • HEXDEV-334: support rbind in python
  • PUBDEV-1215: python to have exportFile calll
  • GitHub: add cross-validation parameter to metric accessors and respective pyunit
  • PUBDEV-1729: Cross-validation metrics should be shown in R and Python for all models
R
  • PUBDEV-385: Expose ParseSetup to user in R
  • GitHub: add mean residual deviance accessor to R interface
  • GitHub: incorporate cross-validation metric access into the R client metric accessors
  • GitHub: R interface for checkpointing in RF enabled

#####System

  • PUBDEV-1735: Add 24-MAR-14 06.10.48.000000000 PM style date to autodetected

####Enhancements

#####API

#####Algorithms

  • GitHub: Add proper deviance computation for DL regression.
  • GitHub: Print GLM model details to the logs.
  • GitHub: Disallow categorical response for GLM with non-binomial family.
  • GitHub: Disallow models with more than 1000 classes, can lead to too large values in DKV due to memory usage of 8*N^2 bytes (the Metrics objects which are in the model output)
  • GitHub: DL: Don't train too long in single node mode with auto-tuning.
  • GitHub: Use mean residual deviance to do early stopping in DL.
  • GitHub: Add a "AUTO" setting for fold_assignment (which is Random). This allows the code to reject non-default user-given values if n-fold CV is not enabled.

#####Python

  • HEXDEV-317: Python has to play nicely in a polyglot, long-running environment
  • GitHub: simplify ast in python frame slicer
  • GitHub: add cross validation metrics and mean residual deviance to model show()
  • GitHub: any to take a frame, simplify python's __contains__

#####R

  • GitHub: On detaching h2o R package, only shut down H2O instance if it was started by the R client
  • GitHub: update h2o load

#####System

  • GitHub: Print a handy message (Open H2O Flow in your web browser) when the cluster comes up like Sparkling Water does.
  • GitHub: Replace memory leaky RCurl getURL with curlPerform.
  • GitHub: Add -disable_web parameter.
  • GitHub: allow numerics in match
  • GitHub: More refactoring of h2o start. Includes:
    • H2OStarter - a generic class to start H2O. It does all dynamic registration
    • H2OTestStarter - a generic class to start h2o-core tests
  • GitHub: Use typed key when it is necessary. Key.make() now returns typed Key. The trick is that type T can be derived by left side of assignment. If it is not possible to derive type of the Key, then developer has to use typed syntax: Key.<Frame>make("myframe.hex") The change simplifies Scala code which will be able to derive type key.
  • PUBDEV-1793: Add Job state and start/end time to the model's output GitHub
  • GitHub: add more places to look when trying to start jar from python's h2o.init
  • GitHub: Cosmetic name changes
  • GitHub: Fetch local node differently from remote node.
  • GitHub: Don't clamp node_idx at 0 anymore.
  • GitHub: Added -log_dir option.

####Bug Fixes

#####API

  • PUBDEV-776: Schema.parse() needs to be better behaved (like, not crash)

#####Algorithms

  • PUBDEV-1725: pca:glrm - give bad results for attached data (bec of plus plus initialization)
  • GitHub: Fix deviance calculation, use the sanitized parameters from the model info, where Auto parameter values have been replaced with actual values
  • GitHub: Fix offset in DL for exponential family (that doesn't do standardization)
  • GitHub: Fix a bug where initial Y was set to all zeroes by kmeans++ when scaling was disabled
  • PUBDEV-1668: GBM: Math correctness for weights
  • PUBDEV-1783: dl: deviance off for large dataset GitHub
  • PUBDEV-1667: GBM: Math correctness for Offsets
  • PUBDEV-1778: drf: reporting incorrect mse on validation set GitHub
  • GitHub: Fix DRF scoring with 0 trees.
Python
  • PUBDEV-1260: Python: Requires asnumeric() function
  • GitHub: python interface: add folds_column to x, if it doesn't already exist in x
  • PUBDEV-1763: Python : Math correctness tests for Tweedie/Gamma/Possion with offsets/weights
  • PUBDEV-1762: Python : Deviance tests for all algos in python GitHub
  • PUBDEV-1671: intermittent: pyunit_weights_api.py, hex.tree.SharedTree$[email protected] at hex.tree.DRealHistogram.scoreMSE(DRealHistogram.java:118), iris dataset GitHub
R
  • PUBDEV-1257: R: no is.numeric method for H2O objects
  • PUBDEV-1622: NPE in water.api.RequestServer, water.util.RString.replace(RString.java:132)...got flagged as WARN in log...I would think we should have all NPE's be ERROR / fatal? or ?? GitHub
  • PUBDEV-1655: h2o.strsplit needs isNA check
  • PUBDEV-1084: h2o.setTimezone NPE
  • PUBDEV-1738: R: cloud name creation can't handle user names with spaces

#####System

  • PUBDEV-1410: apply causes assert errors mentioning deadlock in runit_small_client_mode ...build never completes after hours ..deadlock?
  • PUBDEV-1195: docker build fails
  • HEXDEV-362: Bug in /parsesetup data preview GitHub
  • PUBDEV-1766: H2O xval: when delete all models: get Error evaluating future[6] :Error calling DELETE /3/Models/gbm_cv_13
  • PUBDEV-1767: H2O: when list frames after removing most frames, get: roll ups not possible vec deleted error GitHub

#####Web UI

  • PUBDEV-1782: Flow: View Data fails when there is a UUID column (and maybe also a String column)
  • PUBDEV-1769: xval: cancel job does not work GitHub

###Simons (3.0.1.3) - 7/24/15

####New Features

#####Python

####Enhancements

#####API

  • GitHub: Increase sleep from 2 to 3 because h2o itself does a sleep 2 on the REST API before triggering the shutdown.

#####System

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

  • PUBDEV-1743: gbm poisson w weights: deviance off
  • PUBDEV-1736: gbm poisson with offset: seems to be giving wrong leaf predictions

#####Python

  • PUBDEV-1731: Python get_frame() results in deleting a frame created by Flow
  • HEXDEV-389: Split frame from python
  • HEXDEV-388: python client H2OFrame constructor puts the header into the data (as the first row)

#####R

  • PUBDEV-1504: Runit intermittent fails : runit_pub_180_ddply.R
  • PUBDEV-1678: Client mode jobs fail on runit_hex_1750_strongRules_mem.R

#####System

  • GitHub: Model parameters should be always public.

###Simons (3.0.1.1) - 7/20/15

####New Features

Algorithms

#####Python

  • PUBDEV-1437: Python needs "nlevels" operator like R
  • PUBDEV-1434: Python needs "levels" operator, like R
  • PUBDEV-1355: Python needs h2o.trim, like in R
  • PUBDEV-1354: Python needs h2o.toupper, like in R
  • PUBDEV-1352: Python needs h2o.tolower, like in R
  • PUBDEV-1350: Python needs h2o.strsplit, like in R
  • PUBDEV-1347: Python needs h2o.shutdown, like in R
  • PUBDEV-1343: Python needs h2o.rep_len, like in R
  • PUBDEV-1340: Python needs h2o.nlevels, like in R
  • PUBDEV-1338: Python needs h2o.ls, like in R
  • PUBDEV-1344: Python needs h2o.saveModel, like in R
  • PUBDEV-1337: Python needs h2o.loadModel, like in R
  • PUBDEV-1335: Python needs h2o.interaction, like in R
  • PUBDEV-1334: Python needs h2o.hist, like in R
  • PUBDEV-1351: Python needs h2o.sub, like in R
  • PUBDEV-1333: Python needs h2o.gsub, like in R
  • PUBDEV-1336: Python needs h2o.listTimezones, like in R
  • PUBDEV-1346: Python needs h2o.setTimezone, like in R
  • PUBDEV-1332: Python needs h2o.getTimezone, like in R
  • PUBDEV-1329: Python needs h2o.downloadCSV, like in R
  • PUBDEV-1328: Python needs h2o.downloadAllLogs, like in R
  • PUBDEV-1327: Python needs h2o.createFrame, like in R
  • PUBDEV-1326: Python needs h2o.clusterStatus, like in R
  • PUBDEV-1323: Python needs svd algo
  • PUBDEV-1322: Python needs prcomp algo
  • PUBDEV-1321: Python needs naiveBayes algo
  • PUBDEV-1320: Python needs model num_iterations accessor for clustering models, like R's
  • PUBDEV-1318: Python needs screeplot and plot methods, like R's. (should probably check for matplotlib)
  • PUBDEV-1317: Python needs multinomial model hit_ratio_table accessor, like R's
  • PUBDEV-1316: Python needs model scoreHistory accessor, like R's
  • PUBDEV-1315: R needs weights and biases accessors for deeplearning models
  • PUBDEV-1313: Python needs "as.Date" operator, like R's
  • PUBDEV-1312: Python needs "rbind" operator, like R's
  • PUBDEV-1345: Python needs h2o.setLevel and h2o.setLevels, like in R
  • PUBDEV-1311: Python needs "setLevel" operator, like R's
  • PUBDEV-1306: Python needs "anyFactor" operator, like R's
  • PUBDEV-1305: Python needs "table" operator, like R's
  • PUBDEV-1301: Python needs "as.numeric" operator, like R's
  • PUBDEV-1300: Python needs "as.character" operator, like R's
  • PUBDEV-1293: Python needs "signif" operator, like R's
  • PUBDEV-1292: Python needs "round" operator, like R's
  • PUBDEV-1291: Python need transpose operator, like R's t operator
  • PUBDEV-1289: Python needs element-wise division and multiplication operators, like %/% and %-%in R
  • PUBDEV-1330: Python needs h2o.exportHDFS, like in R
  • PUBDEV-1357: Python and R need which operator GitHub
  • PUBDEV-1356: Python and R needs isnumeric and ischaracter operators
  • PUBDEV-1342: Python needs h2o.removeVecs, like in R
  • PUBDEV-1324: Python needs h2o.assign, like in R GitHub
  • PUBDEV-1296: Python and R h2o clients need "any" operator, like R's
  • PUBDEV-1295: Python and R h2o clients need "prod" operator, like R's
  • PUBDEV-1294: Python and R h2o clients need "range" operator, like R's
  • PUBDEV-1290: Python and R h2o clients need "cummax", "cummin", "cumprod", and "cumsum" operators, like R's
  • PUBDEV-1325: Python needs h2o.clearLog, like in R
  • PUBDEV-1349: Python needs h2o.startLogging and h2o.stopLogging, like in R
  • PUBDEV-1341: Python needs h2o.openLog, like in R
  • PUBDEV-1348: Python needs h2o.startGLMJob, like in R
  • PUBDEV-1331: Python needs h2o.getFutureModel, like in R
  • PUBDEV-1302: Python needs "match" operator, like R's
  • PUBDEV-1298: Python needs "%in%" operator, like R's
  • PUBDEV-1310: Python needs "scale" operator, like R's
  • PUBDEV-1297: Python needs "all" operator, like R's
  • GitHub: add start_glm_job() and get_future_model() to python client. add H2OModelFuture class. add respective pyunit
R
  • PUBDEV-1273: Add h2oEnsemble R package to h2o-3
  • PUBDEV-1319: R needs centroid_stats accessor like Python, for clustering models

#####Rapids

  • PUBDEV-1635: the equivalent of R's "any" should probably implemented in rapids
  • PUBDEV-1634: the equivalent of R's cummin, cummax, cumprod, cumsum should probably implemented in rapids
  • PUBDEV-1633: the equivalent of R's "range" should probably implemented in rapids
  • PUBDEV-1632: the equivalent of R's "prod" should probably implemented in rapids
  • PUBDEV-1699: the equivalent of R's "unique" should probably implemented in rapids GitHub

#####System

  • GitHub: changed to new AMI
  • PUBDEV-679: Create cross-validation holdout sets using the per-row weights
  • GitHub: Add user_name. Add ExtensionHandler1.
  • GitHub: Added auth options to h2o.init().
  • GitHub: Added H2O.calcNextUniqueModelId().
  • GitHub: Add ldap arg.
Web UI
  • HEXDEV-231: Flow: Ability to change column type post-Parse

####Enhancements

#####Algorithms

  • GitHub: use fixed seed to avoid bad splits with some seeds
  • GitHub: Change seed to avoid type flip from integer to double after row slicing, which leads to different split decisions
  • GitHub: Add option during kmeans scoring to return matrix of indicator columns for cluster assignment, which is necessary for initializing GLRM
  • GitHub: Output number of processed observations in PCA
  • GitHub: Add validation into PCA with GramSVD
  • GitHub: Code cleanup of distributions. Also rename _n_folds -> _nfolds for consistency
  • GitHub: Remove restriction to data frames with more than 1 column
  • GitHub: Add debugging output for DL auto-tuning.
  • PUBDEV-556: implement algo-agnostic cross-validation mechanism via a column of weights
  • GitHub: When initializing with kmeans++ set X to matrix of indicator columns corresponding to cluster assignments, unless closed form solution exists
  • GitHub: Always print DL auto-tuning info for now.
  • PUBDEV-1657: pca: would be good to remove the redundant std dev from flow pca model object

#####API

  • GitHub: Set Content-Type: application/x-www-form-urlencoded for regular POST requests.
  • HEXDEV-272: Move response_column parameter above ignored_columns parameter GitHub
    • All of the fields of a schema are now stored in the leaf child of the class hierarchy. Changed the implementation of fields() to simply return the fields variable of a schema. The function calls H2O.fail() if it attempts to access a field from a non-leaf child. response_column is now moved above ignored_columns for every applicable schema. 'own_fields' is also now renamed to 'fields'
  • GitHub: Don't use features from servlet api 3.0 or later anymore. Instead save the response status in a thread local variable and fish it out when needed.

#####Python

  • GitHub: don't use the header of the timezone table for a choice
  • GitHub: never delete models. ever.
  • GitHub: add na_rm argument
  • GitHub: add prod to python interface

#####System

  • GitHub: use Key instead of Vec in refcnter
  • GitHub: protect vecs in apply
  • GitHub: Allows for more than one column to remain unnamed. The new naming will fill in the blanks.
  • GitHub: Refactoring of hadoop mapper and driver.
  • GitHub: Remove -hdfs option.
  • GitHub: Adds more checks for a parse cancel at more stages during the post ingestion file parse.
  • GitHub: Refactor method name for clarification.
  • GitHub: Cleans up and comments the freeing of chunks from a parsed file.
  • GitHub: Since more startup logic is getting added, simplify H2OClientApp as much as possible. Remove H2OClient entirely.
  • GitHub: Add dedicated AddCommonResponseHeadersHandler handler to set common response headers up-front.
  • GitHub: More refactoring of startup. Pushed a bunch of code from H2OApp into H2O. Added H2O.configureLogging().
  • GitHub: Make Progress extend Keyed.
  • GitHub: Make createServer() protected.
  • GitHub: model_id should probably be a Key, not Key.
  • GitHub: Change Jetty version from 9 to 8 to get Java 6 compatibility back.

#####Web UI

  • PUBDEV-1521: show REST API and overall UI response times for each cell in Flow
  • HEXDEV-304: Flow: Emphasize run time in job-progress output
  • PUBDEV-1522: show wall-clock start and run times in the Flow outline
  • PUBDEV-1707: Hook up "Export" button for datasets (frames) in Flow.

####Bug Fixes

#####Algorithms

  • PUBDEV-1641: gbm w poisson: get java.lang.AssertionError' at hex.tree.gbm.GBM$GBMDriver.buildNextKTrees on attached data
  • PUBDEV-1672: kmeans: get AIOOB with user specified centroids GitHub
    • Throw an error if the number of rows in the user-specified initial centers is not equal to k.
  • PUBDEV-1654: pca: gram-svd std dev differs for v2 vs v3 for attached data
  • GitHub: Fix DL
  • GitHub: Fix a bug in PCA utilities for k = 1
  • PUBDEV-1700: nfolds: flow-when set nfold =1 job hangs for ever; in terminal get java.lang.AssertionError
  • PUBDEV-1706: GBM/DRF: is balance_classes=TRUE and nfolds>1 valid? GitHub
  • PUBDEV-806: GLM => runit_demo_glm_uuid.R : water.exceptions.H2OIllegalArgumentException
  • PUBDEV-1696: Client (model-build) is blocked when passing illegal nfolds value. GitHub
  • PUBDEV-1690: Cross Validation: if nfolds > number of observations, should it default to leave-one-out cross-validation?
  • PUBDEV-1537: pca: on airlines get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:219) GitHub
  • PUBDEV-1603: pca: glrm giving very different std dev than R and h2o's other methods for attached data
  • GitHub: Fix a potential race condition in tree validation scoring.
  • GitHub: Fix GLM parameter schema. Clean up hasOffset() and hasWeights()
Python
  • PUBDEV-1627: column name missing (python client)
  • PUBDEV-1629: python client's tail() header incorrect GitHub
  • PUBDEV-1413: intermittent assertion errors in pyunit_citi_bike_small.py/pyunit_citi_bike_large.py. Client apparently not notified
  • PUBDEV-1590: "Trying to unlock null" assertion during pyunit_citi_bike_large.py
  • PUBDEV-1400: match operator should take numerics

#####R

#####Rapids

Sparkling Water
System
  • PUBDEV-1551: Parser: Multifile Parse fails with 0-byte files in directory GitHub
  • HEXDEV-325: Empty reply when parsing dataset with mismatching header and data column length
  • PUBDEV-1509: Split frame : Big datasets : On 186K rows 3200 Cols split frame took 40 mins => which is too long
  • PUBDEV-1438: Column naming can create duplicate column names
  • PUBDEV-1105: NPE in Rollupstats after failed parse
  • PUBDEV-1142: H2O parse: When cancel a parse job, key remains locked and hence unable to delete the file GitHub
  • GitHub: client mode deadlock issue resolution
  • PUBDEV-1670: Client mode fails consistently sometimes : GBM_offset_tweedie.R.out.txt :
  • GitHub: nbhm bug: K == TOMBSTONE not key == TOMBSTONE
  • GitHub: Pulls out a GAID from resource in jar if the GAID doesn't equal the default. Presumably the GAID has been changed by the jar baking program.
Web UI
  • PUBDEV-872: Flows : Not able to load saved flows from hdfs/local GitHub
  • PUBDEV-554: Flow:Parse two different files simultaneously, flow should either complain or fill the additional (incompatible) rows with nas
  • PUBDEV-1527: missing .java extension when downloading pojo GitHub
  • PUBDEV-1642: Changing columns type takes column list back to first page of columns
  • PUBDEV-1508: Flow : Import file => Parse => Error compiling coffee-script Maximum call stack size exceeded
  • PUBDEV-1606: Flow :=> Cannot save flow on hdfs
  • PUBDEV-1527: missing .java extension when downloading pojo
  • PUBDEV-1653: Flow: the column names do not modify when user changes the dataset in model builder

###Shannon (3.0.0.26) - 7/4/15

####New Features

#####Algorithms

  • PUBDEV-1592: Expose standardization shift/mult values in the Model output in R/Python. GitHub

#####Python

  • GitHub: add h2o.shutdown to python client
  • GitHub: add h2o.hist and respective pyunit
  • GitHub: gbm weight pyunit (variable importances)

#####R

#####Web UI

####Enhancements

#####Algorithms

  • PUBDEV-1494: GBM : Weights math correctness tests in R
  • PUBDEV-1523: GLM w tweedie: for attached data, R giving much better res dev than h2o
  • PUBDEV-1396: Offsets/Weights: Math correctness for GLM
  • PUBDEV-1496: RF : Weights Math correctness tests in R
  • HEXDEV-366: remove weights option from DRF and GBM in REST API, Python, R
  • PUBDEV-1553: Threshold in GLM is hardcoded to 0
  • GitHub: Make min_rows a double instead of int: Is now weighted number of observations (min_obs in R).
  • GitHub: Don't use sample weighted variance, but full weighted variance.
  • GitHub: Fix R^2 computation.
  • GitHub: Skip rows with missing response in weighted mean computation.
  • _binomial_double_trees disabled by default for DRF (was enabled).
  • GitHub: Relax tolerance.
  • HEXDEV-329 : Offset for GBM
  • HEXDEV-211 : Tweedie distributions for GLM

#####API

  • PUBDEV-1491: generated REST API POJOS should be compiled and jar'd up as part of the build
  • GitHub: Change schema for PCA, SVD, and GLRM to version 99

#####Python

  • GitHub: is factor returns TRUE/FALSE cast to scalar 1/0
  • GitHub: take a slightly different syntactic approach to dropping column
  • GitHub: better list comp in interaction call
  • GitHub: if weights_column argument is specified, attach the column to the training and/or validation frame (if not already specified as part of x/validation_x). if weights_column is not already part of x/validation_x, then a training_frame/validation_frame needs to be provided and the weights column is taken from here. respective pyunit added

#####R

  • GitHub: better ref handling in the [<- for python and R
  • GitHub: Pass binomial_double_trees in the R wrapper for DRF.
  • GitHub: carefully format NAs and non NAs
  • GitHub: for loop over the x[[j]] to format NAs properly
  • GitHub: Added example to h2o-r/ensemble/create_h2o_wrappers.R

#####System

  • GitHub: allow for no y in model_builder
  • GitHub: Enable auto-flag for Java6 generation.
  • GitHub: better compression in split frame
  • PUBDEV-1594: All basic file accessors in PersistHDFS should check file permissions
  • PUBDEV-1518: getFrames should show a Parse button for raw frames

#####Web UI

  • PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
  • PUBDEV-1546: Flow: Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column
  • PUBDEV-1254: Flow: Add Impute

####Bug Fixes

#####Algorithms

  • PUBDEV-1554: dl with offset: when offset same as response, do not get 0 mse
  • PUBDEV-1555: h2oR: dl with offset giving : Error in args$x_ignore : object of type 'closure' is not subsettable
  • PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
  • PUBDEV-1569: Investigate effectiveness of _binomial_double_trees (DRF) GitHub
  • PUBDEV-1574: Actually pass 'binomial_double_trees' argument given to R wrapper to DRF.
  • PUBDEV-1444: DL: h2o.saveModel cannot save metrics when a deeplearning model has a validation_frame
  • PUBDEV-1579: GBM test time predictions without weights seem off when training with weights GitHub
  • PUBDEV-1533: GLM: doubled weights should produce the same result as doubling the observations GitHub
  • PUBDEV-1531: GLM: it appears that observations with 0 weights are not ignored, as they should be.
  • GitHub: Fix a bug in PCA scoring that was handling categorical NAs inconsistently
  • PUBDEV-1581: Regression 3060 fails on GLRM in R tests
  • PUBDEV-1586: change Grid endpoints and schemas to v99 since they are still in flux
  • PUBDEV-1589: GLM : build model => airlinesbillion dataset => IRLSM/LBFGS => fails with array index out of bound exception
  • PUBDEV-1607: gbm w offset: predict seems to be wrong
  • PUBDEV-1600: Frame name creation fails when file name contains csv or zip (not as extension)
  • PUBDEV-1577: DL predictions on test set require weights if trained with weights
  • PUBDEV-1598: Flow: After running pca when call get Model/ jobs get: Failed to find schema for version: 3 and type: PCA
  • PUBDEV-1576: Test variable importances for weights for GBM/DRF/DL
  • PUBDEV-1517: With R, deep learning autoencoder using all columns in frame, not just those specified in x parameter
  • PUBDEV-1593: dl var importance:there is a .missing(NA) variable in Dl variable importnce even when data has no nas

#####Python

  • PUBDEV-1538: h2o.save_model fails on windoz due to path nonsense
  • GitHub: python leaked key check for Vecs, Chunks, and Frames
  • PUBDEV-1609: frame dimension mismatch between upload/import method

#####R

  • PUBDEV-1601: h2o.loadModel() from hdfs
  • PUBDEV-1611: R CMD Check failing on : The Date field is over a month old.

#####System

  • PUBDEV-1514: Large number of columns (~30000) on importFile (flow) is slow / unresponsive for long time
  • PUBDEV-841: Split frame : Flow should not show raw frames for SplitFrame dialog (water.exceptions.H2OIllegalArgumentException)
  • PUBDEV-1459: bug in GLM POJO: seems threshold for binary predictions is always 0
  • PUBDEV-1566: Cannot save model on windows since Key contains '@' (illegal character to path)
  • GitHub: Fixes the timezone lists.
  • GitHub: R CMD check fix for date
  • GitHub: add ec2 back into project

#####Web UI

  • HEXDEV-54: Flow : Import file 100k.svm => Something went wrong while displaying page

###Shannon (3.0.0.25) - 6/25/15

####Enhancements

#####API

  • PUBDEV-1452: branch 3.0.0.2 to REGRESSION_REST_API_3 and cherry-pick the /99/Rapids changes to it

#####Web UI

  • PUBDEV-1545: Flow => Build model => ignored columns table => should have column width resizing based on column names width => looks odd if column names are short
  • PUBDEV-1546: Flow : Build model => Search for 1 column => select it => build model shows list of columns instead of 1 column

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

  • PUBDEV-1487: gbm weights: give different terminal node predictions than R for attached data
  • GitHub: Fix offset for DL.
  • GitHub: Gracefully handle 0 weight for GBM.

#####Python

  • PUBDEV-1547: Weights API: weights column not found in python client

#####R

  • GitHub: Fix R wrapper for DL for weights/offset.

#####Web UI

  • PUBDEV-1528: Flow model builder: the na filter does not select all ignored columns; just the first 100.

###Shannon (3.0.0.24) - 6/25/15

####New Features

#####Algorithms

  • GitHub: Allow validation for unsupervised models.

#####R

  • GitHub: Added runit GBM weights
  • GitHub: Updated runit_GBM_weights.R

#####Python

  • GitHub: add h2o.set_timezone h2o.get_timezone and h2o.list_timezones to python client and respective pyunit.
  • GitHub: add h2o.save_model and h2o.load_model to python client and respective pyunit

####Enhancements

#####Algorithms

  • GitHub: Skip rows with weight 0.
  • GitHub: x_ignore must be set when autoencoder is TRUE

#####System

  • GitHub: Fix Java bindings generator to generate code under project's location.
  • GitHub: Adds input parameter check to ParseSetup.

####Bug Fixes

#####Algorithms

  • PUBDEV-1529: dl with ae: get ava.lang.UnsupportedOperationException: Trying to predict with an unstable model.
  • GitHub: Bring back accidentally removed hiding of classification-related fields for unsupervised models.

#####API

  • PUBDEV-1456: fix REST API POJO generation for enums, + java.util.map import

###Shannon (3.0.0.23) - 6/19/15

####New Features

#####Algorithms

#####API

  • PUBDEV-61: do back-end work to allow document navigation from one Schema to another
  • PUBDEV-133: doing summary means calling it with each columns name, index not supported?

#####Python

  • GitHub: add num_iterations accessor to python client and respective pyunit
  • GitHub: add score_history accessor to python client and respective pyunit
  • GitHub: add hit ratio table accessor to python interface and respective pyunit
  • GitHub: add h2o.naivebayes and respective pyunits
  • GitHub: add h2o.prcomp and respective pyunits.
  • PUBDEV-681: Add user-given input weight parameters to Python
  • GitHub: add h2o.create_frame to python client and respective pyunit
  • GitHub: add h2o.interaction and respective pyunit
  • GitHub: add h2o.strplit to python client and respective pyunit
  • GitHub: add h2o.toupper and h2o.tolower to python client and respective pyunit
  • GitHub: add h2o.sub and h2o.gsub to python interface and respective pyunit
  • GitHub: add h2o.trim() to python client and respective pyunit
  • GitHub: add h2o.rep_len to python client and respective pyunit
  • GitHub: add h2o.svd to python client and respective golden pyunit
  • GitHub: add scree plot functionality to python client and respective pyunit
  • GitHub: add plotting functionality to python client and respective pyunit

#####R

  • GitHub: added h2o.weights and h2o.biases accessors to R client and update respective runit
  • GitHub: add h2o.centroid_stats to R client and respective runit
  • PUBDEV-680: Add user-given input weight parameters to R
  • GitHub: Add offset/weights to DRF/GBM R wrappers.

#####Web UI

####Enhancements

#####Algorithms

  • PUBDEV-676: Use the user-given weight Vec as observation weights for all algos
  • GitHub: Refactor the code to let the caller compute the weighted sigma.
  • GitHub: Modify prior class distribution to be computed from weighted response.
  • GitHub: Put back the defaultThreshold that's based on training/validation metrics. Was accidentally removed together with SupervisedModel.
  • GitHub: Always sample to at least #class labels when doing stratified sampling.
  • GitHub: Cutout for NAs in GLM score0(data[],...), same as for score0(Chunk[],…)

#####R

  • PUBDEV-856: All h2o things in R should have an h2o.something version so it's unambiguous GitHub
  • GitHub: export clusterIsUp and clusterInfo commands
  • GitHub: update accessors in the shim
  • GitHub: gbm with async exec

#####System

  • HEXDEV-361: Wide frame handling for model builders
  • GitHub: Remove application plugin from assembly to speedup build process.
  • GitHub: add byteSize to ls
  • GitHub: option to launch randomForest async
  • GitHub: Return HDFS persist manager for URIs starting with s3n and s3a
  • GitHub: quote strings when writing to disk

####Bug Fixes

#####Algorithms

  • PUBDEV-1217: pca: when cancel the job the key remains locked
  • PUBDEV-1468: Error in GBM if response column is constant GitHub
  • PUBDEV-1476: dl with obs weights: nas in weights cause 'java.lang.AssertionError GitHub
  • PUBDEV-1458: pca: data with nas, v2 vs v3 slightly different results GitHub
  • PUBDEV-1477: dl w/obs wts: when all wts are zero, get java.lang.AssertionError GitHub
  • GitHub: Fix check for offset (allow offset for logistic regression).
  • GitHub: Gracefully handle exception when launching single-node DRF/GBM in client mode.
  • GitHub: Hack around the fact that hasWeights()/hasOffset() isn't available on remote nodes and that SharedTree is sent to remote nodes and its private internal classes need access to the above methods...
  • GitHub: Fix scoring when NAs are predicted.

#####Python

  • PUBDEV-1469: pyunit_citi_bike_large.py : test failing consistently on regression jobs
  • PUBDEV-1472: Regression job : Pyunit small tests groupie and pub_444_spaces failing consistently
  • PUBDEV-1372: Regression of pyunit_small, Groupby.py
  • PUBDEV-1386: intermittent fail in pyunit_citi_bike_small.py: -Unimplemented- failed lookup on token
  • PUBDEV-1471: pyunit_citi_bike_small.py : failing consistently on regression jobs
  • PUBDEV-1466: matplotlib.pyplot import failure on MASTER jenkins pyunit small jobs GitHub
  • GitHub: minor fix to python's h2o.create_frame
  • GitHub: update the path to jar in connection.py

#####R

  • PUBDEV-1475: Client mode failed tests : runit_GBM_one_node.R, runit_RF_one_node.R, runit_v_3_apply.R, runit_v_4_createfunctions.R GitHub
  • PUBDEV-1235: Split Frame causes AIOOBE on Chicago crimes data GitHub
  • PUBDEV-746: runit_demo_NOPASS_h2o_impute_R : h2o.impute() is missing. seems like we want that?
  • PUBDEV-582: H2O-R- does not give the full column summary
  • PUBDEV-1473: Regression : Runit small jobs failing on tests :
  • PUBDEV-741: runit_NOPASS_pub-668 R tests uses all() ...h2o says all is unimplemented
  • PUBDEV-1506: R: h2o.ls() needs to return data sizes
  • PUBDEV-1436: Intermitent runit fail : runit_GBM_ecology.R GitHub
  • PUBDEV-1464: R: toupper/tolower don't work GitHub GitHub
  • PUBDEV-1194: R: dataset is imported but can't return head of frame

#####Sparkling Water

  • PUBDEV-975: Download page for Sparkling Water should point to the right R-client and Python client
  • PUBDEV-1428: Sparkling water => Flow => Million song/KDD Cup path issues GitHub

Web UI

  • PUBDEV-1433: Flow UI: Change Help > FAQ link to h2o-docs/index.html#FAQ

###Shannon (3.0.0.22) - 6/13/15

####New Features

#####API

  • PUBDEV-633: Generate Java bindings for REST API: POJOs for the entities (schemas)

#####Python

  • GitHub: added h2o.anyfactor() and respective pyunit
  • GitHub: add h2o.scale and respective pyunit
  • GitHub: added levels, nlevels, setLevel and setLevels and respective pyunit...PUBDEV-1434 PUBDEV-1437 PUBDEV-1434 PUBDEV-1345 PUBDEV-1311
  • GitHub: add H2OFrame.as_date and pyunit addition. H2OFrame.setLevel should return a H2OFrame not a H2OVec.

####Enhancements

#####Algorithms

  • GitHub: Add _build_tree_one_node option to GBM
API
  • HEXDEV-352: Additional attributes on /Frames and /Frames/foo/summary

#####R

  • PUBDEV-706: Release h2o-dev to CRAN
  • Adding parameter parse_type to upload/import file (GitHub)

#####Python

  • GitHub: print out where h2o jar is looked for
  • GitHub:add h2o.ls and respective pyunit

#####System

  • PUBDEV-717: refector the duplicated code in FramesV2
  • PUBDEV-1281: Add horizontal pagination of frames to Flow GitHub
  • PUBDEV-607: Add Xmx reporting to GA
  • GitHub:Added support for Freezable[][][] in serialization (added addAAA to auto buffer and DocGen, DocGen will just throw H2O.fail())
  • GitHub: No longer set yyyy-MM-dd and dd-MMM-yy dates that precede the epoch to be NA. Negative time values are fine. This unifies these two time formats with the behavior of as.Date.
  • GitHub: Reduces the verbosity of parse tracing messages.
  • GitHub: Rename AUTO->GUESS for figuring out file type.
Web UI
  • HEXDEV-276: Add frame pagination
  • PUBDEV-1405: Flow : Decision to be made on display of number of columns for wider datasets for Parse and Frame summary
  • PUBDEV-1404: Usability improvements
  • PUBDEV-244: "View Data" display may need to be modified/shortened.

####Bug Fixes

#####Algorithms

  • PUBDEV-1365: GLM: Buggy when likelihood equals infinity
  • PUBDEV-1394: GLM: Some offsets hang
  • PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
  • PUBDEV-1403: pca: h2o-3 reporting incorrect proportion of variance and cum prop GitHub
  • HEXDEV-281: GLM - beta constraints with categorical variables fails with AIOOB
  • HEXDEV-280: GLM - gradient not within tolerance when specifying beta_constraints w/ and w/o prior values
Python
R
System
  • PUBDEV-1423: Phantomjs : Add timeout command line option
  • PUBDEV-1401: Flow : Import file 15 M Rows 2.2K cols=> Parse these files => Change first column type => Unknown => Try to change other columns => Kind of hangs
  • PUBDEV-1406: make the ParseSetup / Parse API more efficient for high column counts GitHub

###Shannon (3.0.0.21) - 6/12/15

####New Features

Python
  • HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API

####Enhancements

#####Algorithms

  • GitHub Made intercept option public and added it to field list in parameter schema
  • GitHub GLM: Updated null model intercept fit.
  • GitHub GLM: Updated null-model constant term fitting when running with offset
  • GitHub glm update
  • GitHub DL code refactoring to reduce file sizes

#####Python

  • GitHub add h2o.round() and h2o.signif() and additional pyunit checks
  • GitHub add h2o.all() and respective pyunit checks

#####R

  • GitHub added intercept option top R

#####System

Web UI
  • GitHub Add horizontal pagination of /Frames to handle UI navigation of wide datasets more efficiently.
  • GitHub Only show the top 7 metrics for the max metrics table
  • GitHub Make the max metrics table entries be called max f1 etc.

####Bug Fixes

The following changes are to resolve incorrect software behavior:

Algorithms
  • PUBDEV-1365: GLM: Buggy when likelihood equals infinity GitHub
  • PUBDEV-1394: GLM: Some offsets hang
  • PUBDEV-1268: GLM: get java.lang.AssertionError at hex.glm.GLM$GLMSingleLambdaTsk.compute2 for attached data
  • PUBDEV-1382: pca: giving wrong std- dev for mentioned data
  • PUBDEV-1383: pca: std dev numbers differ for v2 and v3 for attached data GitHub
  • PUBDEV-1381: GBM, RF: get an NPE when run with a validation set with no response GitHub
  • GitHub GLM fix - fixed fitting of null model constant term
  • GitHub Fix remote bug
  • GitHub Remove elastic averaging parameters from Flow.
  • PUBDEV-1398: pca: predictions on the attached data from v2 and v3 differ
Python
R
  • PUBDEV-761: Save model and restore model (from R)
  • PUBDEV-1236: h2o-r/tests/testdir_misc/runit_mergecat.R failure (client mode only)
System
  • PUBDEV-1402: move Rapids to /99 since it's going to be in flux for a while GitHub
  • GitHub Fixes an operator precedence issue, and replaces debug GA target with actual one.
  • GitHub Fix log download bug where all nodes were getting the same zip file.

###Shannon (3.0.0.18) - 6/9/15

####New Features

#####System

#####Python

  • GitHub: Added --h2ojar option

####Enhancements

Python
  • PUBDEV-277: Make python equivalent of as.h2o() work for numpy array and pandas arrays

####Bug Fixes

#####Algorithms

  • PUBDEV-1371: pca: get java.lang.AssertionError at hex.svd.SVD$SVDDriver.compute2(SVD.java:198)
  • PUBDEV-1376: pca: predictions from h2o-3 and h2o-2 differs for attached data
  • PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found
R

###Shannon (3.0.0.17) - 6/8/15

####New Features

Algorithms
Python
  • PUBDEV-1270: Python Interface needs H2O Cut Function GitHub
  • PUBDEV-1242: Need equivalent of as.Date feature in Python GitHub
  • PUBDEV-1165: H2O Python needs Modulus Operations
  • HEXDEV-29: The ability to define features as categorical or continuous in the web UI and in the python API
  • PUBDEV-1237: environment variable to disable the strict version check in the R and Python bindings
Web UI
  • PUBDEV-1175: Flow: Good interactive confusion matrix for binomial
  • PUBDEV-1176: Flow: Good confusion matrix for multinomial

####Enhancements

#####Algorithms

  • GitHub: GLM weights fix: regularize by sum of weights rather than number of observations
  • GitHub: GLM fix: added line search (and limited number of iterations) to constant term model fitting with offset (could enter infinite loop)
  • GitHub: No longer warn if binomial_double_trees option is enabled for _nclass!=2
  • GitHub: Fix CM table to have integer entries unless there are real-valued entries
  • GitHub: Add extra assertion for train_samples_per_iteration
  • GitHub: Update model during runtime of algorithm.
  • GitHub: Changes to glm forloop to add offsets and add NOPASS/NOFEATURE functionality back to run.py

#####R

  • GitHub: month was off by one, runit test edited
  • GitHub: Comments to clarify the policy on dates in H2O.

#####System

  • HEXDEV-344: Logs should include JVM launch parameters
Web UI
  • PUBDEV-467: Show Frames for DL weights/biases in Flow
  • PUBDEV-1221: add a "I like this" style button with LinkedIn or Github (beside the Flow Assist Me button)
  • PUBDEV-1245: Flow: use new _exclude_fields query parameter to speed up REST API usage

####Bug Fixes

#####Algorithms

  • PUBDEV-1353: GLM: model with weights different in R than in H2o for attached data
  • PUBDEV-1358: GLM: when run with -ive weights, would be good to tell the user that -ive weights not allowed instead of throwing exception
  • PUBDEV-1264: GLM: reporting incorrect null deviance GitHub
  • PUBDEV-1362: GLM: when run with weights and offset get wrong ans
  • PUBDEV-1263: GLM: name ordering for the coefficients is incorrect GitHub
  • PUBDEV-1261: pca: wrong std dev for data with nas rest numeric cols GitHub
  • PUBDEV-1218: pca: progress bar not showing progress just the initial and final progress status GitHub
  • PUBDEV-1204: pca: from flow when try to invoke build model, displays-ERROR FETCHING INITIAL MODEL BUILDER STATE
  • PUBDEV-1212: pca: with enum column reporting (some junk) wrong stdev/ rotation GitHub
  • PUBDEV-1228: pca: no std dev getting reported for attached data
  • PUBDEV-1233: pca: std dev for attached data differ when run on h2o-3 and h2o-2
  • PUBDEV-1258: h2o.glm with offset column: get Error in .h2o.startModelJob(conn, algo, params) : Offset column 'logInsured' not found in the training frame.
R
Sparkling Water

#####System

  • PUBDEV-1288: Confusion Matrix: class java.lang.ArrayIndexOutOfBoundsException', with msg '2' java.lang.ArrayIndexOutOfBoundsException: 2 at hex.ConfusionMatrix.createConfusionMatrixHeader Github
  • HEXDEV-323: SVMLight Parse Bug GitHub
  • PUBDEV-1207: implement JSON field-filtering features: _exclude_fields
  • GitHub: Fix a missing field update in Job.
  • PUBDEV-65: Handling of strings columns in summary is broken
  • PUBDEV-1230: Parse: get AIOOB when parses the attached file with first two cols as enum while h2o-2 does fine
  • PUBDEV-1377: Get AIOOBE when parsing a file with fewer column names than columns GitHub
  • PUBDEV-1364: Variable importance Object

#####Web UI

  • PUBDEV-1198: Flow: Selecting "Cancel" for "Load Notebook" prompt clears current notebook anyway
  • PUBDEV-1172: Model builder takes forever to load the column names in Flow, hence cannot build any models
  • PUBDEV-1248: Flow GLM: from Flow the drop down with column names does not show up and hence not able to select the offset column
  • PUBDEV-1380: DL: when try to access the training frame from the link in the dl model get: Object not found GitHub

###Shannon (3.0.0.13) - 5/30/15

####New Features

#####Algorithms

Python

#####R

####Enhancements

#####Algorithms

#####API

  • PUBDEV-669: have the /Frames/{key}/summary API call Vec.startRollupStats

#####R/Python

  • PUBDEV-479: Port MissingInserter to R/Python
  • PUBDEV-632: Display TwoDimTable of HitRatios in R/Python
  • github: minor change to h2o.demo()
  • github: add h2o.demo() facility to python package, along with some built-in (small) data
  • github: remove cols param

####Bug Fixes

#####Algorithms

  • PUBDEV-1211: pca: descaled pca, std dev seems to be wrong for attached data github
  • PUBDEV-1213: pca: would be good to have the std dev numbered bec difficult to relate to the principal components (github)
  • PUBDEV-1201: pca: get ArrayIndexOutOfBoundsException (github)
  • PUBDEV-1203: pca: giving wrong std dev/rotation-labels for iris with species as enum (github)
  • PUBDEV-1199: DL with <1 epochs has wrong initial estimated time (github)
  • github: Fix missing AUC for training data in DL.
  • github: Add the seed back to GBM imbalanced test (was set to 0 by default before, now explicit)

#####R

  • PUBDEV-1189: R: h2o.hist broken for breaks that is a list of the break intervals (github)
  • PUBDEV-1206: Frame summary from R and Python need to use the Frame summary endpoint (github)
  • PUBDEV-1177: R summary() is slow when large number of columns
  • PUBDEV-1097: R: R should be able to take a of paths similar to how python does

###Shannon (3.0.0.11) - 5/22/15

####Enhancements

#####Algorithms

  • PUBDEV-1179: DRF: investigate if larger seeds giving better models
  • PUBDEV-1178: Add logloss/AUC/Error to GBM/DRF Logs & ScoringHistory
  • PUBDEV-1169: Use only 1 tree for DRF binomial (github)
  • PUBDEV-1170: Wrong ROC is shown for DRF (Training ROC, even though Validation is given)
  • PUBDEV-1162: Speed up sorting of histograms with O(N log N) instead of O(N^2)

#####System

####Bug Fixes

#####Algorithms

  • HEXDEV-253: model output consistency
  • HEXDEV-319: DRF in h2o 3.0 is worse than in h2o 2.0 for Airline
  • PUBDEV-1180: DRF has wrong training metrics when validation is given

#####API

  • PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset

#####Python

  • PUBDEV-1183: Python version check should fail hard by default
  • PUBDEV-1185: Python binding version mismatch check should fail hard and be on by default
  • HEXDEV-138: Port Python tests for Deep Learning

#####R

  • PUBDEV-1160: R: h2o.hist doesn't support breaks argument
  • PUBDEV-1159: R: h2o.hist takes too long to run
  • PUBDEV-1150: R CMD Check: URLs not working
  • PUBDEV-1149: R CMD check not happy with our use of .OnAttach
  • PUBDEV-1174: R: h2o.hist FD implementation broken
  • PUBDEV-1167: R: h2o.group_by broken
  • HEXDEV-318: the fix to H2O startup for the host unreachable from R causes a security hole
  • PUBDEV-1187: FramesHandler.summary() needs to run summary on all Vecs concurrently.

#####System

  • PUBDEV-862: Building a model without training file -> NPE
  • HEXDEV-315: importFile fails: Error in fromJSON(txt, ...) : unexpected character: A
  • PUBDEV-1137: Parse: upload and import gives different chunk compression on the same file
  • PUBDEV-1054: Parse: h2o parses arff file incorrectly
  • PUBDEV-1181: Rapids should queue and block on the back-end to prevent overlapping calls
  • PUBDEV-1184: importFile fails for paths containing spaces

#####Web UI

  • PUBDEV-1182: Flow: when upload file fails, the control does not come back to the flow screen, and have to refresh the whole page to get it back
  • PUBDEV-1131: GBM crashes after calling getJobs in Flow

###Shannon (3.0.0.7) - 5/18/15

####Enhancements

API
  • PUBDEV-711: take a final look at all REST API parameter names and help strings
  • PUBDEV-757: Rename DocsV1 + DocsHandler to MetadataV1 + MetadataHandler
  • PUBDEV-1138: Performance improvements for big data sets => getModels
  • PUBDEV-1126: Performance improvements for big data sets => Get frame summary

#####System

  • HEXDEV-316: ImportFiles should not download files from HTTP

#####Web UI

####Bug Fixes

The following changes are to resolve incorrect software behavior:

API
  • PUBDEV-501: H2OPredict: does not complain when you build a model with one dataset and predict on completely different dataset
  • PUBDEV-1047: API : Get frames and Build model => takes long time to get frames
  • HEXDEV-149: Allow JobsV3 to return properly typed jobs, not always instances of JobV3
  • PUBDEV-1036: rename straggler V2 schemas to V3
R

#####System

  • PUBDEV-1034: Windows 7/8/2012 Multicast Error UDP
  • PUBDEV-862: Building a model without training file -> NPE
  • HEXDEV-253: model output consistency
  • PUBDEV-1135: While predicting get:class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.ArrayIndexOutOfBoundsException: 5
  • PUBDEV-1090: POJO: Models with "." in key name (ex. pros.glm) can't access pojo endpoint
  • PUBDEV-1077: Getting an IcedHashMap warning from H2O startup

#####Web UI

  • PUBDEV-1133: getModels in Flow returns error
  • PUBDEV-926: Flow: When user hits build model without specifying the training frame, it would be good if Flow guides the user. It presently shows an NPE msg
  • PUBDEV-1131: GBM crashes after calling getJobs in Flow

###Shannon (3.0.0.2) - 5/15/15

####New Features

ModelMetrics
WebUI
  • PUBDEV-942: ModelMetrics by model category - Autoencoder

####Enhancements

#####Algorithms

  • github: GLM update: skip lambda max during lambda search
  • github: removed higher accuracy option
  • github: Rename constant col parameter
  • github: GLM update: added stopping criteria to lbfgs, tweaked some internal constants in ADMM
  • github: Add support for ignore_const_col in DL

######Python

  • PUBDEV-852: Binomial: show per-metric-optimal CM and per-threshold CM in Python
  • github: add filterNACols to python
  • github: h2o.delete replaced with h2o.removeFrameShallow
  • github: Add distribution summary to Python

#####R

  • github: add filterNACols to R
  • github: explicitly set cols=TRUE for R style str on frames
  • github: enable faster str, bulk nlevels, bulk levels, bulk is.factor
  • github: Add optional blocking parameter to h2o.uploadFile
System
  • PUBDEV-672 HTML version of the REST API docs should be available on the website
  • PUBDEV-827: class GenModel duplicates part of code of Model

#####Web UI

  • HEXDEV-181 Flow: Handle deep features prediction input and output
  • github: removed use_all_factor_levels from glm flows

####Bug Fixes

#####Algorithms

  • HEXDEV-302: AIOOBE during Prediction with DL github
  • github: glm fix: don't force in null model for lambda search with user given list of lambdas
  • github: Fix domain in glm scoring output for binomial
  • github: GLM Fix - fix degrees of freedom when running without intercept (+/-1)
  • github: GLM fix: make valid data info be clone of train data info (needs exactly the same categorical offsets, ignore unseen levels)
  • github: Fix glm scoring, fill in default domain {0,1} for binary columns when scoring

#####R

  • PUBDEV-1116: R: Parse that works from flow doesn't work from R using as.h2o
  • PUBDEV-798: R: String Munging Functions Missing
  • PUBDEV-584: R: hist() doesn't currently work for H2O objects
  • PUBDEV-820: H2oR: model objects should return the CM when run classification like h2o1
  • PUBDEV-1113: Remove Keys : Parse => Remove => doesn't complete
  • PUBDEV-1102: R: h2o.rbind fails to join two dataset together
  • PUBDEV-899: R: all doesn't work
  • PUBDEV-555: H2O-R: str does not work
  • PUBDEV-1110: H2OR: while printing a gbm model object, get invalid format '%d'; use format %f, %e, %g or %a for numeric objects
  • PUBDEV-903: R: Errors from some rapids calls seem to fail to return an error
  • HEXDEV-311: Performance bug from R with Expect: 100-continue
  • PUBDEV-1030: h2o.performance: ignores the user specified threshold
  • PUBDEV-1071: R: regression models don't show in print statement r2 but it exists in the model object
  • PUBDEV-1072: R: missing accessors for glm specific fields
  • PUBDEV-1032: After running some R and py demos when invoke a build model from flow get- rollup stats problem vec deleted error
  • PUBDEV-1069: R: missing implementation for h2o.r2
  • PUBDEV-1064: Passing sep="," to h2o.importFile() fails with '400 Bad Request'
  • PUBDEV-1092: Get NPE while predicting

#####System

  • PUBDEV-1091: S3 gzip parse failure
  • PUBDEV-1081: Probably want to cleanly disable multicast (not retry) and print suggestion message, if multicast not supported on picked multicast network interface
  • PUBDEV-1112: User has no way to specify whether to drop constant columns
  • PUBDEV-1109: Change all extdata imports to uploadFile
  • PUBDEV-1104: .gz file parse exception from local filesystem
Web UI
  • PUBDEV-1134: getPredictions in Flow returns error
  • PUBDEV-1020: Flow : Drop NA Cols enable => Should automatically populate the ignored columns
  • PUBDEV-1041: Flow GLM: formatting needed for the model parameter listing in the model object github
  • PUBDEV-1108: Flow: When predict on data with no response get :Error processing POST /3/Predictions/models/gbm-a179db76-ba96-420f-a643-0e166aea3af3/frames/subset_1 'undefined' is not an object (evaluating 'prediction.model')

##H2O-Dev

###Shackleford (0.2.3.6) - 5/8/15

####New Features

#####Python

#####Sparkling Water

  • Publish h2o-scala and h2o-app latest version to maven central (PUBDEV-443)

####Enhancements

#####Algorithms

  • Use AUC's default threshold for label-making for binomial classifiers predict() (PUBDEV-1063) (github)
  • GLM update (github)
  • Cleanup AUC2, make incremental version (github)
  • Name change: override_with_best_model -> overwrite_with_best_model (github)
  • Couple of GLM updates (github)
  • Disable _replicate_training_data for data that's larger than 10GB (github)
  • Added replicate_training_data param for DL (github)
  • Change a few kmeans output parameters so no longer dividing by nrows or num_clusters (github)
  • GLMValidation Updated auc computation (github)
  • Do not delete model metrics at end of GBM/DRF (github)

#####API

  • Clean REST api for Parse (PUBDEV-993)
  • Removes is_valid, invalid_lines, and domains from REST api (github)
  • Annotate domains output field as expert level (github)

#####Python

#####R

  • Cleaner client POJO download for R (PUBDEV-907)
  • Implement h2o.interaction() (PUBDEV-854) (github)
  • R: h2o.impute missing (PUBDEV-796)
  • validation_frame is passed through to h2o (github)
  • Adding GBM accessor function runits (github)
  • Adding changes to h2o.hit_ratio_table to be like other accessors (i.e., no train) (github)
  • add h2o.getPOJO to R, fix impute ast build in python (github)

#####System

  • Change NA strings to an array in ParseSetup (PUBDEV-995)
  • Document way of passing S3 credentials for S3N (PUBDEV-947)
  • Add H2O-dev doc on docs.h2o.ai via a new structure (proposed below) (PUBDEV-355)
  • Rapids Ref Doc (PUBDEV-667)
  • Show Timestamp and Duration for all model scoring histories (PUBDEV-1018) (github)
  • Logs slow reads, mainly meant for noting slow S3 reads (github)
  • Make prediction frame column names non-integer (github)
  • Add String[] factor_columns instead of int[] factors (github)
  • change the runtime exception to a Log.info() if interface doesn't support multicast (github)
  • More robust way to copy Flow files to web root per Prithvi (github)
  • Switches na_string from a single value per column to an array per column (github)

#####Web UI

####Bug Fixes

#####Algorithms

  • H2O cloud shuts down with some H2O.fail error, while building some kmeans clusters (PUBDEV-1051) (github)
  • GLM:beta constraint does not seem to be working (PUBDEV-1083)
  • GBM - random attack bug (probably because max_after_balance_size is really small) (PUBDEV-1061) (github)
  • GLM: LBFGS objval java lang assertion error (PUBDEV-1042) (github)
  • PCA Cholesky NPE (PUBDEV-921)
  • GBM: H2o returns just 5525 trees, when ask for a much larger number of trees (PUBDEV-860)
  • CM returned by AUC2 doesn't agree with manual-made labels from F1-optimal threshold (HEXDEV-263)
  • AUC: h2o reporting wrong auc on a modified covtype data (PUBDEV-891)
  • GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
  • KMeans metrics incomplete (PUBDEV-1029)
  • GLM: Java Assertion Error (PUBDEV-1025)
  • Random forest bug (PUBDEV-1015)
  • A particular random forest model has an empty (training) metric json max_criteria_and_metric_scores (PUBDEV-1001)
  • PCA results exhibit numerical inaccuracies compared to R (PUBDEV-550)
  • DRF: reporting wrong depth for attached dataset (PUBDEV-1006)
  • added missing "names" column name to beta constraints processing (github)
  • Fix balance_classes probability correction consistency between H2O and POJO (github)
  • Fix in GLM scoring - check actual for NaNs as well (github)

#####Python

  • Cannot import_file path=url python interface (PUBDEV-1059)
  • head()/tail() should show labels, rather than number encoding, for enum columns (PUBDEV-1017)
  • h2o.py: for binary response printing transpose and hence wrong cm (PUBDEV-1013)

#####R

  • Broken Summary in R (PUBDEV-1073
  • h2oR summary: displaying no labels in summary (PUBDEV-1008)
  • R/Python impute bugs (PUBDEV-1055)
  • R: h2o.varimp doubles the print statement (PUBDEV-1068)
  • R: h2o.varimp returns NULL when model has no variable importance (PUBDEV-1078)
  • h2oR: h2o.confusionMatrix(my_gbm, validation=F) should not show a null (PUBDEV-849)
  • h2o.impute doesn't impute (PUBDEV-1024)
  • R: as.h2o cutting entries when trying to import data.frame into H2O (HEXDEV-293)
  • The default names are too long, for an R-datafile parsed to H2O, and needs to be changed (PUBDEV-976)
  • H2o.confusionMatrix: when invoked with threshold gives error (PUBDEV-1010)
  • removing train and adding error messages for valid = TRUE when there's not validation metrics (github)

#####System

  • Download logs is returning the same log file bundle for every node (PUBDEV-1056)
  • ParseSetup is useless and misleading for SVMLight (PUBDEV-994)
  • Fixes bug that was short circuiting the setting of column names (github)

#####Web UI


###Shackleford (0.2.3.5) - 5/1/15

####New Features

#####API

  • Need a /Log REST API to log client-side errors to H2O's log (HEXDEV-291)

#####Python

  • add impute to python interface (github)

#####System

####Enhancements

#####Algorithms

  • GLM: Name to be changed from normalized to standardized in output to be consistent between input/output (PUBDEV-954)
  • GLM: It would be really useful if the coefficient magnitudes are reported in descending order (PUBDEV-923)
  • PUBDEV-536: Limit DL models to 100M parameters (github)
  • PUBDEV-536: Add accurate memory-based admission control for GBM/DRF (github)
  • relax the tolerance a little more...(github)
  • Tree depth correction (github)
  • Comment out duration_in_ms for now, as it's always left at 0 (github)
  • Updated min mem computation for glm (github)
  • GLM update: added lambda search info to scoring history (github)

#####Python

  • python .show() on model and metric objects should match R/Flow as much as possible (HEXDEV-289)
  • GLM model output, details from Python (HEXDEV-95)
  • GBM model output, details from Python (HEXDEV-102)
  • Run GBM from Python (HEXDEV-99)
  • map domain to result from /Frames if needed (github)
  • added confusion matrix to metric output (github)
  • update metrics_base_confusion_matrices() (github)
  • fetch out string_data if type is string (github)

#####R

#####System

#####Web UI

  • Flow: Confusion matrix: good to have consistency in the column and row name (letter) case (PUBDEV-971)
  • Run GBM Multinomial from Flow (HEXDEV-111)
  • Run GBM Regression from Flow (HEXDEV-112)
  • Sort model types in alphabetical order in Flow (PUBDEV-1011)

####Bug Fixes

The following changes are to resolve incorrect software behavior:

#####Algorithms

  • GLM: Model output display issues (PUBDEV-956)
  • h2o.glm: ignores validation set (PUBDEV-958)
  • DRF: reports wrong number of leaves in a summary (PUBDEV-930)
  • h2o.glm: summary of a prediction frame gives na's as labels (PUBDEV-959)
  • GBM: reports wrong max depth for a binary model on german data (PUBDEV-839)
  • GLM: Confusion matrix missing in R for binomial models (PUBDEV-950) (github)
  • GLM: On airlines(40g) get ArrayIndexOutOfBoundsException (PUBDEV-967)
  • GLM: Build model => Predict => Residual deviance/Null deviance different from training/validation metrics (PUBDEV-991)
  • Domains returned by GLM for binomial classification problem are integers, but should be mapped to their label (PUBDEV-999)
  • GLM: Validation on non training data gives NaN Res Deviance and AIC (PUBDEV-1005)
  • Confusion matrix has nan's in it (PUBDEV-1000)
  • glm fix: pass model_id from R (was being dropped) (github)

#####Python

#####R

  • h2o.confusionMatrix for binary response gives not-found thresholds (PUBDEV-957)
  • GLM: model_id param is ignored in R (PUBDEV-1007)
  • h2o.confusionmatrix: mixing cases(letter) for categorical labels while printing multinomial cm (PUBDEV-996)
  • fix the dupe thresholds error (github)
  • extra arg in impute example (github)
  • fix missing param data (github)

#####System

  • Builds : Failing intermittently due to java.lang.StackOverflowError (PUBDEV-972)
  • Get H2O cloud hang with NPE and roll up stats problem, when click on build model glm from flow, on laptop after running a few python demos and R scripts (PUBDEV-963)

#####Web UI

  • Flow :=> Airlines dataset => Build models glm/gbm/dl => water.DException$DistributedException: from /172.16.2.183:54321; by class water.fvec.RollupStats$ComputeRollupsTask; class java.lang.NullPointerException: null (PUBDEV-603)
  • Flow => Preview Pojo => collapse not working (PUBDEV-977)
  • Flow => Any algorithm => Select response => Select Add all for ignored columns => Try to unselect some from ignored columns => Build => Response column IsDepDelayed not found in frame: allyears_1987_2013.hex. (PUBDEV-978)
  • Flow => ROC curve select something on graph => Table is displayed for selection => Collapse ROC curve => Doesn't collapse table, collapses only graph (PUBDEV-1003)

###Severi (0.2.2.16) - 4/29/15

####New Features

#####Python

####Enhancements

#####Algorithms

  • Use partial-sum version of mat-vec for DL POJO (PUBDEV-936)
  • Always store weights and biases for DLTest Junit (github)
  • Show the DL model size in the model summary (github)
  • Remove assertion in hot loop (github)
  • Rename ADMM to IRLSM (github)
  • Added no intercept option to glm (github)
  • Code cleanup. Moved ModelMetricsPCAV3 out of H2O-algos (github)
  • Improve DL model checkpoint logic (github)
  • Updated glm output (github)
  • Renamed normalized coefficients to standardized coefficients in glm output (github)
  • Use proper tie breaking for NB (github)
  • Add check that DL parameters aren't modified by model training (github)
  • Reduce tolerances (github)
  • If no observations of a response leveland prediction is numeric, assume it is drawn from standard normal distribution (mean 0, standard deviation 1). Add validation test with split frame for naive Bayes (github)

#####Python

  • replaced H2OFrame.send_frame() calls with cbind Exprs so that lazy evaluation is enforced (github)
  • change default xmx/s behavior of h2o.init() (github)
  • better handling of single row return and print (github)

#####R

  • Added interpolation to quantile to match R type 7 (github)
  • Removed and tidied if's in quantile.H2OFrame since it now uses match.arg (github)
  • Connected validation dataset to glm in R (github)
  • Removing h2o.aic from seealso link (doesn't exist) and updating documentation (github)

#####System

  • Add number of rows (per node) to ChunkSummary (PUBDEV-938) (github)
  • allow nrow as alias for count in groupby (github)
  • Only launches task to fill in SVM zeros if the file is SVM (github)
  • Adds more log traces to track progress of post-ingest actions (github)
  • Adds svm as a file extension to the hex name cleanup (github)

#####Web UI

  • Flow: Inspect data => Round decimal points to 1 to be consistent with h2o1 (PUBDEV-453)
  • Setup POJO download method for Flow (PUBDEV-909)
  • Pretty-print POJO preview in flow (PUBDEV-940)
  • Flow: It would be good if 'get predictions' also shows the data (PUBDEV-883)
  • GBM model output, details in Flow (HEXDEV-103)
  • Display a linked data table for each visualization in Flow (PUBDEV-318)
  • Run GBM binomial from Flow (needs proper CM) (PUBDEV-943)

####Bug Fixes

#####Algorithms

  • GLM: results from model and prediction on the same dataset do not match (PUBDEV-922)
  • GLM: when select AUTO as solver, for prostate, glm gives all zero coefficients (PUBDEV-916)
  • Large (DL) models cause oversize issues during serialization (PUBDEV-941)
  • Fixed name change for ADMM (github)

#####API

#####Python

  • H2OVec.row_select(H2OVec) fails on case where only 1 row is selected (PUBDEV-948)
  • fix pyunit (github)

#####R

  • R: Parse of zip file fails, Summary fails on citibike data (PUBDEV-835)
  • h2o. performance reports a different Null Deviance than the model object for the same dataset (PUBDEV-816)
  • h2o.glm: no example on h2o.glm help page (PUBDEV-962)
  • H2O R: Confusion matrices from R still confused (PUBDEV-904) (github)
  • R: h2o.confusionMatrix("H2OModel", ...) extra parameters not working (PUBDEV-953) (github)
  • h2o.confusionMatrix for binomial gives not-found thresholds on S3 -airlines 43g (PUBDEV-957)
  • H2O summary quartiles outside tolerance of (max-min)/1000 (PUBDEV-671)
  • fix space headers issue from R (was not url-encoding the column strings) (github)
  • R CMD fixes (github)
  • Fixed broken R interface - make validation_frame non-mandatory (github)

#####Sparkling Water

  • Sparkling water : #UDP-Recv ERRR: UDP Receiver error on port 54322java.lang.ArrayIndexOutOfBoundsException:(PUBDEV-311)

#####System

  • Mapr 3.1.1 : Memory is not being allocated for what is asked for instead the default is what cluster gets (PUBDEV-937)
  • GLM: AIOOBwith msg '-14' at water.RPC$2.compute2(RPC.java:593) (PUBDEV-917)
  • h2o.glm: model summary listing same info twice (PUBDEV-915)
  • Parse: Detect and reject UTF-16 encoded files (HEXDEV-285)
  • DataInfo Row categorical encoding AIOOBE (HEXDEV-283)
  • Fix POJO Preview exception (github)
  • Fix NPE in ChunkSummary (github)
  • fix global name collision (github)

###Severi (0.2.2.15) - 4/25/15

####New Features

#####Python

  • added min, max, sum, median for H2OVecs and respective pyunit (github)
  • added min(), max(), and sum() functionality on H2OFrames and respective pyunits (github)

#####Web UI

####Enhancements

#####Algorithms

  • K means output clean up (HEXDEV-187)
  • Add FNR/TNR/FPR/TPR to threshold tables, remove recall, specificity (github)
  • Add accessor for variable importances for DL (github)
  • Relax CM error tolerance for F1-optimal threshold now that AUC2 doesn't necessarily create consistent thresholds with its own CMs. (github)
  • Added scoring history to glm (github)
  • Added model summary to glm (github)
  • Add flag to support reading data from S3N (github)
  • Added degrees of freedom to GLM metrics schemas (github)
  • Allow DL scoring_history to be unlimited in length (github)
  • add plotting for binomial models (github)
  • Ignore certain parameters that are not applicable (class balancing, max CM size, etc.) (github)
  • Updated glm scoring, fill training/validation metrics in model output (github)
  • Rename gbm loss parameter to distribution (github)
  • Fix GBM naming: loss -> distribution (github)
  • GLM LBFGS update (github)
  • na.rm for quantile is default behavior (github)
  • GLM update: enabled max_predictors in REST, updated lbfgs (github)
  • Remove keep_cross_validation_splits for now from DL (github)
  • Get rid of sigma in the model metrics, instead show r2 (github)
  • Don't show score_every_iteration for DL (github)
  • Don't print too large confusion matrices in Tree models (github)

#####API

#####Python

  • Python client should check that version number == server version number (PUBDEV-799)
  • Add asfactor for month (github)
  • in Expr.show() only show 10 or less rows. remove locate from runit test because full path used (github)
  • change nulls to () (github)
  • sigma is no longer part of ModelMetricsRegressionV3 (github)

#####R

#####System

  • Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)
  • Rapids: require a (put "key" %frame) (PUBDEV-868)
  • Need pojo base model jar file embedded in h2o-dev via build process (PUBDEV-780) (github)
  • Make .json the default (PUBDEV-619) (github)
  • Rename class for clarification (github)
  • Classifies all NA columns as numeric. Also improves preview sampling accuracy by trimming partial lines at end of chunk. (github)
  • Implements sampling of files within the ParseSetup preview. This prevents poor column type guesses from only sampling the beginning of a file. (github).
  • Rename fields drop_na20_col (github)
  • allow for many deletes as final statements in a block (github)
  • rename initF -> init_f, dropNA20Cols -> drop_na20_cols (github)
  • Removed tweedie param (github)
  • thresholds -> threshold (github)
  • JSON of TwoDimTable with all null values in the first column (no row headers) now doesn't have an empty column for of "" or nulls. (github)
  • move H2O_Load, fix all the timezone functions (github)
  • Add extra verbose printout in case Frames don't match identically (github)
  • allow delayed column lookup (github)
  • add mixed type list (github)
  • Added WaterMeterIo to count persist info (github)
  • Remove special setChunkSize code in HDFS and NFS file vec (github)
  • add check for Frame on string parse (github)
  • Disable Memory Cleaner (github)
  • Handle '<' chars in Keys when swapping (github)
  • allow for colnames in slicing (github)
  • Adjusts parse type detection. If column is all one string value, declare it an enum (github)

#####Web UI

####Bug Fixes

#####Algorithms

  • GLM: lasso i.e alpha =1 seems to be giving wrong answers (PUBDEV-769)
  • AUC: h2o reports .5 auc when actual auc is 1 (PUBDEV-879)
  • h2o.glm: No output displayed for the model (PUBDEV-858)
  • h2o.glm model object output needs a fix (PUBDEV-815)
  • h2o.glm model object says : fill me in GLMModelOutputV2; I think I'm redundant [1] FALSE (PUBDEV-765)
  • GLM : Build GLM Model => Java Assertion error (PUBDEV-686)
  • GLM :=> Progress shows -100% (PUBDEV-861)
  • GBM: Negative sign missing in initF value for ad dataset (PUBDEV-880)
  • K-Means takes a validation set but doesn't use it (PUBDEV-826)
  • Absolute_MCC is NaN (sometimes) (PUBDEV-848) (github)
  • GBM: A proper error msg should be thrown when the user sets the max depth =0 (PUBDEV-838) (github)
  • DRF Regression Assertion Error (PUBDEV-824)
  • h2o.randomForest: if h2o is not returning the mse for the 0th tree then it should not be reported in the model object (PUBDEV-811)
  • GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.tree.gbm.GBM$GBMDriver$GammaPass.map (PUBDEV-693)
  • GBM: Got exception class java.lang.AssertionError with msg null java.lang.AssertionError at hex.ModelMetricsMultinomial$MetricBuildMultinomial.perRow (HEXDEV-248)
  • GBM get java.lang.AssertionError: Coldata 2199.0 out of range C17:5086.0-19733.0 step=57.214844 nbins=256 isInt=1 (HEXDEV-241)
  • GLM: glmnet objective function better than h2o.glm (PUBDEV-749)
  • GLM: get AIOOB:-36 at hex.glm.GLMTask$GLMIterationTask.postGlobal(GLMTask.java:733) (PUBDEV-894) (github)
  • Fixed glm behavior in case no rows are left after filtering out NAs (github)
  • Fix memory leak in validation scoring in K-Means (github)

#####API

  • API unification: DataFrame should be able to accept URI referencing file on local filesystem (PUBDEV-709) (github)

#####Python

#####R

#####System

  • MapR FS loads are too slow (PUBDEV-927)
  • ensure that HDFS works from Windows (PUBDEV-812)
  • Summary: on a time column throws,'null' is not an object (evaluating 'column.domain[level.index]') in Flow (PUBDEV-867)
  • Parse: An enum column gets parsed as int for the attached file (PUBDEV-606)
  • Parse => 40Mx1_uniques => class java.lang.RuntimeException (PUBDEV-729)
  • if there are fewer than 5 unique values in a dataset column, mins/maxs reports e+308 values (PUBDEV-150) (github)
  • Sparkling water - DataFrame[T_UUID] to SchemaRDD[StringType] (PUDEV-771)
  • Sparkling water - DataFrame[T_NUM(Long)] to SchemaRDD[LongType] (PUBDEV-767)
  • Sparkling water - DataFrame[T_ENUM] to SchemaRDD[StringType] (PUBDEV-766)
  • Inconsistency in row and col slicing (HEXDEV-265) (github)
  • rep_len expects literal length only (HEXDEV-268) (github)
  • cbind and = don't work within a single rapids block (HEXDEV-237)
  • Rapids response for c(value) does not have frame key (HEXDEV-252)
  • S3 parse takes forever (PUBDEV-876)
  • Parse => Enum unification fails in multi-node parse (PUBDEV-718) (github)
  • All nodes are not getting updated with latest status of each other nodes info (PUBDEV-768)
  • Cluster creation is sometimes rejecting new nodes (post jenkins-master-1128+) (PUBDEV-807)
  • Parse => Multiple files 1 zip/ 1 csv gives Array index out of bounds (PUBDEV-840)
  • Parse => failed for X5MRows6KCols ==> OOM => Cluster dies (PUBDEV-836)
  • /frame/foo pagination weirded out (HEXDEV-277) (github)
  • Removed code that flipped enums to strings (github)

#####Web UI

  • Flow: It would be really useful to have the mse plots back in GBM (PUBDEV-889)
  • State change in Flow is not fully validated (PUBDEV-919)
  • Flows : Not able to load saved flows from hdfs (PUBDEV-872)
  • Save Function in Flow crashes (PUBDEV-791) (github)
  • Flow: should throw a proper error msg when user supplied response have more categories than algo can handle (PUBDEV-866)
  • Flow display of a summary of a column with all missing values fails. (HEXDEV-230)
  • Split frame UI improvements (HEXDEV-275)
  • Flow : Decimal point precisions to be consistent to 4 as in h2o1 (PUBDEV-844)
  • Flow: Prediction frame is outputing junk info (PUBDEV-825)
  • EC2 => Cluster of 16 nodes => Water Meter => shows blank page (PUBDEV-831)
  • Flow: Predict - "undefined is not an object (evaluating prediction.thresholds_and_metric_scores.name) (PUBDEV-559)
  • Flow: inspect getModel for PCA returns error (PUBDEV-610)
  • Flow, RF: Can't get Predict results; "undefined is not an object (evaluating prediction.confusion_matrices.length)" (PUBDEV-695)
  • Flow, GBM: getModel is broken -Error processing GET /3/Models.json/gbm-b1641e2dc3-4bad-9f69-a5f4b67051ba null is not an object (evaluating source.length) (PUBDEV-800)

###Severi (0.2.2.1) - 4/10/15

####New Features

#####R

####Enhancements

#####Algorithms

  • POJO generation: GBM (PUBDEV-713)
  • POJO generation: DRF (PUBDEV-714)
  • Compute and Display Hit Ratios (PUBDEV-630) (github)
  • Add DL POJO scoring (PUBDEV-585)
  • Allow validation dataset for AutoEncoder (PUDEV-581)
  • PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
  • increase tolerance to 2e-3 (was 1e-3 ..failed with 0.001647 relative difference (github)
  • change tolerance to 1e-3 (github)
  • Add option to export weights and biases to REST API / Flow. (github)
  • Add scree plot for H2O PCA models and fix Runit test. (github)
  • Remove quantiles from the model builders list. (github)
  • GLM update: added row filtering argument to line search task, fixed issues with dfork/asyncExec (github)
  • Updated rho-setting in GLM. (github)
  • No threshold 0.5; use the default (max F1) instead (github)
  • GLM update: updated initilization, NA row filtering, default lambda is now empty, will be picked based on the fraction of lambda_max. (github)
  • Updated ADMM solver. (github)
  • Added makeGLMModel call. (github)
  • Start with classification error NaN at t=0 for DL, not with 1. (github)
  • Relax DL POJO relative tolerance to 1e-2. (github)
  • Override nfeatures() method in DLModelOutput. (github)
  • Renaming of fields in GLM (github)
  • GLM: Take out Balance Classes (PUBDEV-795)

#####API

  • schema metadata for Map fields should include the key and value types (PUBDEV-753) (github)
  • schema metadata should include the superclass (PUBDEV-754)
  • rest api naming convention: n_folds vs ntrees (PUBDEV-737)
  • schema metadata for Map fields should include the key and value types (PUBDEV-753)
  • Create REST Endpoint for exposing .java pojo models (PUBDEV-778)

#####Python

  • Run GLM from Python (including LBFGS) (HEXDEV-92)
  • added H2OFrame show(), as_list(), and slicing pyunits (github)
  • changed solver parameter to "L_BFGS" (github)
  • added multidimensional slicing of H2OFrames and Exprs. (github)
  • add h2o.groupby to python interface (github)
  • added H2OModel.confusionMatrix() to return confusion matrix of a prediction (github)

#####R

  • PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)
  • R: Cannot create new columns through R (PUBDEV-571)
  • H2O-R: it would be more useful if h2o.confusion matrix reports the actual class labels instead of [,1] and [,2] (PUBDEV-553)
  • Support both multinomial and binomial CM (github)

#####System

  • Flow: Standardize max_iters/max_iterations parameters (PUBDEV-447) (github)
  • Add ERROR logging level for too-many-retries case (PUBDEV-146) (github)
  • Simplify checking of cluster health. Just report the status immediately. (github)
  • reduce timeout (github)
  • strings can have ' or " beginning (github)
  • Throw a validation error in flow if any training data cols are non-numeric (github)
  • Add getHdfsHomeDirectory(). (github)
  • Added --verbose. (github)

#####Web UI

  • PUBDEV-707: nice algo names in the Flow dropdown (full word names) (github)
  • Unbreak Flow's ConfusionMatrix display. (github)
  • POJO generation: DL (PUBDEV-715)

####Bug Fixes

#####Algorithms

  • GLM : Build GLM model with nfolds brings down the cloud => FATAL: unimplemented (PUBDEV-731) (github)
  • DL : Build DL Model => FATAL: unimplemented: n_folds >= 2 is not (yet) implemented => SHUTSDOWN CLOUD (PUBDEV-727) (github)
  • GBM => Build GBM model => No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-723)
  • GBM: When run with loss = auto with a numeric column get- error :No enum constant hex.tree.gbm.GBMModel.GBMParameters.Family.AUTO (PUBDEV-708) (github)
  • gbm: does not complain when min_row >dataset size (PUBDEV-694) (github)
  • GLM: reports wrong residual degrees of freedom (PUBDEV-668)
  • H2O dev reports less accurate aucs than H2O (PUBDEV-602)
  • GLM : Build GLM model fails => ArrayIndexOutOfBoundsException (PUBDEV-601)
  • divide by zero in modelmetrics for deep learning (PUBDEV-568)
  • GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
  • GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)
  • GLM : Build Model fails with Array Index Out of Bound exception (PUBDEV-454) (github)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • GLM failure: got NaNs and/or Infs in beta on airlines (PUBDEV-362)
  • MetricBuilderMultinomial.perRow AssertionError while running GBM (HEXDEV-240)
  • Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
  • DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226) (github)
  • AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
  • glm pyunit intermittent failure (HEXDEV-199)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • get rid of nfolds= param since it's not supported in GLM yet (github)
  • Fixed degrees of freedom (off by 1) in glm, added test. (github)
  • GLM fix: fix filtering of rows with NAs and fix in sparse handling. (github)
  • Fix GLM job fail path to call Job.fail(). (github)
  • Full AUC computation, bug fixes (github)
  • Fix ADMM for upper/lower bounds. (updated rho settings + update u-vector in ADMM for intercept) (github)
  • Few glm fixes (github)
  • DL : KDD Algebra data set => Build DL model => ArrayIndexOutOfBoundsException (PUBDEV-696)
  • GBm: Dev vs H2O for depth 5, minrow=10, on prostate, give different trees (PUBDEV-759)
  • GBM param min_rows doesn't throw exception for negative values (PUBDEV-697)
  • GBM : Build GBM Model => Too many levels in response column! (java.lang.IllegalArgumentException) => Should display proper error message (PUBDEV-698)
  • GBM:Got exception 'class java.lang.AssertionError', with msg 'Something is wrong with GBM trees since returned prediction is Infinity (PUBDEV-722)

#####API

  • Cannot adapt numeric response to factors made from numbers (PUBDEV-620)
  • not specifying response_column gets NPE (deep learning build_model()) I think other algos might have same thing (PUBDEV-131)
  • NPE response has null msg, exception_msg and dev_msg (HEXDEV-225)
  • Flow :=> Save Flow => On Mac and Windows 8.1 => NodePersistentStorage failure while attempting to overwrite (?) a flow (HEXDEV-202) (github)
  • the can_build field in ModelBuilderSchema needs values[] to be set (PUBDEV-755)
  • value field in the field metadata isn't getting serialized as its native type (PUBDEV-756)

#####Python

#####R

#####System

  • key type failure should fail the request, not the cloud (PUBDEV-739) (github)
  • Parse => Import Medicare supplier file => Parse = > Illegal argument for field: column_names of schema: ParseV2: string and key arrays' values must be quoted, but the client sent: " (PUBDEV-719)
  • Overwriting a constant vector with strings fails (PUBDEV-702)
  • H2O - gets stuck while calculating quantile,no error msg, just keeps running a job that normally takes less than a sec (PUBDEV-685)
  • Summary and quantile on a column with all missing values should not throw an exception (PUBDEV-673) (github)
  • View Logs => class java.lang.RuntimeException: java.lang.IllegalArgumentException: File /home2/hdp/yarn/usercache/neeraja/appcache/application_1427144101512_0039/h2ologs/h2o_172.16.2.185_54321-3-info.log does not exist (PUBDEV-600)
  • Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)
  • Parse: Numbers completely parsed wrong (PUBDEV-574)
  • Flow: converting a column to enum while parsing does not work (PUBDEV-566)
  • Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
  • toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)
  • Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
  • The quote stripper for column names should report when the stripped chars are not the expected quotes (PUBDEV-424)
  • import directory with large files,then Frames..really slow and disk grinds. Files are unparsed. Shouldn't be grinding (PUBDEV-98)
  • NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
  • h2o.exec won't be supported (github)
  • fixed import issue (github)
  • fixed init param (github)
  • fix repeat as.factor NPE (github)
  • startH2O set to False in init (github)
  • hang on glm job removal (PUBDEV-726)
  • Flow - changed column types need to be reflected in parsed data (HEXDEV-189)
  • water.DException$DistributedException while running kmeans in multinode cluster (PUBDEV-691)
  • Frame inspection prior to file parsing, corrupts parsing (PUBDEV-425)

#####Web UI

  • Flow, DL: Need better fail message if "Autoencoder" and "use_all_factor_levels" are both selected (PUBDEV-724)
  • When select AUTO while building a gbm model get ERROR FETCHING INITIAL MODEL BUILDER STATE (PUBDEV-595)
  • Flow : Build h2o-dev-0.1.17.1009 : Building GLM model gives java.lang.ArrayIndexOutOfBoundsException: (PUBDEV-205 (github)
  • Flow:Summary on flow broken for a long time (PUBDEV-785)

Serre (0.2.1.1) - 3/18/15

####New Features

#####Algorithms

#####Python

#####R

#####System

#####Web UI

####Enhancements

#####Algorithms

  • Display GLM coefficients only if available (PUBDEV-466)
  • Add random chance line to RoC chart (HEXDEV-168)
  • Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
  • Use getRNG for Dropout (github)
  • PUBDEV-598: Add tests for determinism of RNGs (github)
  • PUBDEV-598: Implement Chi-Square test for RNGs (github)
  • Add DL model output toString() (github)
  • Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
  • Print number of categorical levels once we hit >1000 input neurons. (github)
  • Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
  • Fully remove _convert_to_enum in all algos (github)
  • Port MissingValueInserter EndPoint to h2o-dev. (PUBDEV-465)

#####API

#####Python

  • added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
  • Make H2OVec.levels() return the levels (github)
  • H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)

#####System

  • Customize H2O web UI port (PUBDEV-483)
  • Make parse setup interactive (PUBDEV-532)
  • Added --verbose (github)
  • Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
  • Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)

#####Web UI

  • Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
  • Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
  • 'Run' button selects next cell after running
  • ModelMetrics by model category: Clustering (PUBDEV-416)
  • ModelMetrics by model category: Regression (PUBDEV-415)
  • ModelMetrics by model category: Multinomial (PUBDEV-414)
  • ModelMetrics by model category: Binomial (PUBDEV-413)
  • Add ability to select and delete multiple models (github)
  • Add ability to select and delete multiple frames (github)
  • Flows now stop running when an error occurs
  • Print full number of mismatches during POJO comparison check. (github)
  • Make Grid multi-node safe (github)
  • Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

####Bug Fixes

#####Algorithms

  • GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
  • GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
  • GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
  • GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
  • GBM predict fails without response column (PUBDEV-478)
  • GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
  • PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
  • KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • PUBDEV-580: Fix some numerical edge cases (github)
  • Fix two missing float -> double conversion changes in tree scoring. (github)
  • Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
  • Old GLM Parameters Missing (PUBDEV-431)
  • GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)

#####API

  • SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
  • Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
  • Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121) (github)

#####Python

  • fix python syntax error (github)
  • Fixes handling of None in python for a returned na_string. (github)

#####R

  • R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
  • h2o.confusionmatrices does not work (PUBDEV-547)
  • How do i convert an enum column back to integer/double from R? (PUBDEV-546)
  • Summary in R is faulty (PUBDEV-539)
  • R: as.h2o should preserve R data types (PUBDEV-578)
  • NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
  • H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)
  • R-H2O Managing Memory in a loop (PUB-1125)
  • h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
  • H2O-R not showing meaningful error msg

#####System

  • Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
  • 3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
  • Not able to start h2o on hadoop (PUBDEV-487)
  • one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
  • Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
  • The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
  • 0 / Y / N parsing (PUBDEV-229)
  • NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
  • Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
  • Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
  • Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
  • Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Check reproducibility on multi-node vs single-node (PUBDEV-557)
  • Parse : After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)

#####Web UI

  • Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
  • Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
  • Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
  • GBM Model : Params in flow show two times (PUBDEV-440)
  • Flow multinomial confusion matrix visualization (HEXDEV-204)
  • Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
  • Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
  • [MapR] unable to give hdfs file name from Flow (PUBDEV-409)

###Selberg (0.2.0.1) - 3/6/15 ####New Features

#####Algorithms

#####Python

#####R

#####System

#####Web UI

####Enhancements

The following changes are improvements to existing features (which includes changed default values):

#####Algorithms

  • Display GLM coefficients only if available (PUBDEV-466)
  • Add random chance line to RoC chart (HEXDEV-168)
  • Allow validation dataset for AutoEncoder (PUDEV-581)
  • Speed up DLSpiral test. Ignore Neurons test (MatVec) (github)
  • Use getRNG for Dropout (github)
  • PUBDEV-598: Add tests for determinism of RNGs (github)
  • PUBDEV-598: Implement Chi-Square test for RNGs (github)
  • PUBDEV-580: Add log loss to binomial and multinomial model metric (github)
  • Add DL model output toString() (github)
  • Add LogLoss to MultiNomial ModelMetrics (PUBDEV-580)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)
  • Print number of categorical levels once we hit >1000 input neurons. (github)
  • Updated the loss behavior for GBM. When loss is set to AUTO, if the response is an integer with 2 levels, then bernoullli (rather than gaussian) behavior is chosen. As a result, the do_classification flag is no longer necessary in Flow, since the loss completely specifies the desired behavior, and R users no longer to use as.factor() in their response to get the desired bernoulli behavior. The score_each_iteration flag has been removed as well. (github)
  • Fully remove _convert_to_enum in all algos (github)
  • Add DL POJO scoring (PUBDEV-585)

#####API

#####Python

  • added H2OFrame.setNames(), H2OFrame.cbind(), H2OVec.cbind(), h2o.cbind(), and pyunit_cbind.py (github)
  • Make H2OVec.levels() return the levels (github)
  • H2OFrame.dim(), H2OFrame.append(), H2OVec.setName(), H2OVec.isna() additions. demo pyunit addition (github)

#####R

  • PUBDEV-578, PUBDEV-541, PUBDEV-566. -R client now sends the data frame column names and data types to ParseSetup. -R client can get column names from a parsed frame or a list. -Respects client request for column data types (github)

#####System

  • Customize H2O web UI port (PUBDEV-483)
  • Make parse setup interactive (PUBDEV-532)
  • Added --verbose (github)
  • Adds some H2OParseExceptions. Removes all H2O.fail in parse (no parse issues should cause a fail)(github)
  • Allows parse to specify check_headers=HAS_HEADERS, but not provide column names (github)
  • Port MissingValueInserter EndPoint to h2o-dev (PUBDEV-465)

#####Web UI

  • Add 'Clear cell' and 'Run all cells' toolbar buttons (github)
  • Add 'Clear cell' and 'Clear all cells' commands (PUBDEV-493) (github)
  • 'Run' button selects next cell after running
  • ModelMetrics by model category: Clustering (PUBDEV-416)
  • ModelMetrics by model category: Regression (PUBDEV-415)
  • ModelMetrics by model category: Multinomial (PUBDEV-414)
  • ModelMetrics by model category: Binomial (PUBDEV-413)
  • Add ability to select and delete multiple models (github)
  • Add ability to select and delete multiple frames (github)
  • Flows now stop running when an error occurs
  • Print full number of mismatches during POJO comparison check. (github)
  • Make Grid multi-node safe (github)
  • Beautify the vertical axis labels for Flow charts/visualization (more) (PUBDEV-329)

####Bug Fixes The following changes are to resolve incorrect software behavior:

#####Algorithms

  • GBM only populates either MSE_train or MSE_valid but displays both (PUBDEV-350)
  • GBM: train error increases after hitting zero on prostate dataset (PUBDEV-513)
  • GBM : Variable importance displays 0's for response param => should not display response in table at all (PUBDEV-430)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • GLM : R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Flow: GLM - 'model.output.coefficients_magnitude.name' not found, so can't view model (PUBDEV-466)
  • GBM predict fails without response column (PUBDEV-478)
  • GBM: When validation set is provided, gbm should report both mse_valid and mse_train (PUBDEV-499)
  • PCA Assertion Error during Model Metrics (PUBDEV-548) (github)
  • KMeans: Size of clusters in Model Output is different from the labels generated on the training set (PUBDEV-542) (github)
  • Inconsistency in GBM results:Gives different results even when run with the same set of params (HEXDEV-194)
  • divide by zero in modelmetrics for deep learning (PUBDEV-568)
  • AUC reported on training data is 0, but should be 1 (HEXDEV-223) (github)
  • GBM: reports 0th tree mse value for the validation set, different than the train set ,When only train sets is provided (PUDEV-561)
  • PUBDEV-580: Fix some numerical edge cases (github)
  • Fix two missing float -> double conversion changes in tree scoring. (github)
  • Problems during Train/Test adaptation between Enum/Numeric (HEXDEV-229)
  • DRF/GBM balance_classes=True throws unimplemented exception (HEXDEV-226)
  • Flow: HIDDEN_DROPOUT_RATIOS for DL does not show default value (PUBDEV-285)
  • Old GLM Parameters Missing (PUBDEV-431)
  • GLM: R/Flow ==> Build GLM Model hangs at 4% (PUBDEV-456)
  • GBM: Initial mse in bernoulli seems to be off (PUBDEV-515)

#####API

  • SplitFrame on String column produce C0LChunk instead of CStrChunk (PUBDEV-468)
  • Error in node$h2o$node : $ operator is invalid for atomic vectors (PUBDEV-348)
  • Response from /ModelBuilders don't conform to standard error json shape when there are errors (HEXDEV-121)

#####Python

  • fix python syntax error (github)
  • Fixes handling of None in python for a returned na_string. (github)

#####R

  • R : Inconsistency - Train set name with and without quotes work but Validation set name with quotes does not work (PUBDEV-491)
  • h2o.confusionmatrices does not work (PUBDEV-547)
  • How do i convert an enum column back to integer/double from R? (PUBDEV-546)
  • Summary in R is faulty (PUBDEV-539)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • R: as.h2o should preserve R data types (PUBDEV-578)
  • as.h2o loses track of headers (PUBDEV-541)
  • NPE in GBM Prediction with Sliced Test Data (HEXDEV-207) (github)
  • Import file from R hangs at 75% for 15M Rows/2.2 K Columns (HEXDEV-179)
  • Custom Functions don't work in apply() in R (PUBDEV-436)
  • got water.DException$DistributedException and then got java.lang.RuntimeException: Categorical renumber task (HEXDEV-195)
  • h2o.confusionMatrices for multinomial does not work (PUBDEV-577)
  • R: h2o.confusionMatrix should handle both models and model metric objects (PUBDEV-590)
  • H2O-R: as.h2o parses column name as one of the row entries (PUBDEV-591)

#####System

  • Flow: When balance class = F then flow should not show max_after_balance_size = 5 in the parameter listing (PUBDEV-503)
  • 3 jvms, doing ModelMetrics on prostate, class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-495)
  • Not able to start h2o on hadoop (PUBDEV-487)
  • one row (one col) dataset seems to get assertion error in parse setup request (PUBDEV-96)
  • Parse : Import file (move.com) => Parse => First row contains column names => column names not selected (HEXDEV-171) (github)
  • The NY0 parse rule, in summary. Doesn't look like it's counting the 0's as NAs like h2o (PUBDEV-154)
  • 0 / Y / N parsing (PUBDEV-229)
  • NodePersistentStorage gets wiped out when laptop is restarted. (HEXDEV-167)
  • Parse : Parsing random crap gives java.lang.ArrayIndexOutOfBoundsException: 13 (PUBDEV-428)
  • Flow: converting a column to enum while parsing does not work (PUBDEV-566)
  • Parse: Numbers completely parsed wrong (PUBDEV-574)
  • NodePersistentStorage gets wiped out when hadoop cluster is restarted (HEXDEV-185)
  • Parse: Fail gracefully when asked to parse a zip file with different files in it (PUBDEV-540)(github)
  • Building a model and making a prediction accepts invalid frame types (PUBDEV-83)
  • Flow : Import file 15M rows 2.2 Cols => Parse => Error fetching job on UI =>Console : ERROR: Job was not successful Exiting with nonzero exit status (HEXDEV-55)
  • Flow : Build GLM Model => Family tweedy => class hex.glm.LSMSolver$ADMMSolver$NonSPDMatrixException', with msg 'Matrix is not SPD, can't solve without regularization (PUBDEV-211)
  • Flow : Import File : File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Check reproducibility on multi-node vs single-node (PUBDEV-557)
  • Parse: After parsing Chicago crime dataset => Not able to build models or Get frames (PUBDEV-576)

#####Web UI

  • Flow : Build Model => Parameters => shows meta text for some params (PUBDEV-505)
  • Flow: K-Means - "None" option should not appear in "Init" parameters (PUBDEV-459)
  • Flow: PCA - "None" option appears twice in "Transform" list (HEXDEV-186)
  • GBM Model : Params in flow show two times (PUBDEV-440)
  • Flow multinomial confusion matrix visualization (HEXDEV-204)
  • Flow: It would be good if flow can report the actual distribution, instead of just reporting "Auto" in the model parameter listing (PUBDEV-509)
  • Unimplemented algos should be taken out from drop down of build model (PUBDEV-511)
  • [MapR] unable to give hdfs file name from Flow (PUBDEV-409)

###Selberg (0.2.0.1) - 3/6/15 ####New Features

#####Web UI

  • Flow: Delete functionality to be available for import files, jobs, models, frames (PUBDEV-241)
  • Implement "Download Flow" (PUBDEV-407)
  • Flow: Implement "Run All Cells" (PUBDEV-110)

#####API

#####System

  • Add a README.txt to the hadoop zip files (github)
  • Build a cdh5.2 version of h2o (github)

####Enhancements

#####Web UI

#####Algorithms

  • Added K-Means scoring (github)
  • Flow: Implement model output for Deep Learning (PUBDEV-118)
  • Flow: Implement model output for GLM (PUBDEV-120)
  • Deep Learning model output (HEXDEV-89, Flow),(HEXDEV-88, Python),(HEXDEV-87, R)
  • Run GLM Binomial from Flow (including LBFGS) (HEXDEV-90)
  • Flow: Display confusion matrices for multinomial models (PUBDEV-397)
  • During PCA, missing values in training data will be replaced with column mean (github)
  • Update parameters for best model scan (github)
  • Change Quantiles to match h2o-1; both Quantiles and Rollups now have the same default percentiles (github)
  • Massive cleanup and removal of old PCA, replacing with quadratically regularized PCA based on alternating minimization algorithm in GLRM (github)
  • Add model run time to DL Model Output (github)
  • Don't gather Neurons/Weights/Biases statistics (github)
  • Only store best model if override_with_best_model is enabled (github)
  • beta_eps added, passing tests changed (github)
  • For GLM, default values for max_iters parameter were changed from 1000 to 50.
  • For quantiles, probabilities are displayed.
  • Run Deep Learning Multinomial from Flow (HEXDEV-108)

#####API

  • Expose DL weights/biases to clients via REST call (PUBDEV-344)
  • Flow: Implement notification bar/API (PUBDEV-359)
  • Variable importance data in REST output for GLM (PUBDEV-359)
  • Add extra DL parameters to R API (average_activation, sparsity_beta, max_categorical_features, reproducible) (github)
  • Update GLRM API model output (github)
  • h2o.anomaly missing in R (PUBDEV-434)
  • No method to get enum levels (PUBDEV-432)

#####System

  • Improve memory footprint with latest version of h2o-dev (github)
  • For now, let model.delete() of DL delete its best models too. This allows R code to not leak when only calling h2o.rm() on the main model. (github)
  • Bind both TCP and UDP ports before clustering (github)
  • Round summary row#. Helps with pctiles for very small row counts. Add a test to check for getting close to the 50% percentile on small rows. (github)
  • Increase Max Value size in DKV to 256MB (github)
  • Flow: make parseRaw() do both import and parse in sequence (HEXDEV-184)
  • Remove notion of individual job/job tracking from Flow (PUBDEV-449)
  • Capability to name prediction results Frame in flow (PUBDEV-233)

####Bug Fixes

#####Algorithms

  • GLM binomial prediction failing (PUBDEV-403)
  • DL: Predict with auto encoder enabled gives Error processing error (PUBDEV-433)
  • balance_classes in Deep Learning intermittent poor result (PUBDEV-437)
  • Flow: Building GLM model fails (PUBDEV-186)
  • summary returning incorrect 0.5 quantile for 5 row dataset (PUBDEV-95)
  • GBM missing variable importance and balance-classes (PUBDEV-309)
  • H2O Dev GBM first tree differs from H2O 1 (PUBDEV-421)
  • get glm model from flow fails to find coefficient name field (PUBDEV-394)
  • GBM/GLM build model fails on Hadoop after building 100% => Failed to find schema for version: 3 and type: GBMModel (PUBDEV-378)
  • Parsing KDD wrong (PUBDEV-393)
  • GLM AIOOBE (PUBDEV-199)
  • Flow : Build GLM Model with family poisson => java.lang.ArrayIndexOutOfBoundsException: 1 at hex.glm.GLM$GLMLambdaTask.needLineSearch(GLM.java:359) (PUBDEV-210)
  • Flow : GLM Model Error => Enum conversion only works on small integers (PUBDEV-365)
  • GLM binary response, do_classfication=FALSE, family=binomial, prediction error (PUBDEV-339)
  • Epsilon missing from GLM parameters (PUBDEV-354)
  • GLM NPE (PUBDEV-395)
  • Flow: GLM bug (or incorrect output) (PUBDEV-252)
  • GLM binomial prediction failing (PUBDEV-403)
  • GLM binomial on benign.csv gets assertion error in predict (PUBDEV-132)
  • current summary default_pctiles doesn't have 0.001 and 0.999 like h2o1 (PUBDEV-94)
  • Flow: Build GBM/DL Model: java.lang.IllegalArgumentException: Enum conversion only works on integer columns (PUBDEV-213) (github)
  • ModelMetrics on cup98VAL_z dataset has response with many nulls (PUBDEV-214)
  • GBM : Predict model category output/inspect parameters shows as Regression when model is built with do classification enabled (PUBDEV-441)
  • Fix double-precision DRF bugs (github)

#####System

  • Null columnTypes for /smalldata/arcene/arcene_train.data (PUBDEV-406) (github)
  • Flow: Waiting for -1 responses after starting h2o on hadoop cluster of 5 nodes (PUBDEV-419)
  • Parse: airlines_all.csv => Airtime type shows as ENUM instead of Integer (PUBDEV-426) (github)
  • Flow: Typo - "Time" option displays twice in column header type menu in Parse (PUBDEV-446)
  • Duplicate validation messages in k-means output (PUBDEV-305) (github)
  • Fixes Parse so that it returns to supplying generic column names when no column names exist (github)
  • Flow: Import File: File doesn't exist on all the hdfs nodes => Fails without valid message (PUBDEV-313)
  • Flow: Parse => 1m.svm hangs at 42% (HEXDEV-174)
  • Prediction NFE (PUBDEV-308)
  • NPE doing Frame to key before it's fully parsed (PUBDEV-79)
  • h2o_master_DEV_gradle_build_J8 #351 hangs for past 17 hrs (PUBDEV-239)
  • Sparkling water - container exited due to unavailable port (PUBDEV-357)

#####API

  • Flow: Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410) (github)
  • Incorrect dest.type, description in /CreateFrame jobs (PUBDEV-404)
  • space in windows filename on python (PUBDEV-444) (github)
  • Python end-to-end data science example 1 runs correctly (PUBDEV-182)
  • 3/NodePersistentStorage.json/foo/id should throw 404 instead of 500 for 'not-found' (HEXDEV-163)
  • POST /3/NodePersistentStorage.json should handle Content-Type:multipart/form-data (HEXDEV-165)
  • by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: --- Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 (PUBDEV-92)
  • Sparkling water : val train:DataFrame = prostateRDD => Fails with ArrayIndexOutOfBoundsException (PUBDEV-392)
  • Flow : getModels produces error: Error calling GET /3/Models.json (PUBDEV-254)
  • Flow : Splitframe => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-410)
  • ddply 'Could not find the operator' (HEXDEV-162) (github)
  • h2o.table AIOOBE during NewChunk creation (HEXDEV-161) (github)
  • Fix warning in h2o.ddply when supplying multiple grouping columns (github)

###0.1.26.1051 - 2/13/15

####New Features

####Enhancements

#####System

  • Embedded H2O config can now provide flat file (needed for Hadoop) (github)
  • Don't logging GET of individual jobs to avoid filling up the logs (github)

#####Algorithms

  • Increase GBM/DRF factor binning back to historical levels. Had been capped accidentally at nbins (typically 20), was intended to support a much higher cap. (github)
  • Tweaked rho heuristic in glm (github)
  • Enable variable importances for autoencoders (github)
  • Removed group_split option from GBM
  • Flow: display varimp for GBM output (PUBDEV-398)
  • variable importance for GBM (github)
  • GLM in H2O-Dev may provide slightly different coefficient values when applying an L1 penalty in comparison with H2O1.

####Bug Fixes

#####Algorithms

  • Fixed bug in GLM exception handling causing GLM jobs to hang (github)
  • Fixed a bug in kmeans input parameter schema where init was always being set to Furthest (github)
  • Fixed mean computation in GLM (github)
  • Fixed kmeans.R (github)
  • Flow: Building GBM model fails with Error executing javascript (PUBDEV-396)

#####System

  • DataFrame propagates absolute path to parser (github)
  • Fix flow shutdown bug (github)

###0.1.26.1032 - 2/6/15

####New Features

#####General Improvements

  • better model output
  • support for Python client
  • support for Maven
  • support for Sparkling Water
  • support for REST API schema
  • support for Hadoop CDH5 (github)

#####UI

  • Display summary visualizations by default in column summary output cells (PUBDEV-337)
  • Display AUC curve by default in binomial prediction output cells (PUBDEV-338)
  • Flow: Implement About H2O/Flow with version information (PUBDEV-111)
  • Add UI for CreateFrame (PUBDEV-218)
  • Flow: Add ability to cancel running jobs (PUBDEV-373)
  • Flow: warn when user navigates away while having unsaved content (PUBDEV-322)

#####Algorithms

#####API

#####System

####Enhancements

#####UI

  • Added better message when h2o.init() not yet called (No active connection to an H2O cluster. Try calling "h2o.init()") (github)

#####Algorithms

  • Updated column-based gradient task to use sparse interface (github)
  • Updated LBFGS (added progress monitor interface, updated some default params), added progress and job support to GLM lbfgs (github)
  • Added pretty print (github)
  • Added AutoEncoder to R model categories (github)
  • Added Coefficients table to GLM model (github)
  • Updated glm lbfgs to allow for efficient lambda-search (l2 penalty only) (github)
  • Removed splitframe shuffle parameter (github)
  • Simplified model builders and added deeplearning model builder (github)
  • Add DL model outputs to Flow (PUBDEV-372)
  • Flow: Deep Learning: Expert Mode (PUBDEV-284)
  • Flow: Display multinomial and regression DL model outputs (PUBDEV-383)
  • Display varimp details for DL models (PUBDEV-381)
  • Make binomial response "0" and "1" by default (github)
  • Add Coefficients table to GLM model (github)
  • Removed splitframe shuffle parameter (github)
  • Update R GBM demos to reflect new input parameter names (github)
  • Rename GLM variable importance to normalized coefficient magnitudes (github)

#####API

  • Changed key to destination_key (github)
  • Cleaned up REST API schema interface (github)
  • Changed method name, cleaned setup, added a pyunit runner (github)

#####System

####Bug Fixes

#####UI

  • Flow: Parse => 1m.svm hangs at 42% (PUBDEV-345)
  • cup98 Dataset has columns that prevent validation/prediction (PUBDEV-349)
  • Flow: predict step failed to function (PUBDEV-217)
  • Flow: Arrays of numbers (ex. hidden in deeplearning)require brackets (PUBDEV-303)
  • Flow v.0.1.26.1030: StackTrace was broken (PUBDEV-371)
  • Flow: Import files -> Search -> Parse these files -> null pointer exception (PUBDEV-170)
  • Flow: "getJobs" not working (PUBDEV-320)
  • Thresholds x Metrics and Max Criteria x Metrics tables were flipped in flow (HEXDEV-155)
  • Flow v.0.1.26.1030: StackTrace is broken (PUBDEV-348)
  • flow: getJobs always shows "Your H2O cloud has no jobs" (PUBDEV-243)
  • Flow: First and last characters deleted from ignored columns (PUBDEV-300)
  • Sparkling water => Flow => Menu buttons for cell do not show up (PUBDEV-294)

#####Algorithms

  • Flow: Build K Means model with default K value gives error "Required field k not specified" (PUBDEV-167)
  • Slicing out a specific data point is broken (PUBDEV-280)
  • Flow: SplitFrame and grep in algorithms for flow and loops back onto itself (PUBDEV-272)
  • Fixed the predict method (github)
  • Refactor ModelMetrics into a different class for Binomial (github)
  • /Predictions.json did not cache predictions (HEXDEV-119)
  • Flow, DL: Error after changing hidden layer size (PUBDEV-323)
  • Error in node$h2o#node: $ operator is invalid for atomic vectors (PUBDEV-348)
  • Fixed K-means predict (PUBDEV-321)
  • Flow: DL build mode fails => as it's missing adding quotes to parameter (PUBDEV-301)
  • Flow: Build K means model with training/validation frames => unknown error (PUBDEV-185)
  • Flow: Build quantile mode=> Click goes in loop (PUBDEV-188)

#####API

#####System

  • guesser needs to send types to parse (PUBDEV-279)
  • Got h2o.clusterStatus function working in R. (github)
  • Parse: Using R => java.lang.NullPointerException (PUBDEV-380)
  • Flow: Jobs => click on destination key => unimplemented: Unexpected val class for Inspect: class water.fvec.DataFrame (PUBDEV-363)
  • Column assignment in R exposes NullPointerException in Rollup (PUBDEV-155)
  • import from hdfs doesn't add files (PUBDEV-260)
  • AssertionError: ERROR: got tcp resend with existing in-progress task (PUBDEV-219)
  • HDFS parse fails when H2O launched on Spark CDH5 (PUBDEV-138)
  • Flow: Parse failure => java.lang.ArrayIndexOutOfBoundsException (PUBDEV-296)
  • "predict" step is not working in flow (PUBDEV-202)
  • Flow: Frame finishes parsing but comes up as null in flow (PUBDEV-270)
  • scala >flightsToORD.first() fails with "not serializable result" (PUBDEV-304)
  • DL throws NPE for bad column names (PUBDEV-15)
  • Flow: Build model: Not able to build KMeans/Deep Learning model (PUBDEV-297)
  • Flow: Col summary for NA/Y cols breaks (PUBDEV-325)
  • Sparkling Water : util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread NanoHTTPD Session,9,main (PUBDEV-346)
  • toDataFrame doesn't support sequence format schema (array, vectorUDT) (PUBDEV-457)

###0.1.20.1019 - 1/19/15

####New Features

#####UI

  • Added various documentation links to the build page (github)

#####Algorithms

  • Ported matrix multiply over and connected it to rapids (github)

####Enhancements

#####UI

  • Allow user to specify (the log of) the number of rows per chunk for a new constant chunk; use this new function in CreateFrame (github)
  • Make CreateFrame non-blocking, now displays progress bar in Flow (github)
  • Add row and column count to H2OFrame show method (github)
  • Admin watermeter page (PUBDEV-234)
  • Admin stack trace (PUBDEV-228)
  • Admin profile (PUBDEV-227)
  • Flow: Add download logs in UI (PUBDEV-204)
  • Need shutdown, minimally like h2o (PUBDEV-74)

#####API

  • Changed 2 to 3 for JSON requests (github)
  • Rename some more fields per consistency (max_iters changed to max_iterations, _iters to _iterations, _ncats to _categorical_column_count, _centersraw to centers_raw, _avgwithinss to tot_withinss, _withinmse to withinss) (github)
  • Changed K-Means output parameters (withinmse to within_mse, avgss to avg_ss, avgbetweenss to avg_between_ss) (github)
  • Remove default field values from DeepLearning parameters schema, since they come from the backing class (github)
  • Add @API help annotation strings to JSON model output (PUBDEV-216)

#####Algorithms

  • Minor fix in rapids matrix multiplicaton (github)
  • Updated sparse chunk to cut off binary search for prefix/suffix zeros (github)
  • Updated L_BFGS for GLM - warm-start solutions during lambda search, correctly pass current lambda value, added column-based gradient task (github)
  • Fix model parameters' default values in the metadata (github)
  • Set default value of k = number of clusters to 1 for K-Means (PUBDEV-251)

#####System

  • Reject any training data with non-numeric values from KMeans model building (github)

####Bug Fixes

#####API

  • Fixed isSparse call for constant chunks (github)
  • Fixed sparse interface of constant chunks (no nonzero if const 1= 0) (github)

#####System

  • Typeahead for folder contents apparently requires trailing "/" (github)
  • Fix build and instructions for R install.packages() style of installation; Note we only support source installs now (github)
  • Fixed R test runner h2o package install issue that caused it to fail to install on dev builds (github)

###0.1.18.1013 - 1/14/15

####New Features

#####UI

####Enhancements

#####Algorithms


###0.1.20.1016 - 12/28/14

  • Added ip_port field in node json output for Cloud query (github)