function CROSS-VALIDATION-WRAPPER(Learner, k, examples) returns a hypothesis
local variables: errT, an array, indexed by size, storing training-set error rates
errV, an array, indexed by size, storing validation-set error rates
for size = 1 to ∞ do
errT[size], errV[size] ← CROSS-VALIDATION(Learner, size, k, examples)
if errT has converged then do
best_size ← the value of size with minimum errV[size]
return Learner(best_size, examples)
function CROSS-VALIDATION(Learner, size, k, examples) returns two values:
fold_errT ← 0; fold_errV ← 0
for fold = 1 to k do
training_set, validation_set ← PARTITION(examples, fold, k)
h ← Learner(size, training_set)
fold_errT ← fold_errT + ERROR-RATE(h, training_set)
fold_errV ← fold_errV + ERROR-RATE(h, validation_set)
return fold_errT ⁄ k, fold_errV ⁄ k
Figure ?? An algorithm to select the model that has the lowest error rate on validation data by building models of increasing complexity, and choosing the one with best empirical error rate on validation data. Here errT means error rate on the training data, and errV means error rate on the validation data. Learner(size, exmaples) returns a hypothesis whose complexity is set by the parameter size, and which is trained on the examples. PARTITION(examples, fold, k) splits examples into two subsets: a validation set of size N ⁄ k and a training set with all the other examples. The split is different for each value of fold.