GSoC: Test Function Suite #223

pkofod · 2016-06-14T21:45:40Z

I figured I should update details in separate issues instead of the one big issue #200 .

I am planning to start off by testing on CUTEst unconstrained functions, through the CUTEst.jl package. I installed it by following http://abelsiqueira.github.io/blog/installing-cutest-and-cutest.jl/ . The develop branch is necessary, as this has a grad! function defined already. The master branch also segfaulted on me quite often.

I had originally chosen the 216 unconstrained problems in CUTEst, but one seemed to have constraints, and CUTEst.jl had problems loading quite a few (I need to file an issue upstream), so now there are closer to 160.

An interesting feature of the test suite is how broadly we're testing in terms of dimensionality of x (call it N). It seems that a lot of functions are 2- or 3- dimensional, but also values of around 3000 and 5000 seem popular. 11 functions have N = 10000, and 2 have N = 20000. Some of the original formulations allow for a user defined N (like in the Rosenbrock case), but I'm not sure of CUTEst allows the user to specify this.

Anyway, this is a much broader test bed than we already have, and I'll try to post some initial benchmarks as soon as possible.

The text was updated successfully, but these errors were encountered:

mlubin · 2016-06-14T21:47:53Z

CC @abelsiqueira @dpo

dpo · 2016-06-14T22:19:58Z

Note that CUTEst.jl is now available from https://github.com/JuliaOptimizers/CUTEst.jl in more useable form. A lot of the CUTEst problems are also available in AMPL format from https://github.com/mpf/Optimization-Test-Problems and you can read them (once decoded) using https://github.com/JuliaOptimizers/AmplNLReader.jl. If you want to change the size of an AMPL model, you must edit the mod file and decode it again. If you want to see what sizes a CUTEst model comes in, you can do, e.g.,

sifdecoder -show LUBRIFC

and subsequently decode with

sifdecoder -param NN=50 LUBRIFC

CUTEst.jl doesn't currently support passing extra options to sifdecoder (I believe) but it would be easy to add that feature. You can select all unconstrained problems using the select tool (it's quite old school).
Don't hesitate if you encounter any trouble.

pkofod · 2016-06-15T06:36:23Z

Thank you for your reply. It was indeed the JuliaOptimizers version of CUTEst I installed, I should have made that clear. I only used the blog-post to install the homebrew-stuff. Thanks for the other pointers!

abelsiqueira · 2016-06-15T13:55:25Z

Hi, thanks for using my post, but it isn't updated to the last version of the packages.
One thing that is almost released is the bindeps version, that automatically install CUTEst for linux and OSX.
We're working on some changes using NLPModels.jl, and I've been working on it depth-first, so I haven't touched CUTEst.jl in two months. Contact me if there is something you want to expedite.

Also, I'm interested in the load problems you had.

pkofod · 2016-06-15T20:49:33Z

A bindeps version would be wonderful, as the whole homebrew thing was a bit, well, annoying :)

I would love to see what the following does on your installation: https://gist.github.com/pkofod/5dda61027b6829e91c40dccbe6f38f9b . On my computer, all of those problems throw an error, so works ends up being a vector of false.

dpo · 2016-06-15T20:54:20Z

A bindeps version would be wonderful, as the whole homebrew thing was a bit, well, annoying :)

Did something go wrong?

I would love to see what the following does on your installation: https://gist.github.com/pkofod/5dda61027b6829e91c40dccbe6f38f9b . On my computer, all of those problems throw an error, so works ends up being a vector of false.

What is the error? Did you try from the command line?

abelsiqueira · 2016-06-15T23:46:57Z

A bindeps version would be wonderful,

It's started here. Haven't tested it in a while, but it was working.

as the whole homebrew thing was a bit, well, annoying :)

If you're on linux, I agree with you. Notice how bindeps uses linux-cutest, which is a simpler alternative, although very underdeveloped (no tests, no reinstall, etc.)

abelsiqueira · 2016-06-16T00:32:17Z

I would love to see what the following does on your installation: https://gist.github.com/pkofod/5dda61027b6829e91c40dccbe6f38f9b . On my computer, all of those problems throw an error, so works ends up being a vector of false.

Tried it, and they all throw an error, but that's because those problems are not on $MASTSIF. Where did you get them?
To get all unconstrained problems, you can use

grep " .U" $MASTSIF/CLASSF.DB | cut -f1 -d" "

pkofod · 2016-06-16T20:14:44Z

Did something go wrong?

No, nothing went wrong, but as a linux user, homebrew is not really something I've ever used for anything before. It all worked out (after I found the blog post), but it would of course be preferable if Pkg.add("CUTEst") was all you needed :)

Tried it, and they all throw an error, but that's because those problems are not on $MASTSIF. Where did you get them?
To get all unconstrained problems, you can use

Okay, my bad then! The error was probably just that I didn't have them. I thought MASTSIF had all the CUTEst problems.Thanks for the grep.

pkofod · 2016-06-18T18:04:52Z

So, the revised test set is 173 unconstrained problems with the following dimensions:

nvar   count
2      28
3      14
4      4
5      1
6      3
7      1
8      9
10     3
11     1
12     1
15     1
50     8
63     1
99     1
100    2
200    7
500    1
1000   4
1024   2
2000   1
2550   2
2600   2
2652   1
3000   16
3549   1
4000   2
4900   1
4999   1
5000   36
5010   1
5625   2
10000  11
20000  2
100000 2

Quite a few big problems, and maybe a bit too few in the range 10-50 (I suspect a lot of users might be interested in these. I'm running stuff, but it takes a while, that's why there's been no update :)

pkofod · 2016-07-01T22:30:06Z

Just starting to get this under control. Ran BFGS with the 4 different line search algorithms we have. Below you see performance profiles for test problems up to (and including) dimension 5000. Seems like HZ (the default) is a pretty solid choice!

dpo · 2016-07-02T02:55:06Z

What are you measuring in this profile? Number of objective evals?

Ps: https://github.com/JuliaSmoothOptimizers/Profiles.jl

pkofod · 2016-07-02T06:44:11Z

No, this is simply for the attained minimum. Cool that you have a package for that in JSO. I just wrote it myself, as there was no license in the Matlab file on More and Wild's website. Here they are for f_calls

dpo · 2016-07-02T06:50:17Z

That's instructive. The two plots together tell a more complete story. Armijo remains amazingly robust given how simple it is. I'll have to look at the details of your interpolating linesearch.

pkofod · 2016-07-02T07:01:59Z

I would still say hz is dominating if we take into consideration the actual minimum they end up reporting. Edit: this was supposed to be just after the image not a response to you :)

pkofod · 2016-07-02T15:49:53Z

I should compare these with LBFGS. thanks again for the CUTEst package :) the newest version is even easier to install !

pkofod · 2016-07-02T16:15:31Z

This is a surprise... HZ linesearch isn't the best for HZ CG :)

pkofod · 2016-07-02T18:17:56Z

I simply can't run the largest problems on my laptop, and even the medium-large sized ones for Nelder-Mead, so I am waiting for the sysadmin to fix a symlink-thing on the server at the econ department. However, I'm starting to gather some evidence at least.

pkofod · 2016-07-02T20:44:10Z

I'm just spamming at this point. Any participants are welcome to unsubscribe if they're annoyed :)

This is actually quite surprising to me! (I took the upper tau to 20 for clarity, the picture remains the same with other taus bars). Conjugate Gradient is doing remarkably bad compared to Gradient descent here! I've cherry-picked the best line search algorithms for each "solver" or overall algorithm.

Edit Hm, I think the between solver results are wrong, but within each solver it should be correct. Will be back with an update :)

This should be the correct one:

Evizero · 2016-07-03T12:27:09Z

I didn't really follow along here, but just scrolling over the images made me curious. Are these for the Julia implementation in this package? if so what library did you use for plotting?

pkofod · 2016-07-03T12:56:07Z

Yes they are. Plots.jl with the Plotly backend

timholy · 2016-07-03T13:13:04Z

Almost surely a bug in CG, perhaps in the convergence criteria? You already noticed one typo, there may be others!

This is what's so wonderful about having a good test suite! 💯

pkofod · 2016-07-03T13:51:54Z

the dominating gradient descent was definitely an error on my part, but now I can't see the cgs in the plot. In a car, will check later. I agree that the test suite is going to be very helpful. The current test set is 155 functions, and it excludes the largest ones. Not a bad start I think.

pkofod · 2016-07-03T15:07:39Z

Now CG is actually in the plot, and doing much better than gradient descent which has 4 entries in the bottom 7 - auch :) BFGS still seems to dominate, and the BFGS+HZ solver seems to be a safe default (as it is).

(still waiting for the sysadmin to make a symlink to gfortran so I can install CUTEst at work...).

edit however mt is still better than hz for cg...

edit^2 one line was missing: the lbfgs hz solver!

I'll publish the code for this shortly, so others can find even more bugs... :)

mlubin · 2016-07-04T14:27:50Z

@pkofod, where is the code you're using to run these experiments?
EDIT: not sure if you mean that you'll publish the performance profile code or all of it together shortly

pkofod · 2016-07-04T15:23:32Z

@pkofod, where is the code you're using to run these experiments?
EDIT: not sure if you mean that you'll publish the performance profile code or all of it together shortly

I just mean that I'll post the code I used to produce these figures. It's only available if you're setting next to me right now. :) I'll post a gist or something. I just need to organize it a bit better.

pkofod · 2016-07-04T19:47:28Z

As an example, this code produces the figures posted in #220 , that I repeat here:

Do produce that figure, I first load the module, Plots, and Optim. Then I construct a vector of names, and a vector of Optimizer specifications to be benchmarked

using Plots, OptimBenchmarks, Optim

names = ["NelderMead with fixed parameters and unit step",
         "NelderMead with fixed parameters and relative_step (0.05)",
         "NelderMead with fixed parameters and relative step (0.1)",
         "NelderMead with fixed parameters and relative step (0.50)",
         "NelderMead with adaptive parameters and relative step (0.05)",
         "NelderMead with adaptive parameters and relative step (0.1)",
         "NelderMead with adaptive parameters and relative step (0.50)"]

opt_cases =[Optim.NelderMead(parameters = Optim.fixed_parameters,    initial_step = Optim.unit_step),
            Optim.NelderMead(parameters = Optim.fixed_parameters,    initial_step = x->Optim.relative_step(x; s = 0.05)),
            Optim.NelderMead(parameters = Optim.fixed_parameters,    initial_step = x->Optim.relative_step(x; s = 0.1)),
            Optim.NelderMead(parameters = Optim.fixed_parameters,    initial_step = x->Optim.relative_step(x; s = 0.5)),
            Optim.NelderMead(parameters = Optim.adaptive_parameters, initial_step = x->Optim.relative_step(x; s = 0.05)),
            Optim.NelderMead(parameters = Optim.adaptive_parameters, initial_step = x->Optim.relative_step(x; s = 0.1)),
            Optim.NelderMead(parameters = Optim.adaptive_parameters, initial_step = x->Optim.relative_step(x; s = 0.5))]

then I run the benchmarks. It uses try/catches, as an out of memory error or line search error would otherwise ruin everything. We should make line searches soft quit imo.

out = benchmark(names, opt_cases, Optim.OptimizationOptions(iterations = 20000); max_dim = 1000)

save("nelder_mead_benchmark.csv", out...)

Then, load the data as a dataframe (I just saved it to keep a copy), and plot it

dff = read_profile_data("nelder_mead_benchmark.csv")

tau = 1:0.5:20
profiles = performance_profile(names, dff, tau)

plot(tau, hcat(profiles...),
          label = hcat(names...),
          ls=[:solid  :dash :dash :dash :dot :dot :dot],
          size =(800,500),
          ylims = (0,1))

the module is in https://gist.github.com/pkofod/82ba6021e73cf9ee2289f35b08a09f2e (some things could be done more cleanly...)

(note to self, I should wait with the call to aggregate until the performance_profile call)

pkofod · 2017-01-03T19:54:32Z

This work is now in OptimTests, thanks for the feedback.

pkofod mentioned this issue Aug 12, 2016

Benchmarks? #37

Closed

pkofod closed this as completed Jan 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC: Test Function Suite #223

GSoC: Test Function Suite #223

pkofod commented Jun 14, 2016 •

edited

Loading

mlubin commented Jun 14, 2016

dpo commented Jun 14, 2016

pkofod commented Jun 15, 2016

abelsiqueira commented Jun 15, 2016

pkofod commented Jun 15, 2016 •

edited

Loading

dpo commented Jun 15, 2016

abelsiqueira commented Jun 15, 2016

abelsiqueira commented Jun 16, 2016

pkofod commented Jun 16, 2016

pkofod commented Jun 18, 2016

pkofod commented Jul 1, 2016

dpo commented Jul 2, 2016

pkofod commented Jul 2, 2016

dpo commented Jul 2, 2016

pkofod commented Jul 2, 2016 •

edited

Loading

pkofod commented Jul 2, 2016

pkofod commented Jul 2, 2016

pkofod commented Jul 2, 2016

pkofod commented Jul 2, 2016 •

edited

Loading

Evizero commented Jul 3, 2016 •

edited

Loading

pkofod commented Jul 3, 2016

timholy commented Jul 3, 2016

pkofod commented Jul 3, 2016

pkofod commented Jul 3, 2016 •

edited

Loading

mlubin commented Jul 4, 2016 •

edited

Loading

pkofod commented Jul 4, 2016

pkofod commented Jul 4, 2016 •

edited

Loading

pkofod commented Jan 3, 2017

GSoC: Test Function Suite #223

GSoC: Test Function Suite #223

Comments

pkofod commented Jun 14, 2016 • edited Loading

mlubin commented Jun 14, 2016

dpo commented Jun 14, 2016

pkofod commented Jun 15, 2016

abelsiqueira commented Jun 15, 2016

pkofod commented Jun 15, 2016 • edited Loading

dpo commented Jun 15, 2016

abelsiqueira commented Jun 15, 2016

abelsiqueira commented Jun 16, 2016

pkofod commented Jun 16, 2016

pkofod commented Jun 18, 2016

pkofod commented Jul 1, 2016

dpo commented Jul 2, 2016

pkofod commented Jul 2, 2016

dpo commented Jul 2, 2016

pkofod commented Jul 2, 2016 • edited Loading

pkofod commented Jul 2, 2016

pkofod commented Jul 2, 2016

pkofod commented Jul 2, 2016

pkofod commented Jul 2, 2016 • edited Loading

Evizero commented Jul 3, 2016 • edited Loading

pkofod commented Jul 3, 2016

timholy commented Jul 3, 2016

pkofod commented Jul 3, 2016

pkofod commented Jul 3, 2016 • edited Loading

mlubin commented Jul 4, 2016 • edited Loading

pkofod commented Jul 4, 2016

pkofod commented Jul 4, 2016 • edited Loading

pkofod commented Jan 3, 2017

pkofod commented Jun 14, 2016 •

edited

Loading

pkofod commented Jun 15, 2016 •

edited

Loading

pkofod commented Jul 2, 2016 •

edited

Loading

pkofod commented Jul 2, 2016 •

edited

Loading

Evizero commented Jul 3, 2016 •

edited

Loading

pkofod commented Jul 3, 2016 •

edited

Loading

mlubin commented Jul 4, 2016 •

edited

Loading

pkofod commented Jul 4, 2016 •

edited

Loading