Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a subclass of field_set without continuous representation as variable-length sequences of bits. #10

Open
ArleyRistar opened this issue Jul 6, 2015 · 56 comments

Comments

@ArleyRistar
Copy link

By recommendation of Nil, I'm opening a issue initially discussed on the Slack moses channel.
Summary of the situation:
I'm adding particle swarm to the optimization part of moses, and representing continuous knobs as variable-length sequences of bits is causing loss of information and ps doesn't need this discrete representation.
As discussed in the channel, I was thinking of change this representation just inside particle swarm, but there's the problem of returning the deme to the metapopulation that would lose information in the same way.
So Ben said that this design "doesn't really need to persist".

I was thinking that I could "create a modified version of field_set overriding the contin parts to use 1 packet_t as float and use this modified version only with particle swarm".
I still don't know how it can affect other parts like the metapopulation, or if it can be done easily.
Because of that i'm opening this issue looking for opinions.

Have in mind that will use a little more ram for each instance with contin parts, but with a better precision and less processing.

@linas
Copy link
Member

linas commented Jul 7, 2015

Hi Arley,

... and Nil,

I think we need to fix contin_t for "ordinary" moses, not just particle-swarm moses.

How about this idea: we store a full-length float or double in the deme, AS WELL AS the contin_t. The value of the double is used in the actual combo tree, whereas the contin_t is used only during knob-turning, to store the current variance.

Thus, during knob turning, the contin_t is used to generate a random double-precision number, with a PDF centered at zero, and a variance equal to the contin_t value. This random offset is then added to the double-precision number in the deme. (The random offset is stored in the instance, or maybe the sum of the value and the offset is stored .. ). Thus, knob-turning generates a random walk. (During knob turning, the variance is also either made smaller, is kept as-is, or is increased).

The above is how we should do floats in "ordinary", non-particle-swarm MOSES.

For the particle-swarm moses, you may want to do knob turning differently (I don't really understand how), but the above mechanism will at least allow you to store the full-length double-precision number in both the instance and in the deme.

@bgoertzel
Copy link

yah -- This seems like a reasonable approach for now.. any thoughts from
Dr. Geisweiller?

ben

On Tue, Jul 7, 2015 at 10:29 AM, Linas Vepštas [email protected]
wrote:

Hi Arley,

... and Nil,

I think we need to fix contin_t for "ordinary" moses, not just
particle-swarm moses.

How about this idea: we store a full-length float or double in the deme,
AS WELL AS the contin_t. The value of the double is used in the actual
combo tree, whereas the contin_t is used only during knob-turning, to store
the current variance.

Thus, during knob turning, the contin_t is used to generate a random
double-precision number, with a PDF centered at zero, and a variance equal
to the contin_t value. This random offset is then added to the
double-precision number in the deme. (The random offset is stored in the
instance, or maybe the sum of the value and the offset is stored .. ).
Thus, knob-turning generates a random walk. (During knob turning, the
variance is also either made smaller, is kept as-is, or is increased).

The above is how we should do floats in "ordinary", non-particle-swarm
MOSES.

For the particle-swarm moses, you may want to do knob turning differently
(I don't really understand how), but the above mechanism will at least
allow you to store the full-length double-precision number in both the
instance and in the deme.


Reply to this email directly or view it on GitHub
#10 (comment).

Ben Goertzel, PhD
http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one
persists in trying to adapt the world to himself. Therefore all progress
depends on the unreasonable man." -- George Bernard Shaw

@ngeiswei
Copy link
Member

ngeiswei commented Jul 7, 2015

@linas the current state of the code already nearly does that, a contin_spec contains a mean and any knob turning is centered around that mean. However the variance is replaced by the expansion combined with the step, so if you tweak the contin trits (Left, Right, Stop) you get some distribution, let's call it a CTD (for Contin Trits Distribution), it's not a Gaussian, but maybe it approaches one.

Anyway, I agree that the distribution should not be limited to a CTD. My advice would be to add or upgrade the neighborhood sampling methods to take in argument a distribution, could be a Gaussian or otherwise.

I don't think you need to change anything else. And here's why, candidates in the deme are represented as instances

typedef std::vector<packed_t> instance;

see instance.h

If you were to represent the contin_spec directly as a float you'd still need to hack the code to en/decode a contin into an instance (see field_set::pack in field_set.h). It's not impossible, but it adds a coding burden to our student. And if you run out of resolution you can always increase the contin_spec depth, so why bother.

So to sum up, yes ps doesn't need this contin trits representation, but MOSES does. Ultimately I don't think this en/decoding of candidates into instances is good. It has the advantage of saving a bit of RAM at the expense of making the code more rigid and complex. But getting rid of it is probably too much work for this project.

I'll get back with more details about what should be hacked to enable neighborhood sampling with any distribution, not just CTDs.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 7, 2015

Hmm, on the other hand converting a float into bytes is trivial, and it would save some CPU time.

Anyway, you'd still need to upgrade the neighborhood sampling to support any distribution, those 2 issues are really independent. So I suggest

  1. Upgrade the neighborhood sampling distribution to support any distribution.
  2. If it turns out manipulating large contin with the current trit representation is a bottleneck, then hack the contin_spec and en/decoding methods to support float representation.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 7, 2015

@ArleyRistar The code to hack to support 1. is definitely in

moses/moses/moses/neighborhood_sampling.{h,cc}

you may start to study it, till I come back with more information.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 7, 2015

At that point I really need to understand exactly what you want to do and how. There are several ways to go about that problem, like the distribution could be passed to the sampling function (sample_from_neighborhood, then generate_contin_neighbor or another new one), or it could be inside the contin_spec.

Also there is a problem I had not thought about, the mean, in the contin_spec, remains the same during the deme optimization, it cannot be changed because otherwise the contin values corresponding to the instances will not be correctly reconstructed. So if you really want to have the mean change during deme optimization we need to address that.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 7, 2015

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.

@ArleyRistar
Copy link
Author

Nil, i'm seeing the neighborhood sampling now, i'll try to solve this first
problem.

In your penultimate email, i'm assuming that it was for me. I'll see how
can i do that, but leaving the distribution inside contin_spec it's what
makes more sense to me for now.
As for changing the mean, don't worry, what i said in the channel about
changing the mean was just a option that i thought before Ben say that this
representation doesn't need to persist.
Solving the problem 2, ps can use the float representation.

Yes, the code for ps is in there.
I didn't made any outside documentation yet, most of the things i was
putting as commentaries inside the code or posting in the channel.

On Tue, Jul 7, 2015 at 10:57 AM, ngeiswei [email protected] wrote:

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

From the things I thought this seems a acceptable idea for 1; forgive me if
I misunderstood what i was supposed to do or if this is a bad idea.

Nil, we could maintain neighborhood sampling as it is now and change
contin_stepper and contin_spec.
Not even get_contin and set_contin from field_set would change or by just a
little (for 1).

Let me explain in details:
For now we have a behavior similar to this:
https://code.google.com/p/moses/wiki/ModelingAtomSpaces
For example, with mean = 0, expansion = 2 and S=Stop, L=Left, R=Right:
S -> 0
LS -> -2 step = 4
RS -> 2 step = 4
LLS -> -6 step = 8
LRS -> -1 step = 0.5
RLS -> 1 step = 0.5
RRS -> 6 step = 8
...
It has a better precision next to it's mean, and it kind of resembles a
gaussian to me.

My idea is to modify contin_stepper left() and right() to have a behavior
similar to the link, but between [0,1]. In this case it would start with
0.5 as mean and decreasing step_size by half, starting with 0.25.

Applying this contin_stepper to the actual LRS representation would give us
a probability that could be applied to a (cumulative) distribution function.
The distribution function would be stored inside contin_spec in the place
of mean and step_side (depth has to stay to use in the stepper).

I could write some distributions that would be choose as a parameter
(leaving normal as default) and for the parameters, it's given in the
construction of the contin_spec, mean and expansion (as variation but this
last would be better if if bigger).

That makes sense or should i think of other solution?

On Tue, Jul 7, 2015 at 12:52 PM, Arley Ristar [email protected] wrote:

Nil, i'm seeing the neighborhood sampling now, i'll try to solve this
first problem.

In your penultimate email, i'm assuming that it was for me. I'll see how
can i do that, but leaving the distribution inside contin_spec it's what
makes more sense to me for now.
As for changing the mean, don't worry, what i said in the channel about
changing the mean was just a option that i thought before Ben say that this
representation doesn't need to persist.
Solving the problem 2, ps can use the float representation.

Yes, the code for ps is in there.
I didn't made any outside documentation yet, most of the things i was
putting as commentaries inside the code or posting in the channel.

On Tue, Jul 7, 2015 at 10:57 AM, ngeiswei [email protected]
wrote:

OK, I presume the code so far is here

https://github.com/arleyristar/moses/commits/master

Please point to me as well any relevant doc.


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Jul 8, 2015

On Tue, Jul 7, 2015 at 8:17 PM, ngeiswei [email protected] wrote:

@linas https://github.com/linas the current state of the code already
nearly does that,

Arghh... yes but no. I understand perfectly well how contin works in moses,
and basically, it sucks. It needs a major revamp. I used it a lot, and
eventually gave up, because it behaved so badly. I think I know why it
mis-behaves, and the sketch I made was for an approach that I think could
significantly improve it.

The contin_t, as currently designed, attempts to use the center-points of
the Cantor set, which, in a certain naive sense, does have a uniform
distribution over the reals, but it fails for practical machine learning
problems. The problem is that the vertexes in the Cantor tree will almost
never be close to the actual value you are trying to learn, and it takes
too many iterations to get to the vertex that does fit your data. e.g
trying to converge to 1.333 by using 1.0 1.5 1.25 1.375 ... is ineffective
and slow. It is far too many steps for MOSES to get the right value. Its
even worse if you want learn more than one float.

I am kind-of guessing that storing full 64-bit floats, and using a random
walk will converge faster than dyadic Cantor tree walks. I can't prove that
it will converge faster, but contin_t works so poorly that almost anything
will be better.

@linas
Copy link
Member

linas commented Jul 8, 2015

p.s. what you call a "contin trit" is called a "Cantor tree" in the general world. http://blogs.scientificamerican.com/critical-opalescence/files/2012/08/f5.jpg The cantor tree unfolds into the "modular group" and there is all sorts of neat stuff there.

@ArleyRistar
Copy link
Author

So, from Linas idea and issue 2 from Nil:
contin_spec and contin_stepper won't be used and are almost useless now, it
doesn't make sense maintain it in field_set.
get_contin and set_contin has to change to insert the float (contin_t is
already a double) inside instance.
Change generate_contin_neighbor to do the random walk.
Plus a lot of minor updates.

While this isn't decided, I'll be fixing other things in ps.

On Wed, Jul 8, 2015 at 1:31 AM, Linas Vepštas [email protected]
wrote:

p.s. what you call a "contin trit" is called a "Cantor tree" in the
general world.
http://blogs.scientificamerican.com/critical-opalescence/files/2012/08/f5.jpg
The cantor tree unfolds into the "modular group" and there is all sorts of
neat stuff there.


Reply to this email directly or view it on GitHub
#10 (comment).

@ngeiswei
Copy link
Member

ngeiswei commented Jul 9, 2015

@linas, if you update the contin neighborhood sampling you don't have to iterate 1.0 1.5 1.25 1.375 ... to get to 1.333. You just set the contin_spec with enough depth and you can set the value 1.333 right away.

@ArleyRistar contin_spec is certainly useful, but could indeed be greatly simplified, both in code and performances, if the cantor tree representation was replaced by a float representation. Although there is still one area the cantor tree representation wins. If the resolution of the contin values turn out to be low then it takes much less memory, a few bits instead of a systematic 32 bits. It's probably not worth it but useful to keep in mind, and again you don't need to change the contin_spec code to allow to sample with other distributions (unless you put the distribution in the contin_spec of course), as I explain above.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 9, 2015

@ArleyRistar you cannot change the mean, the expension or the step of a contin_spec, once constructed, or at least it wouldn't make much sense because in order reconstruct the contin value given the packed bit string you need those, if they change while optimizing the deme the contin values on the candidates will also change, so the older candidates won't have their contin values correctly reconstituted.

That's why I want to understand exactly how you're algorithm is gonna work, I'm currently reading

https://en.wikipedia.org/wiki/Particle_swarm_optimization

let me know if you follow that, or some alteration of it (this will help me to understand your code when I start reviewing it).

@ngeiswei
Copy link
Member

ngeiswei commented Jul 9, 2015

@ArleyRistar I'm only 1/3 way through reviewing your code, but I think you don't need to change the neighborhood sampling functions, all sampling code is taking place in the particle_swarm struct, and that makes sense given its peculiarities (you don't sample the neighborhood directly, you sample the coefficients to construct the velocities).

So I don't think you need to do 1. or 2. All you need is to increase the depth of the contin_spec. If that becomes a performance bottleneck then you can work toward improving contin_spec, etc. But if the performance overhead is negligible, you don't need to bother (unless your really want to).

Anyway, I'm keeping reviewing, I may change my mind once I see the whole thing.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 9, 2015

@ArleyRistar I'm re-reading your first comment when you created that issue. You got it all. Forget my comment about modifying the neighborhood distribution, I got there because of Linas initial comment and my lack of understanding of ps. So I stand by my comment right above, that can be synthesized into these questions

  1. Have you tried to set the depth high enough so that the loss of information isn't an issue?
  2. Does it create a performance bottleneck?

If the answers are yes, then I think what you suggest is the correct hack, replace the cantor tree by float binary rep. Of course before we go there, we want to do things well, record the performances of a bunch of runs before, to compare with after the code optimization, etc. And I'll probably want to review the contin_spec once more, just in case I see something simple and bright to do. But first things first, answer those 2 questions.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 9, 2015

For instance if you could plot

  1. Performance
  2. Run-time
  3. RAM

w.r.t. depth (varying from say 5 to 32), for one or more optimization problems that you know ps is supposed perform OK, that would definitely help a lot to assess what should be done next.

Ideally you would even repeat that over several random seeds, say 10, to cancel out noise, get smoother charts.

Once you know the kind of depth you need, you may run that into a profiler like valgrind, so you can get an idea of how much performance is wasted by the cantor tree representation.

@ArleyRistar
Copy link
Author

I'll change the code to increase the depth and test with it as Nil said.
I'll post the results here, but it can take a while (give me 1 day).

On Thu, Jul 9, 2015 at 6:45 AM, ngeiswei [email protected] wrote:

For instance if you could plot

  1. Performance
  2. Run-time
  3. RAM

w.r.t. depth (varying from say 5 to 32), for one more more optimization
problems that you know ps is supposed perform somewhat OK, that would
definitely help a lot to assess what should be done next.

Ideally you would even repeat that over several random seeds, say 10, to
cancel out noise, get smoother charts.

Once you know the kind of depth you need, you may run that into a profiler
like valgrind, so you can get an idea of how much performance is wasted by
the cantor tree representation.


Reply to this email directly or view it on GitHub
#10 (comment).

@ngeiswei
Copy link
Member

ngeiswei commented Jul 9, 2015

It may take you a few days even, but it's worth it, not just for ps. Thanks!

@linas
Copy link
Member

linas commented Jul 9, 2015

Hi Nil, Arley,

I have to retract/modify my earlier comments as well; they were based on a misunderstanding of sorts. So, what I meant to say was this:

-- I am very excited by the PSO work; I think it will be an excellent way of handling float-point numbers in moses. My grumbling was less about the contin representation itself, than it was with how the the hill-climbing algo treated it: hill-climbing by twiddling one binary bit at a time of a contin was just terribly slow, inefficient, awful.

-- Yes, contins offer a small, compact way to store floats or doubles. I guess its harmless to use them in the combo trees: you just need to write two functions (maybe they exist??) to convert double to the contin rep, and back. As long as you save enough digits, it should not be a problem.

-- If contins are replaced in the combo trees, then I strongly recommend using 64-bit floats (i.e. doubles), do NOT mess with 32-bit floats. Messing about with 32-bit floats is not worth the effort. Its a bad idea.

-- Nil, the way I currently understand it is as follows: every combo tree would have one to hundreds of doubles in it. If the metapop has 10K trees in it, then there would be 10K trees/metapop * 200 doubles/tree * 8 bytes/double = 16MBytes of doubles in the metapop: hardly a big amount.

-- During a single PSO step, we pick just one tree, and so PSO only needs to optimize 1 to a few hundred doubles. If you have 1K particles per swarm, that is about 200K doubles per PSO step, which again is not very big, and might even fit in modern-day CPU caches.

So I see no problem at all with storing doubles everywhere. Done right PSO will be awesome for moses. I'm actually excited, now that I think I understand it.

@ngeiswei
Copy link
Member

ngeiswei commented Jul 9, 2015

Linas, you wrote,

"-- Yes, contins offer a small, compact way to store floats or doubles. I guess its harmless to use them in the combo trees: you just need to write two functions (maybe they exist??) to convert double to the contin rep, and back. As long as you save enough digits, it should not be a problem."

Yes they exist, field_set::get_contin and field_set::set_contin, but in fact you can directly en/deconde a contin via dereferencing the contin_iterator.

Yeah, you're right, the extra RAM will probably be negligible. I'd still be interested in seeing how increasing the depth would help the quality of the results, and whether this en/decoding is really hurting run-time performance, though. I mean it's nice to know the actual gain of changing a piece of code (although the code simplification is a gain by itself).

@linas
Copy link
Member

linas commented Jul 9, 2015

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin entirely, but I'm tired just right now and don't really care. :-) Do the right thing.

@ArleyRistar
Copy link
Author

TL;DR update:
Problem, worse performance, don't expect good results from what Nil asked.
This weekend i'll try to use Valgrind.

Detailed update:
I don't have the results that Nil asked because i'm running in some
problems that i'll share with you:

The implementation in the last commit has a "mistake", it almost always
initialize the particle with a really big value [double min, double max]
and when i get the result (after the conversion) it always has the value of
the mean (0 or 1 in this case); and it happens after the update too.

It's clearly wrong, but this way it runs very quickly, like 1000 evals in 3
sec (i7 4500U, predicates m1000).
When i changed to a start with smaller values between [-30, 32](range when
depth is 5) i got the result right (in fact the conversion works right with
anything a little minor than 3E+17, it gives 32 that is the limit, more
than that it returns the mean value).

The thing is that it's taking a much longer time now (45s) and when i apply
a restriction to the values to not happen a wrong conversion
(-2.8e+17<x<2.8e+17) or a simply [-30, 32] it's becoming worse in time
(150+s).

The worse thing? it doesn't happen because of the particle-swarm update of
the particles, the bottleneck is when it gets the score (that is coded
exactly like the hill climbing) AND it almost didn't improved in results
because it depends of the discrete part too and it always fall in local
maximum.

Changing the problem didn't help either (iris m2000 has very little contin
%):
hc: score - 48.79, time: 78s
ps: score - 100, time: 90s (42s when it was wrong)

Obs: When i change set_contin to do nothing, it gets 3s in predicates
-m1000 (iris m2000 44s) even with the limit so it's almost certain that
using set_contin all the time is spending too much processing.

Nil don't expect much good results from what you asked. This weekend i'll
give a look at Valgrind (i don't know how to use it yet, i'm used to just
use time) and return with a better detailed experiment.

On Thu, Jul 9, 2015 at 12:11 PM, Linas Vepštas [email protected]
wrote:

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin
entirely, but I'm tired just right now and don't really care. :-) Do the
right thing.


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Jul 11, 2015

What is the test problem that you are running? --linas

On Sat, Jul 11, 2015 at 9:53 AM, arleyristar [email protected]
wrote:

TL;DR update:
Problem, worse performance, don't expect good results from what Nil asked.
This weekend i'll try to use Valgrind.

Detailed update:
I don't have the results that Nil asked because i'm running in some
problems that i'll share with you:

The implementation in the last commit has a "mistake", it almost always
initialize the particle with a really big value [double min, double max]
and when i get the result (after the conversion) it always has the value of
the mean (0 or 1 in this case); and it happens after the update too.

It's clearly wrong, but this way it runs very quickly, like 1000 evals in 3
sec (i7 4500U, predicates m1000).
When i changed to a start with smaller values between [-30, 32](range when
depth is 5) i got the result right (in fact the conversion works right with
anything a little minor than 3E+17, it gives 32 that is the limit, more
than that it returns the mean value).

The thing is that it's taking a much longer time now (45s) and when i apply
a restriction to the values to not happen a wrong conversion
(-2.8e+17<x<2.8e+17) or a simply [-30, 32] it's becoming worse in time
(150+s).

The worse thing? it doesn't happen because of the particle-swarm update of
the particles, the bottleneck is when it gets the score (that is coded
exactly like the hill climbing) AND it almost didn't improved in results
because it depends of the discrete part too and it always fall in local
maximum.

Changing the problem didn't help either (iris m2000 has very little contin
%):
hc: score - 48.79, time: 78s
ps: score - 100, time: 90s (42s when it was wrong)

Obs: When i change set_contin to do nothing, it gets 3s in predicates
-m1000 (iris m2000 44s) even with the limit so it's almost certain that
using set_contin all the time is spending too much processing.

Nil don't expect much good results from what you asked. This weekend i'll
give a look at Valgrind (i don't know how to use it yet, i'm used to just
use time) and return with a better detailed experiment.

On Thu, Jul 9, 2015 at 12:11 PM, Linas Vepštas [email protected]
wrote:

OK, sure. My knee-jerk reaction is "simplify the code" and remove contin
entirely, but I'm tired just right now and don't really care. :-) Do the
right thing.


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

@ngeiswei
Copy link
Member

@ArleyRistar I'm wondering if the super fast run-times aren't due to the cache. If the candidates are idiotically identical that could turn most evaluations into cache hits. Can you run your experiment with the debug logging level and send it to us?

@ArleyRistar
Copy link
Author

I was running with predicates.csv -W1 -u pred -m1000.
I'll send, but I didn't know that were a cache in the evaluations so it
appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected] wrote:

@ArleyRistar https://github.com/arleyristar I'm wondering if the super
fast run-times aren't due to the cache. If the candidates are idiotically
identical that could turn most evaluations into cache hits. Can you run
your experiment with the debug logging level and send it to us?


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

I forgot to post here too:
https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_
or read below.
Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B 0
-E 1 parameters) and with reduction; it's separated this way because the
reduction part affects the particle swarm (PS) too much (clean_combo_tree
uses 99.73% with reduction in PS). Besides that, the experiments were split
in three parts: analysis of the memory, processing and time. The memory
analysis was done using the valgrind tool massif running for 4 values of
depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The
processing analysis was done in the same way of the memory, but using the
tool callgrind. The time analysis was done in a different way, using a
simple python script to measure the mean time of seeds between 1 and 10
running between depth 2 and 32 for PS and HC. The dataset used was
predicates.csv because it is a simple one that is already inside the
examples folder of Moses; It was running for a maximum of 1000 evaluations,
a small value because of the valgrind tools that increase the program time
in 20x for massif and between 20x and 100x for callgring [1] but it's a
sufficient amount of evaluations to test (HC in some seeds can find the
best solution in less than 1000 evals).
2. Results description:2.1. Memory:

  • PS with reduction spend more memory than without.
  • Increasing in the depth of PS, increases the memory used, but not that
    much, sometimes it can decrease, because it can find better results that
    are less complex.
  • HC without reduction reaches the ram limit independently of depth.
  • HC with reduction uses far less memory, even less than PS without
    reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too much
    depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a percentage
    much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it found
    a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's stuck
    in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because
    this problem didn't need precision or a bigger value to reach the max.
    global.
  1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with
thousands of instances will not reach the amount of ram that the computers
today have, and even if do, there's always feature selection; but let's
take a look:
Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B
step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each
dimension in a deme. While a double representation uses 8B * num_instances.
So using variable-length representation with a small depth for a larger
number of instances still makes sense for RAM economy. But i'm still
felling that this way could be better in 2 forms. 1- It uses 2 bits to
represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth
can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or
RRRRR if the get/set contin function limits the for with the depth too?
3.2. Processing:

With variable length representation, processing is just a problem with PS,
and just if it have a bigger depth. In all cases the clean_combo_tree and
merge_demes are the ones that use more processing and the specific part of
HC or PS are small (PS is bigger just because of set_contin and
get_contin). In the PS i'm having to call get_contin a lot of times (to get
the best values), and it isn't helping. To solve it i'll have to store the
values in doubles, that in the end will lose the benefices of using the
variable length representation.
3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to
work best with HC where each one in a eval are very close. In the initial
phase of PS the instance are very different one of the other. I would say
that PS just is running quicker without reduction because it returns a deme
smaller than HC because it doesn't include the worst solutions.
3.4. General:

The variable length representation don't have that much impact in memory or
performance, it just have when using PS and for complex problems. Even so,
I still find better to change it to a double representation, to get a
little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
  • depth 5: peak 1.2842 MiB = 0.168322662 MB
  • depth 16: peak 1.3022 MiB = 0.170681958 MB
  • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions,
    and than, less complexity)
  • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
    • get_contin: 1.95%
    • set_contin: 0.7%
    • update_particles: 3.17%
    • complexity_based_scorer: 80.65%
  • depth 16:
    • get_contin: 5.39%
    • set_contin: 1.97%
    • update_particles: 6.98%
    • complexity_based_scorer: 77.19%
  • depth 32:
    • get_contin: 11.02%
    • set_contin: 4.15%
    • update_particles: 14.16%
    • complexity_based_scorer: 77.30%
  • depth 128:
    • get_contin: 16.18%
    • set_contin: 10.34%
    • update_particles: 24.40%
    • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
  1. HC without reduction:2.1. Massif:
    • depth 5: peak 53.1730 MiB = 6.96949146 MB
    • depth 16: peak 53.1730 MiB = 6.96949146 MB
    • depth 32: peak 53.1730 MiB = 6.96949146 MB
    • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 16:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.21%
  • depth 32:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 128:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.06%
    • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
  1. PS with reduction:3.1. Massif:
    • depth 5: peak 2.7437 MiB = 0.359622246 MB
    • depth 16: peak 2.8201 MiB = 0.369636147 MB
    • depth 32: peak 4.2963 MiB = 0.563124634 MB
    • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
    • get_contin: 0.11%
    • set_contin: 0.01%
    • update_particles: 0.02%
    • complexity_based_scorer: 91.94%
  • depth 16:
    • get_contin: 0.11%
    • set_contin: 0.02%
    • update_particles: 0.06%
    • complexity_based_scorer: 93.25%
  • depth 32:
    • get_contin: 0.12%
    • set_contin: 0.04%
    • update_particles: 0.15%
    • complexity_based_scorer: 91.06%
  • depth 128:
    • get_contin: 0.25%
    • set_contin: 0.16%
    • update_particles: 0.37%
    • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
  1. HC with reduction:4.1. Massif:
    • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
    • get_contin: 0.42%
    • set_contin: 0%
    • hill_climbing: 0.08%
    • complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time
with results) you can just ask me, i'll not but it in the github, it's too
big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000.
I'll send, but I didn't know that were a cache in the evaluations so it
appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected]
wrote:

@ArleyRistar https://github.com/arleyristar I'm wondering if the super
fast run-times aren't due to the cache. If the candidates are idiotically
identical that could turn most evaluations into cache hits. Can you run
your experiment with the debug logging level and send it to us?


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

Update from what we talked about in the Moses channel.

  • I sent the above post check how depth affects the code.
  • In the conclusion i found a issue different from what we were testing (PS
    is slow with reduction).
  • Ben ask for a hypotheses.
  • I respond that it may be of Nil said 3 emails ago.
  • I show a callgrind that has a big difference in the number of
    instructions to reduct HC and PS.
  • I start to talk to Nil.
  • Empirically, i tested that HC will have bad performance too if it change
    the continuous knobs every eval too (because PS do this).
  • Nil and I didn't decided anything about reduction.
  • But we decided to change the variable length representation to double
    (that was what Linas wanted 7 emails ago).
  • We decided to change instance<packet_t> to std::bitset (i have a update
    about it below).
  • Ben gave some good ideas that we could try, but he'll be offline until
    Sunday, so it would be better wait for him.
  • Linas confirms that reduction with contin works well (in the post we
    could see that it really makes a big difference for HC independently of
    depth).

For now on i'll make my updates here to avoid any confusion.

On Mon, Jul 27, 2015 at 9:38 AM, Arley Ristar [email protected] wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_
or read below.
Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B 0
-E 1 parameters) and with reduction; it's separated this way because the
reduction part affects the particle swarm (PS) too much (clean_combo_tree
uses 99.73% with reduction in PS). Besides that, the experiments were split
in three parts: analysis of the memory, processing and time. The memory
analysis was done using the valgrind tool massif running for 4 values of
depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The
processing analysis was done in the same way of the memory, but using the
tool callgrind. The time analysis was done in a different way, using a
simple python script to measure the mean time of seeds between 1 and 10
running between depth 2 and 32 for PS and HC. The dataset used was
predicates.csv because it is a simple one that is already inside the
examples folder of Moses; It was running for a maximum of 1000 evaluations,
a small value because of the valgrind tools that increase the program time
in 20x for massif and between 20x and 100x for callgring [1] but it's a
sufficient amount of evaluations to test (HC in some seeds can find the
best solution in less than 1000 evals).
2. Results description:2.1. Memory:

  • PS with reduction spend more memory than without.
  • Increasing in the depth of PS, increases the memory used, but not
    that much, sometimes it can decrease, because it can find better results
    that are less complex.
  • HC without reduction reaches the ram limit independently of depth.
  • HC with reduction uses far less memory, even less than PS without
    reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too much
    depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a
    percentage much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it
    found a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's
    stuck in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because
    this problem didn't need precision or a bigger value to reach the max.
    global.
  1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with
thousands of instances will not reach the amount of ram that the computers
today have, and even if do, there's always feature selection; but let's
take a look:
Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B
step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each
dimension in a deme. While a double representation uses 8B * num_instances.
So using variable-length representation with a small depth for a larger
number of instances still makes sense for RAM economy. But i'm still
felling that this way could be better in 2 forms. 1- It uses 2 bits to
represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth
can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or
RRRRR if the get/set contin function limits the for with the depth too?
3.2. Processing:

With variable length representation, processing is just a problem with PS,
and just if it have a bigger depth. In all cases the clean_combo_tree and
merge_demes are the ones that use more processing and the specific part of
HC or PS are small (PS is bigger just because of set_contin and
get_contin). In the PS i'm having to call get_contin a lot of times (to get
the best values), and it isn't helping. To solve it i'll have to store the
values in doubles, that in the end will lose the benefices of using the
variable length representation.
3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to
work best with HC where each one in a eval are very close. In the initial
phase of PS the instance are very different one of the other. I would say
that PS just is running quicker without reduction because it returns a deme
smaller than HC because it doesn't include the worst solutions.
3.4. General:

The variable length representation don't have that much impact in memory
or performance, it just have when using PS and for complex problems. Even
so, I still find better to change it to a double representation, to get a
little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
  • depth 5: peak 1.2842 MiB = 0.168322662 MB
  • depth 16: peak 1.3022 MiB = 0.170681958 MB
  • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better
    solutions, and than, less complexity)
  • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
    • get_contin: 1.95%
    • set_contin: 0.7%
    • update_particles: 3.17%
    • complexity_based_scorer: 80.65%
  • depth 16:
    • get_contin: 5.39%
    • set_contin: 1.97%
    • update_particles: 6.98%
    • complexity_based_scorer: 77.19%
  • depth 32:
    • get_contin: 11.02%
    • set_contin: 4.15%
    • update_particles: 14.16%
    • complexity_based_scorer: 77.30%
  • depth 128:
    • get_contin: 16.18%
    • set_contin: 10.34%
    • update_particles: 24.40%
    • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
  1. HC without reduction:2.1. Massif:
    • depth 5: peak 53.1730 MiB = 6.96949146 MB
    • depth 16: peak 53.1730 MiB = 6.96949146 MB
    • depth 32: peak 53.1730 MiB = 6.96949146 MB
    • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 16:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.21%
  • depth 32:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 128:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.06%
    • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
  1. PS with reduction:3.1. Massif:
    • depth 5: peak 2.7437 MiB = 0.359622246 MB
    • depth 16: peak 2.8201 MiB = 0.369636147 MB
    • depth 32: peak 4.2963 MiB = 0.563124634 MB
    • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
    • get_contin: 0.11%
    • set_contin: 0.01%
    • update_particles: 0.02%
    • complexity_based_scorer: 91.94%
  • depth 16:
    • get_contin: 0.11%
    • set_contin: 0.02%
    • update_particles: 0.06%
    • complexity_based_scorer: 93.25%
  • depth 32:
    • get_contin: 0.12%
    • set_contin: 0.04%
    • update_particles: 0.15%
    • complexity_based_scorer: 91.06%
  • depth 128:
    • get_contin: 0.25%
    • set_contin: 0.16%
    • update_particles: 0.37%
    • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
  1. HC with reduction:4.1. Massif:
    • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
    • get_contin: 0.42%
    • set_contin: 0%
    • hill_climbing: 0.08%
    • complexity_based_scorer: 32.52% (Low because merge_demes uses a
      lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time
with results) you can just ask me, i'll not but it in the github, it's too
big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000.
I'll send, but I didn't know that were a cache in the evaluations so it
appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected]
wrote:

@ArleyRistar https://github.com/arleyristar I'm wondering if the
super fast run-times aren't due to the cache. If the candidates are
idiotically identical that could turn most evaluations into cache hits. Can
you run your experiment with the debug logging level and send it to us?


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

About std::bitset: it isn't something for PS alone, and it isn't something
that will increase the performance (not much anyway), but the code do some
custom operations that exist there (and some that don't).
But i've found that:

  • We can't use std::bitset, we have to use boost::dynamic_bitset (bitset
    has to have size at compilation time, something that we don't have).
  • Some functions already exist in dynamic_bitset (like
    field_set::count(inst)).
  • Some functions doesn't exist (but should), like get/set_raw, but it's
    made for vector<packet_t>.
  • You don't have to have 2 offsets anymore.
  • You can't have a iterator with bitset (well it has a custom iterator
    anyway, but it has to been rewrite).

Basically you have to change too much things to gain almost nothing, so
i'm skipping it for now and jumping to change the variable representation.

On Thu, Jul 30, 2015 at 5:02 PM, Arley Ristar [email protected] wrote:

Update from what we talked about in the Moses channel.

  • I sent the above post check how depth affects the code.
  • In the conclusion i found a issue different from what we were testing
    (PS is slow with reduction).
  • Ben ask for a hypotheses.
  • I respond that it may be of Nil said 3 emails ago.
  • I show a callgrind that has a big difference in the number of
    instructions to reduct HC and PS.
  • I start to talk to Nil.
  • Empirically, i tested that HC will have bad performance too if it change
    the continuous knobs every eval too (because PS do this).
  • Nil and I didn't decided anything about reduction.
  • But we decided to change the variable length representation to double
    (that was what Linas wanted 7 emails ago).
  • We decided to change instance<packet_t> to std::bitset (i have a update
    about it below).
  • Ben gave some good ideas that we could try, but he'll be offline until
    Sunday, so it would be better wait for him.
  • Linas confirms that reduction with contin works well (in the post we
    could see that it really makes a big difference for HC independently of
    depth).

For now on i'll make my updates here to avoid any confusion.

On Mon, Jul 27, 2015 at 9:38 AM, Arley Ristar [email protected] wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_
or read below.
Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B
0 -E 1 parameters) and with reduction; it's separated this way because the
reduction part affects the particle swarm (PS) too much (clean_combo_tree
uses 99.73% with reduction in PS). Besides that, the experiments were split
in three parts: analysis of the memory, processing and time. The memory
analysis was done using the valgrind tool massif running for 4 values of
depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The
processing analysis was done in the same way of the memory, but using the
tool callgrind. The time analysis was done in a different way, using a
simple python script to measure the mean time of seeds between 1 and 10
running between depth 2 and 32 for PS and HC. The dataset used was
predicates.csv because it is a simple one that is already inside the
examples folder of Moses; It was running for a maximum of 1000 evaluations,
a small value because of the valgrind tools that increase the program time
in 20x for massif and between 20x and 100x for callgring [1] but it's a
sufficient amount of evaluations to test (HC in some seeds can find the
best solution in less than 1000 evals).
2. Results description:2.1. Memory:

  • PS with reduction spend more memory than without.
  • Increasing in the depth of PS, increases the memory used, but not
    that much, sometimes it can decrease, because it can find better results
    that are less complex.
  • HC without reduction reaches the ram limit independently of depth.
  • HC with reduction uses far less memory, even less than PS without
    reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too
    much depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a
    percentage much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it
    found a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's
    stuck in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because
    this problem didn't need precision or a bigger value to reach the max.
    global.
  1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with
thousands of instances will not reach the amount of ram that the computers
today have, and even if do, there's always feature selection; but let's
take a look:
Variable-length representation uses a amount of 28B (4B depth, 8B mean,
8B step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each
dimension in a deme. While a double representation uses 8B * num_instances.
So using variable-length representation with a small depth for a larger
number of instances still makes sense for RAM economy. But i'm still
felling that this way could be better in 2 forms. 1- It uses 2 bits to
represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth
can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or
RRRRR if the get/set contin function limits the for with the depth too?
3.2. Processing:

With variable length representation, processing is just a problem with
PS, and just if it have a bigger depth. In all cases the clean_combo_tree
and merge_demes are the ones that use more processing and the specific part
of HC or PS are small (PS is bigger just because of set_contin and
get_contin). In the PS i'm having to call get_contin a lot of times (to get
the best values), and it isn't helping. To solve it i'll have to store the
values in doubles, that in the end will lose the benefices of using the
variable length representation.
3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems
to work best with HC where each one in a eval are very close. In the
initial phase of PS the instance are very different one of the other. I
would say that PS just is running quicker without reduction because it
returns a deme smaller than HC because it doesn't include the worst
solutions.
3.4. General:

The variable length representation don't have that much impact in memory
or performance, it just have when using PS and for complex problems. Even
so, I still find better to change it to a double representation, to get a
little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
  • depth 5: peak 1.2842 MiB = 0.168322662 MB
  • depth 16: peak 1.3022 MiB = 0.170681958 MB
  • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better
    solutions, and than, less complexity)
  • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
    • get_contin: 1.95%
    • set_contin: 0.7%
    • update_particles: 3.17%
    • complexity_based_scorer: 80.65%
  • depth 16:
    • get_contin: 5.39%
    • set_contin: 1.97%
    • update_particles: 6.98%
    • complexity_based_scorer: 77.19%
  • depth 32:
    • get_contin: 11.02%
    • set_contin: 4.15%
    • update_particles: 14.16%
    • complexity_based_scorer: 77.30%
  • depth 128:
    • get_contin: 16.18%
    • set_contin: 10.34%
    • update_particles: 24.40%
    • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
  1. HC without reduction:2.1. Massif:
    • depth 5: peak 53.1730 MiB = 6.96949146 MB
    • depth 16: peak 53.1730 MiB = 6.96949146 MB
    • depth 32: peak 53.1730 MiB = 6.96949146 MB
    • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 16:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.21%
  • depth 32:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.05%
    • complexity_based_scorer: 65.23%
  • depth 128:
    • get_contin: 0.78%
    • set_contin: 0%
    • hill_climbing: 0.06%
    • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
  1. PS with reduction:3.1. Massif:
    • depth 5: peak 2.7437 MiB = 0.359622246 MB
    • depth 16: peak 2.8201 MiB = 0.369636147 MB
    • depth 32: peak 4.2963 MiB = 0.563124634 MB
    • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
    • get_contin: 0.11%
    • set_contin: 0.01%
    • update_particles: 0.02%
    • complexity_based_scorer: 91.94%
  • depth 16:
    • get_contin: 0.11%
    • set_contin: 0.02%
    • update_particles: 0.06%
    • complexity_based_scorer: 93.25%
  • depth 32:
    • get_contin: 0.12%
    • set_contin: 0.04%
    • update_particles: 0.15%
    • complexity_based_scorer: 91.06%
  • depth 128:
    • get_contin: 0.25%
    • set_contin: 0.16%
    • update_particles: 0.37%
    • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
  1. HC with reduction:4.1. Massif:
    • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
    • get_contin: 0.42%
    • set_contin: 0%
    • hill_climbing: 0.08%
    • complexity_based_scorer: 32.52% (Low because merge_demes uses a
      lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time
with results) you can just ask me, i'll not but it in the github, it's too
big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000.
I'll send, but I didn't know that were a cache in the evaluations so it
appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected]
wrote:

@ArleyRistar https://github.com/arleyristar I'm wondering if the
super fast run-times aren't due to the cache. If the candidates are
idiotically identical that could turn most evaluations into cache hits. Can
you run your experiment with the debug logging level and send it to us?


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Jul 30, 2015

Hi Arley,

Could you cut-n-paste this into the directory
moses/moses/diary/particle-swarm/ ? The other directories under "diary"
provide similar information on experiments. They are rather informal; just
notes, really, nothing particularly fancy.

On Mon, Jul 27, 2015 at 7:38 AM, arleyristar [email protected]
wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_
or read below.
Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B 0
-E 1 parameters) and with reduction; it's separated this way because the
reduction part affects the particle swarm (PS) too much (clean_combo_tree
uses 99.73% with reduction in PS).

What do you mean "affects too much"? Do you mean "uses too much cpu
time"? or does it actually change results?

You nmay want to call (maybe) reduct once, before the PSO step; clearly,
you should not call reduct while doing PSO i.e. while trying different
float values.

Besides that, the experiments were split
in three parts: analysis of the memory, processing and time. The memory
analysis was done using the valgrind tool massif running for 4 values of
depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The
processing analysis was done in the same way of the memory, but using the
tool callgrind. The time analysis was done in a different way, using a
simple python script to measure the mean time of seeds between 1 and 10
running between depth 2 and 32 for PS and HC. The dataset used was
predicates.csv

predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to
learn, and illustrates one of the big problems with the current contin
learner, that I am hoping that PSO will do much better on.

something that had to learn e.g. x>1.618 would be hard for the current
contin code, and should be "very easy" for PSO.

Much better test cases wou;ld be any dataset that people typically use
gradient-descent methods on, or maybe datasets that are easily solved by
any kind of linear regression, for example, SVM. Currently, moses mostly
cannot do linear-regression type problems; I am hoping PSO will fix that.

A demo of either a "linear programming" (linear optimization) or a "linear
integer programming" problem, in moses, with PSO, would be very cool.

because it is a simple one that is already inside the
examples folder of Moses; It was running for a maximum of 1000 evaluations,
a small value because of the valgrind tools that increase the program time
in 20x for massif and between 20x and 100x for callgring [1] but it's a
sufficient amount of evaluations to test (HC in some seeds can find the
best solution in less than 1000 evals).
2. Results description:2.1. Memory:

  • PS with reduction spend more memory than without.
  • Increasing in the depth of PS, increases the memory used, but not that
    much, sometimes it can decrease, because it can find better results that
    are less complex.
  • HC without reduction reaches the ram limit independently of depth.
  • HC with reduction uses far less memory, even less than PS without
    reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too much
    depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a percentage
    much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it found
    a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

I don't quite understand these results. MOSES does reduction in several
different places, for "completely unrelated" reasons. it seems to me that
you should do reduction at least once, before starting PSO. Howver, during
the PSO inner loop, where you are merely trying different floating point
values, you should NOT do any reduction at all; it would be completely
pointless. Right!?

Can you sketch the pseudocode for me?

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's stuck
    in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because
    this problem didn't need precision or a bigger value to reach the max.
    global.
  1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with
thousands of instances will not reach the amount of ram that the computers
today have, and even if do, there's always feature selection; but let's
take a look:
Variable-length representation uses a amount of 28B (4B depth, 8B mean, 8B
step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each
dimension in a deme. While a double representation uses 8B * num_instances.
So using variable-length representation with a small depth for a larger
number of instances still makes sense for RAM economy. But i'm still
felling that this way could be better in 2 forms. 1- It uses 2 bits to
represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5 depth
can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or
RRRRR if the get/set contin function limits the for with the depth too?
3.2. Processing:

With variable length representation, processing is just a problem with PS,
and just if it have a bigger depth. In all cases the clean_combo_tree and
merge_demes are the ones that use more processing and the specific part of
HC or PS are small (PS is bigger just because of set_contin and
get_contin). In the PS i'm having to call get_contin a lot of times (to get
the best values), and it isn't helping. To solve it i'll have to store the
values in doubles, that in the end will lose the benefices of using the
variable length representation.
3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems to
work best with HC where each one in a eval are very close. In the initial
phase of PS the instance are very different one of the other. I would say
that PS just is running quicker without reduction because it returns a deme
smaller than HC because it doesn't include the worst solutions.
3.4. General:

The variable length representation don't have that much impact in memory or
performance, it just have when using PS and for complex problems. Even so,
I still find better to change it to a double representation, to get a
little better performance, with the possibility to search in a bigger space

and have a less complex code at the cost of a little more ram and my work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
  • depth 5: peak 1.2842 MiB = 0.168322662 MB
  • depth 16: peak 1.3022 MiB = 0.170681958 MB
  • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions,
    and than, less complexity)
  • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
  • get_contin: 1.95%
  • set_contin: 0.7%
  • update_particles: 3.17%
  • complexity_based_scorer: 80.65%
  • depth 16:
  • get_contin: 5.39%
  • set_contin: 1.97%
  • update_particles: 6.98%
  • complexity_based_scorer: 77.19%
  • depth 32:
  • get_contin: 11.02%
  • set_contin: 4.15%
  • update_particles: 14.16%
  • complexity_based_scorer: 77.30%
  • depth 128:
  • get_contin: 16.18%
  • set_contin: 10.34%
  • update_particles: 24.40%
  • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
  1. HC without reduction:2.1. Massif:
  • depth 5: peak 53.1730 MiB = 6.96949146 MB
  • depth 16: peak 53.1730 MiB = 6.96949146 MB
  • depth 32: peak 53.1730 MiB = 6.96949146 MB
  • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.23%
  • depth 16:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.21%
  • depth 32:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.23%
  • depth 128:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.06%
  • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
  1. PS with reduction:3.1. Massif:
  • depth 5: peak 2.7437 MiB = 0.359622246 MB
  • depth 16: peak 2.8201 MiB = 0.369636147 MB
  • depth 32: peak 4.2963 MiB = 0.563124634 MB
  • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
  • get_contin: 0.11%
  • set_contin: 0.01%
  • update_particles: 0.02%
  • complexity_based_scorer: 91.94%
  • depth 16:
  • get_contin: 0.11%
  • set_contin: 0.02%
  • update_particles: 0.06%
  • complexity_based_scorer: 93.25%
  • depth 32:
  • get_contin: 0.12%
  • set_contin: 0.04%
  • update_particles: 0.15%
  • complexity_based_scorer: 91.06%
  • depth 128:
  • get_contin: 0.25%
  • set_contin: 0.16%
  • update_particles: 0.37%
  • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
  1. HC with reduction:4.1. Massif:
  • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
  • get_contin: 0.42%
  • set_contin: 0%
  • hill_climbing: 0.08%
  • complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time
with results) you can just ask me, i'll not but it in the github, it's too
big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000.
I'll send, but I didn't know that were a cache in the evaluations so it
appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected]
wrote:

@ArleyRistar https://github.com/arleyristar I'm wondering if the
super
fast run-times aren't due to the cache. If the candidates are
idiotically
identical that could turn most evaluations into cache hits. Can you run
your experiment with the debug logging level and send it to us?


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

"Could you cut-n-paste this into the directory
moses/moses/diary/particle-swarm/ ? The other directories under "diary"
provide similar information on experiments. They are rather informal; just
notes, really, nothing particularly fancy."

I'll make a diary from that post and commit.

"What do you mean "affects too much"? Do you mean "uses too much cpu
time"? or does it actually change results?
I mean like jumping from 1s without reduction to 180s with it almost
without changing scores.

You nmay want to call (maybe) reduct once, before the PSO step; clearly,
you should not call reduct while doing PSO i.e. while trying different
float values."
I don't call reduction directly, i call the scorer, that call the
complexity_based_scorer, that calls get_candidate, that call
clean_combo_tree, that calls reduct::rule::operator.
In fact, this part that is calling the scorer is exactly like the HC, it
even has the same commentaries, it was literally copy and past of 5 lines
(just changing the start and end of the deme).


predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to
learn, and illustrates one of the big problems with the current contin
learner, that I am hoping that PSO will do much better on.

something that had to learn e.g. x>1.618 would be hard for the current
contin code, and should be "very easy" for PSO.

Much better test cases wou;ld be any dataset that people typically use
gradient-descent methods on, or maybe datasets that are easily solved by
any kind of linear regression, for example, SVM. Currently, moses mostly
cannot do linear-regression type problems; I am hoping PSO will fix that.

A demo of either a "linear programming" (linear optimization) or a "linear
integer programming" problem, in moses, with PSO, would be very cool.

Yes, predicates is the most simple one to test, i was aware of that.
But even this most simple one took 3h running to get a callgrind of PS with
reduction and for the first test i got some good results.
Harder problems will show these results in a more precise way, and i'll use
these problems in the next experiment.
Let me reminder everybody of one thing, PS DO NOT get better results than
HC, i haven't seen a single problem that every knob is continuous.
Even in this simple problem the discrete part has more than 1/3 of the
knobs.
In the discrete part if falls in local maximum, it has the more simple and
old implementation for using PSO with discrete problems, that's basically a
round off and has no mechanism of escape.
I know that Linas suggested to "not wasting any time there", but I want to
fix that in this second part (or fix, or do as Ben suggest and make a
hybrid), but the issues are stacking up.


I don't quite understand these results. MOSES does reduction in several
different places, for "completely unrelated" reasons. it seems to me that
you should do reduction at least once, before starting PSO. Howver, during
the PSO inner loop, where you are merely trying different floating point
values, you should NOT do any reduction at all; it would be completely
pointless. Right!?
I dunno. I would not say pointless because of the complexity scorer that PS
uses; besides that doing this reduction in the loop seems to help HC a lot.
And Ben said:
"Another point to keep in mind is: There are some fitness functions for
which fitness function evaluation is much more expensive than reduction"
So in some cases it would be better do the reduction i think...

Can you sketch the pseudocode for me?
Yes, there's even a image.
This image has 5% degree of precision (functions that uses less than it,
won't show), but if you need i can generate a more precise one.
But if you are talking about the PS part alone, i can make a
simple pseudocode:
operator():
Create new particles in a deme.
Create new velocities.
Inner loop:
Apply transform to score deme.
Select best score for each particle and the best particle.
Check finishing conditions.
Update particles based on the bests:
Update bit.
Update disc.
Update contin.
End loop.
Return deme with the best particles.
End operator.

On Thu, Jul 30, 2015 at 6:16 PM, Linas Vepštas [email protected]
wrote:

Hi Arley,

Could you cut-n-paste this into the directory
moses/moses/diary/particle-swarm/ ? The other directories under "diary"
provide similar information on experiments. They are rather informal; just
notes, really, nothing particularly fancy.

On Mon, Jul 27, 2015 at 7:38 AM, arleyristar [email protected]
wrote:

I forgot to post here too:

https://opencog.slack.com/files/arleyristar/F086CPMUK/Initial_experiments_for_PS_and_HC_in_continuous_problems_
or read below.
Initial experiments for PS and HC in continuous problems.

For resume version jump to 3.4. General.

  1. Methodology:

First, the experiments were divided in two groups: without reduction (-B
0
-E 1 parameters) and with reduction; it's separated this way because the
reduction part affects the particle swarm (PS) too much (clean_combo_tree
uses 99.73% with reduction in PS).

What do you mean "affects too much"? Do you mean "uses too much cpu
time"? or does it actually change results?

You nmay want to call (maybe) reduct once, before the PSO step; clearly,
you should not call reduct while doing PSO i.e. while trying different
float values.

Besides that, the experiments were split
in three parts: analysis of the memory, processing and time. The memory
analysis was done using the valgrind tool massif running for 4 values of
depth {5(default), 16, 32, 128} and for PS and hill climbing (HC). The
processing analysis was done in the same way of the memory, but using the
tool callgrind. The time analysis was done in a different way, using a
simple python script to measure the mean time of seeds between 1 and 10
running between depth 2 and 32 for PS and HC. The dataset used was
predicates.csv

predicates.csv is extremely simple. predicates-1.3.csv is a lot harder to
learn, and illustrates one of the big problems with the current contin
learner, that I am hoping that PSO will do much better on.

something that had to learn e.g. x>1.618 would be hard for the current
contin code, and should be "very easy" for PSO.

Much better test cases wou;ld be any dataset that people typically use
gradient-descent methods on, or maybe datasets that are easily solved by
any kind of linear regression, for example, SVM. Currently, moses mostly
cannot do linear-regression type problems; I am hoping PSO will fix that.

A demo of either a "linear programming" (linear optimization) or a "linear
integer programming" problem, in moses, with PSO, would be very cool.

because it is a simple one that is already inside the
examples folder of Moses; It was running for a maximum of 1000
evaluations,
a small value because of the valgrind tools that increase the program
time
in 20x for massif and between 20x and 100x for callgring [1] but it's a
sufficient amount of evaluations to test (HC in some seeds can find the
best solution in less than 1000 evals).
2. Results description:2.1. Memory:

  • PS with reduction spend more memory than without.
  • Increasing in the depth of PS, increases the memory used, but not that
    much, sometimes it can decrease, because it can find better results that
    are less complex.
  • HC without reduction reaches the ram limit independently of depth.
  • HC with reduction uses far less memory, even less than PS without
    reduction.

2.2. Processing:

  • Variable length representation affects PS without reduction too much
    depending of the depth (15% at depth 32 vs. 2.7% in depth 5).
  • Affects the PS with reduction in a similar way, but with a percentage
    much lower, because of the reduction.
  • The depth didn't affect hill climbing at all, mostly because it found
    a solution that needs less than depth 5.

2.3. Time:

  • PS without reduction is much quicker than with reduction.
  • HC without reduction is slower than with reduction.
  • PS without reduction is quicker than HC without reduction.
  • PS with reduction is very much slower than HC with reduction.

I don't quite understand these results. MOSES does reduction in several
different places, for "completely unrelated" reasons. it seems to me that
you should do reduction at least once, before starting PSO. Howver, during
the PSO inner loop, where you are merely trying different floating point
values, you should NOT do any reduction at all; it would be completely
pointless. Right!?

Can you sketch the pseudocode for me?

2.4. Others:

  • Increasing the depth in PS, it returns better results, but it's stuck
    in the discrete maximum local anyway.
  • The results are not affected increasing the HC depth, maybe because

this problem didn't need precision or a bigger value to reach the max.
global.

  1. Conclusion:3.1. Memory:

I don't think that memory is a problem, even very complex problems with
thousands of instances will not reach the amount of ram that the
computers
today have, and even if do, there's always feature selection; but let's
take a look:
Variable-length representation uses a amount of 28B (4B depth, 8B mean,
8B
step_size, 8B expansion) + ((1/4B * depth) * num_instances) for each
dimension in a deme. While a double representation uses 8B *
num_instances.
So using variable-length representation with a small depth for a larger
number of instances still makes sense for RAM economy. But i'm still
felling that this way could be better in 2 forms. 1- It uses 2 bits to
represent left, right and stop (0,1,2), and it don't use 11 (3). 2- 5
depth
can represents +-30 (with LLLLS and RRRRS), why not use until LLLLL or
RRRRR if the get/set contin function limits the for with the depth too?
3.2. Processing:

With variable length representation, processing is just a problem with
PS,
and just if it have a bigger depth. In all cases the clean_combo_tree and
merge_demes are the ones that use more processing and the specific part
of
HC or PS are small (PS is bigger just because of set_contin and
get_contin). In the PS i'm having to call get_contin a lot of times (to
get
the best values), and it isn't helping. To solve it i'll have to store
the
values in doubles, that in the end will lose the benefices of using the
variable length representation.
3.3 Time:

I don't know how the cache of combo trees reduction works, but it seems
to
work best with HC where each one in a eval are very close. In the initial
phase of PS the instance are very different one of the other. I would say
that PS just is running quicker without reduction because it returns a
deme
smaller than HC because it doesn't include the worst solutions.
3.4. General:

The variable length representation don't have that much impact in memory
or
performance, it just have when using PS and for complex problems. Even
so,
I still find better to change it to a double representation, to get a
little better performance, with the possibility to search in a bigger
space
and have a less complex code at the cost of a little more ram and my

work.

Detailed results data:

  1. PS without reduction:1.1. Massif:
  • depth 5: peak 1.2842 MiB = 0.168322662 MB
  • depth 16: peak 1.3022 MiB = 0.170681958 MB
  • depth 32: peak 1.1866 MiB = 0.155530035 MB (leads to better solutions,
    and than, less complexity)
  • depth 128: peak 1.5212 MiB = 0.199386726 MB

1.2. Callgrind:

  • depth 5:
  • get_contin: 1.95%
  • set_contin: 0.7%
  • update_particles: 3.17%
  • complexity_based_scorer: 80.65%
  • depth 16:
  • get_contin: 5.39%
  • set_contin: 1.97%
  • update_particles: 6.98%
  • complexity_based_scorer: 77.19%
  • depth 32:
  • get_contin: 11.02%
  • set_contin: 4.15%
  • update_particles: 14.16%
  • complexity_based_scorer: 77.30%
  • depth 128:
  • get_contin: 16.18%
  • set_contin: 10.34%
  • update_particles: 24.40%
  • complexity_based_scorer: 67.82%

1.3. Time (s):

  • Depth: 2 | Mean: 2.557880
  • Depth: 3 | Mean: 2.593005
  • Depth: 4 | Mean: 2.663667
  • Depth: 5 | Mean: 2.668498
  • Depth: 6 | Mean: 2.701040
  • Depth: 7 | Mean: 2.644231
  • Depth: 8 | Mean: 2.714792
  • Depth: 9 | Mean: 2.692005
  • Depth: 10 | Mean: 2.680957
  • Depth: 11 | Mean: 2.735071
  • Depth: 12 | Mean: 2.783216
  • Depth: 13 | Mean: 2.743773
  • Depth: 14 | Mean: 2.725607
  • Depth: 15 | Mean: 2.722347
  • Depth: 16 | Mean: 2.755987
  • Depth: 17 | Mean: 2.901231
  • Depth: 18 | Mean: 2.809483
  • Depth: 19 | Mean: 2.762999
  • Depth: 20 | Mean: 2.758817
  • Depth: 21 | Mean: 2.772569
  • Depth: 22 | Mean: 2.816113
  • Depth: 23 | Mean: 2.928664
  • Depth: 24 | Mean: 2.904253
  • Depth: 25 | Mean: 2.938958
  • Depth: 26 | Mean: 2.845138
  • Depth: 27 | Mean: 2.877145
  • Depth: 28 | Mean: 2.857758
  • Depth: 29 | Mean: 2.840159
  • Depth: 30 | Mean: 2.822597
  • Depth: 31 | Mean: 2.813350
  • Depth: 32 | Mean: 2.820256
  1. HC without reduction:2.1. Massif:
  • depth 5: peak 53.1730 MiB = 6.96949146 MB
  • depth 16: peak 53.1730 MiB = 6.96949146 MB
  • depth 32: peak 53.1730 MiB = 6.96949146 MB
  • depth 128: peak 53.1730 MiB = 6.96949146 MB

2.2. Callgrind:

  • depth 5:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.23%
  • depth 16:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.21%
  • depth 32:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.05%
  • complexity_based_scorer: 65.23%
  • depth 128:
  • get_contin: 0.78%
  • set_contin: 0%
  • hill_climbing: 0.06%
  • complexity_based_scorer: 65.20%

2.3. Time (s):

  • Depth: 2 | Mean: 13.933148
  • Depth: 3 | Mean: 13.710301
  • Depth: 4 | Mean: 13.771378
  • Depth: 5 | Mean: 13.699602
  • Depth: 6 | Mean: 13.643018
  • Depth: 7 | Mean: 13.953615
  • Depth: 8 | Mean: 14.218726
  • Depth: 9 | Mean: 14.607479
  • Depth: 10 | Mean: 14.577442
  • Depth: 11 | Mean: 14.953429
  • Depth: 12 | Mean: 14.705942
  • Depth: 13 | Mean: 14.993052
  • Depth: 14 | Mean: 14.541223
  • Depth: 15 | Mean: 14.249910
  • Depth: 16 | Mean: 14.563872
  • Depth: 17 | Mean: 15.361906
  • Depth: 18 | Mean: 14.469012
  • Depth: 19 | Mean: 14.755074
  • Depth: 20 | Mean: 14.411482
  • Depth: 21 | Mean: 14.821014
  • Depth: 22 | Mean: 14.412988
  • Depth: 23 | Mean: 14.660408
  • Depth: 24 | Mean: 14.989600
  • Depth: 25 | Mean: 14.081191
  • Depth: 26 | Mean: 14.491643
  • Depth: 27 | Mean: 14.121354
  • Depth: 28 | Mean: 14.673866
  • Depth: 29 | Mean: 14.079422
  • Depth: 30 | Mean: 13.745444
  • Depth: 31 | Mean: 13.860764
  1. PS with reduction:3.1. Massif:
  • depth 5: peak 2.7437 MiB = 0.359622246 MB
  • depth 16: peak 2.8201 MiB = 0.369636147 MB
  • depth 32: peak 4.2963 MiB = 0.563124634 MB
  • depth 128: peak 3.6244 MiB = 0.475057357 MB

3.2. Callgrind:

  • depth 5:
  • get_contin: 0.11%
  • set_contin: 0.01%
  • update_particles: 0.02%
  • complexity_based_scorer: 91.94%
  • depth 16:
  • get_contin: 0.11%
  • set_contin: 0.02%
  • update_particles: 0.06%
  • complexity_based_scorer: 93.25%
  • depth 32:
  • get_contin: 0.12%
  • set_contin: 0.04%
  • update_particles: 0.15%
  • complexity_based_scorer: 91.06%
  • depth 128:
  • get_contin: 0.25%
  • set_contin: 0.16%
  • update_particles: 0.37%
  • complexity_based_scorer: 92.87%

3.3. Time (s):

  • Depth: 32 | Mean: 180
  1. HC with reduction:4.1. Massif:
  • depth 32: peak 940.1328 MiB = 0.1203369984 MB

4.2. Callgrind:

  • depth 32:
  • get_contin: 0.42%
  • set_contin: 0%
  • hill_climbing: 0.08%
  • complexity_based_scorer: 32.52% (Low because merge_demes uses a lot)

4.3. Time (s):

  • Depth: 32 | Mean: 1.8413

For even more detailed result (the callgrids and massifs outputs or time
with results) you can just ask me, i'll not but it in the github, it's
too
big and don't need to be there.

[1] See valgrind site: http://valgrind.org/info/tools.html

On Mon, Jul 13, 2015 at 10:59 AM, Arley Ristar [email protected] wrote:

I was running with predicates.csv -W1 -u pred -m1000.
I'll send, but I didn't know that were a cache in the evaluations so it
appears to be exactly it that is happening.

On Mon, Jul 13, 2015 at 10:43 AM, ngeiswei [email protected]
wrote:

@ArleyRistar https://github.com/arleyristar I'm wondering if the
super
fast run-times aren't due to the cache. If the candidates are
idiotically
identical that could turn most evaluations into cache hits. Can you
run
your experiment with the debug logging level and send it to us?


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Jul 31, 2015

OK, So now I am completely confused. I was assuming the algo went like this:

outer loop:
   select exemplar
   decorate it with knobs; most of these will work just like before, but ...
   anywhere where a contin used to be, there will now be a double-precision float 
   (i.e. old contins are replaced by particles)
   inner loop:
      twiddle the boolean and discrete knobs, but NOT the floats!
      call reduct to get a simpler expression.
      initialize particle velocities.
      PSO loop:
           score the current tree with the current particle velocities
           (do NOT reduce the tree before scoring!!)
           update particle positions and velocities
           repeat PSO loop until converged.
     Now that PSO is done, 
     record best score,
     repeat the knob-twiddling inner-loop
     use ordinary hill-climbing to see if the best current PSO result
     is better than the previous best.
     if not, exit the inner loop

In other words, the new PSO-enabled version is just like the old hill-climbing moses, and it even uses the same-old hill-climbing just like before, with NO modifications AT ALL to hill-climbing... the only modification for PSO would be to replace all contins by particles, and to replace the computation of a single score by a PSO-loop.

That is, in the old code, scoring would be done exactly once for each knob setting; in the new code, this single score would be replaced by a PSO loop. But otherwise, everything else would be unchanged. I guess its not being designed like this?

@ngeiswei
Copy link
Member

Agreed about std::bitset.

@ArleyRistar
Copy link
Author

That is, in the old code, scoring would be done exactly once for each knob
setting; in the new code, this single score would be replaced by a PSO
loop. But otherwise, everything else would be unchanged. I guess its not
being designed like this?
Linas, no it's not. For now it's a pure PSO and it isn't a hybrid version
like this.
I wanted to make a complete version of PS to solve the problems
(bit+disc+contin) like HC do. I still believe that with the right discrete
algorithm for PS it can perform as good or better than HC, but it isn't
that certain.
On the other hand, there's a huge chance that a hybrid version would
increase the performance of continuous knobs.
I've thought about doing a hybrid, but not as described above that makes
more sense (in some place in the code, there's some comments calling HC).
I still want to test another discrete algorithm for discrete PS, but for
now i think this hybrid should be priority for me (after finishing change
to variable length to double).

Some comments:

outer loop:
select exemplar <- Like center_inst from HC?
decorate it with knobs; most of these will work just like before, but ...
anywhere where a contin used to be, there will now be a
double-precision float
(i.e. old contins are replaced by particles) <- that's not
necessary now that the representation will change (But HC and the
others will access it like before)
inner loop:
twiddle the boolean and discrete knobs, but NOT the floats! <-
like neighborhood_sampling?
call reduct to get a simpler expression. <- like calling scorer
with reduction to true?
initialize particle velocities.
PSO loop:<- I would do it for the best one, right? i can't do
this for all generated by neighborhood_sampling because their
reductions trees would have continuous in different places, wouldn't
they? And even if they had, the best position for one wouldn't
necessarily be the best for others.
score the current tree with the current particle velocities
<- so in this case, can i work directly with the tree and not with the
instance?
(do NOT reduce the tree before scoring!!)
update particle positions and velocities
repeat PSO loop until converged. <- convergence in pso it's
a little trick, but for this case i would recommend finishing early.
Now that PSO is done,
record best score,
repeat the knob-twiddling inner-loop <- Won't it be better to cut
this part and do this after it return to the inner loop?
use ordinary hill-climbing to see if the best current PSO result
is better than the previous best.
if not, exit the inner loop <- Here i would use the same
parameter as HC to stop when reaching max_dist.

On Fri, Jul 31, 2015 at 5:20 AM, ngeiswei [email protected] wrote:

Agreed about std::bitset.


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Aug 1, 2015

On Fri, Jul 31, 2015 at 8:06 PM, arleyristar [email protected] wrote:

That is, in the old code, scoring would be done exactly once for each knob
setting; in the new code, this single score would be replaced by a PSO
loop. But otherwise, everything else would be unchanged. I guess its not
being designed like this?

Linas, no it's not. For now it's a pure PSO and it isn't a hybrid version
like this.

OK, well, I don't understand what "pure PSO" is. How can PSO possibly
change discrete values? I thought it only worked with things that have
floating-point values, and velocities; how can you possibly convert
non-contin knobs into floats?

I wanted to make a complete version of PS to solve the problems
(bit+disc+contin) like HC do. I still believe that with the right discrete
algorithm for PS it can perform as good or better than HC, but it isn't
that certain.

Again, I don't understand what a "discrete algo for PS" could be, or
how it might work. Wikipedia did not seem to mention it; is there some
reference I should be looking at?

Look, there is nothing "wrong" with hill-climbing; it works well for
discrete and boolean knobs. It fails disasterously with contin knobs,
but the reason for this is "well understood". Replacing
contin-hillclimbing by almost any float-pt optimization algo will work
better.

On the other hand, there's a huge chance that a hybrid version would
increase the performance of continuous knobs.

Yes, and that is what I find very exciting.

I've thought about doing a hybrid, but not as described above that makes
more sense (in some place in the code, there's some comments calling HC).
I still want to test another discrete algorithm for discrete PS,

OK. I'd like some insight into what this might be like.

but for
now i think this hybrid should be priority for me (after finishing change
to variable length to double).

Yeah, this would be great!

Some comments:

The formatting is mangled...

outer loop:
select exemplar <- Like center_inst from HC?

The outer loop is independent of HC. There is a population of trees;
one is selected and decorated with knobs. HC is the "inner loop" it
offers one way of turning knobs. There are several different inner
loops, but HC is the only one that works well.

decorate it with knobs; most of these will work just like before, but ...
anywhere where a contin used to be, there will now be a
double-precision float
(i.e. old contins are replaced by particles) <- that's not
necessary now that the representation will change (But HC and the
others will access it like before)

OK. As far as I am concerned, contins could be completely discarded
(and replaced by doubles). Perhaps Nil has other ideas?

inner loop:
twiddle the boolean and discrete knobs, but NOT the floats! <-
like neighborhood_sampling?

yes.

call reduct to get a simpler expression. <- like calling scorer
with reduction to true?

well, in this "hybrid" approach, this would be a step you'd have to
explicitly add. The only reason that reduct is done before scoring is
that it speeds up scoring. In the "hybrid" approach, you would not
need this step in the scorer itself... but you would need it earlier.

initialize particle velocities.
PSO loop:<- I would do it for the best one, right?

Yeah, I guess so ... or maybe you would want to do it for the best 5 or 10 ...

i can't do
this for all generated by neighborhood_sampling because their
reductions trees would have continuous in different places, wouldn't
they?

Yes; each different tree would have the contin variables in different
places, and you'd have to run an independent run of PSO for each one.
(or just run it for the best one, or run it for the best 5 or 10 ...)

score the current tree with the current particle velocities
<- so in this case, can i work directly with the tree and not with the
instance?

Well, normally, a single instance is converted to a tree, then the
tree is reduced, then the tree is scored. (An instance being just a
binary string which indicates the knob settings on a decorated tree.
Instances are used only because they are compact, much smaller than
trees; otherwise the distinction doesn't much matter.)

There seem to be two different ways of doing hybrid-PSO.

One is "per-instance PSO":
Given "some set of instances", for each instance:
-- convert a single instance to a tree
-- reduce the tree
-- apply PSO to all floats in the tree.
-- the final instance score is the best PSO result.

What is the "some set of instances"? Well, it could be a set of one:
the highest scoring instance before PSO. Or it could be the top 5
best. Or it could be all of the ones returned by
neighborhood_sampling ...

Now that PSO is done,
record best score,
repeat the knob-twiddling inner-loop <- Won't it be better to cut
this part and do this after it return to the inner loop?

I suppose that is another way of doing it. In this case, you let HC
find the very best 100, and then run PSO on all 100 of these, and see
if PSO can get even better scores. Lets call this "per-hill PSO".

Clearly, per-hill PSO will use a LOT less cpu than per-instance PSO.
Will it actually be better? Hard to say. It seems easier to code up
per-hill PSO.

--linas

Reply to this email directly or view it on GitHub.

@ngeiswei
Copy link
Member

ngeiswei commented Aug 4, 2015

What about PSO is moved into the sampling contin neighborhood? So there is no PSO per se, it's still hillclimbing optimization but the neighborhood sampling for contin (now double) values relies on PS. The code could be done so that PSO based sampling is just one possible sampling amonst others (just that one keeps track of positions and velocities to determine the sampling distribution, as opposed to say a Gausian).

If there's only contin knobs I guess it might look and behave just like the current PSO code, but if there are discrete values, some discrete knob turning will be mashed up with contin knob turned according to PSO.

I don't have a good intuition as to whether it's a good idea or not to mash up discrete HC and contin PS, ultimately it depends on how dependent discrete knobs are from contin knobs. Assuming there are independent then it should work great, faster than doing HC then PS, cause we'll be able to optimize both discrete and contin knobs at the same time. If there are dependent (they surely are to some degree), then I don't know.

@bgoertzel
Copy link

That seems a good, simple kind of hybrid to explore...

On Tue, Aug 4, 2015 at 1:20 PM, ngeiswei [email protected] wrote:

What about PSO is moved into the sampling contin neighborhood? So there is
no PSO per se, it's still hillclimbing optimization but the neighborhood
sampling for contin (now double) values relies on PS. The code could be
done so that PSO based sampling is just one possible sampling amonst others
(just that one keeps track of positions and velocities to determine the
sampling distribution, as opposed to say a Gausian).

If there's only contin knobs I guess it might look and behave just like
the current PSO code, but if there are discrete values, some discrete knob
turning will be mashed up with contin knob turned according to PSO.

I don't have a good intuition as to whether it's a good idea or not to
mash up discrete HC and contin PS, ultimately it depends on how dependent
discrete knobs are from contin knobs. Assuming there are independent then
it should work great, faster than doing HC then PS, cause we'll be able to
optimize both discrete and contin knobs at the same time. If there are
dependent (they surely are to some degree), then I don't know.


Reply to this email directly or view it on GitHub
#10 (comment).

Ben Goertzel, PhD
http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one
persists in trying to adapt the world to himself. Therefore all progress
depends on the unreasonable man." -- George Bernard Shaw

@linas
Copy link
Member

linas commented Aug 4, 2015

Well, during hill-climbing, hundreds or thousands of different trees are explored. Depending on the knob setting, any given tree may or may not contain some contin variables -- for example:

or ($z and (boolean-knob  greater-than ( plus( $x $y))))

so if boolean-knob is set to True then the tree is $z or 0<($x + $y) but if the boolean-knob is False, then the tree does not even have the contins in it; the tree is just $z.

@ArleyRistar
Copy link
Author

Nil, I don't know if leaving PSO inside neighborhood sampling is a good
idea.
On the bright side, we would have a quicker convergence (probably).
On the not-so-bright side, we'll fall in the reduction problem again.
(hill_climbing will call the score and so on)
On the bright side, i don't have to worry about maintaining the
contin_neighbor to work the same way that is now, but with
double instead of variable length (something that i'm doing now).
On the not-so-bright side, if we do the reduction before, we fall in what
Linas said, that we wouldn't have equal numbers of contins to apply PSO.
On the bright side, we could map and update the ones in use.

Anyway, pick one of these two (separated or inside neighborhood) to do
first.

On Tue, Aug 4, 2015 at 1:17 PM, Linas Vepštas [email protected]
wrote:

Well, during hill-climbing, hundreds or thousands of different trees are
explored. Depending on the knob setting, any given tree may or may not
contain some contin variables -- for example:

or ($z and (boolean-knob greater-than ( plus( $x $y))))

so if boolean-knob is set to True then the tree is $z or 0<($x + $y) but
if the boolean-knob is false, then the free does not even have the contins
in it; the tree is just $z.


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Aug 4, 2015

FYI, I don't think I have any objeections to the variable-length encoding that contin uses. I guess its OK for a suitable form of OK. What I did object to was treating contins as just another knob, of which a single bit could be twiddled at a time. Experience showed that this was an ineffective way of searching the space of possible floating point values.

@bgoertzel
Copy link

I have thought about this a little more and poked around on the Internet a
little...

There is something called PSO-LS, given in pseudocode in this paper

https://books.google.com.et/books?id=vSXvHTQ07UsC&pg=PA246&lpg=PA246&dq=pso+hillclimbing+hybrid&source=bl&ots=DbAqVFwpNG&sig=9ZFZjsL5vPfPXPUZSDZh77gpIn0&hl=en&sa=X&ved=0CCkQ6AEwA2oVChMIkqmg7pKRxwIVqSPbCh0tsQtO#v=onepage&q=pso%20hillclimbing%20hybrid&f=false

which basically does PSO, but does hillclimbing on each particle to
local-optimize it before reporting its info back to the swarm... [Step 3
in their pseudocode would be replaced in MOSES by our existing, fancy
hillclimbing from the specified initial point.]

From what I understand, this would seem a natural extension of the current
"hill-climbing with repeated re-starts" approach. Basically, we would be
using PSO to handle the re-starts...

In this approach, MOSES search would break down into three levels:

Highest level, which is the deme level: choosing which exemplars to put in
the demes, i.e. search of deme-exemplar space

Mid-level, which is choosing the starting-points for hill-climbing. In a
PSO-LS approach, this would be done by PSO.

Low-level... This would use the current hill-climbing, which is
reduct-friendly (i.e. makes small changes so leads to cheap reductions
usually), but ignores dependencies among variables. Adding crossover (or
BOA) then incorporates a way of paying attention to dependencies at this
level...

...

In this case, the motivation for incorporating LS (hillclimbing) into PSO
is a little different from in the usual case. The usual motivation is to
help PSO do the final stages of optimization better, and help it faster
escape local optima. Here we have an additional motivation, which is to
minimize the amount of effort spent on reduction.

Still, this approach would be expected to add value only in cases where
fitness evaluation is very costly so that the cost of reduction does not
dominate. But this is generally the most important kind of case for AGI.

On reflection, I find this more promising than doing PSO within the local
neighborhood as Nil suggested. Doing PSO within the local neighborhood
appears not to really exploit the power of PSO for doing global search.

With this approach, I think we could get away with demes containing
programs with more variables than we are using currently in MOSES. Right
now we tend to very strictly restrict the number of features used in a
deme, but I think this is partly a result of working around the limitations
of HC.

-- Ben

On Tue, Aug 4, 2015 at 11:44 PM, arleyristar [email protected]
wrote:

Nil, I don't know if leaving PSO inside neighborhood sampling is a good
idea.
On the bright side, we would have a quicker convergence (probably).
On the not-so-bright side, we'll fall in the reduction problem again.
(hill_climbing will call the score and so on)
On the bright side, i don't have to worry about maintaining the
contin_neighbor to work the same way that is now, but with
double instead of variable length (something that i'm doing now).
On the not-so-bright side, if we do the reduction before, we fall in what
Linas said, that we wouldn't have equal numbers of contins to apply PSO.
On the bright side, we could map and update the ones in use.

Anyway, pick one of these two (separated or inside neighborhood) to do
first.

On Tue, Aug 4, 2015 at 1:17 PM, Linas Vepštas [email protected]
wrote:

Well, during hill-climbing, hundreds or thousands of different trees are
explored. Depending on the knob setting, any given tree may or may not
contain some contin variables -- for example:

or ($z and (boolean-knob greater-than ( plus( $x $y))))

so if boolean-knob is set to True then the tree is $z or 0<($x + $y) but
if the boolean-knob is false, then the free does not even have the
contins
in it; the tree is just $z.


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

Ben Goertzel, PhD
http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one
persists in trying to adapt the world to himself. Therefore all progress
depends on the unreasonable man." -- George Bernard Shaw

@ngeiswei
Copy link
Member

Hmm, what about keeping PSO as neighborhood sampling technique, but just limiting the number of contin/float knobs to turn at once? That way there would be more ones or zeros. Or would what you suggest brings a greater benefit? (I may not have understood very well honestly).

@ArleyRistar
Copy link
Author

I'm finally finishing the variable length change, it gave me more work than
i expected.
It's compiling now, i have to adjust somethings but probably today or at
most tomorrow i finish.
Then i'll start this hybrid.
From my point of view:

  • Linas' option:
    It's the easiest option from the 3, but my fear is that it won't be able to
    do better global search. Doing HC before will leave PSO too dependent of
    the examples found by HC.
    For example, if it starts with a exemplar that has 0 or 1 in all it's
    contin knobs, HC will optimize the discrete knobs and pass the reduction to
    PSO that has to optimize the contins, but it'll optimize dependently of
    that exemplar (that maybe don't have much contins because of the reduction)
    and these values may be good for that exemplar but not necessarily for
    global. Basically what i'm trying to say is that some examples generated in
    HC will not have chance to get optimized to really know if they are good or
    not.
  • Nil's option:
    It's the more difficult one to do, but it can get better results if done
    right. My view i've said 4 emails ago.
  • Ben's option:
    I got the idea, but as Ben said, this specific PSO-LS doesn't work for us
    as it is. The hill-climbing is a random change between [-1,1] to do local
    search. In our case we don't want to use some hill-climbing for the contin,
    but for the disc. I think it would work similar to the Linas idea, but in a
    inverse way, PSO first than HC.
  • My (new) opinion:
    Leave the pseudocode as Linas said, but instead of doing a complete HC
    before PSO, it will do just 1 neighborhood sampling for disc, reduce the
    best (now center), apply 1 iteration of PS for contin (as Nil said, doing
    sampling + PS), update center, and so on in a loop. As Ben said PS would
    choose the next center to HC and HC to PS. If not found better, increase
    distance for HC and/or increase inertia for PS until converge.

Why it'll work (probably):
PSO can adapt to a changing best maximum IF the change is small enough (HC
will change by 1-4 disc at times) to be got by a random change or the disc
values converge before the contins (that will happens most of the times).
HC can now find other examples that a contin would affect in the next
iteration.

*Nil, you can do that, but you lose the PS purpose of adjust lots of
variables at the same time converging to a point. If you wanna do that i
recommend a random walking or something like that. After the change to
double is somewhat easy to implement a test for this. It's definitely
something to put in a to-do list.

On Tue, Aug 11, 2015 at 12:29 PM, ngeiswei [email protected] wrote:

Hmm, what about keeping PSO as neighborhood sampling technique, but just
limiting the number of contin/float knobs to turn at once? That way there
would be more ones or zeros. Or would what you suggest brings a greater
benefit? (I may not have understood very well honestly).


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Aug 11, 2015

Yes, I like your "new idea". It makes sense. Maybe there could be a flag or
option that says "do n steps of hillclimbing in between PSO", and maybe
another flag that says "do one round of PSO before starting HC."

Its very hard to intuit what will actually happen, since typically, there
are local minima everywhere, and you don't know how big they are, what
shape they have, what's next to what. For any given problem (or rather,
class of problems) there is a meta-meta optimization step: fiddling with
flags and options, by hand, to see what settings seem to provide the best
learning. For moses, this meta-meta step seems to make a huge difference,
particularly the "temperature". Setting the number of "dynamical features"
to 10 or 20 seems like a good bet. I forget there is one or two more that
seem to make a big difference.

--linas

On Tue, Aug 11, 2015 at 4:13 PM, arleyristar [email protected]
wrote:

I'm finally finishing the variable length change, it gave me more work than
i expected.
It's compiling now, i have to adjust somethings but probably today or at
most tomorrow i finish.
Then i'll start this hybrid.
From my point of view:

  • Linas' option:
    It's the easiest option from the 3, but my fear is that it won't be able to
    do better global search. Doing HC before will leave PSO too dependent of
    the examples found by HC.
    For example, if it starts with a exemplar that has 0 or 1 in all it's
    contin knobs, HC will optimize the discrete knobs and pass the reduction to
    PSO that has to optimize the contins, but it'll optimize dependently of
    that exemplar (that maybe don't have much contins because of the reduction)
    and these values may be good for that exemplar but not necessarily for
    global. Basically what i'm trying to say is that some examples generated in
    HC will not have chance to get optimized to really know if they are good or
    not.
  • Nil's option:
    It's the more difficult one to do, but it can get better results if done
    right. My view i've said 4 emails ago.
  • Ben's option:
    I got the idea, but as Ben said, this specific PSO-LS doesn't work for us
    as it is. The hill-climbing is a random change between [-1,1] to do local
    search. In our case we don't want to use some hill-climbing for the contin,
    but for the disc. I think it would work similar to the Linas idea, but in a
    inverse way, PSO first than HC.
  • My (new) opinion:
    Leave the pseudocode as Linas said, but instead of doing a complete HC
    before PSO, it will do just 1 neighborhood sampling for disc, reduce the
    best (now center), apply 1 iteration of PS for contin (as Nil said, doing
    sampling + PS), update center, and so on in a loop. As Ben said PS would
    choose the next center to HC and HC to PS. If not found better, increase
    distance for HC and/or increase inertia for PS until converge.

Why it'll work (probably):
PSO can adapt to a changing best maximum IF the change is small enough (HC
will change by 1-4 disc at times) to be got by a random change or the disc
values converge before the contins (that will happens most of the times).
HC can now find other examples that a contin would affect in the next
iteration.

*Nil, you can do that, but you lose the PS purpose of adjust lots of
variables at the same time converging to a point. If you wanna do that i
recommend a random walking or something like that. After the change to
double is somewhat easy to implement a test for this. It's definitely
something to put in a to-do list.

On Tue, Aug 11, 2015 at 12:29 PM, ngeiswei [email protected]
wrote:

Hmm, what about keeping PSO as neighborhood sampling technique, but just
limiting the number of contin/float knobs to turn at once? That way there
would be more ones or zeros. Or would what you suggest brings a greater
benefit? (I may not have understood very well honestly).


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

I'm back.
So, about the pull request to merge the project: All the commits prior
to August can easily be merged and won't affect other parts. It's basically
the addiction of the PS as a optimization option, some changes in the
parameters to receive arguments for ps and a option to change the depth
from the command line. Plus that, there's the diary commit in August that
can put in the pull request.
But at least for now i recommend leave the variable length representation
change and hybrid commits.

The variable length change changed too much things outside the
optimization, and in some changes i wasn't so sure about the correctness,
but in the end "just" the contin part of the neighborhood sampling was
broken, but it was expected. Nil suggested to mimic the prior behavior, but
it would cause performance loss and i had little time to do hybrid, so i
leaved it for a while and tested a simpler solution, but it turn out to be
not so good (i'll not detailed, it didn't work anyway, was basically a
expansion factor (+ and -) that was decreasing by half and initialized with
the 2^(depth-1)).
So i won't recommend because it's worse for contin than before for hill
climbing. For ps it gave better results, but it still have the discrete
problem, so it still gave bad results.

About the hybrid. It depends of the variable length and it's not giving
good results (at least it was better than without). The version in the last
commit (21 Aug before 19 utc) that is the one that i have to send to gsoc
isn't complete, in fact i thought it was complete, but there's a mistake
that i saw after.
One of the tricks i had to do was pass the double value as reference and
not copy how it was before (to form the combo_tree), it won't cause
problems because the instance is constant when it is called, but for my
case that needed to change the combo_tree that was important. That way i
can change the contin part of the temporary instance and apply the scorer.
The problem lies in inserting that instance in the metapopulation because
of the reduction. After the reduction I don't know which contin was used
and which wasn't and in the rush i didn't realized it putting all the
particle; leaving contins that weren't supposed to be there.
These days i found a not-so-pretty way to map which contin is used in the
reduced tree without changing the rules, i'll test after the pull requests,
probably next week, but don't expect much. About the flag that Linas said:
they are pretty easy to do, but first let's see if it work.
Also it's pretty much without documentation, so make a pull request without
it will leave the code a bit confusing.

On Tue, Aug 11, 2015 at 6:42 PM, Linas Vepštas [email protected]
wrote:

Yes, I like your "new idea". It makes sense. Maybe there could be a flag or
option that says "do n steps of hillclimbing in between PSO", and maybe
another flag that says "do one round of PSO before starting HC."

Its very hard to intuit what will actually happen, since typically, there
are local minima everywhere, and you don't know how big they are, what
shape they have, what's next to what. For any given problem (or rather,
class of problems) there is a meta-meta optimization step: fiddling with
flags and options, by hand, to see what settings seem to provide the best
learning. For moses, this meta-meta step seems to make a huge difference,
particularly the "temperature". Setting the number of "dynamical features"
to 10 or 20 seems like a good bet. I forget there is one or two more that
seem to make a big difference.

--linas

On Tue, Aug 11, 2015 at 4:13 PM, arleyristar [email protected]
wrote:

I'm finally finishing the variable length change, it gave me more work
than
i expected.
It's compiling now, i have to adjust somethings but probably today or at
most tomorrow i finish.
Then i'll start this hybrid.
From my point of view:

  • Linas' option:
    It's the easiest option from the 3, but my fear is that it won't be able
    to
    do better global search. Doing HC before will leave PSO too dependent of
    the examples found by HC.
    For example, if it starts with a exemplar that has 0 or 1 in all it's
    contin knobs, HC will optimize the discrete knobs and pass the reduction
    to
    PSO that has to optimize the contins, but it'll optimize dependently of
    that exemplar (that maybe don't have much contins because of the
    reduction)
    and these values may be good for that exemplar but not necessarily for
    global. Basically what i'm trying to say is that some examples generated
    in
    HC will not have chance to get optimized to really know if they are good
    or
    not.
  • Nil's option:
    It's the more difficult one to do, but it can get better results if done
    right. My view i've said 4 emails ago.
  • Ben's option:
    I got the idea, but as Ben said, this specific PSO-LS doesn't work for us
    as it is. The hill-climbing is a random change between [-1,1] to do local
    search. In our case we don't want to use some hill-climbing for the
    contin,
    but for the disc. I think it would work similar to the Linas idea, but
    in a
    inverse way, PSO first than HC.
  • My (new) opinion:
    Leave the pseudocode as Linas said, but instead of doing a complete HC
    before PSO, it will do just 1 neighborhood sampling for disc, reduce the
    best (now center), apply 1 iteration of PS for contin (as Nil said, doing
    sampling + PS), update center, and so on in a loop. As Ben said PS would
    choose the next center to HC and HC to PS. If not found better, increase
    distance for HC and/or increase inertia for PS until converge.

Why it'll work (probably):
PSO can adapt to a changing best maximum IF the change is small enough
(HC
will change by 1-4 disc at times) to be got by a random change or the
disc
values converge before the contins (that will happens most of the times).
HC can now find other examples that a contin would affect in the next
iteration.

*Nil, you can do that, but you lose the PS purpose of adjust lots of
variables at the same time converging to a point. If you wanna do that i
recommend a random walking or something like that. After the change to
double is somewhat easy to implement a test for this. It's definitely
something to put in a to-do list.

On Tue, Aug 11, 2015 at 12:29 PM, ngeiswei [email protected]
wrote:

Hmm, what about keeping PSO as neighborhood sampling technique, but
just
limiting the number of contin/float knobs to turn at once? That way
there
would be more ones or zeros. Or would what you suggest brings a greater
benefit? (I may not have understood very well honestly).


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

@ngeiswei
Copy link
Member

Thanks for your comeback Arley. I expected the difficulty of you last task to be too high for the GSoC timing. See what you can do. We've learned things, that already matters.

@linas
Copy link
Member

linas commented Aug 31, 2015

There are no open pull requests ...

How hard would it be to split up the code into different mergable pull requests?

@ArleyRistar
Copy link
Author

There's no conflict, it's able to merge.
But i wanna make just one pull with the diary commit, that is after i
started the variable length replacement.
I started see i now, i'll send it asap.

On Mon, Aug 31, 2015 at 6:43 PM, Linas Vepštas [email protected]
wrote:

There are no open pull requests ...

How hard would it be to split up the code into different mergable pull
requests?


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

ops, i did something wrong:
*ci/circleci *— No test commands were found
Anybody knows what is it?

On Mon, Aug 31, 2015 at 6:49 PM, Arley Ristar [email protected] wrote:

There's no conflict, it's able to merge.
But i wanna make just one pull with the diary commit, that is after i
started the variable length replacement.
I started see i now, i'll send it asap.

On Mon, Aug 31, 2015 at 6:43 PM, Linas Vepštas [email protected]
wrote:

There are no open pull requests ...

How hard would it be to split up the code into different mergable pull
requests?


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Sep 1, 2015

You can do multiple pull requests, that might make things easier to review and to merge.

Ignore circle-ci for now. Its tempermental.

@ArleyRistar
Copy link
Author

How often things like this happen:
Using a very simple chaotic search in predicates-1.3 i found (in different
seeds and the most simple ones):
0 and(0<(+(-1.30989714835535209 $x)) 0<($y) $g)
0 and(0<(+(-1.30827373948606507 $x)) 0<($y) $g)
0 and(0<(+(-1.3062848433168337 $x)) 0<($y) $g)

But:
-1 and(0<(+(-1.31160418347260088 $x)) 0<($y) $g)
-1 and(0<(+(-1.29389487563024619 $x)) 0<($y) $g)
-1 and(0<(+(-1.29023859074577651 $x)) 0<($y) $g)

Seriously that a 0 score is inside a range that is less than 0.018 (and can
be less than that yet)? And for one of the simplest problems?
How the variable length representation could found these values?
Is there a seed value that i can find a 0 score using HC with variable
length representation that has the same complexity as above?
The smallest that i found after some random tests was (with a increased
eval number):
0 and(not(0<(*(+(55 *(-42 $x)) $y))) 0<($y) $g)

And in:
0 and(0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) 0<($y) $g)
Correct me if i'm wrong but this first term i think that i can rewrite in
something like this:
0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) ==>
((1.06447865729134317 * x) - 1.38413811600397763) > 0
Why it isn't reduced to something like:
((1.06447865729134317 * x) - 1.38413811600397763) > 0 ==> x >
(1.38413811600397763 / 1.06447865729134317) ==>
x > 1,300296729 ==> 1,300296729<($x)

On Mon, Aug 31, 2015 at 10:05 PM, Linas Vepštas [email protected]
wrote:

You can do multiple pull requests, that might make things easier to review
and to merge.

Ignore circle-ci for now. Its tempermental.


Reply to this email directly or view it on GitHub
#10 (comment).

@linas
Copy link
Member

linas commented Sep 5, 2015

Are these rhetorical questions, or do you want someone to answer them?

On Sat, Sep 5, 2015 at 1:11 PM, arleyristar [email protected]
wrote:

How often things like this happen:
Using a very simple chaotic search in predicates-1.3 i found (in different
seeds and the most simple ones):
0 and(0<(+(-1.30989714835535209 $x)) 0<($y) $g)
0 and(0<(+(-1.30827373948606507 $x)) 0<($y) $g)
0 and(0<(+(-1.3062848433168337 $x)) 0<($y) $g)

But:
-1 and(0<(+(-1.31160418347260088 $x)) 0<($y) $g)
-1 and(0<(+(-1.29389487563024619 $x)) 0<($y) $g)
-1 and(0<(+(-1.29023859074577651 $x)) 0<($y) $g)

Seriously that a 0 score is inside a range that is less than 0.018 (and can
be less than that yet)? And for one of the simplest problems?
How the variable length representation could found these values?
Is there a seed value that i can find a 0 score using HC with variable
length representation that has the same complexity as above?
The smallest that i found after some random tests was (with a increased
eval number):
0 and(not(0<(*(+(55 *(-42 $x)) $y))) 0<($y) $g)

And in:
0 and(0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) 0<($y) $g)
Correct me if i'm wrong but this first term i think that i can rewrite in
something like this:
0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) ==>
((1.06447865729134317 * x) - 1.38413811600397763) > 0
Why it isn't reduced to something like:
((1.06447865729134317 * x) - 1.38413811600397763) > 0 ==> x >
(1.38413811600397763 / 1.06447865729134317) ==>
x > 1,300296729 ==> 1,300296729<($x)

On Mon, Aug 31, 2015 at 10:05 PM, Linas Vepštas [email protected]
wrote:

You can do multiple pull requests, that might make things easier to
review
and to merge.

Ignore circle-ci for now. Its tempermental.


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

Yes, these are rhetorical questions because i had nothing better to do in
my weekend.
But now seriously, i need to know if, in other problems that you guys
tested, there's a need for big precision, for example, the predicates.csv
with various seeds generate score 0 responses in a bigger range, i just
wanna know if predicates-1.3.csv has others responses and/or it happens
just in some cases. I need this to refine the parameters to choose between
precision vs less evals.

"How the variable length representation could found these values?"
I need this to know how or if this representation can found cases like this.

"Is there a seed value..." I want to know to compare to what i found.

And the last one: I do not exactly know how the reduction work (yet),
knowing what it can or not reduce helps me to set boundaries.
E.g. Given a problem that i know that is solved with a contin value of 100,
it's better to give contin a small boundaries like [-16,16] and expect it
to find something like *(10 10) or *(10 *(10 $x)) letting it have a bigger
complexity (but reducible), or should i search between [-128, 128] and
having to do more evals to converge?
This example doesn't fit very well the question, but represents how little
i still don't know about the reduction and Moses in general.

So yes, you can consider my obvious questions as real.

On Sat, Sep 5, 2015 at 3:27 PM, Linas Vepštas [email protected]
wrote:

Are these rhetorical questions, or do you want someone to answer them?

On Sat, Sep 5, 2015 at 1:11 PM, arleyristar [email protected]
wrote:

How often things like this happen:
Using a very simple chaotic search in predicates-1.3 i found (in
different
seeds and the most simple ones):
0 and(0<(+(-1.30989714835535209 $x)) 0<($y) $g)
0 and(0<(+(-1.30827373948606507 $x)) 0<($y) $g)
0 and(0<(+(-1.3062848433168337 $x)) 0<($y) $g)

But:
-1 and(0<(+(-1.31160418347260088 $x)) 0<($y) $g)
-1 and(0<(+(-1.29389487563024619 $x)) 0<($y) $g)
-1 and(0<(+(-1.29023859074577651 $x)) 0<($y) $g)

Seriously that a 0 score is inside a range that is less than 0.018 (and
can
be less than that yet)? And for one of the simplest problems?
How the variable length representation could found these values?
Is there a seed value that i can find a 0 score using HC with variable
length representation that has the same complexity as above?
The smallest that i found after some random tests was (with a increased
eval number):
0 and(not(0<(*(+(55 *(-42 $x)) $y))) 0<($y) $g)

And in:
0 and(0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) 0<($y) $g)
Correct me if i'm wrong but this first term i think that i can rewrite in
something like this:
0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) ==>
((1.06447865729134317 * x) - 1.38413811600397763) > 0
Why it isn't reduced to something like:
((1.06447865729134317 * x) - 1.38413811600397763) > 0 ==> x >
(1.38413811600397763 / 1.06447865729134317) ==>
x > 1,300296729 ==> 1,300296729<($x)

On Mon, Aug 31, 2015 at 10:05 PM, Linas Vepštas <
[email protected]>
wrote:

You can do multiple pull requests, that might make things easier to
review
and to merge.

Ignore circle-ci for now. Its tempermental.


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

@ngeiswei
Copy link
Member

ngeiswei commented Sep 7, 2015

Hi @ArleyRistar I've only experimented learning contin-boolean expressions a few times, long ago. It obviously isn't extremely mature. Regarding

0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) ==> ((1.06447865729134317 * x) - 1.38413811600397763) > 0 Why it isn't reduced to something like: ((1.06447865729134317 * x) - 1.38413811600397763) > 0 ==> x > (1.38413811600397763 / 1.06447865729134317) ==> x > 1,300296729 ==> 1,300296729<($x)

There is no x > y operator in combo, only 0>. If there were such + corresponding algebraic rules, I suppose the reduct engine would be able to reduce the expression as you show.

@linas
Copy link
Member

linas commented Sep 9, 2015

Oh,

Note also: there is no rule to reduce a contin expression by dividing out
constants. I beleive that this is because division would do great violence
to the old power-of-2-bits representation. e.g. 1/3 becomes an infinite
series of base-2 bits.

If we switch to float pt reps, then having a reduction rule that divides
out constants would be a good idea. .. necessary, even.

#19

--linas

On Mon, Sep 7, 2015 at 2:06 AM, ngeiswei [email protected] wrote:

Hi @ArleyRistar https://github.com/arleyristar I've experimented
learning contin-boolean expressions a few times, long ago. It obviously
isn't extremely mature. Regarding

0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) ==> ((1.06447865729134317 * x) - 1.38413811600397763) > 0 Why it isn't reduced to something like: ((1.06447865729134317 * x) - 1.38413811600397763) > 0 ==> x > (1.38413811600397763 / 1.06447865729134317) ==> x > 1,300296729 ==> 1,300296729<($x)

There is no x > y operator in combo, only 0>. If there were just operator

  • corresponding algebraic rules, I suppose the reduct engine would be able
    to reduce the expression as you show.


Reply to this email directly or view it on GitHub
#10 (comment).

@ArleyRistar
Copy link
Author

Sorry, but these days I couldn't do anything for the project, at night (for
me) i'll do the merge and add a test for PS that Nil asked.
About this reduction, i think i could see it, but i have to "finish" the
representation change.
When i say finish i mean find a good replacement for what neighborhood
sampling did to contin; And i'm pretty sure that the discrete part is a
little slower now, probably i let some part without optimization when
changing the representation, i have to fix it before make a pull request. I
kind was in a hurry when doing this part because the end of GSoC was near,
so it still have little comments, i have to do it too.

On Wed, Sep 9, 2015 at 3:17 PM, Linas Vepštas [email protected]
wrote:

Oh,

Note also: there is no rule to reduce a contin expression by dividing out
constants. I beleive that this is because division would do great violence
to the old power-of-2-bits representation. e.g. 1/3 becomes an infinite
series of base-2 bits.

If we switch to float pt reps, then having a reduction rule that divides
out constants would be a good idea. .. necessary, even.

#19

--linas

On Mon, Sep 7, 2015 at 2:06 AM, ngeiswei [email protected] wrote:

Hi @ArleyRistar https://github.com/arleyristar I've experimented
learning contin-boolean expressions a few times, long ago. It obviously
isn't extremely mature. Regarding

0<(+(-1.38413811600397763 *(1.06447865729134317 $x))) ==>
((1.06447865729134317 * x) - 1.38413811600397763) > 0 Why it isn't reduced
to something like: ((1.06447865729134317 * x) - 1.38413811600397763) > 0
==> x > (1.38413811600397763 / 1.06447865729134317) ==> x > 1,300296729 ==>
1,300296729<($x)

There is no x > y operator in combo, only 0>. If there were just operator

  • corresponding algebraic rules, I suppose the reduct engine would be
    able
    to reduce the expression as you show.


Reply to this email directly or view it on GitHub
#10 (comment).


Reply to this email directly or view it on GitHub
#10 (comment).

ngeiswei added a commit to ngeiswei/moses that referenced this issue May 27, 2020
Merge opencog/moses to singnet/moses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants