-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not always hide atoms when using frames. #15
Comments
I'm interested in doing this.
What I found is that currently, the actual hiding takes place with
This is my suggestion how this could be accomplished:
Anyway, I hope I didn't get this entirely wrong! Maybe you could point out what i´m missing. |
Wow! Cool! Umm. Technical questions require detailed answers;
Yes I think so. I wish there was some documentation in the code that explained what that function did! What moron doesn't document their code??? Oh, wait ... Here's a trick:
Use empty string to dump everything. If you're stepping through examples/demos, you can use this to look at the actual contents of the DB, to see if it matches up with the docs, with what you think it should be.
Well, that's the main issue. If I recall correctly, what needs to happen is that the backend needs to be warned that the deletion is going to happen; then the atomspace can delete, and then the backed can be told to finish up with whatever it needs to do. Without this two-step, there's just not enough info available to "do it right": either the atom is already gone and we've lost access to it, or its not yet gone, and we don't know what to do with it next. A kind-of chicken-and-egg problem.
I don't know. Maybe the most important thing is to go through the examples & demos and really be sure that you've got a clear idea of how its supposed to work. I think the code is bug-free (famous last words). The design goal was to make it all "simple and obvious" for the user, which makes the implementation rather complicated. (I recall that it was hard to get it right.) If you're going through the examples, and something seems wrong ... well, that could be a bug. If you're not sure, the |
Oh, several side remarks:
|
Tried to test this in some - as it seemed to me - suitable example with two AS + RocksDB to see if the atom gets deleted. In the situation of this example, we have a base + overlay AS. Now, one can attach a Rocks DB to the overlay.
Now, add sth to the base and check if it is also in the overlay, then add it to rocks:
Then go back to base and delete some atom:
Now, if you look in both overlay and rocks, the deletion did occur as expected. |
Later, I want to try other examples with multiple atomspaces (e.g. frame) |
It is not obvious if this is the desired outcome, or if it should even be allowed. If you think of overlays as going from past to future, i.e. the base space containing the oldest data, and the overlays as being newer, then deleting something in the base space is like changing something in the past. Should that be allowed, or not? What might a reasonable person expect to happen? (I don't yet have good answers for any of this. I kind of follow my nose, to see what "feels right".) Some examples:
Currently, I have only two use cases:
Anything outside these two use cases is unexplored. My gut instinct says that it would be best if lower layers cannot be modified. This includes deletion. There is, however, a flag that sort of allows this; it is used to modify the base while using a temporary (because some results obtained in the temporary need to be copied back into the base.) |
As a starting point for an example with subframes, I looked at the multi space example and I tried to see what happens in the rocks database, if I delete an atom:
Attach Rocks storage to C:
Now we delete the atom from the subframe A:
The atom disappears from C (as we can check with Note: Now, we have the problem that atom is still in A, which might not be intended ? |
Now, let's try a different (maybe more plausible) scenario (again from the multi-space example):
and rocksdb links to atomspace c:
Now,
and again it is purged from C, but not from rocksdb. |
Another modification from the previous example I just tried: have exactly the same atom |
This was mouldering in my email box. Sorry for the late reply. Regarding this comment: #15 (comment) Working as designed. The Atom is not deleted, it is hidden. Let me explain. You wrote:
Try
This states that:
What happened when you did the delete, you only deleted in frame c, but not in frame a. When you say If you closed the storage node, exited, re-opened, restored, you would find that The |
Regardng this comment: #15 (comment) I see an actual bug. But before I explain the bug, let me point out a variation on what you wrote, that does work correctly. If I take what you wrote and also save c, then everything is OK. The optimization, proposed at the very top, is to just go ahead and remove the Along the way, I seem to have spotted an actual, real-life bug. Next comment. |
I opened issue #21 to describe the bug. |
After some lazy weeks, here some suggestions...:
Sometimes, atoms from the incoming set also get deleted. To handle deletion in RocksDB, one could do this:
To elaborate further: |
Some comments, then: Rocks is a single-user system. There shouldn't be a need for a "session id" -- there can only ever be one session, ever. So let me expand on suggestion 2) up at the top. The design would be:
|
It looks like we have to pass the In more detail - it could be done in the following way:
|
Hi @sikefield3 sorry I lost track of my email. Let me review the overall structure, starting from the user API. By this I mean "legal calls the user can make into scheme or python or C++". They can do one of two things:
The first is done by calling The second case is more complicated, and its the one that needs the update. Here's the call chain:
It's this last part,
That's all that is needed on the core AtomSpace side. This will allow the comment about WriteThruProxy to be removed, because the two-step pre-post remove will be able to fix things just so. For rocks, one just has to implement a Does that make sense? See what's happening? |
I've implemented the code for the comment immediately above in opencog/atomspace#3044 I think this will simplify your thinking about all the other parts of this issue. |
Cool! |
Somehow, I wasn't aware of the whole class hierarchy: For |
Yes! Here's the design:
Recall, its all in the wiki: https://wiki.opencog.org/w/StorageNode The
So, different I implemented the simplest ProxyNodes as "proof of concept"; I have not done the fancy ones yet. Mirroring is easy. The Eviction one has not been implemented, it's probably the most important/interesting one to do next. Some of my data runs are just a bit too big to fit into RAM, so I have to kill them and restart. It would be nice to have some |
Here is how
PS: About the |
While I was trying to figure out, why the tests with my implementation of the
Is there a particular reason for this ? |
Hi @sikefield3 My apologies for a very late response. I am seeing your work just now. For the question about atomspace-rocks/opencog/persist/rocks/RocksIO.cc Lines 79 to 107 in 08b21ed
and so it would seem three possibilities: a number, a +number and a -number. Looking at the code, it seems like only +1 and -1 appear, for example, near lines 329 383 412 which seem to say that +1 is a temporary marker. The important one seems to be near line 638 which says
So it appears that About your idea of using |
BTW, I plan to make unrelated changes in the next few hours. You should take your existing work, and make a pull request, and then say something like "pull request is a draft for review" then I won't merge it, but at least I can explore what's going on in there. |
For your earlier comment:
My gut instinct is that you don't need to write anything out during this process. For any given thread, you are guaranteed that if there's a call to There is some slim chance that some other thread will try to remove the same atom, so that there is a |
Sure, I can do that. |
Oh hey, you're there! My apologies for not answering in so long! (I screwed up.) I'm not sure if I should interact with you a lot, or if I should take a "hands-off" approach. If you're unsure of an implementation idea, and want to talk about it, make a pull request, but set it into "draft" form. That makes it a lot easier to review & discuss. Of course, it takes brain-cells & real thinking on my part to review what you're doing, so I'm happy to go hands-off. But if you're spinning and floating and a little lost, its much better if you continue to ask. And if I don't respond in a few days, then ping me one, two, three, four times. I'm sometimes distracted and need the pings to re-prioritize my to-do list. |
No problem! I was preoccupied with other things as well, so there didn't anything happening in the last weeks.
I think the approach with draft pull requests is a very good idea. This way, you can see the misconceptions I still have. In fact I was going to make a PR shortly. About my approach: I am still figuring things out, while coming up with the code. I also use the examples (by modifying them etc.) to see what's going on. In addition, I tried to use gdb to step through some parts of the code, but that proved just a little to cumbersome... And don't worry, if it takes you a bit longer to respond. |
Just made this Draft pull request |
Sorry for the late response: Edit: just had clean sth up |
Let's go back to square one. Restate what the problem actually is, and what the solution needs to do. I should preface: the code works perfectly today and this project is about creating a performance optimization for a special case that involves frames. Please note: almost no one uses frames! There's no screaming, burning need to actually make atom deletion work better: it already works just fine. The only "problem" is that it could use slightly less disk space, for the special case of atoms being deleted in frames. By "slightly less", I mean maybe 100 bytes per atom, and so a user would have to be using frames (and most users don't) and would also have to have 10 million deleted atoms, before they'd notice a (mere) 1GB extra disk-space being used. And they probably won't notice because if their data set is 10GB, and for some reason its 11GB because the delete function wasn't optimized, who cares? So this project is a pretty damn obscure performance optimization inside the rocks code base. It's a reasonable way of learning how the atomspace works, and dinking with it, but it is not a high-priority project, and I'm thinking that perhaps there are much more important and better projects for you to work on, @sikefield3 -- stuff that I actually kind of care about and will have impact. Heh. OK, with that out of the way, let me go back and describe the actual problem. If the user is NOT using frames, then delete works fine, and nothing needs to be done. If the user is using frames, then there is the following scenario:
So far, there are no problems.
Right now, the code is written so everything works; there aren't any bugs (that I'm aware of) in any of this. The optimization is that maybe rocks can delete the atom, instead of marking it as hidden, if the atomspace deleted it. I'm actually confused by the details; a unit test needs to be written to explore this more carefully, to see what actually happens in this mixed-frame deletion process. Something that demonstrates a situation where we can go "ah hah! See! The atom is marked hidden everywhere and since its hidden everywhere, we can actually nuke it forever." Maybe there's even a bug in here, somewhere, I dunno. That would be a good place to start. |
Part of the problem here is that so much time elapses, that I kind of forget what the issue is, and I think you do too, and so we kind of reset onto some wrong track and focus on the wrong thing. Having a unit test that actually shows what the actual problem is would make it a lot easier to think about all of this clearly. |
We could just ask if frames are used - say here - and if not, it works as before without pre-remove and post-remove (I'm not sure. Maybe this is already happening).
Maybe, one could check this in the same place or somewhere else.
That's what I wanted to do, see my earlier comment. At the moment, I can't see that I am missing something apart from the things mentioned above ... |
I agree. I thinks it's for the best to create a unit test before continuing working on bug itself. |
Yes, please. This is just subtle enough that, unless I think hard about it for a good long while, I forget what the issue is, which means that finding the right fix becomes hard. |
When multiple frames are being used, the current rocks code will just hide atoms, instead of removing them, when frames are being used. This results in functionally correct behavior, but is wasteful of storage if the atom can actually be deleted.
The AtomSpace extract code currently implements the correct (reference) implementation: it will either hide an atom, when needed, or it will delete the atom, if possible. It can be used as the final arbiter of whether to delete, or not. The backend should follow this advice.
There right way to solve this is to implement the pre-delete and a post-delete calls in the backend. The atomspace calls pre-delete before doing the deletion, and rocks can gather any needed info for the deletion to happen. Next, the atomspace extracts the atom. Then it calls the post-delete hook. The hook code should look to see what the atomspace did: either the absent flag is set on the atom, or the atom is actually gone. If its only marked absent, then rocks should also hide the atom. else, rocks should delete the atom.
The text was updated successfully, but these errors were encountered: