-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cephfs: add support for cache management callbacks #605
Comments
This version is a bit old at this point (8 months ago), but let's assume that it's not an issue for the moment. Can you also please let us know what versions of the ceph libraries are being linked with go-ceph and the version of ceph running on the server side? Thanks.
OK, I think I understand the general use-case.
OK thanks. That warning is issued when the mds wants clients to revoke inodes from the client cache. Refs:
When you restart the ceph-service pod the warnings stop?
Possibly, but to determine if that's the case we need to understand more about how this condition is expected to be resolved by clients using the high-level api calls. go-ceph relies upon the C based APIs provided by ceph. Currently, the only api calls we use are what I call the high-level calls and there's not a lot of control we have over the behavior of things like inode cache or caps. My gut feeling is that is more likely an issue with the version of the ceph libs in use or the high level libs in general. I suggest first checking to see if this is a known issue with libcephfs. If there are api calls pertaining to cache management I am not aware of them. So we might want to strike up a conversation with the maintainers of libcephfs too. |
@phlogistonjohn you can ignore the "uat and product environment", when I restart the ceph-service pod, the warnings stop. ceph version 14.2.21 (5ef401921d7a88aea18ec7558f7f9374ebd8f5a6) nautilus (stable) ceph library : libcephfs/librados2/librbd1 version
|
The pleasant coincidence of a ceph mailing list post indicating that nfs-ganesha once encountered similar problems and a bit more searching shows that there is now api hooks intended for managing client cache pressure:
I also see that there is a newer version: ceph_ll_register_callbacks2 What I'm not clear about is how the low-level api and the high level apis are meant to interact with regards to the cache and these callbacks. @jtlayton if I may bother you for a moment, since you added the original set of callbacks: is it correct to say that using (only) the high level API doesn't automatically handle the cache pressure requests? Assuming we were to add support for the callbacks how would existing code that uses the high-level api make use of the callbacks? What, if any, additional low-level API functions would be needed to make it useful to someone who has a use-case like @qiankunli's? Thanks for your time! |
Yeah -- we probably need to add some documentation comments to For this problem, you're probably mostly interested in Ganesha keeps an cache of Inode references, and it registers an ->dentry_cb works in a similar fashion for Dentry objects. Ganesha's FSAL_CEPH doesn't use that one, but ceph-fuse does. Probably, you want to expose most or all of these via go-ceph. It's been a while since I did any Go work though, so I won't presume how you should go about it. |
So, to be clear:
Correct. This is really an application-level problem with the ll_* API. The application is generally holding references to Dentry and Inode objects, and usually needs to do something application-specific to release them in response to a request from the MDS to trim them. |
Thanks for the suggestion. Done at https://tracker.ceph.com/issues/53004
OK, thanks for the heads up. I've already looked over the test C file and that helped somewhat. I'll take a look at some other examples too.
OK, thanks!
Sure. I can look into how both systems use the callback APIs down the road, to help serve as examples.
Yeah, we currently have only bindings for a (good, but incomplete) set of the high-level api functions. I have suspected for a while we'd eventually want to cover more of the low level APIs too. On the plus side we've done some callback support for the rbd packge so we have some experience with callbacks between the C and Go layers.
Thanks, that's probably the cause of @qiankunli issues then. One thing I'm still not clear on (but perhaps I just need to review the code closer) how do the high-level and low-level APIs interact. Is there something like a (private) cache the high level calls use that we need to be aware of? |
@qiankunli based on this conversation it appears that the issue is not a bug per-se, but something more architectural. As such, we're interested in improving go-ceph to handle this case but this could be a long process. I'm changing this from a question to an enhancement. However, I can't make any promises as to when I or other contributors will start working on this directly, and when we do it will likely land in parts over time. In the meantime, perhaps it could help if your application periodically "reset" during downtime, or you may just continue what you've been doing... |
The high level API was made to mirror the POSIX filesystem API. It has its own file descriptor table, etc. to closely mirror how the kernel syscall API works. e.g.:
The fd is synthetic and generated by libcephfs as the result of an earlier ceph_open call. The low-level API works with Inode and Fh objects (which are really like file descriptions). So you (usually) do a lookup and then use the resulting Inode or Fh object to do other calls. Once you're done you have to put references to the objects, to ensure there aren't leaks. Aside from that, they mostly end up calling the same code under the hood. |
@phlogistonjohn just introduce the our usage of go-ceph |
go-ceph version: v0.8.0
there is a ceph-service pod running in k8s, which use go-ceph to operate cephfs, like(create/delete dir, create/delete dir), but mds report
I retart ceph-service , mds is ok, is it a bug in go-ceph?
The text was updated successfully, but these errors were encountered: