Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file system architecture doc #309

Merged
merged 6 commits into from
Aug 8, 2023
Merged

Conversation

sollhui
Copy link
Contributor

@sollhui sollhui commented Jul 29, 2023

In #302 the doc of the architectural pattern of BanyanDB, which uses remote storage system. Under this storage mode, adding the architecture design of the file system.

@wu-sheng
Copy link
Member

All graphs of docs are versioned here, https://github.com/apache/skywalking-website/tree/master/static/doc-graph/banyandb

And you should refer the website(skywalking.a.o) path after the graph hosted on the website.

@wu-sheng wu-sheng requested a review from hanahmily July 29, 2023 23:44
@wu-sheng wu-sheng added the documentation Improvements or additions to documentation label Jul 29, 2023
@wu-sheng wu-sheng added this to the 0.5.0 milestone Jul 29, 2023
@sollhui
Copy link
Contributor Author

sollhui commented Jul 30, 2023

@hanahmily Plz review the picture in doc is good, and then I pull request to skywalking-website.
And where the picture and io_uring picture should I put in? In the 0.5.0 or 1.0.0?

@hanahmily
Copy link
Contributor

@hanahmily Plz review the picture in doc is good, and then I pull request to skywalking-website. And where the picture and io_uring picture should I put in? In the 0.5.0 or 1.0.0?

In the FS system, the Remote Storage option is not mandatory. To represent it, you can either draw a dotted line or use two separate diagrams.

I should be in the 0.5.0 where the diagram is first introuced.

@sollhui
Copy link
Contributor Author

sollhui commented Jul 31, 2023

@hanahmily Plz review the picture in doc is good, and then I pull request to skywalking-website. And where the picture and io_uring picture should I put in? In the 0.5.0 or 1.0.0?

In the FS system, the Remote Storage option is not mandatory. To represent it, you can either draw a dotted line or use two separate diagrams.

I should be in the 0.5.0 where the diagram is first introuced.

Got it, I will pull request to skywalking-website and can review there.

@wu-sheng
Copy link
Member

Local file system calling remote file system? What is that about? I don't think we should list that. If there is a local fs API mock for remote system, that is local fs API side thing, not ours.

@sollhui
Copy link
Contributor Author

sollhui commented Jul 31, 2023

Local file system calling remote file system? What is that about? I don't think we should list that. If there is a local fs API mock for remote system, that is local fs API side thing, not ours.

It is not local file system calling remote file system.Under the strategy of using remote storage, data is not directly stored on remote storage, but is first stored in the local file system and then transferred remotely after the file reaches a certain size.

@wu-sheng
Copy link
Member

Local file system calling remote file system? What is that about? I don't think we should list that. If there is a local fs API mock for remote system, that is local fs API side thing, not ours.

It is not local file system calling remote file system.Under the strategy of using remote storage, data is not directly stored on remote storage, but is first stored in the local file system and then transferred remotely after the file reaches a certain size.

Yes, that is how the remote file system works. But is that BanyanDB's concern? @hanahmily I am not aware of we are getting that far.

@hanahmily
Copy link
Contributor

Local file system calling remote file system? What is that about? I don't think we should list that. If there is a local fs API mock for remote system, that is local fs API side thing, not ours.

It is not local file system calling remote file system.Under the strategy of using remote storage, data is not directly stored on remote storage, but is first stored in the local file system and then transferred remotely after the file reaches a certain size.

Yes, that is how the remote file system works. But is that BanyanDB's concern? @hanahmily I am not aware of we are getting that far.

Certainly. The FS system, a virtual file system, hides the details of all the OS-related or third-party APIs. As I previously noted in clustering, S3, a classic remote share storage system, will be integrated into the FS system discussed here.

But the S3 integration is at the very beginning phase, we only need to document where the integration point is in the banyandb.

@wu-sheng
Copy link
Member

Certainly. The FS system, a virtual file system, hides the details of all the OS-related or third-party APIs

This is my point. How virtual fs works should not be explained by us.
From our arch graph perspective, if we have two storage adaptors, one for local, the other for remote, let's make them parallel listed as the callees of fs adaptor layers. If the remote fs is always considered hidden from local fs, I think local fs is our only adapter target.

@hanahmily
Copy link
Contributor

This is my point. How virtual fs works should not be explained by us. From our arch graph perspective, if we have two storage adaptors, one for local, the other for remote, let's make them parallel listed as the callees of fs adaptor layers. If the remote fs is always considered hidden from local fs, I think local fs is our only adapter target.

There are two options here:

  1. The locally mounted storage
  2. The locally mounted storage + shared remote storage

Based on the above, two separate make more sense to me.

@wu-sheng
Copy link
Member

wu-sheng commented Aug 1, 2023

If there are two implementations, there are two boxes. That is basically what I am saying. Two implementations mean remote fs is not hidden behind the local fs.

@sollhui
Copy link
Contributor Author

sollhui commented Aug 1, 2023

If there are two implementations, there are two boxes. That is basically what I am saying. Two implementations mean remote fs is not hidden behind the local fs.

In this case, How to represent from local system to remote storage system?Even if remote storage systems are used, data will inevitably be stored on the local file system. How to express this process?

@wu-sheng
Copy link
Member

wu-sheng commented Aug 1, 2023

In this case, How to represent from local system to remote storage system?Even if remote storage systems are used, data will inevitably be stored on the local file system. How to express this process?

Who did this fetching and caching?

@sollhui
Copy link
Contributor Author

sollhui commented Aug 2, 2023

In this case, How to represent from local system to remote storage system?Even if remote storage systems are used, data will inevitably be stored on the local file system. How to express this process?

Who did this fetching and caching?

The storage adapter did this operation, which is a component used for remote transmission. My image needs to be changed, and it should be fs api->local file system->(storage adapter->remote storage system)(optional).Is it suitable?

@wu-sheng
Copy link
Member

wu-sheng commented Aug 2, 2023

I think you should have

fs API --> local fs adaptor
       --> remote fs adaptor --> (local fs <-> remote fs)

@sollhui
Copy link
Contributor Author

sollhui commented Aug 2, 2023

I think you should have

fs API --> local fs adaptor
       --> remote fs adaptor --> (local fs <-> remote fs)

Can we merge local fs adaptor and remote fs adaptor to fs adaptor? Perhaps it would be more concise.

@wu-sheng
Copy link
Member

wu-sheng commented Aug 2, 2023

Isn't the that API FS about? It is the abstract layer, then all implementations of file FS API are your adaptors.

@sollhui
Copy link
Contributor Author

sollhui commented Aug 2, 2023

Isn't the that API FS about? It is the abstract layer, then all implementations of file FS API are your adaptors.

Sorry, I don't quite understand the meaning of this sentence. The purpose of the fs API is to provide a unified file interface, so developers don't have to worry about the differences between different storage systems and operating systems. That's what I want to express
image

@hanahmily
Copy link
Contributor

There are some key issues I have to elaborate on here:

The interaction between remote and local systems is at a higher level in the invoking chain, and it is independent of the file system. It is up to the user to specify which files are local or remote. The fs system has to be capable of adapting both types of files in a process.

There are two use cases:

  1. All banyandb's files are local files
  2. The index and data files are remote files, but the control, meta, and wal log file are local.

Based on the above, I prefer to draw two boxes to describe the two cases.

How to represent from local system to remote storage system?Even if remote storage systems are used, data will inevitably be stored on the local file system. How to express this process?

You don't have to. The user will read the wal log from local, then write to the remote data file. FS system doesn't handle the process. You don't have to draw the relationship either.

Isn't the that API FS about? It is the abstract layer, then all implementations of file FS API are your adaptors.

No, it doesn't belong to the adapter as I mentioned above.

@codecov-commenter
Copy link

codecov-commenter commented Aug 3, 2023

Codecov Report

Merging #309 (2731720) into main (20ce7d6) will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #309      +/-   ##
==========================================
+ Coverage   39.62%   39.64%   +0.01%     
==========================================
  Files         104      104              
  Lines       11101    11101              
==========================================
+ Hits         4399     4401       +2     
+ Misses       6268     6266       -2     
  Partials      434      434              

see 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@hanahmily hanahmily merged commit 4d105d0 into apache:main Aug 8, 2023
12 checks passed
@sollhui sollhui deleted the fs-architecture branch August 20, 2023 11:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants