Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zenfs gc sekhar feature #4

Open
wants to merge 40 commits into
base: master
Choose a base branch
from
Open

Zenfs gc sekhar feature #4

wants to merge 40 commits into from

Conversation

soumendus
Copy link

No description provided.

soumendus added 24 commits June 27, 2021 11:47
ext_to_zone_map
UpdateMetadataAfterMerge() function
friend class ZenFSGCWorker;
Fixed Misplaced unlock
No need for ext_to_zone_map, removing it
Changed return type from void to IOStatus
Added MoveValidDataToNewDestZone() function
…rn type and removed ext_to_zone_map related code.

Modified UpdateMetadataAfterMerge() function to support IOStatus return type and removed ext_to_zone_map related code.
Changed void from IOStatus
Missing variable IOStatus
comment
Added ReadExtent() function declaration
Added definition of ReadExtent()
Modified MoveValidDataToNewDestZone()
Type cast to char*
TODO comment
Modified MoveValidDataToNewDestZone() and added helper ReadExtent()
fixed Slice param
@soumendus soumendus requested review from levichen94, royguo, shampoo365, Yuanliang-Wang and ZYYByteDance and removed request for royguo July 6, 2021 22:01
@skyzh
Copy link
Collaborator

skyzh commented Jul 7, 2021

@soumendus I have updated your code and formatted it. Please run git pull to update your local branch.

int dont_read = 0;

// Sort the Extent list in decreasing order.
std::sort(extent_list.begin(), extent_list.end(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is okay to change the extent order of a file? File content is store sequentially as its order of extent. Say that we have three extents, 1: 1KB, 2: 1KB, 3: 2KB. When we read from the 2048 byte, we should begin at extent 3. After we sort the extents, the file content is changed. I have mistaken this extent list for file extent list. So I have another question: How to construct the contents of ZenFSGCWorker? Do we have any plan on this?

fs/io_zenfs.cc Show resolved Hide resolved
fs/io_zenfs.cc Outdated Show resolved Hide resolved
fs/io_zenfs.cc Outdated Show resolved Hide resolved
fs/io_zenfs.cc Outdated Show resolved Hide resolved

// Store the new starting position for the extent
// which will be later made persistent.
new_start = zone_dst->wp_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there could be multiple threads appending data to a zone, how could we ensure that new_start is really at zone_dst->wp_?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there could be multiple threads appending data to a zone, how could we ensure that new_start is really at zone_dst->wp_?

Yes that's a valid concern. We need to figure out a way to serialize that. I did not find a way to lock the zone and operate on it or maybe there is a way. Perhaps we need a bit more thought on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As far as I can see, in ZenFS, there is only one writable file in one zone. AllocateZone will mark a zone as being written after allocating, making this zone not being able to used by other files for write. Maybe allocating free zone in dst_zone_list could solve this issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As far as I can see, in ZenFS, there is only one writable file in one zone. AllocateZone will mark a zone as being written after allocating, making this zone not being able to used by other files for write. Maybe allocating free zone in dst_zone_list could solve this issue.

Hmm. OK, we have to add that logic in the function GetDestZoneToMoveValidData() and the function should push zones to the dst_zone_list member variable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. As far as I can see, in ZenFS, there is only one writable file in one zone. AllocateZone will mark a zone as being written after allocating, making this zone not being able to used by other files for write. Maybe allocating free zone in dst_zone_list could solve this issue.

Hmm. OK, we have to add that logic in the function GetDestZoneToMoveValidData() and the function should push zones to the dst_zone_list member variable.

  • For getting the list of destination zones, in the function GetDestZoneToMoveValidData(), we may need to call AllocateZone() with open_for_write_ set to false or write another version of AllocateZone() ex. AllocateZoneForGC() which sets that flag to false. Then in the function MoveValidDataToNewDestinationZone(), after we fetch the current parameters of the destination zones like wp_ (write pointer), we can set the flag open_for_write_ back to true and then start appending to that zone.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in https://github.com/bzbd/zenfs/blob/master/fs/zbd_zenfs.cc#L518, allocate a zone without setting open_for_write to true may lead to some issues. The zone might be seen in "open but not allocated" state, and will be allocated to other files. Therefore, I don't think this would work.

fs/io_zenfs.cc Show resolved Hide resolved
fs/io_zenfs.cc Outdated Show resolved Hide resolved
ZoneFile* file_moved;
file_moved = *zone_file_it;

// What if the file is deleted before coming here?
Copy link
Collaborator

@skyzh skyzh Jul 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should lock the metadata and scan if there is any new records before updating metadata. This may require more careful design work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOStatus s; is never assigned or modified throughout this function...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should lock the metadata and scan if there is any new records before updating metadata. This may require more careful design work.

SyncFileMetadata() function calls the PersistRecord() function which holds lock before updating the metadata so I did not use any lock. Maybe you are right. I was thinking maybe we can remove the UpdateMetadataAfterMerge() completely and somehow figure out to update the metadata changes in the MoveValidDataToNewDestZone() function itself. It would have been less code and easier because why to have one more function. But to do that we need to get the reference to the files whose extents have been moved. From the files, its easier to get the reference to the extents but not sure how to get the reference of the files from the extents.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. If possible, please update the design doc for new information.

std::vector<ZoneFile*> files_moved_to_dst_zone;

std::atomic<uint64_t>
total_residue_; // Is atomic necessary since only one thread at one time?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that this variable is not used in this patch?

// will have a new starting position. No need to
// change the length of the extent as it will be the
// same.
ext->start_ = new_start;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same lock issue here. When we are changing extent information, it is possible that some other operating is in the process of reading where the extent is. We need to apply some kind of lock throughout the GC process.


// The current zone cannot fit this extent because of lack
// of space, so get the next zone from the dst_zone_list.
zone_it++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider the case that there is not enough destination zone?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider the case that there is not enough destination zone?

Yes, that possibility could arise. The function GetDestZoneToMoveValidData() should do that math and figure out how many zones are needed to fit all the extents(cold valid data) and called AllocateZone() for moving the valid cold data. If the desired amount of zones to fit all the cold valid data( or extents ) is not possible to allocate then it will send IOStatus error that "Not enough zones". In that case, the GC cannot progress. Maybe a slight design change can be done that we try to move only that much valid data, for which we have space(zones). Since this is a periodic background reclaim, in the next cycles, the remaining zones could be reclaimed. Just a thought, but this could make the implementation intricate.

ZoneFile* file_moved;
file_moved = *zone_file_it;

// What if the file is deleted before coming here?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. If possible, please update the design doc for new information.

Copy link
Collaborator

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good.

@skyzh
Copy link
Collaborator

skyzh commented Jul 13, 2021

I've seen an issue in RocksDB facebook/rocksdb#8504 about NoSpace condition. It seems that it's better not to use == to check the IOStatus. Instead, we should check the status code inside. (Not sure if this is necessary in ZenFS)

@soumendus
Copy link
Author

I've seen an issue in RocksDB facebook/rocksdb#8504 about NoSpace condition. It seems that it's better not to use == to check the IOStatus. Instead, we should check the status code inside. (Not sure if this is necessary in ZenFS)

I saw, 's == IOStatus::NoSpace()' has been used at several other places in the ZenFS. So it might be OK.

For instance the following code in the file fs_zenfs.cc, 's == IOStatus::NoSpace()' has been used.
IOStatus ZenFS::PersistSnapshot(ZenMetaLog* meta_writer) {
.
.
s = WriteSnapshotLocked(meta_writer);
if (s == IOStatus::NoSpace()) {
Info(logger_, "Current meta zone full, rolling to next meta zone");
s = RollMetaZoneLocked();
}
.
.
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants