Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return remove dir error of table dirStats to trigger retry. #4134

Merged
merged 1 commit into from
Nov 2, 2023

Conversation

JoJossd
Copy link
Contributor

@JoJossd JoJossd commented Nov 1, 2023

The current issue is that when there's an error in deleting dirstats, the rmdir operation still proceeds (the transaction is successfully committed). However, the next time a mkdir operation with the same name is attempted, it will check dirstats. If it already exists, an error is reported, making it impossible to make a directory with the same name.

There are two options to fix this issue:

  1. Return the rmdir error. This will trigger a retry to execute rmdir later.
  2. Ignore existing dirstats during mkdir.

For option2, I think it requires the logic to check if the dirStats table is the only one not removed by the previous transaction, if so, overwrite the old one during mkdir, if not, report EEXIST error. This requires far more changes than option1, so option1 is preferred here.

@Hexilee
Copy link
Contributor

Hexilee commented Nov 1, 2023

Hi @JoJossd, thanks for your contribution! Currently, juicefs tries to clear the value of dir usage when mkdir. However, there is no common operation like insertOrUpdate in the SQL engine.

To keep the SQL engine in the same behaviors as Redis and Transaction-KV, I prefer to update that dir usage record when the insert fails by "duplicated unique key" error. Please refer to https://github.com/juicedata/juicefs/blob/main/pkg/meta/sql.go#L2518

@JoJossd
Copy link
Contributor Author

JoJossd commented Nov 1, 2023

Thanks for the reply @Hexilee . I understand that we should keep the SQL engine in the same behavior as Redis and Transaction-KV. Just one question before further changes, in the doRmdir function, both Redis and Transaction-KV seem to delete dirStats dirQuota in one transaction, while SQL engine could delete dirQuota without successfully deleting dirStats. This is also the reason why I propose option1 to fix the issue in the first place. Please correct me if I misunderstand the logic here.

doRmdir in tkv:

juicefs/pkg/meta/tkv.go

Lines 1441 to 1442 in e39c4a6

tx.delete(m.dirStatKey(inode))
tx.delete(m.dirQuotaKey(inode))

doRmdir in redis:

juicefs/pkg/meta/redis.go

Lines 1577 to 1582 in e39c4a6

pipe.HDel(ctx, m.dirDataLengthKey(), field)
pipe.HDel(ctx, m.dirUsedSpaceKey(), field)
pipe.HDel(ctx, m.dirUsedInodesKey(), field)
pipe.HDel(ctx, m.dirQuotaKey(), field)
pipe.HDel(ctx, m.dirQuotaUsedSpaceKey(), field)
pipe.HDel(ctx, m.dirQuotaUsedInodesKey(), field)

@Hexilee
Copy link
Contributor

Hexilee commented Nov 1, 2023

I agree with you, that deletion error should cause a transaction failure in the SQL engine. And we should also try to "insert or update" dirStats which may be left by older clients.

@JoJossd
Copy link
Contributor Author

JoJossd commented Nov 2, 2023

If dirStats deletion is part of a transaction and the transaction aborts, the entire directory should stay intact. This means that new clients should see the entire old directory when they call mkdir. As per the documentation in the Linux man page, mkdir should return an EEXIST error instead of overwriting the old directory. Therefore, perhaps we shouldn't attempt to "insert or update" any existing file while making a new directory.

@Hexilee
Copy link
Contributor

Hexilee commented Nov 2, 2023

If the entire directory still exists, we should fails mkdir with EEXIST. But old client (. e.g. v1.1.0) may remove the directory but leave dirStat, which causes mkdir failed. New client should work for this case.

@JoJossd
Copy link
Contributor Author

JoJossd commented Nov 2, 2023

I see your point. However, I'm not very familiar with this part of the codebase. Also, I'm currently occupied with other tasks. So I think it would be better if your team handles these changes when they have the time. This will guarantee that the change is implemented correctly and efficiently.

Copy link
Contributor

@Hexilee Hexilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Hexilee Hexilee merged commit 95ec414 into juicedata:main Nov 2, 2023
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants