Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix preserving intermediate symlinks #514

Merged
merged 1 commit into from
Oct 30, 2023
Merged

Conversation

mrahs
Copy link
Collaborator

@mrahs mrahs commented Oct 24, 2023

This patch fixes an oversight in the previous PR #511 where an intermediate symlink is processed just like an input which means all of its tree is also included even though it is not part of the specified input.

That not only inflated the Merkle tree, but also included symlinks to targets outside the execution root which caused failures in builds that do not specify the materialization option.

@mrahs mrahs requested review from ramymedhat and gkousik October 24, 2023 22:50
go/pkg/client/tree.go Outdated Show resolved Hide resolved
go/pkg/client/tree.go Outdated Show resolved Hide resolved
@@ -1038,7 +1039,7 @@ func TestComputeMerkleTree(t *testing.T) {
wantCacheCalls: map[string]int{
"foobarDir/foo": 2,
"foobarDir/bar": 2,
"foobarDir": 5, // 2 as an ancestor to input files, 1 as input, 2 as an ancestor to nested files.
"foobarDir": 5, // 2 as input ancestor, 1 as input, 2 as nested input ancestor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why this is not 3?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foobarDir is itself an input, which means it's going to be accessed itself, that's 1. It's also going to be traversed to include its children, but for each child it's going to be visited once as an ancestor while evaluating symlinks; that's 2. foo and bar are also listed as inputs, so it's going to be visited again while evaluating ancestors; that's another 2.

"foobarDir": 3, // 1 as input, 2 as a real ancestor
"foobarSymDir": 3, // 1 as input, 2 as a symlink ancestor
"foobarDir": 3, // 1 as target of the symlink, 2 as input ancestor
"foobarSymDir": 4, // 1 as input, 1 as added symlink ancestor, 2 as input ancestor
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: some tests use foobarSymDir->foobarDir and some use foobarDir->foobarDirOrig. Can we make it consistent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a compromise to avoid creating more global digests in this test. Basically this allows foobarDir the symlink to materialize as a directory whose digest is already known.

Copy link
Collaborator Author

@mrahs mrahs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a couple of attempts to refactor loadFiles to reduce its complexity, I didn't arrive at a satisfactory result. I may need to widen the scope of the refactor which I'd like to avoid.

Instead, I made a small update to make it clearer when the symlink target should not be followed regardless of the option. I think it's a good compromise for now.

@@ -1038,7 +1039,7 @@ func TestComputeMerkleTree(t *testing.T) {
wantCacheCalls: map[string]int{
"foobarDir/foo": 2,
"foobarDir/bar": 2,
"foobarDir": 5, // 2 as an ancestor to input files, 1 as input, 2 as an ancestor to nested files.
"foobarDir": 5, // 2 as input ancestor, 1 as input, 2 as nested input ancestor
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foobarDir is itself an input, which means it's going to be accessed itself, that's 1. It's also going to be traversed to include its children, but for each child it's going to be visited once as an ancestor while evaluating symlinks; that's 2. foo and bar are also listed as inputs, so it's going to be visited again while evaluating ancestors; that's another 2.

"foobarDir": 3, // 1 as input, 2 as a real ancestor
"foobarSymDir": 3, // 1 as input, 2 as a symlink ancestor
"foobarDir": 3, // 1 as target of the symlink, 2 as input ancestor
"foobarSymDir": 4, // 1 as input, 1 as added symlink ancestor, 2 as input ancestor
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a compromise to avoid creating more global digests in this test. Basically this allows foobarDir the symlink to materialize as a directory whose digest is already known.

This patch fixes an oversight in the previous PR bazelbuild#511 where an
intermediate symlink is processed just like an input which means all of
its tree is also included even though it is not part of the specified
input.

That not only inflated the Merkle tree, but also included symlinks to
targets outside the execution root which caused failures in builds that
do not specify the materialization option.
@mrahs mrahs merged commit e00bd32 into bazelbuild:master Oct 30, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants