Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression with package restore: RID Graph not respected when selecting native library from nuget package #4195

Open
erikest opened this issue Jan 20, 2020 · 17 comments
Assignees
Milestone

Comments

@erikest
Copy link

erikest commented Jan 20, 2020

According to RID Graph, when restoring nuget packages, it should be the case that the when a library for a specific rid is available it should prefer that before walking the fallback graph. Instead, the behavior I'm seeing is that it conflicts with a library in a fallback rid.

I've added support for gRPC with .net core on raspberry pi with a package for that specific rid: linux-arm. My nuget package uses the convention for native libraries runtimes\<rid>\native\<lib>. The grpc core package also uses this convention, but doesn't have the linux-arm rid, just linux. Both packages are required when publishing for arm, but with the native library from mine and the .net assembly from grpc core.

When publishing with -r linux-arm, I see the following:

Encountered conflict between 'CopyLocal:C:\Users\Geocoder\.nuget\packages\grpc.core\2.26.0\runtimes\linux\native\libgrpc_csharp_ext.x86.so' and 'CopyLocal:C:\Users\Geocoder\.nuget\packages\libgrpc_csharp_ext.arm7\1.0.9\runtimes\linux-arm\native\libgrpc_csharp_ext.x86.so'. Choosing 'CopyLocal:C:\Users\Geocoder\.nuget\packages\grpc.core\2.26.0\runtimes\linux\native\libgrpc_csharp_ext.x86.so' arbitrarily as both items are copy-local and have equal file and assembly versions.

Here are steps to reproduce:

create a new project and add the packages

dotnet new console -o grpcTest
dotnet add package Grpc.Core
dotnet add package Grpc.Net.Client
dotnet add package Grpc.Tools
dotnet add package libgrpc_csharp_ext.arm7 --version 1.1.0

publish to arm

dotnet publish -r linux-arm --self-contained
@erikest erikest changed the title Publish command doesn't respect runtime identifier when selecting native library from nuget packages Regression with Publish command: RID Graph not respected when selecting native library from nuget package Jan 29, 2020
@erikest erikest changed the title Regression with Publish command: RID Graph not respected when selecting native library from nuget package Regression with package restore: RID Graph not respected when selecting native library from nuget package Jan 29, 2020
@bitzeal-johan
Copy link

Is there a work-around?

@Liriel
Copy link

Liriel commented Dec 3, 2020

@bitzeal-johan yes there is a workaround. You can compile the lib yourself as @erikest outlined in https://github.com/erikest/libgrpc_csharp_ext and then replace the binary in the output folder

dotnet publish -r linux-arm --self-contained
cp libgrpc_csharp_ext.x86.so ./bin/Debug/netcoreapp3.1/linux-arm/publish/

I used this method a while ago (on net3.1 though)

@richlander
Copy link
Member

richlander commented Jul 9, 2021

I just tried this with the .NET 6 SDK. I don't see this error.

This is what I see in terms of the .so file that was mentioned.

root@a1e6b08d8845:/grpcTest# find . | grep libgrpc_csharp_ext.x86.so
./bin/Debug/net6.0/linux-arm/publish/libgrpc_csharp_ext.x86.so
./bin/Debug/net6.0/linux-arm/libgrpc_csharp_ext.x86.so

Perhaps the packages have been updated.

@Liriel
Copy link

Liriel commented Jul 12, 2021

@richlander also true for .NET 5

$ find . | grep libgrpc_csharp_ext.x86.so
./bin/Debug/net5.0/linux-arm/publish/libgrpc_csharp_ext.x86.so
./bin/Debug/net5.0/linux-arm/libgrpc_csharp_ext.x86.so

the package libgrpc_csharp_ext.arm7 remains unchanged however

@kleisauke
Copy link

The issue persists with .NET 8.0 and .NET 9.0 preview, see:

$ dotnet new console -o netvips
$ cd netvips/
$ dotnet add package NetVips.Native.linux-arm64 --version 8.15.2
$ dotnet add package NetVips.Native.linux-musl-arm64 --version 8.15.2
$ dotnet publish -r linux-musl-arm64 -o .
$ echo "3aa9e0d2865e8a65a0d4d7fa4011e1ab2b6e5c313b53d7584363da817db2b45f libvips.so.42" | sha256sum --check --quiet && echo "arm64-musl libvips.so.42 detected"
libvips.so.42: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
$ echo "d9a06f7adce8a07213c68ba0be7046a255d2952893174fed5b1187589e770aa2 libvips.so.42" | sha256sum --check --quiet && echo "arm64-glibc libvips.so.42 detected"
arm64-glibc libvips.so.42 detected
$ readelf -d libvips.so.42 | grep NEEDED
 0x0000000000000001 (NEEDED)             Shared library: [libresolv.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Introducing the linux-glibc-arm64 RID could possibly fix this, as it ensures that this fallback never happens, see: dotnet/runtime#7318

This was originally reported at kleisauke/net-vips#233.

@kleisauke
Copy link

I further debugged this issue using the following command:

$ dotnet publish -r linux-musl-arm64 -o . -v diag | grep "arbitrarily"

Which outputs:

Encountered conflict between 'CopyLocal:/home/kleisauke/.nuget/packages/netvips.native.linux-arm64/8.15.2/runtimes/linux-arm64/native/libvips.so.42' and 'CopyLocal:/home/kleisauke/.nuget/packages/netvips.native.linux-musl-arm64/8.15.2/runtimes/linux-musl-arm64/native/libvips.so.42'. Choosing 'CopyLocal:/home/kleisauke/.nuget/packages/netvips.native.linux-arm64/8.15.2/runtimes/linux-arm64/native/libvips.so.42' arbitrarily as both items are copy-local and have equal file and assembly versions.

This message is printed as part of the conflict resolution logic in the .NET SDK, specifically:

if (item1.ItemType == ConflictItemType.CopyLocal && item2.ItemType == ConflictItemType.CopyLocal)
{
// If two items are copy local, we must pick one even if versions are identical, as only
// one of them can be copied locally. The policy here must be deterministic, but it can
// be chosen arbitrarily. The assumption is that the assemblies are fully semantically
// equivalent.
//
// We choose ordinal string comparison of package id as a final tie-breaker for this case.
// We will get here in the real case of frameworks with overlapping assemblies (including
// version) and self-contained apps. The assembly we choose here is not guaranteed to match
// the assembly that would be chosen by the host for a framework-dependent app. The host
// is free to make its own deterministic but arbitrary choice.
int cmp = string.CompareOrdinal(item1.PackageId, item2.PackageId);
if (cmp != 0)
{
var arbitraryWinner = cmp < 0 ? item1 : item2;
LogMessage(conflictMessage, Strings.ChoosingCopyLocalArbitrarily_Info, arbitraryWinner.DisplayName);
return arbitraryWinner;
}
}

The resolver always prefers the NetVips.Native.linux-arm64 package over NetVips.Native.linux-musl-arm64 because, in an ordinal string comparison, linux-arm64 precedes linux-musl-arm64.

The assumption that the assemblies are fully semantically equivalent is questionable, as it does not hold true in this case.

@am11
Copy link
Member

am11 commented Oct 27, 2024

As it stands, we can condition the glibc package like this: Condition="$(RuntimeIdentifier.StartsWith('linux-')) and !$(RuntimeIdentifier.StartsWith('linux-musl-'))" and musl one: Condition="$(RuntimeIdentifier.StartsWith('linux-musl-'))"which gives toolchain proper gestures to pick the right dependency. We have similar conditions in NativeAOT integration.

@richlander
Copy link
Member

richlander commented Oct 27, 2024

@baronfel -- is this a change we can make?

@baronfel baronfel self-assigned this Oct 27, 2024
@baronfel baronfel added needs team triage Requires a full team discussion Area-NetSDK labels Oct 27, 2024
@baronfel
Copy link
Member

Looks like this one wasn't tagged so it was never triaged by the team. We'll discuss this week.

@kleisauke
Copy link

For triage purposes: the original issue was a conflict with the fallback linux RID, which is inherited by the linux-arm RID. Both provided a shared library named libgrpc_csharp_ext.x86.so, but packaged in different directories (i.e. runtimes/linux/native/ and runtimes/linux-arm/native/) across separate NuGet packages.

That issue was only reproducible with the Grpc.Core package versions prior version 2.34.0 and the libgrpc_csharp_ext.arm7 package. These 32-bit libraries were removed in commit grpc/grpc@335a9ea, and the RIDs were revised to arch-specific ones in commit grpc/grpc@6dbdedd.

@baronfel
Copy link
Member

sdk-issue-4195.zip

Here's a binlog of @kleisauke's comment above. It seems pretty clear to me that we need a proper disjoint RID graph to disambiguate the two assets - they are being written to the project.assets.json in a correct way:

"NetVips.Native.linux-arm64/8.15.2": {
        "type": "package",
        "compile": {
          "lib/netstandard1.0/_._": {}
        },
        "runtime": {
          "lib/netstandard1.0/_._": {}
        },
        "runtimeTargets": {
          "runtimes/linux-arm64/native/libvips.so.42": {
            "assetType": "native",
            "rid": "linux-arm64"
          }
        }
      },
      "NetVips.Native.linux-musl-arm64/8.15.2": {
        "type": "package",
        "compile": {
          "lib/netstandard2.1/_._": {}
        },
        "runtime": {
          "lib/netstandard2.1/_._": {}
        },
        "runtimeTargets": {
          "runtimes/linux-musl-arm64/native/libvips.so.42": {
            "assetType": "native",
            "rid": "linux-musl-arm64"
          }
        }

but because our entire toolchain doesn't recognize that linux-musl isn't necessarily compatible with linux that original sin taints our entire resolution/asset copying process.

@am11
Copy link
Member

am11 commented Oct 29, 2024

It goes back to a very old discussion dotnet/runtime#3075 (comment). Basically RID is not guaranteeing "binary compatibility". I think in this case, when both linux-musl-{arch} and linux-{arch} assets are present, all we need is to give precedence to linux-musl one. Same goes for linux-bionic (aka Android) etc.

@marcpopMSFT
Copy link
Member

Triage: @elinor-fung this looks like a potential ordering issue in the RID graph. Would it make sense to fix it there rather than us trying to special casing this in conflict resolution?

@elinor-fung
Copy link
Member

I think I'm missing what the potential ordering issue is in the RID graph? In the graph itself, linux-musl-<arch> does get priority over linux-<arch>.

My understanding of this issue is that the appropriate assets for each package are selected as expected (per the RID graph, with linux-musl taking priority of linux). But in this repro with multiple packages provide the same asset, the SDK chooses one alphabetically. Could it use the RID graph as part of the conflict resolution?

@kleisauke
Copy link

According to kleisauke/net-vips#233 (comment), merging NetVips.Native.linux-arm64 and NetVips.Native.linux-musl-arm64 into a single NuGet package may resolve the issue (unverified).

@zotanmew
Copy link

We've been using this method for a while and have had no more issues with the wrong native dependency being loaded. However, it's unfortunate given that repackaging anything other people choose to have split (which is a perfectly valid strategy) is extra work for us, so I'd love to see this fixed in the sdk.

@baronfel
Copy link
Member

My understanding of this issue is that the appropriate assets for each package are selected as expected (per the RID graph, with linux-musl taking priority of linux). But in this repro with multiple packages provide the same asset, the SDK chooses one alphabetically. Could it use the RID graph as part of the conflict resolution?

Thanks for clarifying @elinor-fung - this does make sense to me when it's put like that.

sdk-issues-4195.zip

I've attached a binlog for future analysis - the main blocker that I see to easily doing this work is that the ReferenceCopyLocalPaths Item[] parameter (which comes from the ReferenceCopyLocalPaths MSBuild Item type) has the following metadata:

Image

Ideally we would annotate files from the packages that are RID-specific with that data, and then we could potentially use that data to de-conflict. So I see two units of work here:

  • flow RID/TFM data for assets of various types into the relevant MSBuild Items. This data does exist in the project.assets.json already, so we hopefully just need to find the correct location in the build to attach this data.
  • use the new RID data to de-conflict, applying a 'best match' algorithm based on the RID being targeted (which would be a new input to conflict resolution) and the RIDs of the conflicting assets.

@marcpopMSFT marcpopMSFT removed the needs team triage Requires a full team discussion label Nov 5, 2024
@marcpopMSFT marcpopMSFT added this to the 10.0.1xx milestone Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests