-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security] CredentialProviderAzure should not implicitly retrieve and use shared storage account keys #20634
Comments
@sugibuchi: Thanks for the detailed breakdown. |
I don't know about removing the functionality as can improve the user experience by making the authentication seamless, but I can make it opt-in to reduce security concerns. |
I am a bit surprised of a security concern if you are able to access those keys. Shouldn't that key not be available then in the first place? Is it possible for us to swap the order @nameexhaustion , first Entra-id then (maybe opted in) shared key. |
I have made this change in the PR - https://github.com/pola-rs/polars/pull/20652/files#r1910147426 |
Removing the implicit use of shared keys has literally no impact on the UX. The current
Even though the Entra ID-based authentication is available, why do we need to use the shared key-based authentication? Why don't we directly use Entra ID-baed auth to access storage accounts?
In other words, once you run Shared key-based authentication should only be used as a last resort when Entra ID-based authentication is unavailable. We don't need to maintain the current logic that retrieves shared keys by using Entra ID-based auth, even as an opt-in option or a fallback logic. Instead, we need an option to specify shared keys explicitly.
|
We have this functionality available as an option as well (passing
It's definitely not a necessity, but for users that work in environments where Entra ID has not been configured, having this available can save the user from having to manually figure out where/how to get the keys from, so I definitely think it's worth it. Additionally, the most important issue regarding security concerns will be resolved after we update it to be opt-in. |
The version in #20652 has a "permission check" that tries to access the metadata of a file system root with Entra ID-based authorization. The version in #20652 suggests activating a fallback mechanism to automatically use the shared key-based auth if it can't access the metadata. However, this complex logic looks over-engineering and potentially leads to unexpected results. For example, we can configure ABAC to allow access to a specific file and refuse to access the file system root. The version in #20652 cannot access blob containers having such ABAC since the "permission check" fails. More importantly, the suggested fallback to shared keys is inappropriate in this case (as the ABAC is configured intentionally not to allow access to the file system root). In general, users should consider RBAC/ABAC/ACL settings first if data access with Entra ID is refused. I don't believe automatic fallback to shared key-based authentication is a good suggestion. There are various implementations of ABFS file system clients and applications accessing Azure storage accounts. However, use of shared key-based auth and Entra ID-based auth are usually exclusive and explicit. Such simplicity makes the behaviour of blob client authentication more predictable. |
Description
CredentialProviderAzure
, introduced by #20384, implicitly tries to retrieve and use a shared storage account key and then falls back to the Entra ID-based authentication usingDefaultAzureCredential
.However, I have strong concerns about this behaviour as it is not compliant with Microsoft's security recommendation and has other drawbacks related to security, performance and the practices followed by other libraries.
Microsoft recommends Entra ID-based authentication over shared key-based auth
See also https://learn.microsoft.com/en-us/azure/storage/blobs/security-recommendations#identity-and-access-management
However, the current behaviour of
CredentialProviderAzure
is the opposite (shared key first, then the Entra ID-based auth).Accessing Azure Blob with a shared key bypasses all fine-grained data access control
A shared key for an Azure storage account is a master key granting permission to read/write all files. Accessing with a shared key does not respect data access control configured by using Azure RBAC, ABAC and ACL.
Implicit use of shared keys may lead to unexpected exposure or modification of data.
Calling Azure CLI to retrieve shared keys imposes unnecessary latency
The current
CredentialProviderAzure
always executes Azure CLI to retrieve shared keys regardless of whether shared keys are available or not. However, the execution of Azure CLI is slow (particularly in Windows environments) compared to fetching access tokens from Entra ID. The current behaviour imposes unnecessary latency, particularly when we use an authentication flow that directly fetches access tokens from Entra ID (client credential flow, etc.).None of the other existing libraries implements such behaviour
The other existing libraries accessing Azure Blob, including
hadoop-azure
for Hadoop HDFS, Apache Arrow, Python fsspec, DuckDB, etc., do not implicitly retrieve and use shared keys. When using the shared key-based authentication with those libraries, we must explicitly specify a shared key.The current behaviour, inconsistent with most existing libraries, potentially leads to confusion and unexpected incidents.
Proposal
Drop the implicit use of shared storage account keys from
CredentialProviderAzure
. Instead, shared keys should be explicitly specified instorage_options
.polars/py-polars/polars/io/cloud/credential_provider.py
Lines 207 to 227 in 3f74df4
self._get_azure_storage_account_key_az_cli()
as (1) the Entra ID-based authentication does not require account names (2) a shared key should be specified in conjunction with an account name, like most other libraries.self._get_azure_storage_account_key_az_cli()
,self._azcli_version()
,self._azcli()
etc. as those util functions for Azure CLI are used only to retrieve shared keys.The text was updated successfully, but these errors were encountered: