Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ranger permission check #5285

Merged
merged 5 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions docs/en/deployment/hadoop_java_sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -741,6 +741,55 @@ JuiceFS can use local disk as a cache to accelerate data access, the following d

![parquet](../images/spark_sql_parquet.png)

## Permission control by Apache Ranger

JuiceFS currently supports path permission control by integrating with Apache Ranger's HDFS module.

### 1. Configurations

| Configuration | Default Value | Description |
|-----------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `juicefs.ranger-rest-url` | | `ranger`'s HTTP link url. Not configured means not using this feature. |
| `juicefs.ranger-service-name` | | `ranger`'s `service name` in `HDFS` module, required |
| `juicefs.ranger-cache-dir` | | `ranger`'s policies cache path. By default, a `UUID` path hierarchy is added under the environment variable `java.io.tmpdir` to prevent multitasking from interfering with each other. After configuring a fixed directory, multiple tasks will share the cache, and only one JuiceFS is responsible for cache refreshing, to reduce the pressure on connecting to `Ranger Admin`. |
| `juicefs.ranger-poll-interval-ms` | `30000` | `ranger`'s interval to refresh cache, default is 30s |

### 2. Dependencies

Considering the complexity of the authentication environment and the possibility of dependency conflicts, the JAR packages related to Ranger authentication (such as `ranger-plugins-common-2.3.0.jar`, `ranger-plugins-audit-2.3.0.jar`, etc.) and their dependencies have not been included in the JuiceFS SDK.

If occurred the `ClassNotFound` error when use, it is recommended to import it into the relevant directory (such as `$SPARK-HOME/jars`)

Some dependencies may need:

```shell
ranger-plugins-common-2.3.0.jar
ranger-plugins-audit-2.3.0.jar
gethostname4j-1.0.0.jar
jackson-jaxrs-1.9.13.jar
jersey-client-1.19.jar
jersey-core-1.19.jar
jna-5.7.0.jar
```

### 3. Tips

#### 3.1 Ranger version

The code is tested on `Ranger2.3` and `Ranger2.4`. As no other features are used except for `HDFS` module authentication, theoretically all other versions are applicable.

#### 3.2 Ranger Audit

Currently, only support authentication function, and the `Ranger Audit` is disabled.

#### 3.3 Ranger's other parameters

To improve usage efficiency, currently only support some **CORE** parameters of Ranger.

#### 3.4 Security tips

Due to the complete open source of the project, it is unavoidable for users to disrupt permission control by replacing parameters such as `juicefs.ranger.rest-url`. If stricter control is required, it is recommended to compile the code independently and solve the problem by encrypting relevant security parameters.

## FAQ

### 1. `Class io.juicefs.JuiceFileSystem not found` exception
Expand Down
49 changes: 49 additions & 0 deletions docs/zh_cn/deployment/hadoop_java_sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -866,6 +866,55 @@ JuiceFS 可以使用本地磁盘作为缓存加速数据访问,以下数据是

![parquet](../images/spark_sql_parquet.png)

## 使用 Apache Ranger 进行权限管控

JuiceFS 当前支持对接 Apache Ranger 的 `HDFS` 模块进行路径的权限管控。

### 1. 相关配置

| 配置项 | 默认值 | 描述 |
|-----------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------|
| `juicefs.ranger-rest-url` | | `ranger`连接地址。不配置该参数即不使用该功能。 |
| `juicefs.ranger-service-name` | | `ranger`中配置的`service name`,必填 |
| `juicefs.ranger-cache-dir` | | `ranger`策略的缓存路径。默认在环境变量`java.io.tmpdir`下,添加`UUID`路径层级防止多任务相互影响。当配置固定目录后,多个任务会共享缓存,有且仅有一个JuiceFS对象负责缓存刷新,减少对连接`Ranger Admin`压力。 |
| `juicefs.ranger-poll-interval-ms` | `30000` | `ranger`缓存刷新周期,默认30s |

### 2. 环境及依赖

考虑到鉴权环境的复杂性,以及依赖冲突的可能性,Ranger 鉴权相关 JAR 包(例如`ranger-plugins-common-2.3.0.jar`,`ranger-plugins-audit-2.3.0.jar`等)及其依赖并未打进 JuiceFS 的 SDK 中。

使用中如果遇到`ClassNotFound`报错,建议单独引入相关目录中(例如`$SPARK_HOME/jars`)

可能需要单独添加的依赖:

```shell
ranger-plugins-common-2.3.0.jar
ranger-plugins-audit-2.3.0.jar
gethostname4j-1.0.0.jar
jackson-jaxrs-1.9.13.jar
jersey-client-1.19.jar
jersey-core-1.19.jar
jna-5.7.0.jar
```

### 3. 使用提示

#### 3.1 Ranger版本

当前代码测试基于`Ranger2.3`和`Ranger2.4`版本,因除`HDFS`模块鉴权外并未使用其他特性,理论上其他版本均适用。

#### 3.2 Ranger Audit

当前仅支持鉴权功能,`Ranger Audit`功能已关闭。

#### 3.3 Ranger其他参数

为提升使用效率,当前仅开放连接 Ranger 最核心的参数。

#### 3.4 安全性问题

因项目代码完全开源,无法避免用户通过替换`juicefs.ranger.rest-url`等参数的方式扰乱安全管控。如需更严格的管控,建议自主编译代码,通过将相关安全参数进行加密处理等方式解决。

## FAQ

### 1. 出现 `Class io.juicefs.JuiceFileSystem not found` 异常
Expand Down
27 changes: 27 additions & 0 deletions sdk/java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,33 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.ranger</groupId>
<artifactId>ranger-plugins-common</artifactId>
<version>2.3.0</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.ranger</groupId>
<artifactId>ranger-plugins-audit</artifactId>
<version>2.3.0</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
</dependencies>

<distributionManagement>
Expand Down
Loading