Skip to content

Commit

Permalink
[#2695] feat(doc): Add docs for fileset catalog (#2781)
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

This PR proposes to add docs for fileset catalog.

### Why are the changes needed?

Fix: #2695 

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

No.
  • Loading branch information
jerryshao authored Apr 3, 2024
1 parent 79a6311 commit f119d90
Show file tree
Hide file tree
Showing 3 changed files with 484 additions and 0 deletions.
63 changes: 63 additions & 0 deletions docs/hadoop-catalog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
title: "Hadoop catalog"
slug: /hadoop-catalog
date: 2024-4-2
keyword: hadoop catalog
license: "Copyright 2024 Datastrato Pvt Ltd.
This software is licensed under the Apache License version 2."
---

## Introduction

Hadoop catalog is a fileset catalog that using Hadoop Compatible File System (HCFS) to manage
the storage location of the fileset. Currently, it supports local filesystem and HDFS. For
object storage like S3, GCS, and Azure Blob Storage, you can put the hadoop object store jar like
hadoop-aws into the `$GRAVITINO_HOME/catalogs/hadoop/libs` directory to enable the support.
Gravitino itself hasn't yet tested the object storage support, so if you have any issue,
please create an [issue](https://github.com/datastrato/gravitino/issues).

Note that Gravitino uses Hadoop 3 dependencies to build Hadoop catalog. Theoretically, it should be
compatible with both Hadoop 2.x and 3.x, since Gravitino doesn't leverage any new features in
Hadoop 3. If there's any compatibility issue, please create an [issue](https://github.com/datastrato/gravitino/issues).

## Catalog

### Catalog properties

| Property Name | Description | Default Value | Required | Since Version |
|---------------|-------------------------------------------------|---------------|----------|---------------|
| `location` | The storage location managed by Hadoop catalog. | (none) | No | 0.5.0 |

### Catalog operations

Refer to [Catalog operations](./manage-fileset-metadata-using-gravitino.md#catalog-operations) for more details.

## Schema

### Schema capabilities

The Hadoop catalog supports creating, updating, deleting, and listing schema.

### Schema properties

| Property name | Description | Default value | Required | Since Version |
|---------------|------------------------------------------------|---------------|----------|---------------|
| `location` | The storage location managed by Hadoop schema. | (none) | No | 0.5.0 |

### Schema operations

Refer to [Schema operation](./manage-fileset-metadata-using-gravitino.md#schema-operations) for more details.

## Fileset

### Fileset capabilities

- The Hadoop catalog supports creating, updating, deleting, and listing filesets.

### Fileset properties

None.

### Fileset operations

Refer to [Fileset operations](./manage-fileset-metadata-using-gravitino.md#fileset-operations) for more details.
8 changes: 8 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ REST API and the Java SDK. You can use either to manage metadata. See
metalakes.
* [Manage relational metadata using Gravitino](./manage-relational-metadata-using-gravitino.md)
to learn how to manage relational metadata.
* [Manage fileset metadata using Gravitino](./manage-fileset-metadata-using-gravitino.md) to learn
how to manage fileset metadata.

Also, you can find the complete REST API definition in
[Gravitino Open API](./api/rest/gravitino-rest-api), and the
Expand All @@ -69,6 +71,10 @@ Gravitino currently supports the following catalogs:
* [**MySQL catalog**](./jdbc-mysql-catalog.md)
* [**PostgreSQL catalog**](./jdbc-postgresql-catalog.md)

**Fileset catalogs:**

* [**Hadoop catalog**](./hadoop-catalog.md)

Gravitino also provides an Iceberg REST catalog service for the Iceberg table format. See the
[Iceberg REST catalog service](./iceberg-rest-service.md) for details.

Expand Down Expand Up @@ -99,6 +105,8 @@ Gravitino supports different catalogs to manage the metadata in different source
* [Hive catalog](./apache-hive-catalog.md): a complete guide to using Gravitino to manage Apache Hive data.
* [MySQL catalog](./jdbc-mysql-catalog.md): a complete guide to using Gravitino to manage MySQL data.
* [PostgreSQL catalog](./jdbc-postgresql-catalog.md): a complete guide to using Gravitino to manage PostgreSQL data.
* [Hadoop catalog](./hadoop-catalog.md): a complete guide to using Gravitino to manage fileset
using Hadoop Compatible File System (HCFS).

### Trino connector

Expand Down
Loading

0 comments on commit f119d90

Please sign in to comment.