Skip to content

Commit

Permalink
feat(sdk): structured properties - add support for listing (datahub-p…
Browse files Browse the repository at this point in the history
  • Loading branch information
shirshanka authored Jan 9, 2025
1 parent 42b2cd3 commit 45450f1
Show file tree
Hide file tree
Showing 6 changed files with 663 additions and 34 deletions.
329 changes: 296 additions & 33 deletions docs/api/tutorials/structured-properties.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import TabItem from '@theme/TabItem';
## Why Would You Use Structured Properties?

Structured properties are a structured, named set of properties that can be attached to logical entities like Datasets, DataJobs, etc.
Structured properties have values that are types. Conceptually, they are like “field definitions”.
Structured properties have values that are typed and support constraints.

Learn more about structured properties in the [Structured Properties Feature Guide](../../../docs/features/feature-guides/properties/overview.md).

Expand All @@ -15,6 +15,7 @@ Learn more about structured properties in the [Structured Properties Feature Gui

This guide will show you how to execute the following actions with structured properties.
- Create structured properties
- List structured properties
- Read structured properties
- Delete structured properties
- Add structured properties to a dataset
Expand All @@ -32,7 +33,8 @@ Additionally, you need to have the following tools installed according to the me
<Tabs>
<TabItem value="CLI" label="CLI" default>

Install the relevant CLI version. Forms are available as of CLI version `0.13.1`. The corresponding DataHub Cloud release version is `v0.2.16.5`
Install the relevant CLI version.
Structured Properties were introduced in version `0.13.1`, but we continuously improve and add new functionality, so you should always [upgrade](https://datahubproject.io/docs/cli/#installation) to the latest cli for best results.
Connect to your instance via [init](https://datahubproject.io/docs/cli/#init):

- Run `datahub init` to update the instance you want to load into.
Expand All @@ -56,33 +58,8 @@ Requirements for OpenAPI are:
The following code will create a structured property `io.acryl.privacy.retentionTime`.

<Tabs>
<TabItem value="graphql" label="graphQL" default>

```graphql
mutation createStructuredProperty {
createStructuredProperty(
input: {
id: "retentionTime",
qualifiedName:"retentionTime",
displayName: "Retention Time",
description: "Retention Time is used to figure out how long to retain records in a dataset",
valueType: "urn:li:dataType:datahub.number",
allowedValues: [
{numberValue: 30, description: "30 days, usually reserved for datasets that are ephemeral and contain pii"},
{numberValue: 90, description:"description: Use this for datasets that drive monthly reporting but contain pii"},
{numberValue: 365, description:"Use this for non-sensitive data that can be retained for longer"}
],
cardinality: SINGLE,
entityTypes: ["urn:li:entityType:datahub.dataset", "urn:li:entityType:datahub.dataFlow"],
}
) {
urn
}
}
```

</TabItem>
<TabItem value="CLI" label="CLI">
<TabItem value="CLI" label="CLI" default>

Create a yaml file representing the properties you’d like to load.
For example, below file represents a property `io.acryl.privacy.retentionTime`. You can see the full example [here](https://github.com/datahub-project/datahub/blob/example-yaml-sp/metadata-ingestion/examples/structured_properties/struct_props.yaml).
Expand All @@ -108,13 +85,41 @@ For example, below file represents a property `io.acryl.privacy.retentionTime`.
```
Use the CLI to create your properties:
```commandline
```shell
datahub properties upsert -f {properties_yaml}
```

If successful, you should see `Created structured property urn:li:structuredProperty:...`

</TabItem>

<TabItem value="Graphql" label="GraphQL" default>

```graphql
mutation createStructuredProperty {
createStructuredProperty(
input: {
id: "retentionTime",
qualifiedName:"retentionTime",
displayName: "Retention Time",
description: "Retention Time is used to figure out how long to retain records in a dataset",
valueType: "urn:li:dataType:datahub.number",
allowedValues: [
{numberValue: 30, description: "30 days, usually reserved for datasets that are ephemeral and contain pii"},
{numberValue: 90, description:"description: Use this for datasets that drive monthly reporting but contain pii"},
{numberValue: 365, description:"Use this for non-sensitive data that can be retained for longer"}
],
cardinality: SINGLE,
entityTypes: ["urn:li:entityType:datahub.dataset", "urn:li:entityType:datahub.dataFlow"],
}
) {
urn
}
}
```

</TabItem>

<TabItem value="OpenAPI v2" label="OpenAPI v2">

```shell
Expand Down Expand Up @@ -236,9 +241,182 @@ Example Response:
</TabItem>
</Tabs>

## Read Structured Properties
## List Structured Properties

You can list all structured properties in your DataHub instance using the following methods:

<Tabs>
<TabItem value="CLI" label="CLI" default>

```shell
datahub properties list
```

This will show all properties with their full details.

Example Response:
```json
{
"urn": "urn:li:structuredProperty:clusterName",
"qualified_name": "clusterName",
"type": "urn:li:dataType:datahub.string",
"description": "Test Cluster Name Property",
"display_name": "Cluster's name",
"entity_types": [
"urn:li:entityType:datahub.dataset"
],
"cardinality": "SINGLE"
}
{
"urn": "urn:li:structuredProperty:projectNames",
"qualified_name": "projectNames",
"type": "urn:li:dataType:datahub.string",
"description": "Test property for project name",
"display_name": "Project Name",
"entity_types": [
"urn:li:entityType:datahub.dataset",
"urn:li:entityType:datahub.dataFlow"
],
"cardinality": "MULTIPLE",
"allowed_values": [
{
"value": "Tracking",
"description": "test value 1 for project"
},
{
"value": "DataHub",
"description": "test value 2 for project"
}
]
}
```


If you only want to see the URNs, you can use:

```shell
datahub properties list --no-details
```

Example Response:
```
[2025-01-08 22:23:00,625] INFO {datahub.cli.specific.structuredproperties_cli:134} - Listing structured property urns only, use --details for more information
urn:li:structuredProperty:clusterName
urn:li:structuredProperty:clusterType
urn:li:structuredProperty:io.acryl.dataManagement.deprecationDate
urn:li:structuredProperty:projectNames
```

To download all the structured property definitions into a single file that you can use with the `upsert` command as described in the [create section](#create-structured-properties), you can run the list command with the `--to-file` option.

```shell
datahub properties list --to-file structured_properties.yaml
```

Example Response:
```yaml
- urn: urn:li:structuredProperty:clusterName
qualified_name: clusterName
type: urn:li:dataType:datahub.string
description: Test Cluster Name Property
display_name: Cluster's name
entity_types:
- urn:li:entityType:datahub.dataset
cardinality: SINGLE
- urn: urn:li:structuredProperty:clusterType
qualified_name: clusterType
type: urn:li:dataType:datahub.string
description: Test Cluster Type Property
display_name: Cluster's type
entity_types:
- urn:li:entityType:datahub.dataset
cardinality: SINGLE
- urn: urn:li:structuredProperty:io.acryl.dataManagement.deprecationDate
qualified_name: io.acryl.dataManagement.deprecationDate
type: urn:li:dataType:datahub.date
display_name: Deprecation Date
entity_types:
- urn:li:entityType:datahub.dataset
- urn:li:entityType:datahub.dataFlow
- urn:li:entityType:datahub.dataJob
- urn:li:entityType:datahub.schemaField
cardinality: SINGLE
- urn: urn:li:structuredProperty:io.acryl.privacy.enumProperty5712
qualified_name: io.acryl.privacy.enumProperty5712
type: urn:li:dataType:datahub.string
description: The retention policy for the dataset
entity_types:
- urn:li:entityType:datahub.dataset
cardinality: MULTIPLE
allowed_values:
- value: foo
- value: bar
... etc.
```

</TabItem>

<TabItem value="OpenAPI v3" label="OpenAPI v3">

Example Request:
```bash
curl -X 'GET' \
'http://localhost:9002/openapi/v3/entity/structuredproperty?systemMetadata=false&includeSoftDelete=false&skipCache=false&aspects=structuredPropertySettings&aspects=propertyDefinition&aspects=institutionalMemory&aspects=structuredPropertyKey&aspects=status&count=10&sortCriteria=urn&sortOrder=ASCENDING&query=*' \
-H 'accept: application/json'
```

Example Response:
```json
{
"scrollId": "...",
"entities": [
{
"urn": "urn:li:structuredProperty:clusterName",
"propertyDefinition": {
"value": {
"immutable": false,
"qualifiedName": "clusterName",
"displayName": "Cluster's name",
"valueType": "urn:li:dataType:datahub.string",
"description": "Test Cluster Name Property",
"entityTypes": [
"urn:li:entityType:datahub.dataset"
],
"cardinality": "SINGLE"
}
},
"structuredPropertyKey": {
"value": {
"id": "clusterName"
}
}
}
]
}
```

Key Query Parameters:
- `count`: Number of results to return per page (default: 10)
- `sortCriteria`: Field to sort by (default: urn)
- `sortOrder`: Sort order (ASCENDING or DESCENDING)
- `query`: Search query to filter properties (* for all)

</TabItem>
</Tabs>

The list endpoint returns all structured properties in your DataHub instance. Each property includes:
- URN: Unique identifier for the property
- Qualified Name: The property's qualified name
- Type: The data type of the property (string, number, date, etc.)
- Description: A description of the property's purpose
- Display Name: Human-readable name for the property
- Entity Types: The types of entities this property can be applied to
- Cardinality: Whether the property accepts single (SINGLE) or multiple (MULTIPLE) values
- Allowed Values: If specified, the list of allowed values for this property

You can see the properties you created by running the following command:
## Read a single Structured Property

You can read an individual property you created by running the following command:

<Tabs>
<TabItem value="CLI" label="CLI" default>
Expand Down Expand Up @@ -279,6 +457,91 @@ If successful, you should see metadata about your properties returned.
}
```

</TabItem>
<TabItem value="GraphQL" label="GraphQL">

Example Request:
```graphql
query {
structuredProperty(urn: "urn:li:structuredProperty:projectNames") {
urn
type
definition {
qualifiedName
displayName
description
cardinality
allowedValues {
value {
... on StringValue {
stringValue
}
... on NumberValue {
numberValue
}
}
description
}
entityTypes {
urn
info {
type
qualifiedName
}
}
}
}
}
```

Example Response:
```json
{
"data": {
"structuredProperty": {
"urn": "urn:li:structuredProperty:projectNames",
"type": "STRUCTURED_PROPERTY",
"definition": {
"qualifiedName": "projectNames",
"displayName": "Project Name",
"description": "Test property for project name",
"cardinality": "MULTIPLE",
"allowedValues": [
{
"value": {
"stringValue": "Tracking"
},
"description": "test value 1 for project"
},
{
"value": {
"stringValue": "DataHub"
},
"description": "test value 2 for project"
}
],
"entityTypes": [
{
"urn": "urn:li:entityType:datahub.dataset",
"info": {
"type": "DATASET",
"qualifiedName": "datahub.dataset"
}
},
{
"urn": "urn:li:entityType:datahub.dataFlow",
"info": {
"type": "DATA_FLOW",
"qualifiedName": "datahub.dataFlow"
}
}
]
}
}
},
"extensions": {}
}
```
</TabItem>

<TabItem value="OpenAPI v2" label="OpenAPI v2">
Expand Down Expand Up @@ -389,7 +652,7 @@ Example Response:
This action will set/replace all structured properties on the entity. See PATCH operations to add/remove a single property.

<Tabs>
<TabItem value="graphQL" label="GraphQL" default>
<TabItem value="GraphQL" label="GraphQL" default>

```graphql
mutation upsertStructuredProperties {
Expand Down Expand Up @@ -537,7 +800,7 @@ datahub dataset get --urn {urn}
For reading all structured properties from a dataset:

<Tabs>
<TabItem value="graphql" label="GraphQL" default>
<TabItem value="Graphql" label="GraphQL" default>

```graphql
query getDataset {
Expand Down
Loading

0 comments on commit 45450f1

Please sign in to comment.