The component's logic should be implemented in the fullrun.sql
and dryrun.sql
files.
Workflows needs to perform a dry-run query before the actual execution of the workflow, in order to determine the resulting schema of each node.
For this, we need to create a dryrun.sql
file that generates an empty table with the same schema as the actual component's result (defined in fullrun.sql
) and returning 0 rows.
💡 Tip
The dry run code doesn't really need to be exactly the same as the full run. Functions that take longer to run can be avoided as long as the resulting schema is the same. For example, using
"uuid_string" AS uuid"
generates the same schema as"GENERATE_UUID() AS uuid"
would generate.
Below you can see an example of a stored procedure built following the approach defined above. This procedure takes a table and generates a new one that includes and additional column with a unique identifier.
EXECUTE IMMEDIATE '''
CREATE TABLE IF NOT EXISTS ''' || output || '''
OPTIONS (expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 30 DAY))
AS SELECT *, GENERATE_UUID() AS uuid
FROM ''' || input || ';';
EXECUTE IMMEDIATE '''
CREATE TABLE IF NOT EXISTS ''' || output || '''
OPTIONS (
expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 30 DAY))
AS SELECT *, "uuid_string" AS uuid
FROM ''' || input || '''
WHERE 1 = 0;
''';
For implementing the logic of the component, you will have available a set of variable matching the names of the parameters declared in the metadata.json
file, with both input and output parameters. These variables will contain the values selected by the user when configuring the component as part of a workflow.
If you have declared environment variables in your component metadata (see here), they will also be created and they will be available to be used from your code.
The input
and output
variables will contain the names of the input table that was configured for the component, and the name of the output one generated by the component. Parameters with those same names should have been declared in the corresponding metadata.json
file.
Do not generate tables with names others than the ones provided in the variables corresponding to output parameters. Otherwise, those tables will not be used when the component output is connected to another component in the workflow.
When a workflow is run from the Workflows UI, table names of the tables created by its components are fully qualified. That means that, if your custom component is connected to an upstream components and uses a table from it, the name that it will received in the corresponding parameter will be a FQN (that is, in the form project.dataset.table
). Output names received in the output parameters will also be FQNs.
However, when the workflow is run as a stored procedure (when exported or when executed via API), all tables created in components are session tables that are single names (that is, something like tablename
instead of project.dataset.table
). That means that inputs that come from other components, and also output table names, will be single-name tables.
You should prepare your component to deal with this situation. Check the input/output table names to see whether they are fully-qualified or not, and implement the corresponding logic to run in each case.