-
Notifications
You must be signed in to change notification settings - Fork 79
`datalab` to `google.datalab` Migration Guide
[Under construction]
As Datalab is moving out of BETA, the API surface has been reworked and polished into a new namespace package google.datalab
. This page documents the notable differences between this new namespace and the old datalab
, which is planned to be phased out.
It's worth mentioning that, while this plan is in motion, the old datalab
namespace is still included in the Datalab environment for backward compatibility, in order to allow existing notebooks to run without modification. However, migrating to the new namespace is highly encouraged in order to make use of ongoing support and new features.
Datalab is moving to the new BigQuery Standard SQL, which is compliant with the SQL 2011 standard. Legacy SQL BigQuery is no longer supported. Please refer to the BigQuery migration guide here to help you change your queries to the new standard.
-
The magics
%sql
and%bigquery
have been removed, their functionality merged into the new%bq
magic, which allows you to execute queries, as well as declare Query objects, UDFs, and External Data Sources. See%bq -h
for more details. -
The magic command structure has been changed for all magics to "%magic resources action". As an example, you can list tables by doing
%bq tables list
. -
Query.extract()
,Query.extract_async()
, andQuery.results()
are now all part ofQuery.execute()
. AQueryOutput
class has been added that can specify the type of output when executing a query. For example, to execute a query and extract the results into a file, you can do:
query.execute(QueryOutput.file())
-
Query.sample()
andsampling_query()
have been replaced by aSampling
class that can specify sampling method when executing a query. Here's an example:
# use random sampling to get a 2% sample of the query
query.execute(sampling=Sampling.random(percent=2))
-
Query.to_dataframe()
andQuery.to_file()
can both be done using theQueryOutput
object described above. They still also exist on theTable
object, which results from query execution. -
Schema.from_dataframe()
is now part ofSchema.from_data()
, which can recognize the type of the data passed in. -
Table.to_query()
andView.to_query()
have both been replaced with static methods on theQuery
class, particularlyQuery.from_table()
andQuery.from_view()
. -
View.execute()
,View.execute_async()
,View.results()
, andView.sample()
have been removed in favor of build aQuery
object out of theView
object using the constructor mentioned above, then calling these methods on theQuery
object. -
All SQL parsing has been removed from Datalab. Instead, a SQL query's dependencies (subqueries, UDFs..., etc) are concatenated and sent to the BigQuery service API. This means that arbitrary variable substitution is no longer supported, and the only parameterization functionality allowed is what is offered by BigQuery itself. You can read more on it here. Datalab offers an easy way to do query parameterization by adding a
query_params
dictionary parameter.
-
The magic
%storage
has been renamed%gcs
-
Item
is now calledObject
to properly reflect Google Cloud Storage naming. -
Object
now hasdownload()
,upload()
,read_stream()
, andwrite_stream()
functionalities.
-
Context
moved to top-levelgoogle.datalab
, and allproject_id
andcredentials
arguments now default to using the global settings under theContext.default()
variable, unless aContext
variable is passed in. For example, to set the bigquery billing tier config globally, you can do:Context.default().set_config({'bigquery_billing_tier': 2})
. -
Project
andProjects
modules have been removed, their functionality now being part ofutils
.