- MSSQL and Azure MSQQL connectors now use the ODBC 18 driver
- The Oracle connector install script is now compatible with Ubuntu 24.04
- Prevent injection in Jinja templates
- The Azure MSSQL connector now uses
sqlalchemy
to connect to MSSQL.
- The lib is now compatible with pydantic 2.10
- Mongo: when reading data by chunk, ignore each individual chunk index when concatenating them.
- Mongo: correctly type the aggregation pipeline, expected when
query
is alist
- Mongo: Added an optional
chunk_size
param to get_df, to create the dataframe chunk by chunk (saves memory)
- HTTP API: API results are now correctly merged even if they need to be filtered or flattened.
- HTTP API: Add
data_filter
offset pagination config field to determine which part of data must be used to compute the data length.
- HTTP API: Missing dependencies for the HTTP API connector do not prevent the import of the lib anymore
- HTTP API: Add a
PaginationConfig
toHttpAPIDataSource
in order to handle API pagination and fetch all data. It supports the following kinds of pagination: page-based, cursor-based, offset-limit and hypermedia.
- Google BigQuery: If the dtype of a column in the
DataFrame
returned by_retrive_data
isobject
, it gets converted toInt64
orfloat64
when it is defined as a numeric dtype by Big Query. - When testing connection, set timeout to 10s when checking if port is opened.
- jinja templates :: expressions containing parentheses or curly braces are not limited to output strings anymore.
- Google BigQuery :: increase limits when fetching db tree structure
- MySQL / Redshift / Snowflake oAuth2 ::
get_model
now supports extra kwargs so it doesn't crash
- It is now possible to import all connectors models without having any extra installed
- The deprecated
GoogleSpreadsheet
connector has been removed. - The deprecated
GoogleSheets2
connector has been removed.
- Postgres: Rather than being silently caught, exceptions happenning in
get_form
andget_model
are now logged - HTTP: The
custom_token_server
authentication type now accepts atoken_header_name
kwarg. It allows to override the name of the authorization header, which defaults toAuthorization
.
get_model
now alphabetically sorts the columns before returning them, in order to ensure result consistency.get_model
now supports anexclude_columns
argument defaulting toFalse
. It allows to not retrieve columns in the model. This is only implemented in the Postgres connector for now.get_model
now supportsschema_name
andtable_name
arguments, allowing to filter on a specific table and/or schema. This is only implemented in the Postgres connector for now.- DiscoverableConnector:
format_db_model
is now roughly 3x faster, resulting in performance gains inget_model
.
- Hubspot: added support for listing and selection of custom objects
- Hubspot: added support for custom attributes
- Athena: params are now correctly interpolated
- Dependencies: removed the upper bound on peakina
- BigQuery: the JWT token auth method is now supported in the status check.
- HTTP: allow connector to be instantiated without passing positional arguments to auth.
- OracleSQL: Fix jinja templates and test string fixtures
- Datetime series returned by our connectors don't have timezones anymore
- OracleSQL: Add variables templating support
- MySQL: An unknown exception during the status check now makes the check fail
- MySQL: Add an optional
charset_collation
to the connector, as PyMySQL >=1.1.0 always runs aSET NAMES
on connection, which breaks on servers using a non-default collation
- MySQL: Allow dict parameters to be used with PyMySQL 1.1.1
- MySQL: Use a regular PyMySQL Cursor rather than a DictCursor when pandas 2.x is used
- Google Big Query: the query generated by
get_model
now correctly quotes the dataset name, which allows to build a DB models for datasets starting with a number
- Elasticsearch: force
widget="json"
onbody
so the form is properly filled when updating a data source
- Added support for Python 3.12
- Restored the
HubspotPrivateApp
connector, which was deleted by error in v6.0.0
- Breaking: Support for Python 3.10 has been dropped.
- Breaking: The following connectors have been removed:
- Wootric
- Trello
- Toucan Toco
- Net Explorer
- Linkedin Ads
- Microstrategy
- Hubspot
- Google My Business
- Google Adwords
- Facebook Insights
- Facebook Ads
- Anaplan
- Adobe Analytics
- Google Big Query: do not exclude partitioning columns when listing table structure
- Mongo: maximal connection pool size is now configurable via the
max_pool_size
parameter. It defaults to 1
- Google Big Query: an actual connection check is now done in
get_status
, rather than just a private key validation. - SQL connectors: duplicate columns are now renamed with a suffix indicating their position. A duplicate
my_column
column now becomesmy_column_0
,my_column_1
...
- Google Big Query: A simple status check that validates the private key's format has been implemented
- Elasticsearch: Host verification has been disabled to tolerate strict network configurations
- Install scripts: fix oracle install script by replacing gdown.pl with wget
-
Postgres: Materialized views are now returned as well via
get_model
. Their type is'view'
. -
Breaking: The version requirement for pydantic has been increased to
>=2.4.2,<3
- Removed the upper constraint on
pyarrow<14
- Revert a change (from 4.9.3) that prevented the publication of the package on pypi
- Update DataBricks connector
- Google Big Query: get project_id from connector config whatever auth mode (JWT/GoogleCreds).
- Goole Big Query:
- Better UX (Switch between GoogleCreds auth or GoogleJWT auth).
- Explicit errors information when no data is returned.
- Fallback on GoogleCredentials auth when JWTCredentials fails (or when jwt-token is not valid aymore).
- Goole Big Query: Now support signed JWT connection on the GBQ connector.
- Postgres: In case two tables in different schemas have the same name,
get_model
andget_model_with_info
now return the correct information.
- S3: Add a new AWS S3 connector using the Security Token Service (STS) API Assume Role.
- Install scripts: fix mssql install scripts by forcing debian/11 deb repo
- GoogleSheets: Replace empty values by numpy
NaN
.
- Redshift: Ignore Programming Error when table_infos is empty for a database.
- PyYaml: Fix broken dependency and bump it from 5.4.1 to >=6,<7
- Feat[Goole Big Query] : We can now get the database model(list of tables) based on a given schema name to speed up the project tree structure.
- Fix: on mysql, avoid duplicated columns when retrieving table informations
- The exception raised by
nosql_apply_parameters_to_query
whenhandle_errors
is true and an undefined variable is encountered has changed fromNonValidVariable
toUndefinedVariableError
. __VOID__
values are no longer removed from queries.
- Added a missing dependency on
aiohttp
- This release officially adds support for Python 3.11
- The
awswrangler
dependency has been bumped to^3.0.0
- For SQL connectors,
get_model()
's output is now filtered on the passed db name, if it is specified
- The
Hive
connector has been deleted - The
Indexima
connector has been deleted - The
Rok
connector has been deleted - The
Lightspeed
connector has been deleted - The
Revinate
connector has been deleted
- Bump Peakina from 0.9.x to 0.10.x
- The upper constraint on python < 3.11 has been lifted. This does not mean that Python 3.11 is officially supported yet.
- MySQL: It is now possible to use the MySQL connector with a CA bundle in VERIFY_IDENTITY mode
- HubSpot: root-level properties are now also returned along with proeprties in the "properties" object
- HubSpot: it is now possible to retrieve a data slice for owners
- HubSpot: Added a new connector based on HubSpot private apps
- MySQL: Allow Optional parameters on ssl_mode
- MongoConnector: Now handle "VOID" in $and match conditions.
- Export of the peakina Connector through
CONNECTOR_REGISTRY
.
- Added a new Connector: Peakina for files.
- Google Big Query no longer crashes when trying to retrieve the table list for datasets in different locations.
Dates as float
is now selected by default in Google Sheets data sources.
- Feat: The connector
GoogleSheets
datasource now has an option calledDates as Floats
, to see date time columns as strings or float when reading the sheet.
Pagination information has been refactored. The DataSlice
and DataStats
interfaces have been changed:
DataStats
no longer hastotal_rows
andtotal_returned_rows
fields.DataSlice
now has apagination_info
field in its root. This field is required and contains aPaginationInfo
model.
For information about the PaginationInfo
model and how to interpret its contents, see the documentation.
- Deps: Upper constraint on cryptography has been loosened from <37 -> <39
- Snowflake: The snowflake connector has been refactored in order to prevent spawning threads and connection pooling.
- Fix: drop
date_as_object
argument since we moved on to for google bigquery 3.
- Fix: Ensure Postgres always uses the default database for connection, rather than 'postgres'.
- Fix regression introduced in the mongo connector in 3.23.2 where
$match
statements containing only matches on nulls were considered empty.
- Fix: Add support for
__VOID__
syntax tonosql_apply_parameters_to_query
- Fix: Fixed the % character replacement on edges cases for
pandas_read_sql
.
- MySQL: Added support for REQUIRED ssl_mode
- Fix: Replace % character by %% in
pandas_read_sql
to prevent pandas from interpreting%
as the interpolation of an SQL parameter
- Fix: Ensure timezone-aware timestamp column are converted to UTC
- The contraint of the
lxml
dependency has been loosened from4.9.1
to^4.6.5
.
- The package now exposes a
__version__
attribute. - The contraint of the
pyarrow
dependency has been loosened from<7
to<9
.
- Automate PyPI artifact publication
- MySQL: Add support for SSL-based authentication
- Google Big Query: fix variables interpolation.
- Athena: fix order of OFFSET and LIMIT query parameters
- Athena: fix the parameter injection
- Base connector: Fixed pagination values (
total_rows
andtotal_returned_rows
) - Athena: Hacked pagination values in case not all results were fetched
- Mongo: removed
_id
column in response DataFrame.
- All connectors: removed werkzeug dependency.
- All connectors: Add support for an optional
db_name
parameter in theget_model
method. - MySQL: Use the provided
db_name
for discoverability when possible inget_model
. - MySQL: Simplify query for schema construction in order to be compatible with older versions
- Redshift: Add an option to disable TCP keep-alive (enabled by default).
- MySQL: Do not specify a database on discoverability-related functions (listing databases and describing table schemas).
- Conditions: The unquoting logic is now only applied when the passed parameter is a string
- Athena: Parameters are now passed as SQL parameters rather than interpolated by us in order to prevent SQL injection.
- Conditions: Strings are now unquoted for conditions applying only to numbers (
lt
,lte
,gt
,gte
).
- MySQL: Return a more explicit error message in case no query is specified
- Mysql: Revert the
following_relations
attribute as deprecated - Athena: Add an option allowing to toggle CTAS (disabled by default)
- Fix: Mysql, Athena add hidden table attribute to avoid old datasources configs to break
- Fix: Mysql replace quoting character
- Feat: Mysql & Athena graphical selection interface
- Feat: Mongo connector's
get_slice_with_regex
method now supports a dict of lists of regex patterns to match for in the different columns
- Fix redshift connector: Removing pooling due to table locks
- Feature nosql_apply_parameters_to_query: add tuple render capabilities
- Ignore extra attributes in BigQueryDataSource for graphical selection
- Add attributes & methods to big query connector for graphical selection
- Implement exploration in google big query connector
3.14.1 2022-06-28
- Make exploration faster and add form for redshift connector
3.14.0 2022-06-25
- Improve order and default values of fields of the redshift connector
- Get table information from redshift connector
3.13.0 2022-06-24
- Added default database field for redshift and postgres connectors
- Added a new status check for request on default databases
3.12.0 2022-06-23
Remove the table attribute from RedshiftDataBaseConnector
3.11.0 2022-06-17
Add support for elasticsearch >= 8 on the ElasticsearchConnector.
3.0.0 2022-02-03
The connector GoogleSheets
based on bearer.sh (discontinued service) has been replaced by a new one, agnostic of the
OAuth manager used. This new connector needs a retrieve_token
function to get valid authentication tokens.
It also features automatic dates parsing and uses the official google API python client.
2.0.0 2022-01-19
Some DataStats properties changed in the naming and some of them was added, see HERE for more informations.
- Fixes on sql/snowflake (don't run count for DESCRIBE or SHOW queries + don't use -1 as default rows count)
- Fixes on sharepoint and onedrive connectors.
1.3.43 2022-01-17
- Added filenames_to_match param to extract multiple files on connectors sharepoint and onedrive.
- Added a dev container for developping safely on connectors.