Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when attempting to create partitions on large Teradata tables (2B+ rows) #1280

Open
karcot1 opened this issue Sep 27, 2024 · 4 comments · Fixed by #1281
Open

Error when attempting to create partitions on large Teradata tables (2B+ rows) #1280

karcot1 opened this issue Sep 27, 2024 · 4 comments · Fixed by #1281
Assignees
Labels
priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@karcot1
Copy link

karcot1 commented Sep 27, 2024

When creating partitions for a large table in a Teradata to BigQuery migration, the following error was encountered:

teradatasql.OperationalError: [Version 20.0.0.15] [Session 64925450] [Teradata Database] [Error 2616] Numeric overflow occurred during computation

image

DVT is failing when generating the partitions as it is trying to do count(*) before partition creation.

According to the documentation, a TD integer can only be between -2,147,483,648 and 2,147,483,647. The table in question has 2,234,011,695 rows, which is outside the Integer range(https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Data-Types-and-Literals/Numeric-Data-Types/INTEGER-Data-Type).

Suggestion is to change the count check to cast(count(1) as BIGINT) which should resolve the error.

@helensilva14 helensilva14 added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. labels Sep 27, 2024
@helensilva14 helensilva14 changed the title DVT Errors out when attempting to create partitions on large tables (2B+ rows) Error when attempting to create partitions on large Teradata tables (2B+ rows) Sep 27, 2024
@github-project-automation github-project-automation bot moved this from Pending ⏰ to Done 🚀 in Data Validation Tool Release Tracker Sep 27, 2024
@helensilva14 helensilva14 reopened this Sep 30, 2024
@helensilva14 helensilva14 moved this from Done 🚀 to In Progress 🚧 in Data Validation Tool Release Tracker Sep 30, 2024
@helensilva14
Copy link
Member

helensilva14 commented Sep 30, 2024

Looks like the numeric overflow is occurring when DVT attempts to execute this query in TD, so it's something realted to the ROW_NUMBER calculation

WITH t0 AS (
  SELECT 
    t2.EDW_ACCT_NB,
    t2.EFF_DT,
    (row_number() OVER (ORDER BY t2.EDW_ACCT_NB ASC, t2.EFF_DT ASC) - 1) + 1 AS dvt_pos_num
  FROM `<DATASET_NAME>.<TABLE_NAME>` t2
)

@karcot1
Copy link
Author

karcot1 commented Oct 1, 2024

Adding to this - based on the TD docs, row_number() is similar to count(*) and returns an INT value. In a table with over 2 billion rows, once you get to row number 2,147,483,648 you have exceeded the upper limit of the INT data type and will see the numeric overflow error.

Would be good to review if there are any other instances where DVT outputs an INT that is dependent on the size of the table. Any other instances of count(*) or similar functions will present with the same error.

https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Functions-Expressions-and-Predicates/Ordered-Analytical/Window-Aggregate-Functions/ROW_NUMBER/ROW_NUMBER-Usage-Notes/Window-Aggregate-Equivalent

@nehanene15
Copy link
Collaborator

I wonder if doing something similar to the following would work:

WITH t0 AS (
  SELECT 
    t2.EDW_ACCT_NB,
    t2.EFF_DT,
    ( (CAST COUNT(*) AS BIGINT) OVER (ORDER BY t2.EDW_ACCT_NB ASC, t2.EFF_DT ASC) - 1) + 1 AS dvt_pos_num
  FROM `<DATASET_NAME>.<TABLE_NAME>` t2
)

We would have to see if count(*) can replace row_number() in all databases.

@sundar-mudupalli-work
Copy link
Collaborator

Hi,
Looking at StackOverflow on this topic and specifically the third comment - there might be a way to get totals when the number of rows is over 2 billion records. The challenge is for DVT to be able to create such a table and test it, rather than making a suggestion over the wall. We have a sandbox teradata implementation in a container. I have not been able to test if that implementation can hold a table over 2 billion records.

Thanks.

Sundar Mudupalli

@helensilva14 helensilva14 moved this from In Progress 🚧 to Pending ⏰ in Data Validation Tool Release Tracker Oct 11, 2024
@helensilva14 helensilva14 added priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) and removed priority: p0 Highest priority. Critical issue. Will be fixed prior to next release. labels Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p2 Medium priority. Fix may not be included in next release (e.g. minor documentation, cleanup) type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
4 participants