Cannot complete the test run #32

HarryLiUS · 2018-07-13T21:25:44Z

Hello,

I following the instruction to do a local test run. First 3 steps completed successfully. At step 4, the table creation completed in about 10+ minutes. It is longer than I expected, but it is completed. Here is the output:

==============================================
TPC-DS On Spark Menu

SETUP
(1) Create spark tables
RUN
(2) Run a subset of TPC-DS queries
(3) Run All (99) TPC-DS Queries
CLEANUP
(4) Cleanup
(Q) Quit

Please enter your choice followed by [ENTER]: 1

INFO: Creating tables. Will take a few minutes ...
INFO: Progress : [########################################] 100%
INFO: Spark tables created successfully..
Press any key to continue

After succeeded with table creation, I tried to run query 1 and here is what I got:

==============================================
TPC-DS On Spark Menu

SETUP
(1) Create spark tables
RUN
(2) Run a subset of TPC-DS queries
(3) Run All (99) TPC-DS Queries
CLEANUP
(4) Cleanup
(Q) Quit

Please enter your choice followed by [ENTER]: 2

Enter a comma separated list of queries to run (ex: 1, 2), followed by [ENTER]:
1
INFO: Checking pre-reqs for running TPC-DS queries. May take a few seconds..
ERROR: The rowcounts for TPC-DS tables are not correct. Please make sure option 1
is run before continuing with currently selected option
Press any key to continue

I repeated this again and no help.

Checking rowcounts.rrn, it is all 0.

And, here is the output from spark-shell from step 3.

scala> spark.conf
res0: org.apache.spark.sql.RuntimeConfig = org.apache.spark.sql.RuntimeConfig@505bc480
scala> spark.conf.get("spark.sql.catalogImplementation")
res1: String = hive

Thank you for the help,
Harry

stevemar · 2018-07-14T01:10:23Z

@dilipbiswal can you give this a look when you have a minute ^

HarryLiUS · 2018-07-17T16:56:44Z

Hello @stevemart @dilipbiswal,
Do you have any update?

Also, I have questions about tpcdsenv.sh variables. For the error above, I used default except point the root directory to my TPC-DS installation directory. Here is the tpcdsenv.sh:

harry.li@perf84:/usr/local/harry/tpcds/spark-tpc-ds-performance-test$ cat bin/tpcdsenv.sh
#!/bin/bash
#
# tpcdsenv.sh - UNIX Environment Setup
#

#######################################################################
# This is a mandatory parameter. Please provide the location of
# spark installation.
#######################################################################
export SPARK_HOME=/usr/local/harry/spark

#######################################################################
# Script environment parameters. When they are not set the script
# defaults to paths relative from the script directory.
#######################################################################

export TPCDS_ROOT_DIR=/usr/local/harry/tpcds/spark-tpc-ds-performance-test
export TPCDS_LOG_DIR=
export TPCDS_DBNAME=
export TPCDS_WORK_DIR=
harry.li@perf84:/usr/local/harry/tpcds/spark-tpc-ds-performance-test$

My questions are:

Is this setting good to use test with ./bin/tpcdsspark.sh?
If I need to move my database from local disk to HDFS, what kind of changes will be? I tried to change the setting as following and it does not work.

export TPCDS_ROOT_DIR=/usr/local/harry/tpcds/spark-tpc-ds-performance-test
export TPCDS_LOG_DIR=hdfs:///TPC-DS/logDir
export TPCDS_DBNAME=hdfs:///TPC-DS/dbDir
export TPCDS_WORK_DIR=hdfs:///TPC-DS/workDir

Please advise and thanks in advance.
Harry

cruizen · 2018-12-20T11:45:32Z

@HarryLiUS Can you run step 4 (cleanup) to clean all data and start from scratch? I think you may have run dsdgen to generate data at a different scale factor.

mosayyebzadeh · 2020-04-22T23:05:47Z

@HarryLiUS Could you solve the problem? I am facing the same problem.

ViRaL95 · 2020-06-06T15:44:54Z

has anyone resolved this?

mosayyebzadeh · 2020-06-06T16:00:22Z

I could not make it work with Spark 3.0.0. But after switching to Spark 2.4.5 the problem went away.

fbaig · 2021-05-13T09:41:29Z

Works without any modifications with Spark 2.4.5 and Spark 2.4.7. However, requires some modifications to run with Spark version 3.0.1. Actually, the solution does not even relate to Spark. There is a check which compares row counts from generated data and expected results. The check fails because it compares the contents of files. The newer version of Spark has some new warnings that get added to the beginning of the generated file and thus fails the comparison with the expected result.
Following are the steps to make it work with Spark version 3.0.1

In bin/tpcdsspark.sh in function check_createtables()
Before the file comparison check i.e. if cmp -s "$file1" "$file2"
If you are on Mac sed -i '' '/^W/d' $file1
If you are on Linux sed -i '/^W/d' $file1

ChenZuzhi · 2022-06-06T19:37:28Z

This error occurs when the file rowcounts.rrn and the file rowcounts.expected are not exactly the same.
For me, it turns out that the rowcounts.rrn is derived by the log rowcounts.out , thus contains some unexpected warning logs.
The rowcounts.rnn then turn out to be like:

WARNING: An illegal reflective access operation has occurred...
Setting default log level to "WARN".
6
11718
144067

And the rowcounts.expected is like:

6
11718
144067

This cause the error in check_createtable.

So here's my solution:
Open up the log rowcounts.rrn under worker directory, write down the words that occurs in the file that should not contained in rowcounts.expected. In my case, the words including 'WARNING', 'Setting'
Then edit the file tpcdsspark.sh in line 99, add | grep -v "WARNING" and | grep -v "Setting", this will filter the unexpected log in 'rowcounts.out' and derive a good 'rowcounts.rnn'.
I then use "Cleanup" and then "create spark tables" in 'tpcdsspark.sh' command, then everything works fine for me.

ltgoter · 2022-08-11T01:34:09Z

This error occurs when the file rowcounts.rrn and the file rowcounts.expected are not exactly the same. For me, it turns out that the rowcounts.rrn is derived by the log rowcounts.out , thus contains some unexpected warning logs. The rowcounts.rnn then turn out to be like:

WARNING: An illegal reflective access operation has occurred...
Setting default log level to "WARN".
6
11718
144067

And the rowcounts.expected is like:

6
11718
144067

This cause the error in check_createtable.

So here's my solution: Open up the log rowcounts.rrn under worker directory, write down the words that occurs in the file that should not contained in rowcounts.expected. In my case, the words including 'WARNING', 'Setting' Then edit the file tpcdsspark.sh in line 99, add | grep -v "WARNING" and | grep -v "Setting", this will filter the unexpected log in 'rowcounts.out' and derive a good 'rowcounts.rnn'. I then use "Cleanup" and then "create spark tables" in 'tpcdsspark.sh' command, then everything works fine for me.

For spark 3.3.0, it need more filter. i make it work with adding the follow filter:

| grep -v "WARNING" | grep -v "Setting" | grep -v "Spark"

shuaiwuyue · 2022-09-05T12:37:37Z

hi, @HarryLiUS Have you solved this problem? I checked my rowcounts.rrn and it is also all 0.

theosib-amazon · 2022-09-23T20:34:33Z

For spark 3.3.0, it need more filter. i make it work with adding the follow filter:
| grep -v "WARNING" | grep -v "Setting" | grep -v "Spark"

This is what fixed it for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot complete the test run #32

Cannot complete the test run #32

HarryLiUS commented Jul 13, 2018

stevemar commented Jul 14, 2018

HarryLiUS commented Jul 17, 2018

cruizen commented Dec 20, 2018

mosayyebzadeh commented Apr 22, 2020

ViRaL95 commented Jun 6, 2020

mosayyebzadeh commented Jun 6, 2020

fbaig commented May 13, 2021

ChenZuzhi commented Jun 6, 2022

ltgoter commented Aug 11, 2022 •

edited

Loading

shuaiwuyue commented Sep 5, 2022

theosib-amazon commented Sep 23, 2022

Cannot complete the test run #32

Cannot complete the test run #32

Comments

HarryLiUS commented Jul 13, 2018

============================================== TPC-DS On Spark Menu

SETUP (1) Create spark tables RUN (2) Run a subset of TPC-DS queries (3) Run All (99) TPC-DS Queries CLEANUP (4) Cleanup (Q) Quit

Please enter your choice followed by [ENTER]: 1

============================================== TPC-DS On Spark Menu

SETUP (1) Create spark tables RUN (2) Run a subset of TPC-DS queries (3) Run All (99) TPC-DS Queries CLEANUP (4) Cleanup (Q) Quit

Please enter your choice followed by [ENTER]: 2

stevemar commented Jul 14, 2018

HarryLiUS commented Jul 17, 2018

cruizen commented Dec 20, 2018

mosayyebzadeh commented Apr 22, 2020

ViRaL95 commented Jun 6, 2020

mosayyebzadeh commented Jun 6, 2020

fbaig commented May 13, 2021

ChenZuzhi commented Jun 6, 2022

ltgoter commented Aug 11, 2022 • edited Loading

shuaiwuyue commented Sep 5, 2022

theosib-amazon commented Sep 23, 2022

==============================================
TPC-DS On Spark Menu

SETUP
(1) Create spark tables
RUN
(2) Run a subset of TPC-DS queries
(3) Run All (99) TPC-DS Queries
CLEANUP
(4) Cleanup
(Q) Quit

==============================================
TPC-DS On Spark Menu

SETUP
(1) Create spark tables
RUN
(2) Run a subset of TPC-DS queries
(3) Run All (99) TPC-DS Queries
CLEANUP
(4) Cleanup
(Q) Quit

ltgoter commented Aug 11, 2022 •

edited

Loading