Skip to content

RandomQueryGeneratorIndependentGrammars

David Bennett edited this page Jul 17, 2014 · 2 revisions

Table of Contents

Writing DB-Independent Grammars (The MySQL-centric approach)

When writing grammars that is intended to be run on several databases (Typically JavaDB/Apache Derby or PostgreSQL), several things must be taken into consideration.

  • MySQL is not very standards compliant
  • MySQL is very "forgiving", that is a lot of expressions that goes through MySQL will be rejected by most other databases.
The aim of this document is to enable writing of RQG-grammars for MYSQL which may be run on PostgreSQL and JavaDB. Thus, limitations of those two databases will also be consisdered a portability issue.

Variants in the grammar

It is possible to have variations in the grammar for different databases. MySQL will accept input in bracketed comments if the first character is ! (exclamation mark). This the following wil work in all databases:

INSERT /*! IGNORE */ INTO ......

In addition, it is possible to insert comments which will be stripped away if the database appears in at :-separated list at the beginning of the comment, like this:

SELECT .... ORDER BY f /*+JavaDB:Postgres: NULLS FIRST */

The recoginzed databases are MySQL, JavaDB and Postgres. The match is case-insenisitive.

Incompatible types

MySQL will accept a lot of expressions which are rejected by other databases. Common cases are:

  • Comparision of incomatible types (e.g. int_field > date_field)
  • Aggregate functions of illegal types (e.g. SUM(time_field))
  • Integers as boolean expressions (e.g. SELECT .... FROM ... JOIN ... ON integerexpression)

GROUP BY and aggregate functions

The SQL standard requires that if one column in a select is an aggregate function, then either, all columns must be aggregates or mentioned in a GROUP by clause.

GROUP BY alias names

You cannot do GROUP BY on an alias name like this:

SELECT a as b ..... GROUP BY b;

This is not standard, and not allowed in JavaDB/Derby

HAVING with alias name in expression

Likewise you cannot use an alias name in the HAVING expression. Not allowed in PostgreSQL nor JavaDB/Derby.

DISTINCT

SELECT DISTINCT requires that ORDER by refers to a selected column. The following is illegal in standard SQL:

SELECT DISTINCT i FROM... ORDER BY j;

LIMIT

Use LIMIT with care. The translator will translate it, but since execution engines differ, make sure that LIMIT is combined with an ORDER BY which ensures deterministic results. That is, either ORDER BY a column which has the UNIQUE contraint (or is a PRIMARY KEY which implies the UNIQUE constraint), or order on all columns (total oredring).

If the columns in the ORDER BY clause contains NULL VALUES (or in some otehr cases) the result may still be different. See below.

ORDER BY

ORDER BY may yield different results depending on

  • How NULL is treated. MySQL treats null values as less than non-null values, while in JavaDB/DERBY and PostgreSQL they are treated as greater. The standard specifies that this is implementation defined. There is, however a portable workaround. Use the COALESCE function the order by clause to give NULL values a defined order. E.g. like this
SELECT i FROM t ORDER BY COALESCE(i,9999999);
  • How other columns are ordered, depending on datatype. For most datatypes, this will probably be equal, but in the general case there may be differences. Some datatype may be ordered in 1) lexical order, 2) numeric order or 3) binary order. This might be different for some databases on some datatypes.

Joins

For the moment, Derby/JavaDB (10.5.3) does not support NATURAL JOIN and USING clause, so portable tests should not use these.

Quoting

Use ' for string literals. If possible, do not use quoted identifiers. They will be translated to unquoted when the queries are run on other databases than MySQL.

DUAL

Avoid the non-standard Oracleistic table DUAL (Typically used to get one value/row: SELECT 1 FROM DUAL;). It is not supported by PostgreSQL nor by Derby/JavaDB. Unfortunately, MySQL and PostgreSQL supports the nonstandard SELECT 1, but Derby/JavaDB does not, while Derby/JavaDB supports the standard VALUES 1, but MySQL and PostgreSQL does not. The solution is to create your own DUMMY table with one row for this purpose. DUMMY is now added to the schema created by gendata-old.pl

Non-standard table reference

The table refererence (table AS alias) is not allowed in the standard, nor in PostgreSQL. Instead of writing

SELECT * FROM (a AS b)

write

SELECT * FROM (SELECT * FROM a) AS b.

Row value expressions

JavaDB/Derby does not support row value expressions, so you can't write e.g. (1,2) IN (.....). Only single values on the left side of the IN-operator.

Note on USING clause

USING is not supported by JavaDB/Derby.

A USING clause is basically a shorthand notation for an ON clause with equality on all listed columns. But, USING will also affect the selecte list of columns if * is used. For INNER joins * will be translated to the named columns in the lis from one of the tables (which is irrelevant since they are equal). For OUTER joins * will be translated to the named columns in the left operand for LEFT JOIN and the named columns in the right operand for RIGHT join. If the selected columns is a specific list, It will have no effect on which columns selected.

Example:

mysql> select * from a;
+------+
| i    |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

mysql> select * from b;
+------+
| i    |
+------+
|    3 |
|    2 |
+------+
2 rows in set (0.00 sec)

mysql> select * from a left outer join b on a.i = b.i; 
+------+------+
| i    | i    |
+------+------+
|    1 | NULL |
|    2 |    2 |
+------+------+
2 rows in set (0.00 sec)

mysql> select * from a left outer join b using(i);
+------+
| i    |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)


mysql> select a.i,b.i from a left outer join b using(i);
+------+------+
| i    | i    |
+------+------+
|    1 | NULL |
|    2 |    2 |
+------+------+
2 rows in set (0.00 sec)

Category:RandomQueryGenerator

Clone this wiki locally