This section contains the information and reasoning that went into the design of the DataJoint-to-SQL transpiler.
MySQL appears to differ from standard SQL by the sequence of evaluating the clauses of the SELECT statement.
Standard SQL: FROM > WHERE > GROUP BY > HAVING > SELECT MySQL: FROM > WHERE > SELECT > GROUP BY > HAVING
TODO: verify with latest SQL standards and postgres / CockroachDB implementations and whether this order can be configured
SELECT to an earlier phase allows the
GROUP BY and
HAVING clauses to use alias column names created by the
The current implementation targets the MySQL implementation where table column aliases can be used in
If postgres or CockroachDB cannot be coerced to work this way, restrictions of aggregations will have to be updated accordingly.
QueryExpression is the main object representing a distinct
It implements operators
proj — restriction, join, and projection.
heading describes all attributes.
proj creates a new heading.
restriction contains the
AndList of conditions. Operator
& creates a new restriction appending the new condition to the input’s restriction.
support represents the
FROM clause and contains a list of either
QueryExpression objects or table names in the case of base queries.
The joint operator
* adds new elements to the
At least one element must be present in
support. Multiple elements in
support indicate a join.
From the user’s perspective
QueryExpression objects are immutable: once created they cannot be modified. All operators derive new objects.
proj can create an alias attribute by renaming an existing attribute or calculating a new attribute.
Alias attributes are the primary reason why subqueries are sometimes required.
Projections, restrictions, and joins do not necessarily trigger new subqueries: the resulting
QueryExpression object simply merges the properties of its inputs into self:
The input object is treated as a subquery in the following cases:
A restriction is applied that uses alias attributes in the heading
A projection uses an alias attribute to create a new alias attribute.
A join is performed on an alias attribute.
An Aggregation is used a restriction.
An error arises if
If a restriction or a projection attempts to use attributes not in the current heading.
If attempting to join on attributes that are not join-compatible
If attempting to restrict by a non-join-compatible expression
A subquery is created by creating a new
QueryExpression object (or a subclass object) with its
support pointing to the input object.
The join is always natural (i.e. equijoin on the namesake attributes).
Before version 0.13: As of version
0.12.* and earlier, two query expressions were considered join-compatible if their namesake attributes were the primary key of at least one of the input expressions. This rule was easiest to implement but does not provide best semantics.
Version 0.13: In version
0.13.*, two query expressions are considered join-compatible if their namesake attributes are either in the primary key or in a foreign key in both input expressions.
Future (potentially version 0.14+): This compatibility requirement will be further restricted to require that the namesake attributes ultimately derive from the same primary key attribute by being passed down through foreign keys.
The same join compatibility rules apply when restricting one query expression with another.
Any restriction applied to the inputs of a join can be applied to its output. Therefore, those inputs that are not turned into queries donate their supports, restrictions, and projections to the join itself.
Table is a subclass of
QueryExpression implementing table manipulation methods such as
The restriction operator
& applied to a
Table preserves its class identity so that the result remains of type
proj converts the result into a
QueryExpression object. This may produce a base query that is not an instance of Table.
Aggregation is a subclass of
Its main input is the aggregating query expression and it takes an additional second input — the aggregated query expression.
The SQL equivalent of aggregation is
the NATURAL LEFT JOIN of the two inputs.
followed by a GROUP BY on the primary key arguments of the first input
followed by a projection.
The projection works the same as
.proj with respect to the first input.
With respect to the second input, the projection part of aggregation allows only calculated attributes that use aggregating functions (eg
COUNT) applied to the attributes of the aggregated (second) input and non-aggregating functions on the attribute of the aggregating (first) input.
Aggregation supports all the same operators as
restrictionturns into a
HAVINGclause instead of a
WHEREclause. This allows applying any valid restriction without making a subquery (at least for MySQL). Therefore, restricting an
Aggregationobject never results in a subquery.
In joins, aggregation always turns into a subquery.
All other rules for subqueries remain the same as for
Union is a subclass of
Union object results from the
+ operator on two
support property contains the list of expressions (at least two) to unify.
+ operator on unions simply merges their supports, making a bigger union.
Union operator performs an OUTER JOIN of its inputs provided that the inputs have the same primary key and no secondary attributes in common.
Union treats all its inputs as subqueries except for unrestricted Union objects.
dj.U is a special operand in query expressions that allows performing special operations. By itself, it can never form a query and is not a subclass of
QueryExpression. Other query expressions are modified through participation in operations with
dj.U object with a
Once a QueryExpression is used in a
fetch operation or becomes a subquery in another query, it can project out all unnecessary attributes from its own inputs, recursively.
This is implemented by the
This simplification produces much leaner queries resulting in improved query performance in version 0.13, especially on complex queries with blob data, compensating for MySQL’s deficiencies in query optimization.