Transformations and Expressions

Explanation of the diagram

The purpose of this part of the model is to be able to track the derivation of data. It is similar in concept to lineage in data warehousing – i.e. how data are derived. The functionality of this part of the model allows the identification and documentation of the calculations performed (these will normally be automated, program calculations), as well as defining structures that support a syntax neutral expression “grammar” that can specify the operations at a granular level such that a program can “read” the metadata and compose the expression required in whatever computer language is appropriate. It also allows specifying and documenting the coherence rules among different data, expressing them as calculations (for example, the coherence rule “a + b = c” can be written as “a + b - c = 0” and checked through the calculation “if((a + b – c) = 0, then …, else …)”).

There are three type of ItemScheme relevant to this model.

A TransformationScheme which comprises one or more Transformations.
An OperatorScheme which comprises one or more Operators.
An ExpressionNodeScheme scheme which contains one or more ExpressionNodes..

The model presented here is a basic framework which can be used for expressions and transformations, but requires more work on elaborating its integration into the model and its actual use within the model. This elaboration will be in a future release of the standard. The expression concept in the SDMX-IM takes a functional view of expression trees, resulting in the ability of relatively few expression node types to represent a broad range of expressions. Every function or traditional mathematical operator that appears in an expression hierarchy is represented by the +operator role on the association to Operator which in turn comprises input and output Parameter. For example, the arithmetic plus operation “a + b” can be thought of as the function “sum(a, b).” The “sum” is the Operator, and “a” and “b” are its Parameters. A parameter is a generic possible input and output of an operator (e.g. base and exponent are the parameters of the power operator), while an argument is the specific value that a parameter takes in a specific calculation (e.g. in the Einstein equation “E = MC2”, the arguments of the “power” operation are “C” (the base) and “2” (the exponent)).

The actual semantics of a particular function or operation are left to specific tool implementations and are not captured by the SDMX-IM. The hierarchical nature of the SDMX-IM representation of expressions is achieved by the recursive nature of the OperatorNode association. This association allows the sub hierarchies within an expression to be treated as actual arguments of their parent nodes.The model can be used equally to define data derivations and to define integrity checks (e.g. the Sum of A+B must equal C).

Although the model defines the data structures that are used to contain a syntax neutral expression, the model itself does not specify a syntax neutral expression grammar. Alternatively, the function can be described in a text form either as an unstructured explanation of the function, or as a more formal language like BNF2 (Backus Naur Form). The data structures work as follows: The actual basic mathematical functions that need to be performed (e.g. sum, multiply, divide, assign (=), <, > etc.) are defined as Operators an OperatorScheme. For each Operator the input and output Parameters, are defined in the Parameter class. The calculations are defined as Transformations in a TransformationScheme. A Transformation is a specific calculation and is specified by means of an expression, which is obtained by applying one or more Operators in the desired order (for example, in the textual form, using parenthesis) and specifying the actual arguments for the Operators’ Parameters; the result of the whole expression is assigned (=) to the model item that is the result of the Transformation (that is “E” in the Einstein equation).

A Transformation operates on existing IdentifiableArtefacts and its result is another IdentifiableArtefact. A calculated IdentifiableArtefact may be in its turn be an operand of other Transformations. The expression of a Transformation (for example, for the Einstein equation calculus, “E = M*(C**2)”) may be decomposed in a hierarchy of ExpressionNodes (in the example, “M”, “C”, “2”, *, **). The ExpressionNode can be a ReferenceNode, a ConstantNode or an OperatorNode. The ReferenceNode references an identifiable model artefact (in the example, “M” and “C”). The ConstantNode is by definition a constant value (in the example “2”). The OperatorNode references an Operator in the OperatorScheme (in the example *, **). The Transformation has an association to its component ExpressionNodes.

The hierarchy of the ExpressionNodes conveys the order in which the operators are applied in the expression and is obtained by means of the /hierarchy association of the OperatorNode class, in which the child ExpressionNodes are the arguments of the parent OperatorNode. The child ExpressionNodes must correspond to the formal parameters of the Operator referenced by the parent OperatorNode in the correct sequence. The (child) ExpressionNode can be the result of another operation (that is another OperatorNode) or can be a Constant or can be a reference to an IdentifiableArtefact (ReferenceNode). All IdentifiableArtefacts in the SDMX-IM have a unique urn comprising the values of the individual objects that identify it. The structure of this urn is defined in the Registry Specification. An example would be the urn of a code which comprises the agency:code-list-id.code-id – an actual example is "urn:sdmx:org.sdmx.infomodel.codelist.Code=TFFS:CL_AREA(1.0).1A".

Page tree

Transformations and Expressions