| Contact person* | Mauro Scanu |
|---|---|
| Job title | senior researcher |
scanu@istat.it | |
| Telephone | +39 06 4673 3357 |
Überblick
Community Forums
Inhalte
ThemeBuilder
| Contact person* | Mauro Scanu |
|---|---|
| Job title | senior researcher |
scanu@istat.it | |
| Telephone | +39 06 4673 3357 |
.
The information model adopted in Istat for the development of the Sistema Unitario dei Metadati for structural metadata (SUM-MS) strongly relies on the GSIM concepts available in groups Concepts and Structures. Furthermore, some concepts of the Production group have also been used. The model adopted in Istat can be considered as a customized GSIM compliant model.
The primary objective of the Istat information model under SUM-MS is to give a definition to data inside Data Sets organized in Data Structures. SUM-MS adopts essentially two data structures: macro data hypercubes and data sets of micro data. The general plan is given in the next Figure. In the next sections there is a general overview of the model.

SUM-MS has been built in line with sections 47-50 in GSIM Specification, version 1.1 (December 2013). Each data is a result of a Process step through the application of a Process method on the necessary Inputs. Hence, each data is marked by the following elements:
1) The Statistical Program under which it has been produced;
2) The Process Step (phase) under which the data has been produced;
3) The Process method that specifies the method that produced the data (consisting of a set of Rules);
4) The Inputs that are necessary in order to produce the data through the application of the Process method.
Micro and macro data along the statistical process are then described in terms of a set of concepts as available in the Concepts and Structures GSIM groups. For this reason, the model is essentially very similar to the one depicted in GSIM, although some modifications have been also considered, as described in the next sections.
.GSIM has been adopted as a reference for nomenclature and definitions of concepts related to data definitions. Among all the concepts available in GSIM, some of them have not been adopted, some others will be included in the next SUM-MS releases and some others are new objects with respect to GSIM.
The concepts that have not been adopted are those that refer to actual instances (e.g. instance variable) given that SUM-MS contains only metadata, not data. For the moment the SUM-MS system contains metadata without documenting many details: e.g. SUM does not documents the Unit Type for a Population (by the way, SUM-MS declares only “analysis populations”, i.e. the reference population for the data; other concepts are left to the referential part of the system); statistical variables are organized under the Represented Variable concept, and their conceptual domain is described by means of the different kinds of variables (numerical or categorical). Anyway there is room for including also other concepts in the SUM-MS.
Each (micro or macro) data produced in any phase of a Statistical Program is described as an output (product of a Process Step) and assigned a code and description. Its main characteristics are:
1) The Statistical Program under which it has been produced
2) The Process Step (phase) under which the data has been produced
3) The Process method that specifies the method that produced the data (consisting of a set of Rules)
4) The Inputs that are necessary in order to produce the data through the application of the Process method
Given the above mentioned scenario, the characteristics of micro and macro data sets are as in the following.
Micro data set
The data set consists of as many Data Structures as the available reference populations in the data set. The data set details the concepts 1),.. 4) described above plus the following characteristics.
In case of numerical variables, there are cases where some of them form a tabular sub-data set at a different unit level. The possibility to read that sub-data set in all its different meanings is important for their reuse (e.g. when data on the number of employees per gender and age class is requested to each enterprise, and these data can be seen both as the observation of numerical variables on each enterprise (first kind of reference population), as well as aggregate tabular data for the employees (second kind of reference population for the same data)).
As far as categorical variables are concerned, each one is connected to a Classification and a classification variant (named data structure variant, see the section on classifications) containing the detail of the actual set of codes used in practice in the data set.
Macro data set
Due to the presence of different kinds of data in a hypercube of macro data, Statistical Program, Process method and Inputs are described at different levels in micro and macro data. While these elements can be referred at a data set level for micro data, the presence of data of different nature in macro data sets induced us to detail them at a more disaggregated level. Our model includes an additional concept, the Data Content, that details Statistical Program, Process method and Inputs for each macro data (see the data Content definition in Section 5).
The whole macro data hypercube is described reporting the phase in which it has been produced (e.g. data dissemination), and the following components:
Diese Site wird mit einer kostenlosen Atlassian Confluence Community-Lizenz betrieben, die https://www.atlassian.com/software/views/community-license-request gewährt wurde. Confluence heute testen.