- Erstellt von Klas Blomqvist, zuletzt geändert von Chris Jones am 22 Jun, 2015
| Contact person* | Klas Blomqvist |
|---|---|
| Job title | Senior Advisor Metadata |
klas.blomqvist@scb.se | |
| Telephone | +46 8 506 94352 |
Statistical Information Model
System overview
The model is structured into the same four areas used in GSIM to structure the information objects. The areas are business, exchange, concepts and structures.
Overview diagram:

One of the purposes of the model is to support coordination and standardization of processes. The picture shows how the production of statistics in the business process model interacts with the metadata model in each process (at any level) in the process model. A process is related to its input and output as described in the model of the central Metadata Repository. The picture shows how one process uses input and creates output as described in the model. The design of the actual process step is controlled and shaped also in accordance with the model.
Process flow:

Concepts
The green part of the model is consistent with GSIM Concepts. It describes the basic concepts in statistics linked to the statistical production process and data structure in a systematic way. These objects are the conceptual contents used as input and output of process steps. The objects include descriptions and definitions of what the statistics measure in the practical implementation. The objects used are linked to the data and can be described (documented) in reference metadata.
Unit type is a group of objects of interest that have common characteristics. They can be defined in a Population, which is a Concept that is a set of units of a particular unit type defined by common characteristics.
In the model, there are four types of variables: Conceptual variable, Represented variable, Context variable[1] and Instance variable.
- A Conceptual variable measures the characteristics, it is a Concept
- A Represented variable is a Conceptual variable and is linked to Unit type and Value domain.
- A Context variable is a Represented variable linked to a specific data source and a Population
- An Instance variable is a Context variable that has actual values for a certain unit.
A Value Domain can be a Described value domain or an Enumerated value domain. A Described value domain can be continuous or a simple describing text.
Categories are Concepts that can be used in three ways in the central Metadata Repository. They are described as types of Nodes - Category Unit, Code unit or Classification unit. Categories are grouped in Nodeset. There are three such types of groups: Category set, Code list and Classification. A Category set is a set of Category Units that contains the meaning of a Category without any associated representations, e.g., woman and man. In a Code list the Code units include Category meanings combined with a Code unit, e.g. 1‒woman and 2‒man. A Classification is a Code list that meets the criteria that the Classification units are mutually exclusive and exhaustive for each level.
Business
In the blue part of the model the design of a Statistical Program is described. Here it is determined which Process steps that are to be included in the Statistical Program. For each Process step it is determined which Methods and Rules that are going to be used, but also the information that a Process step requires as input to be executed. The results of the Process step are also identified, as well as the Process metrics that are created during the Process step. It is also important to describe the order in which the different Process steps must be performed. Based on the decisions taken in the design it is examined what IT-services are most suited to perform the specified process steps.
Once the design is completed and implemented in automated or manual procedures, implementation will result in refined data and process metrics for each process step
Exchange
The red part of the model describes how, when and why the information collected in each statistical program and supplied to external customers and users.
Structure
In the yellow part of the model (structures), the variables described in the concepts part of the model are connected to logical data units that are structured as units or dimensions and stored physically in database tables or files. A variable’s role in a certain data structure is described here: to identify, measure or have other roles.
[1] The Context variable is a Swedish addition to GSIM that expresses a Represented variable with a specific role, reference time and source (belonging to a specific population)
Process description that this GSIM implementation supports
At Statistics Sweden, we have made a principle design for the statistical production process to clarify the input and output that is relevant in various process steps. The design principle is used when the Swedish version of GSBPM is described as a process flow from establish needs to disseminate. The process flow is a part of Statistics Sweden´s process architecture. In order to obtain a uniform description it was important to describe the input / output as that in Design and Plan and to show how it is used in Build and Test and then that this is to be the basis for running and executing the production itself. Input / output is tied to relevant GSIM objects.Business case
.Relation to other Models
.
Statistics Sweden uses GSIM together with the Swedish version of GSBPM (see the GSPBM case study).
For business processes that are not part of the statistical business process Statistics Sweden has developed a process chart, in line with GAMSO.
Statistics Sweden´s process chart:

Design
.
Principles for the central Metadata Repository:
- The metadata is stored when they are created and then re-used where they are relevant.
- The metadata shall be used actively, in the sense that they support metadata-driven production, which means that it is through metadata production is supported and controlled.
- The metadata that is established as common; such as classifications and standardised variables, should have the common metadata layer as the source.
- Common concepts should be used in a uniform manner and based on common definitions, which are documented in a thesaurus.
The central Metadata Repository must meet the following requirements:
- The central metadata repository to support the handling of business processes.
- The metadata shall support version and generation management of data.
- Principles shall be provided for the update, ownership and permissions / security.
- The metadata shall be presented in several languages if required, at least in Swedish and English.
The main way we use GSIM in Statistics Sweden is to adapt our existing information architecture to GSIM. This ensures that the projects which use our information model works in accordance with GSIM and the business doesn´t need to have specific knowledge of GSIM.
Statistics Sweden´s information architecture consists of two levels – object groups and detailed information models. The contents evolve within development projects, which mean that there are no detailed information models for all object groups.
Object groups:


Licensing
.New Information Objects and/or new specialisations of GSIM Information Objects
.
Groups of objects
Statistics Sweden´s information architecture has several groups of objects that are used in other processes as well as those shared between several processes like Personnel and Organisation. The circled object groups are specifically used for the statistical production. The object group Structure can be used by other processes.
Highlighted object groups

Context variable
A Context variable is a Represented variable linked to a specific data source and a Population. This new object was created in order to make the link between concepts and structure.
Lessons learned
.
GSIM as such is not possible to be used directly of the shelf, it is a conceptual model and it requires a lot of work to adopt it to the NSI-level. It is difficult to connect the different parts of GSIM. As soon as you start the level of details required are very large. This does not mean that it is not useful, quite the opposite, it is extremely helpful in a lot of cases since it provides a solution to existing problems and can therefore be used as a reference. GSIM has proven very valuable as a foundation in discussions regarding a central metadata repository and has provided a common vocabulary in those discussions. It has been a source of inspiration in the efforts of constructing a production system for short term statistics (KLON, mentioned above). During this work a number of lessons has been learned.
Experiences in short:
- The importance of reusing metadata
For example, allowing the user to use the names and codes of variables and value domains from "Concepts" in "Business" and "Exchange" to configure the design and also what will be delivered to internal and external users. - Harmonized metadata
It is not a prerequisite, but significantly facilitates the work. By giving the same code and name to variables and values that by definition conceptually are the same thing. - Periodicity
Often there are data with different periodicity available. In order to have a general production system there needs to be a property that describes the periodicity on transformable input and output. The periodicity may be different between input and output. E.g. monthly data can be transformed into both quarterly and yearly data so that the output out of a service can be all three types of periodicity. - Classify variables
In order to aid users of the production system, but also in the building of common services that can be used by more than one statistical program, is it useful that variables are classified as different types. A variable can be of a different type depending on the context of its use. - Traceability - and reproducibility
Not only for versioning of data, but also for metadata. This is closely tied to versioning of services. - Codes (status and ”other” codes)
By using process step as an attribute it is possible to monitor where in the business process a values is used, given that the process flow is defined. This means that a lot of often occurring status codes are not required. What is left are status codes for a specific values, these are considered as generic status codes. - Exchanging data
To a large extent data is exchanged between different statistical programs within a NSI. This is often done by hard coded database-to-database solutions. Or even worse, e-mail. One way to avoid that is to store data in the same structure. Harmonized metadata is necessary for this.
Suggestions for changes to GSIM
.
- Keine Stichwörter