- Angelegt von Klas Blomqvist, zuletzt geändert am 10 Apr, 2015
| Contact person* | Klas Blomqvist |
|---|---|
| Job title | Senior Advisor Metadata |
klas.blomqvist@scb.se | |
| Telephone | +46 8 506 94352 |
Statistical Information Model
Statistics Sweden has created a national version of an overview model of GSIM. Some areas are not covered.
Detailed diagram:

Background
Statistics Sweden has a vision in line with the HLG and Eurostat 2020 vision. It focuses on creating a production environment based on a high level of standardization. The Swedish national version of GSBPM is the base for producing statistics as well as the organisation providing a basis for well-defined standardized processes and activities using standardised methods and tools.
Following the vision, information is stored in a coordinated and effective manner in a well-structured data warehouse throughout the production. The links between activities and data are supported by a common platform, where the standardized tools and services needed to carry out an activity are available via a central metadata repository, which contains the information needed to describe the data warehouse, including the tools, and to control and run the processes. These processes are continually evaluated by the process metrics created and utilized in order to improve them. Coordination at the statistical object level is supported by a registry system, in which base registers[1] interact and work well together with other statistical activities.
In the vision, all parts of the system are well integrated and cooperate to drive the business processes (production) forward in an effective, well-documented and standardized manner in which responsibilities for different parts are made clear.

Production systems do not maintain their own local metadata repositories, but all metadata associated with the statistical production process are stored in a central Metadata Repository. Common tools and methods related to relevant parts of statistical production process are means to achieve reusability and coherence.
A central data warehouse requires standardized rules governing the physical storage, such as formats and lengths. There must also be rules governing how to handle updates, ownership and permissions / security.
Other key factors to achieve the objectives include relevant expertise and management support.
Sweden has a distributed statistical system with 27 authorities responsible for producing official statistics where Statistics Sweden is the national coordinator. In the future the vision should cover the entire Swedish statistical system, using a common shared model for a metadata repository.
The vision is in principal shared by several other statistical organizations in the world, and is in line with the HLG vision[1]. It follows the HLG models GSIM (Generic Statistical Information Model), which is a common information model for statistical production, and GSBPM (Generic Statistical Business Process Model). Both are key components in modelling the central Metadata Repository. The model is in principal a Swedish adaptation of GSIM, on which the model is closely based.
The model is structured into the same four groups used by GSIM to structure the information objects. The areas are business, exchange, concepts and structures.
The present situation
An analysis of the current metadata situation at Statistics Sweden shows that the distance to achieving the vision is considerable. Currently, it is not possible to monitor, assess or control the production via the business processes. The metadata environment is split up storing metadata for different purposes separately, often closely coupled to the tools that support a particular process. Many tailor-made production systems include the required metadata, but these are stored separately in the respective production systems with no links to central systems. There are metadata in common production tools such as Triton[1] and the Statistical Databases (PC-Axis based), but they lack links to the metadata used in the following or previous processes. The MetaPlus system was originally developed primarily to be Statistics Sweden’s tool to document final observation registers (mainly microdata). It is nonetheless the metadata system at Statistics Sweden that is closest to fulfilling the principles of a central metadata repository.
MetaPlus contains central common metadata such as classifications, variable definitions and their value domains, linked to objects and populations. The documentation in MetaPlus is fairly complete, and up to date, but in many cases it has extensive quality deficiencies.
Triton's metadata system was developed specifically to support the information handled by Triton[1]. There are no links to metadata that relate to subsequent processes. Other collection tools, such as the centralized scanning system and SIV[2], use their own metadata systems covering their particular purposes. They lack connections to other metadata systems.
The metadata system used in the statistical databases is the oldest metadata system still use. It uses PC-Axis and is tied to the structure of the statistical databases and has no links to other metadata.

The scope of a central metadata repository
The vision of a central metadata repository uses the term metadata in a broad sense. The repository includes information about:
- connections to the statistical business process in its various stages
- design of the statistical programs and their detailed process steps
- process rules
- statistical variables and other variables needed to support statistics production
- populations and unit types associated with them
- value domains and classifications
- statistical program cycles
- questions
- services available
- reference metadata (documentation) and quality descriptions
- thesaurus
- connection to physical data storage
- statistical products
The central metadata repository supports process metrics, and stores the related process metadata
The central Metadata Repository is responsible for the quality, structural rules, and consistency of the metadata. It ensures that redundancy checks and versioning is carried out. Read-only copies of metadata may occur locally in production systems, e.g. for performance reasons.
The central Metadata Repository does not include:
- data and object instances
- process metrics
- services (executable IT services and manual procedures (checklists, procedure descriptions, tutorials, etc.)
- other master data that is common to Statistics Sweden such as personnel and organization
The model of the central metadata repository and the Data warehouse and register system vision
The image below shows a schematic view of the main parts of the central Metadata Repository and how it is linked to data and to the statistical business process model.

The central Metadata Repository includes reusable components, which can be connected to different parts of the statistical production process. They should be described in such a way that they support and control the production. This means that they need to be systematically structured in order to be machine-readable (automated).
The metadata repository does not need to be physically located to one single place or have one single common interface, but it essential that the various parts fit together and that users do not have to re-enter information, e.g. value domains, several times ‒ reuse is a central feature.
Currently metadata are available that actively support the processes Collect (in Triton) and Disseminate (in the statistical databases). For Process and Analyse there is currently no cohesive active metadata repository to support the production. However, the functionality available in MetaPlus makes it possible to use the metadata in an active way, but since MetaPlus is not comprehensive, there is a need to supplement that part of the metadata layer.
Process steps
It is essential that the metadata model includes a link to the statistical business process. This allows the model to provide a complete basis for a process based organisation that defines a set of common, organisation-wide process steps. Any statistical program will be able to select its relevant process steps with support from the metadata repository.
Process steps exist at various levels. The highest level is the statistical business process model as a whole. It comprises Process steps 1-8, and then an arbitrary number of levels, until the activity level is reached.
The generic business process steps are the basis for maintaining a common set of status codes throughout the statistical production process. A general list of business process steps (down to activities) does not exist today, and needs to be created to be made available in a central metadata repository.
No process metricsare stored the central metadata repository, only the process metadata. Process metadata are necessary to evaluate a specific business process activity, for example, in statistical products. An example of this could be to measure the influx for Statistics Sweden's voluntary individual surveys in 2013 which used the SIV collection tool. Another example could be the relative size of the over-coverage (in the sample) for all enterprise surveys which cover section G according to the Swedish activity classification (SNI 2007).
Attributes - Status codes
Various objects in a central metadata repository require status codes to efficiently support the construction and use of systems and system components for all the parts of the statistical business process. What status codes are required, and for which objects, to provide such support needs to be investigated and discussed further.
Business process
Business process is important in the model since the other objects are directly or indirectly connected via business process. It is the glue that ties the different parts of the metadata model to the model of the business process.
At the present, the statistical production process comprises many different Business processes. These are not coordinated and do not cover the whole statistical business process production cycle. Statistical program, Statistical program cycle and Collection cycle are implemented in Triton, covering the Collect and parts of the Analyse processes. Disseminate has a publishing cycle and statistical product, that has production cycle. The relationship between these rounds is not unambiguous. The MetaPlus metadata system documents final observation registers (usually microdata) using a hierarchical structure (Register, Register variant and Register Version). National accounts use the terms Computational cycle and Version to describe their Business Processes. For parts of Process and Analyse there are no Business Processes implemented in any standard tool at Statistics Sweden.
When data have been collected, they pass a number of process steps, such as editing and manual examination. A correction of a value means that a new data generation is created. This generation of the data has a different source than the original value, and has. Every generation contains a reference to the operator or service that made the correction, a status code that indicates why and how the change was made, and a time stamp that records when the change was made.
Parameters and rules
Parameter refers to a characteristic that can be considered constant for a given situation but which can assume different values in other situations. A parameter can be used as an input to a rule, which means that the rule is defined input parameters. Parameters can be inputs to a service and control the service logic, which in turn is controlled by predefined (with parameters) rules that are embedded in the service. Other input to a service is not seen as parameters here, it is called technical information or support information. Parameters cannot be an output from a service, but the result of a service can be used as a parameter to control another post in a process step. It answers the question “What information does a process / process step / activity need to function?”
Whether a process step is a manual or an automated one something must describe how to perform the process step. In the automated case there is a code, e.g. SAS, and in the manual case then there is a text, such as a work routine description. Both code and work routine description may contain rules that affect the outcome of the process step. A SAS program may look like ”If &var1.=^ '.' then &var3.=&var1.; else &var3. =&var2.; end;”. A work routine description might look something like this: "If there is industry data from the LFS it should be used, otherwise the Short-term statistics, wages and salaries, private sector will be used as the source for the calculation".
Generic and statistical program specific metadata
The common metadata repository can be divided into two logical parts: the generic part, which is valid for the whole organisation, and the statistical program-specific one. The generic part contains metadata that is common to the whole statistical production process and should be maintained centrally. When a specific statistical program is designed, conducted and documented, its metadata are derived from the generic metadata, creating the specific instance metadata. The metadata that are specific to a statistical program will also be shared and reused by other statistical programs.

Connecting data and metadata
The (yellow) structures part of the metadata model provides information about how the data are physically organized and where they are stored. The key connection point is the instance. An instance variable provides a link to the Data point where the Datum representing a variable for an individual object is stored. This link appears in both GSIM and the Swedish model, but in addition the Swedish model also adds a Context variable (in the Concepts part), which expresses a represented variable with a specific role, reference time and source. The context variable can be linked to one or more Data columns (an addition to the Structures part). This link enables the model to be the basis of a metadata driven production system.
[1] Platform for Standard tools for processes Collect, Review , Validate, Edit and Impute.
[2] Standard tools for data collection via the Internet (web forms and loading files)
[1] Statistics Sweden uses the term”base register” to indicate three registers which are continuously updated statistical copies of the national administrative registers on individuals, businesses, and geographical entities
Adoption of GSIM
.System overview
.Process description that this GSIM implementation supports
.Business case
.Relation to other Models
.Design
.Licensing
.New Information Objects and/or new specialisations of GSIM Information Objects
.Lessons learned
.Suggestions for changes to GSIM
.
- Keine Stichwörter