- Erstellt von Benutzer-51b47, zuletzt geändert von Essi Kaukonen am 12 Sep, 2016
| Contact person* | Essi Kaukonen |
|---|---|
| Job title | Planning Officer / Metadata services |
essi.kaukonen@stat.fi | |
| Telephone | + 358 50 314 2311 |
Metadata strategy
The objectives of Statistics Finland's operational strategy are such as the usability and reusability of data, reliability of statistics and standardization of processes. A common metadata system supports these objectives by providing a uniform way of describing statistical resources, such as classifications, data sets and variables.
Statistics Finland has not laid down a specific metadata strategy, but the main principle is that metadata will be created and maintained in a metadata system and made available to whatever use it is needed for in business processes from data collection to data dissemination. Common statistical metadata system will be further developed in order to rationalize and support the harmonization of statistical business processes.
The metadata system is based on Statistics Finland’s CoSSI -datamodel. The new Classification System, which will be in use by the end of this year, is based on GSIM Classification model.
Current situation
At the moment, we are developing our metadata system and standard ways to utilize it in several projects.
.
Metadata Classification
This section is partially outdated, will be updated later.
1) Statistical metadata.
Statistical metadata consist of :
- descriptions and definitions of statistical data and variables
- classifications
- variable formulas and unit of measurement
2) Statistical data quality.
Statistical data quality reports consist of :
- statistical method descriptions
- relevance of data
- validity, reliability and accuracy of data
The former elements of the report are evaluated by quality indicators which are based on international recommendations.
3) Metadata of statistical documents or products.
Document and product metadata consist of information about:
- producers
- publication information
- identification knowledge of the publications or products
- field or subject area glossary
- keywords
4) Process metadata. Process metadata are divided into technical and conceptual metadata:
a) technical metadata
- technical metadata guide the process of data production: data collection, data management and data dissemination. For instance, it makes it possible to follow data production phase by phase. It also documents the process.
b) conceptual process metadata
- conceptual process metadata consist of the technical information of data and variables which are used in producing data. For example, they can be minimum or maximum values, various calculation rules or use of certain classification values.
Metadata system(s)
This section is partially outdated, will be updated later.
![]()
Statistics Finland's present metadata system comprises of an older part, which consists of Microsoft SQL Server databases and their PowerBuilder interfaces and of a new part currently under construction, where the eXist-XML database acts as the metadata warehouse. The new maintenance tools of metadata contents are being built. Of them, the variable editor indended for maintaining data and variable descriptions has been completed.
In practice, we thus currently operate in two different environments. Ensuring the interoperationality between them is a challenge to the development projects of the metadata system.
The currently used metadata system built in the 1990s is composed of the following parts:
- The classification database and its user interface
- The concepts database and its user interface
- The archiving database system and its user interface
The databases were originally Sybase databases that were transferred in the conversion of 2011 to the SQL Server environment.
The content of the classification database has long been utilised as SAS formats and to an extent in the statistics production processes. In the variable editor and the archiving database system, variable-specific classifications can be retrieved from the classification database and they can be added to the data description. The classifications published on the classifications pages of the stat.fi service are also produced from the classification database.
The concepts and definitions published on the stat.fi website in the concepts service are produced from the concepts database. In the variable editor and the archiving database system, concepts can be retrieved from the concepts database and they can be combined to variable descriptions.
The new metadata system elements already in use are the eXist-XML database acting as the metadata warehouse, the variable editor and the Arbortext text editor. The classifications and concepts are automatically copied from SQL Server databases to the new metadata warehouse.
The document metadata are maintained in eXist with the Arbortext text editor. Arbortext reads trilingual variable data into the tables inside the publications.
The data and variable descriptions stored in the metadata warehouse are drawn up with the variable editor. As far as possible, data maintained in other systems (classification and concepts database, the operational guidance and planning system STOJ) are used in the descriptions.
The data and variable descriptions in the metadata warehouse are utilised in trilingual tabulation in SAS and PX-Edit.
Only part of all the metadata generated at Statistics Finland are updated at the moment in the common metadata warehouses. Plenty of metadata is stored in the data systems of specific sets of statistics, in SAS and Word and Excel files, which makes them available only to the statistics concerned or even only to a certain expert. Deficient and non-uniform descriptions of metadata restrict their retrievability and usability.
Ongoing development projects
The new metadata system aims to enhance the connections of the metadata warehouse to the statistics production process by making the metadata maintenance tools easy to use, by improving the connections of the statistical information systems to the centralised metadata warehouse, and by increasing services related to the use of metadata contents.
Service interface of classifications
A whole connected to the use of classifications is initially implemented from the service-based metadata architecture (see Section 4.1). The project on the service interface of classifications defines and implements the services connected to the maintenance and use of classifications and classification conversion keys in the course of 2013.
Implementation of the variable editor
In the implementation project of the variable editor, which was concluded at the end of 2012, Statistics Finland's statistical units described data sets and variables contained in them to the metadata warehouse (eXist database) with the variable editor. The Metadata Services unit trained and supported the producers of descriptions and prepared instructions together with the Information Technology and Information Services Departments. A total of 189 persons were trained during the project. In all, 123 of the sets of statistics, or slightly over 60 per cent, prepared descriptions to the metadata warehouse. At the end of the year, the metadata warehouse contained in all over 700 data descriptions. The data descriptions mostly related to the data acquisition and dissemination phase of the statistical process. The quality control of the content of data descriptions will be developed further during 2013.
After the project ended, the implementation will continue in other projects focusing on the development of the statistics production process, such as developing the reception system of administrative data.
The variable editor developer group works in connection with the projects. It deals with requests for development received from the users and decides whether they will be put into practice in the editor.
Renewal of archiving
At the moment, statistical data are archived through several applications and user interfaces, which makes it difficult to manage the archiving process. The different services do not communicate with each other and no monitoring or reporting is built in them. Statistical units consider archiving a separate work phase from the production process, and therefore it is often overlooked.
The aim of renewing archiving is to define and describe the new archiving process and the data needed by it and implement the technical tools. The aim is to clarify and automate archiving of statistical data sets by combining them as an integral and non-delayed part into the statistics production process by utilising the data sets already described in the metadata warehouse. The project was started in spring 2013.
Quality reporting
The project examines the relationship of the present quality reporting to the coming requirements, reviews the connection of Statistics Finland's metadata warehouse and metadata model to Eurostat's extended Metadata Standard (SIMS) and makes a plan for introducing the new quality reporting model. The aim is to perform quality reporting so that quality reports are no longer made separately for the EU, other international organisations and domestic users, but one quality report is used as far as possible in reporting. The project will start in May 2013.
Other data systems related to metadata
TILKUT is a description database of statistics that contains basic data on statistics (name, description, topics, keywords, publication frequency and contact persons). The data from the TILKUT database are used in the stat.fi web service, the operational guidance and planning system STOJ, and the variable editor.
The operational guidance and planning system STOJ includes information on the names, publication times and contact persons of publications. The contact details of persons needed for data descriptions are retrieved from STOJ to the variable editor.
Starting from 2006, the data collection register contains data related to Statistics Finland's data collections. The system was originally built to serve metadata needs connected to direct data collections. The system was later extended to cover administrative data sets as well. In principle, the register should have all Statistics Finland's data collections described, but especially for administrative data sets, this objective has not been reached. The data are used in stat.fi’s services to data providers, in the register of enterprise respondents and in Statistics Finland's planning and monitoring process. The register contains estimates of the burden caused by an individual data collection.
The register of enterprise respondents is a register intended for managing data collections to which samples, response data and respondent data are stored. The register is used to control whether a response has been received from a data provider and a rough estimate is given of the response burden.
For personal data collections, data on samples have already been collected for some time, but there is no actual register of them.
Costs and Benefits
.Implementation strategy
The implementation strategy of the metadata system is step-wise.IT Architecture
Statistics Finland's common metadata system is being implemented step-by step according to the principles of service-based architecture.
Services meeting the needs of different user groups and client systems have a key role in the service-based architecture. The picture below shows the service interface to be built on top of the metadata warehouse, whose services produce the required data from the documents in the metadata warehouse, and also attend to storing of data to the warehouse.
![]()
The content of the metadata warehouse is maintained and it can be made available in client systems by ordering services through the service interface from the metadata warehouse. The service interface is implemented in line with the REST architecture (Representational State Transfer). The basic structures of the application are carried out according to the layer style. The business logic layer is formed of REST service interfaces, their processing logics and data transfer modules offered by the interfaces to client software. The function of data transfer modules is to offer data from the data warehouse to client software with an easy-to-use entity structure.
Metadata Management Tools
See Section 4.1.Standards and formats
Statistics Finland has developed a Common Structure of Statistical Information (CoSSI) based on xml. It is a modular data model for describing statistical tables, classifications, concepts, variables, general information on statistical documents, quality descriptions, etc. CoSSI was designed in accordance with international standards such as the Dublin Core and CALS. If needed, CoSSI can be expanded; new elements, e.g. for data descriptions have already been integrated into it. In its ITC strategy, Statistics Finland has provided guidelines for the use of the CoSSI model. The data models of the classifications and concepts in use have been developed in the 1990s, and the elements they contain are presently part of CoSSI.
The basic structure and content of statistical information is defined in the CoSSI data model. It describes the information structure of the statistical data to be produced. The way in which data are produced, that is, the production steering system, is not described in the CoSSI data model. The definition of the data and content required by the production steering system was left to the future development phase of the model.
The data model comprises a description of basic information of data sets for the production and editing of statistical data and distribution of statistical information. At the moment, the model's parts to be extended and checked due to changed content requirements are as follows:
- Quality description of statistics
- The classification information model
- Supplementing the metadata part (docmeta) concerning the data record with data required by archiving
- Methodological description of editing
- Attaching source system metadata as part of statistical metadata
- Metadata of questions and questionnaires.
Preliminary examinations indicate that the CoSSI data model offers an adequate basis for producing content description data of statistical information following the GSIM data model (Generic Statistical Information Model, version 0.4/ 5.2012). A preliminary outline has been made to the CoSSI model of the structure that would cover the needs of Eurostat's different quality reports.
CoSSI documentation on the web: http://www.stat.fi/org/tut/dthemes/drafts/cossi_en.html
Version control and revisions
.Outsourcing versus in-house development
The user interfaces and the applications for the databases have been mainly developed and built in-house.The applications developed at Statistics Finland can in principle be shared free of charge with other statistical organizations.Where necessary, details regarding test use and access to more precise descriptions etc. may be agreed upon separately.Sharing software components of tools
.Overview of roles and responsibilities
A high-level organisation structure map can be found on the Statistics Finland website: http://tilastokeskus.fi/org/tilastokeskus/organisaatio_en.html
In connection with Statistics Finland's organisational change, the Metadata Services unit was transferred on 1 January 2013 from the Information Technology Department to the new Standards and Methods Development Department. The task of the Standards and Methods Development Department is to steer the statistics production process, to support statistical methodology in statistics production, to promote uniform application of metadata and classifications and quality work in statistics production and to intensify project work.
The main guidelines for the development of the metadata systems at Statistics Finland are coordinated and processed in cooperation with the Standards and methods and IT Departments. The statistical departments lay out the main demands and needs for metadatabases and their use in various phases of the statistical processes.
The Metadata Services unit maintains classification standards, concepts and the archiving metadata system. The statistical departments maintain their own (statistical) metadata in the centralised metadata systems according to the instructions made by the Metadata Services unit. The Metadata Services unit also trains and consults statistical departments in metadata issues and is in charge of controlling quality in the metadata systems.
The CoSSI model steering group is in charge of managing and developing the model according to user needs in a manner that will not expose its main structure to risk.
Metadata management team
.Training and knowledge management
Training and knowledge management of metadata experts:
New experts working at the Metadata Services unit have been trained mainly by mentoring and guidance provided by senior experts. ESTP courses on metadata training have been provided.
Training and knowledge management of metadata of statistical departments:
Statistics Finland provides structured basic cources (for new recruits) and advanced courses dealing with statistical production. These cources contain also basic information about the present metadata systems, mainly classifications, concepts and archive data management.
The Metadata Services unit provides the personnel with informative briefings whenever there are major modifications made to the metadata system either in content or in application development. More systematicly organized training is needed when new tools are brought into use. Half-day metadata seminars are organised yearly to present topical metadata issues.
As an example, in connection with the implementation project of the variable editor, an extensive training programme was carried out during which a one-day training was planned and organised for statistical experts on making of data descriptions and use of common tools for it. During the project, training days were held around twice a month. In the future, new modes of training, especially self-training by following LYNC-recordings and special cliniques where you can work with your own material, are to be considered.
In practice, much of the training today for users of the metadata system is still done side by side. This leads to good results but is, in fact, ineffective as each client is trained individually and also resources used here would be needed in the development of metadata systems.
Partnerships and cooperation
Statistics Finland cooperates with organisations and participates in working groups that define standard classifications or standards on both international and national levels. Metadata experts attend regularly Eurostat’s Metadata Working Group and Classification Group meetings as well as METIS meetings. Statistics Finland has also representatives in the PC-Axis Reference group and Eurostat’s Quality Working Group. Spatial metadata experts contibute in INSPIRE metadata work and attend national working groups in the implementation of the INSPIRE metadata process.Other issues
.Lessons learned
A metadata system complying with the uniform architecture is not just a technological renewal, but its implementation will require change in work procedures, responsibilities and organisation of tasks. The change in work procedures above all means timely recording of metadata in connection with data planning and production – a move from irregular retrospective description to regular and up-to-date description. The change is a challenge to the systems and applications, because work procedures change only if the technology allows it. An optimal procedure can be realised only if the users feel the applications are easy to use and serve their work. Commitment by the Management and their support to the work is crucial for the statistical units to be able to provide the contribution needed to the development work and for ensuring that the work will be sustained. The centralised metadata system should support the harmonisation of the production of statistics to a sufficient degree, thus making it more effective, but it should also be flexible enough to a certain extent to serve what statistics specifically call for. Involving statistics in the planning is needed.
- Keine Stichwörter