- Erstellt von Benutzer-51b47, zuletzt geändert von Essi Kaukonen am 13 Apr, 2015
| Contact person* | Saija Ylönen |
|---|---|
| Job title | Head of Development/ Metadata services |
saija.ylonen@stat.fi | |
| Telephone | + 358 9 1734 2641 |
Metadata strategy
Statistics Finland has not laid down a specific metadata strategy, but a policy definition on the development of a centralised statistical metadata system is included in the agency’s ICT strategy. Statistics Finland intends to develop and implement an xml-based common statistical metadata system in order to rationalise and support the harmonisation of statistical business processes. The system will be based on Statistics Finland’s CoSSI metadata model. The ICT strategy and its future goals lay emphasis on creating common and integrated metadata systems application tools. In accordance with the main principle, metadata will be created and maintained in a metadata system and made available and transformable to whatever use it is needed for in business processes from data collection to data dissemination. In accordance with the ICT strategy, the services of the metadata system are defined, planned and implemented in connection with individual renewals of the statistical system, paying attention to extendibility in the planned solutions so that they can be offered after the renewal to other systems and users at Statistics Finland as well. The metadata systems are developed so that they can be made available in addition to so-called general user interfaces (such as the variable editor) direct to statistical applications. This eases the exploitation of the content of the metadata warehouse in the statistics production process. The objectives of Statistics Finland's operational strategy are such as the usability and retrievability of data, reliability of statistics and standardisation of processes. A common metadata system supports these objectives by providing a uniform way of describing statistical resources, such as classifications, data sets and variables.Current situation
At the moment, we are undergoing a transitional phase where we maintain metadata in relational databases and at the same time, develop new systems based on xml technology. The current metadata systems and the ongoing projects are introduced in Section 2.2.Metadata Classification
This section is partially outdated, will be updated later in 2015.
1) Statistical metadata.
Statistical metadata consist of :
- descriptions and definitions of statistical data and variables
- classifications
- variable formulas and unit of measurement
2) Statistical data quality.
Statistical data quality reports consist of :
- statistical method descriptions
- relevance of data
- validity, reliability and accuracy of data
The former elements of the report are evaluated by quality indicators which are based on international recommendations.
3) Metadata of statistical documents or products.
Document and product metadata consist of information about:
- producers
- publication information
- identification knowledge of the publications or products
- field or subject area glossary
- keywords
4) Process metadata. Process metadata are divided into technical and conceptual metadata:
a) technical metadata
- technical metadata guide the process of data production: data collection, data management and data dissemination. For instance, it makes it possible to follow data production phase by phase. It also documents the process.
b) conceptual process metadata
- conceptual process metadata consist of the technical information of data and variables which are used in producing data. For example, they can be minimum or maximum values, various calculation rules or use of certain classification values.
Metadata system(s)
This section is partially outdated, will be updated later in 2015.
![]()
Statistics Finland's present metadata system comprises of an older part, which consists of Microsoft SQL Server databases and their PowerBuilder interfaces and of a new part currently under construction, where the eXist-XML database acts as the metadata warehouse. The new maintenance tools of metadata contents are being built. Of them, the variable editor indended for maintaining data and variable descriptions has been completed.
In practice, we thus currently operate in two different environments. Ensuring the interoperationality between them is a challenge to the development projects of the metadata system.
The currently used metadata system built in the 1990s is composed of the following parts:
- The classification database and its user interface
- The concepts database and its user interface
- The archiving database system and its user interface
The databases were originally Sybase databases that were transferred in the conversion of 2011 to the SQL Server environment.
The content of the classification database has long been utilised as SAS formats and to an extent in the statistics production processes. In the variable editor and the archiving database system, variable-specific classifications can be retrieved from the classification database and they can be added to the data description. The classifications published on the classifications pages of the stat.fi service are also produced from the classification database.
The concepts and definitions published on the stat.fi website in the concepts service are produced from the concepts database. In the variable editor and the archiving database system, concepts can be retrieved from the concepts database and they can be combined to variable descriptions.
The new metadata system elements already in use are the eXist-XML database acting as the metadata warehouse, the variable editor and the Arbortext text editor. The classifications and concepts are automatically copied from SQL Server databases to the new metadata warehouse.
The document metadata are maintained in eXist with the Arbortext text editor. Arbortext reads trilingual variable data into the tables inside the publications.
The data and variable descriptions stored in the metadata warehouse are drawn up with the variable editor. As far as possible, data maintained in other systems (classification and concepts database, the operational guidance and planning system STOJ) are used in the descriptions.
The data and variable descriptions in the metadata warehouse are utilised in trilingual tabulation in SAS and PX-Edit.
Only part of all the metadata generated at Statistics Finland are updated at the moment in the common metadata warehouses. Plenty of metadata is stored in the data systems of specific sets of statistics, in SAS and Word and Excel files, which makes them available only to the statistics concerned or even only to a certain expert. Deficient and non-uniform descriptions of metadata restrict their retrievability and usability.
Ongoing development projects
The new metadata system aims to enhance the connections of the metadata warehouse to the statistics production process by making the metadata maintenance tools easy to use, by improving the connections of the statistical information systems to the centralised metadata warehouse, and by increasing services related to the use of metadata contents.
Service interface of classifications
A whole connected to the use of classifications is initially implemented from the service-based metadata architecture (see Section 4.1). The project on the service interface of classifications defines and implements the services connected to the maintenance and use of classifications and classification conversion keys in the course of 2013.
Implementation of the variable editor
In the implementation project of the variable editor, which was concluded at the end of 2012, Statistics Finland's statistical units described data sets and variables contained in them to the metadata warehouse (eXist database) with the variable editor. The Metadata Services unit trained and supported the producers of descriptions and prepared instructions together with the Information Technology and Information Services Departments. A total of 189 persons were trained during the project. In all, 123 of the sets of statistics, or slightly over 60 per cent, prepared descriptions to the metadata warehouse. At the end of the year, the metadata warehouse contained in all over 700 data descriptions. The data descriptions mostly related to the data acquisition and dissemination phase of the statistical process. The quality control of the content of data descriptions will be developed further during 2013.
After the project ended, the implementation will continue in other projects focusing on the development of the statistics production process, such as developing the reception system of administrative data.
The variable editor developer group works in connection with the projects. It deals with requests for development received from the users and decides whether they will be put into practice in the editor.
Renewal of archiving
At the moment, statistical data are archived through several applications and user interfaces, which makes it difficult to manage the archiving process. The different services do not communicate with each other and no monitoring or reporting is built in them. Statistical units consider archiving a separate work phase from the production process, and therefore it is often overlooked.
The aim of renewing archiving is to define and describe the new archiving process and the data needed by it and implement the technical tools. The aim is to clarify and automate archiving of statistical data sets by combining them as an integral and non-delayed part into the statistics production process by utilising the data sets already described in the metadata warehouse. The project was started in spring 2013.
Quality reporting
The project examines the relationship of the present quality reporting to the coming requirements, reviews the connection of Statistics Finland's metadata warehouse and metadata model to Eurostat's extended Metadata Standard (SIMS) and makes a plan for introducing the new quality reporting model. The aim is to perform quality reporting so that quality reports are no longer made separately for the EU, other international organisations and domestic users, but one quality report is used as far as possible in reporting. The project will start in May 2013.
Other data systems related to metadata
TILKUT is a description database of statistics that contains basic data on statistics (name, description, topics, keywords, publication frequency and contact persons). The data from the TILKUT database are used in the stat.fi web service, the operational guidance and planning system STOJ, and the variable editor.
The operational guidance and planning system STOJ includes information on the names, publication times and contact persons of publications. The contact details of persons needed for data descriptions are retrieved from STOJ to the variable editor.
Starting from 2006, the data collection register contains data related to Statistics Finland's data collections. The system was originally built to serve metadata needs connected to direct data collections. The system was later extended to cover administrative data sets as well. In principle, the register should have all Statistics Finland's data collections described, but especially for administrative data sets, this objective has not been reached. The data are used in stat.fi’s services to data providers, in the register of enterprise respondents and in Statistics Finland's planning and monitoring process. The register contains estimates of the burden caused by an individual data collection.
The register of enterprise respondents is a register intended for managing data collections to which samples, response data and respondent data are stored. The register is used to control whether a response has been received from a data provider and a rough estimate is given of the response burden.
For personal data collections, data on samples have already been collected for some time, but there is no actual register of them.
Costs and Benefits
Costs To estimate the need for human resources, the working hours spent on the variable editor project have been referred to. At the first stage, a variable editor was designed and built for describing data and variables. A total of 381 working days were spent on the project, programming work accounting for 120 days. This was the first XML database application the main programmer had worked with, so part of the time was spent on becoming familiar with it. The project attained its goal behind schedule, and the number of working days exceeded by far the number planned. The biggest single reason for exceeding the number of allocated working days was that at the beginning of the project, it had not yet been decided which elements of the extensive metadata model were intended to be shared and obligatory for each set of statistics, nor had shared process definitions (stages of work) and terminology for the user interface yet been determined. After the variable editor was completed, the project on its implementation was started for the years 2011 to 2012. According to the working time recording system, the realised number of staff-days in the project was 340, of which the project group used 253 staff-days (121 days for the project manager), the steering group 28 days and statistical experts 59 days (training and description preparation). The workload of statistical experts is possibly larger than the figure indicated here, because they have probably allocated some of their working time to codes of their statistical unit in addition to the project code. The implementation of the metadata system requires use of the statistical expert resource for preparing descriptions. The amount of work is dependent on the quality of the existing descriptions. Plenty of time should be reserved for description work in connection with projects, as descriptions made according to a uniform data model will make work easier in many ways in future. Benefits As far as statistics are concerned, metadata can be maintained in one system and ready for use for any set of statistics. Metadata maintained by other statistics can be applied, so overlapping work will be avoided. For example, this will make the harmonisation of concepts easier, as the definitions of any statistics can be consulted and compared in one place. The metadata have been described systematically and appropriately. They are easy to find and made use of both for client assignments and within the organisation, e.g. for training new personnel. From the perspective of information technology work: Centralised metadata systems serving a number of statistics production processes are likely to reduce the amount of resources needed for programming in new system projects. By utilising shared metadata systems in different statistical systems, more time can be devoted to design work, as each statistics system does not require a metadata system of its own. Shared systems ensure the availability of extra hands if needed: thanks to fewer systems, application designers will be able to devote more time, e.g. for working by two. In order to gain optimal benefit from centralised systems, statistics specialists and programmers should be provided with information about their scope of application on a regular basis. Personnel must be kept informed about the possibilities of their application. Development of the metadata system calls for versatile co-operation between statistics, IT and information experts. New contacts are formed in projects and much competence is shared between different expert groups. This has a fruitful effect on the activity of the organisation and on competence development. In terms of statistics, metadata will be able to be maintained in one system and ready for use for any statistics. Metadata maintained by other statistics can be applied, so overlapping work will be avoided. For example, this will make the harmonisation of concepts easier as the definitions of any statistics can be consulted and compared in one place. The metadata have been described systematically and appropriately. They are easy to find and made use of both for client commissions and within the organisation, e.g. for training new personnel. In terms of information technology work, centralised metadata systems serving a number of statistics production processes are likely to reduce the amount of resources needed for programming in new system projects. The use of shared metadata systems in different statistical systems contributes to time saving in design work as each statistics system does not require a metadata system of its own. Shared systems ensure the availability of extra hands if needed: thanks to fewer systems, application designers will be able to devote more time e.g. for working by two. In order to gain optimal benefit from centralised systems, statistics specialists and programmers should be provided with information about their scope of application on a regular basis. Personnel must be kept informed about the possibilities of their application.Implementation strategy
The implementation strategy is step-wise. The purpose is that once the new metadata system is ready for implementation, shifting to its application will happen in parallel with the general modification projects of every statistical data system.IT Architecture
Statistics Finland's common metadata system is being implemented step-by step wise according to the principles of service-based architecture.
Services meeting the needs of different user groups and client systems have a key role in the service-based architecture. The picture below shows the service interface to be built on top of the metadata warehouse, whose services produce the required data from the documents in the metadata warehouse, and also attend to storing of data to the warehouse.
![]()
The content of the metadata warehouse is maintained and it can be made available in client systems by ordering services through the service interface from the metadata warehouse. The service interface is implemented in line with the REST architecture (Representational State Transfer). The basic structures of the application are carried out according to the layer style. The business logic layer is formed of REST service interfaces, their processing logics and data transfer modules offered by the interfaces to client software. The function of data transfer modules is to offer data from the data warehouse to client software with an easy-to-use entity structure.
Metadata Management Tools
See Section 4.1.Standards and formats
Statistics Finland has developed a Common Structure of Statistical Information (CoSSI) based on xml. It is a modular data model for describing statistical tables, classifications, concepts, variables, general information on statistical documents, quality descriptions, etc. CoSSI was designed in accordance with international standards such as the Dublin Core and CALS. If needed, CoSSI can be expanded; new elements, e.g. for data descriptions have already been integrated into it. In its ITC strategy, Statistics Finland has provided guidelines for the use of the CoSSI model. The data models of the classifications and concepts in use have been developed in the 1990s, and the elements they contain are presently part of CoSSI.
The basic structure and content of statistical information is defined in the CoSSI data model. It describes the information structure of the statistical data to be produced. The way in which data are produced, that is, the production steering system, is not described in the CoSSI data model. The definition of the data and content required by the production steering system was left to the future development phase of the model.
The data model comprises a description of basic information of data sets for the production and editing of statistical data and distribution of statistical information. At the moment, the model's parts to be extended and checked due to changed content requirements are as follows:
- Quality description of statistics
- The classification information model
- Supplementing the metadata part (docmeta) concerning the data record with data required by archiving
- Methodological description of editing
- Attaching source system metadata as part of statistical metadata
- Metadata of questions and questionnaires.
Preliminary examinations indicate that the CoSSI data model offers an adequate basis for producing content description data of statistical information following the GSIM data model (Generic Statistical Information Model, version 0.4/ 5.2012). A preliminary outline has been made to the CoSSI model of the structure that would cover the needs of Eurostat's different quality reports.
CoSSI documentation on the web: http://www.stat.fi/org/tut/dthemes/drafts/cossi_en.html
Version control and revisions
.Outsourcing versus in-house development
The user interfaces and the applications for the databases have been mainly developed and built in-house.The applications developed at Statistics Finland can in principle be shared free of charge with other statistical organizations.Where necessary, details regarding test use and access to more precise descriptions etc. may be agreed upon separately.Sharing software components of tools
.Overview of roles and responsibilities
A high-level organisation structure map on the Statistics Finland website: http://tilastokeskus.fi/org/tilastokeskus/organisaatio_en.html In connection with Statistics Finland's organisational change, the Metadata Services unit was transferred on 1 January 2013 from the Information Technology Department to the new Standards and Methods Development Department. The task of the Standards and Methods Development Department is to steer the statistics production process, to support statistical methodology in statistics production, to promote uniform application of metadata and classifications and quality work in statistics production and to intensify project work. The main guidelines for the development of the metadata systems at Statistics Finland are coordinated and processed in cooperation with the Standards and methods and IT Departments. The tasks of the Metadata Services unit were unchanged, except for the technical maintenance and development of the metadata system that stayed in the Information Technology Department in the organisational change. The Metadata Services unit still maintains classification standards, concepts and the archiving metadata system. The statistical departments maintain their own (statistical) metadata in the centralised metadata systems according to the instructions made by the Metadata Services unit. The unit also trains and consults statistical departments in metadata issues and is in charge of controlling quality in the metadata systems. The statistical departments lay out the main demands and needs for metadatabases and their use in various phases of the statistical processes. For several years, Statistics Finland has had an unofficial interest group working on metadata issues called the Metadata Coordination Group. Anyone interested in metadata issues can attend it. From 2009 onwards, an official group was nominated consisting of members who work with metadata issues and permanent members from all statistical departments. The group has a plan of action that is updated yearly. The group provides an important forum for presenting and discussing various developments related to metadata systems. The goal of the group is to widen knowledge of metadata and metadata systems and inform its members of the development achieved in metadata work both in-house, nationally and internationally. As stated above in Section 5.3, Statistics Finland applies the CoSSI information model for storing metadata. The CoSSI model steering group is in charge of managing and developing the model according to user needs in a manner that will not expose its main structure to risk.Metadata management team
.Training and knowledge management
Training and knowledge management of metadata experts:
New experts working at the Metadata Services have been trained mainly by mentoring and guidance provided by senior experts. ESTP courses on metadata training have been provided.
Training and knowledge management of metadata of statistical departments:
Statistics Finland provides structured basic and advanced courses in statistics and statistical processes to new recruits and statisticians that also contain basic knowledge of the present metadata systems, mainly classifications, concepts and archive data management. The importance of training the personnel becomes increasingly important as new common metadata systems and tools are produced.
The Metadata Services unit provides the personnel with informative briefings whenever there are major modifications to metadata systems either in content or in application development. Half-day metadata seminars are organised a couple of times a year to present topical metadata issues.
In connection with the implementation project of the variable editor, an extensive training programme was carried out during which a one-day training was planned and organised for statistical experts on making of data descriptions and use of common tools for it. During the project, training days were held around twice a month. A similar model will probably be applied to the introductions of other parts of the metadata system.
Much of the training for users of the metadata systems is done side by side, which tends to be the most efficient way.
Partnerships and cooperation
Statistics Finland cooperates with organisations and participates in working groups that define standard classifications or standards on both international and national levels. Metadata experts attend regularly Eurostat’s Metadata Working Group meetings as well as METIS meetings. Statistics Finland has also representatives in the PC-Axis Reference group and Eurostat’s Quality Working Group. Spatial metadata experts follow INSPIRE metadata work and attend national working groups in the implementation of the INSPIRE metadata process.Other issues
.Lessons learned
A metadata system complying with the uniform architecture is not just a technological renewal, but its implementation will require change in work procedures, responsibilities and organisation of tasks. The change in work procedures above all means timely recording of metadata in connection with data planning and production – a move from irregular retrospective description to regular and up-to-date description. The change is a challenge to the systems and applications, because work procedures change only if the technology allows it. An optimal procedure can be realised only if the users feel the applications are easy to use and serve their work. The basis for metadata work must be seen to reside in the contents, not in the technology. Commitment by the Management and their support to the work is crucial for the statistical units to be able to provide the contribution needed to the development work and for ensuring that the work will be sustained. The centralised metadata system should support the harmonisation of the production of statistics to a sufficient degree, thus making it more effective, but it should also be flexible enough to a certain extent to serve what statistics specifically call for. Involving statistics in the planning. Implementations originating from the statistics concerned.
- Keine Stichwörter