3.1 Metadata Classification
One way of regarding the role that metadata can play is to identify their function in the different statistical processes and respective tasks:
Statistical metadata functions:
- Contextualising data and supporting their dissemination and re-use;
- Giving information on the quality of the data provided;
- Harmonising concepts, classifications and questions, promoting comparability of information;
- Documenting production processes.
Statistical metadata include a wide range of attributes. We can therefore consider another level of classification:
- Survey metadata - in this category we consider all metadata for characterising the survey and schedules required for the planning and dissemination of data (attributes of Chapter I of the methodological document - general characterisation of the survey - and the schedules for data collection and dissemination of results).
- Methodological metadata - a description of the methods supporting the processes associated to the survey (attributes of Chapter II of the methodological document - methodological characterisation of the survey).
- Definitional metadata - includes the concepts, classifications, definitions of variables and questionnaires used.
- Quality metadata - includes all the attributes in the quality reports and indicators defining the quality of a survey.
System metadata - information required by operating systems and programs to function properly. It is destined to supply the information on the physical representation of data and other technological aspects and to support exchanges of information between systems.
3.2 Metadata used/created at each phase
In the SP's metadata system, the most active phases in the insertion of metadata into the system are design and dissemination. In the operation phase, the data collection process is the one involving the most collection and use of metadata. This idea is based on an analysis of the electronic collection system (WebInq) and projects for the "universes and samples", "surveys process management", "statistical burden indicators" and "household survey questionnaire systems".
WebInq is an online service available on the Official Statistics website for electronic data collection. It allows respondents to answer SP surveys in different ways:
- Filling in an electronic form online;
- Filling in XLS (Excel) files and sending them by email;
- Uploading XML files.
For each survey whose data can be collected in this system, we have described some of the characteristics included in its methodological document and show an image of the questionnaire in the data collection instruments system.
The information from the methodological document visible on WebInq comprises: description, objectives, legal framework, type of survey, geographical scope, date reference period, data collection period, concepts and classifications used. The surveys are identified in the system by the survey code used in the metadata system.
Universes and Samples management system
This system is in its initial implementation phase and its purpose is the integrated management of an annual universe frame to support all the surveys based on the "enterprise" statistical unit. Two other sub-universe frames are created on the basis of this universe, one to support short term surveys and the other to support structural surveys. The sample frame and the samples are selected from these sub-universes. The entities making up this system are: universe frame, sub-universe frame, sample frame, sample and stratum. The attributes of the survey entity that are not featured in the methodological document, but are required by the entity, will also be defined in the system.
For a survey to be processed in this system, its methodological document must be registered in the metadata system.
In order to characterise the samples in a survey, certain attributes must be defined and identified: the universe with which it is associated, the names of the stratum variables, the names of the changeable variables, frequency, possibility of replacements and the replacement method and associated questionnaires, among others.
In this system, surveys are identified by the code used in the metadata system.
Collection process management system
This system is in the development phase and will provide transversal support to surveys, to the different components of the data collection processes for self-completion surveys:
- Control of the data collection operation;
- Despatch of survey;
- Receipt of responses;
- Preparation of auxiliary charts for controlling responses.
This system interacts with:
- "Universes and samples management system" - importing samples for the despatch of the survey and updating samples on the basis of information on replacements;
- Planning and control system - importing established schedules;
- Metadata system - its gets information on the characteristics and code of survey.
Statistical burden indicators
This system is in the planning stages and, when implemented, will be a tool for analysing statistical burden and the enterprise response rate. From the metadata system, we expect the use of the survey code and name, registration number and questionnaire name, association of questionnaires with surveys, frequency and variables observed in the questionnaires.
This database of indicators supports the dissemination of statistics. The data disseminated are accompanied by their metadata. The statistical metadata come from the metadata system and no indicator can be provided without being recorded in the variables subsystem of the metadata system.
The variables defining an indicator (variable measure and its dimensions) are recorded in the variables subsystem along with a cross-reference between the variables that defines the indicator.
There are rules on naming variables and indicators. The value domains of the dimensions are also recorded in the classification subsystem and the concepts measured by the variables in the concept subsystem. Each indicator is given a code which links the two systems.
The metadata attributes provided for each indicator are its name, frequency, source, unit of measure, associated concepts, definition, formula and other contextual information.
The table below shows which metadata entities are inserted (I)/ updated (U)/ consulted (C) in the different phases, processes or documents produced in the life cycle of statistical operations.
Fig. 13. Metadata entities and the life cycle of statistical operations
Please note that the unimplemented systems and documents in this table are shadowed to distinguish them from the ones that have been implemented.
3.3 Metadata relevant to other business processes
There are close links between the activity planning, human resources planning and budget subsystems. In the second half of each year, departments prepare the activity plan for the following year. This plan lays out all the surveys to be undertaken in that year in national statistical production. When the unit heads draw up the plan, they define the schedules of surveys and allocate human resources on a person-hour basis. Personnel costs are calculated automatically on the basis of this allocation. The activity plan includes some characteristics of the surveys into the metadata system, such as their code, name, frequency, the responsible entity, type of survey, observation unit and sample size. Activities are given a code for analytical accounting. This is the code used to draw up the budget. The plan also refers to this code and the methodological document saves it so that the three systems can be linked.