Login required to access the wiki. Please register to create your login credentials We apologize for any inconvenience this may cause, but please note that this step is necessary to protect your privacy and ensure a safer browsing experience. Thank you for your cooperation. Documents available for download: GAMSO , GSBPM , GSIM |
Contact person* | |
---|---|
Job title | Specialist of Databases and Metadata Sector |
Telephone |
Metadata strategy
.
One of the main priorities in five years strategic plan named Official Statistical Program (PSZ 2011-2016) was creating the data warehouse (DW) for the integration of databases INSTAT including metadata. In order to integrate administrative data and survey data, it is planned to build a data warehouse with integrated metadata.
From the perspective of information technology, statistical information systems have always posed challenges regarding metadata repositories and data warehouse to data mining. INSTAT takes a large amount of administrative data from other institutions. Lack of the electronic exchange of administrative data affects the process of production of official statistics, based on multi-year program of official statistics. Implementation of a system for the automation of data collection and processing of administrative resources will increase the quality and reduce the statistical distribution of the processing time. Within the context of a statistical office, in general, a statistical data warehouse can be defined as a single stock, full and part-s metadata which are acquired from different sources, collected and combined to form a structure; documented in a standard format and stored in a facility that allows users to view, execute the query, combine unloading data for analysis at different levels. To achieve the goals mentioned above, data warehouse as a storage place should be established, but also to document the overall process of data storage, in which the institution collects, transforms and loads data in different physical systems, optimized for decision making. The solution should be oriented metadata. Design, development and implementation of the data warehouse (Data Warehouse DW) INSTAT consists of:
• Defining the requirements DW;
• Identify the most appropriate software for the management of DW;
• Development of DW system;
• Completion of the DW with statistical information and the definition of standard procedures to maintain the DW;
• Making data.
Standardization of data transfer from government organizations - providers of administrative data for statistical office, will have a positive impact on overall system statistics. Automation of data collection and processing of administrative resources consist of:
• Identify all data sources required by INSTAT, according to the Program;
• Defining data exchange protocol;
• Implement a system for the exchange of data.
One of our main objectives is to have a full operational and integrated Metadata system in the INSTAT database system by 2018 and this system shall be fully compatible with ESMS. To achieve our goal we have implemented MetaPlus System and we have created a documentation team which includes everyone responsible for documenting in MetaPlus. Our future work will include:
• Conduct documentation workshops.
• Continue documenting the content of the statistical activities in Metaplus.
• Monitor the quality of the existing Quality declarations and SCBDOK documentation.
• Carry out SIMS training.
• Set up a simple structure in accordance with SIMS that can provide ESMS and ESQRS metadata in line with the National reference metadata handler (NRME). This solution will be based on mapping of existing Quality declarations and SCBDOK documentation to SIMS.
• Continue to document classifications in Metaplus.
• Define statistical concepts and classifications used within SIMS, based on GSIM concepts structure.
INSTAT will establish a standard for Quality reporting that is following the Eurostat SIMS (Single Integrated Metadata Structure, which ties together ESMS and ESQRS) standard. This allows future reporting to Eurostat. Subject matter, methodology and IT will carry out workshops together, led by the metadata team, to make the documentations. The workshops will be focused on both Metaplus and SIMS.
The starting point for a metadata system is classification and concept (variable) harmonization. Along with the workshops on documentation, a list of statistical (variables) concepts used in SIMS will be created as a first step towards variable harmonization.
Current situation
Metadata is important for the users given that all the variables in a survey have to be explained to ensure transparency, credibility and meet established quality criteria. We are currently working on the implementation of a standardized system for documentation. The main purpose of using this system is to develop a sustainable statistical system in the country, facilitating decision-making based on relevant and reliable statistical information that meets domestic needs and complies with EU requirement. Currently we are using MetaPlus System for documenting statistical activities and administrative datasets. MetaPlus System has been implemented with the help of experts from Statistics Sweden and now we are working on documentation. According to the experts we should continue by documenting the administrative databases and the censuses (The Population and Household Census 2011, Census of non-Agriculture Economic Enterprises 2010, Census of Agriculture Holdings 2012). They contain a lot of concepts/variables that are also used in other statistics and the “original source” should ideally be documented first if it is possible. Priority order of documentation is as below:
• Administrative databases
• Censuses
• Registers (business register is done)
• Surveys
• National Accounts
• Secondary registers
Currently we are using MetaPlus to document final datasets (structural metadata). On a second phase we will document the production processes as well (referential metadata).
Metadata Classification
The main functions for statistical metadata are:
- Contextualizing data, supporting their dissemination and re-use;
- Giving information on the quality of the data provided;
- Harmonizing concepts, classifications and questions, promoting comparability of information;
- Documenting production processes.
Metadata can be divided in two main groups:
- Structural metadata: This type of metadata is used to define the data structures. Variable names, classifications, standard code lists, variable types, data set definitions are parts of the structural metadata.
- Reference metadata: The metadata that describes the content and quality of statistical data. The aim and scope of the study, data collection and processing methods, quality indicators are parts of the reference metadata
Statistical metadata include a wide range of attributes. We can therefore consider another level of classification:
- Survey metadata: In this category we consider all metadata for characterizing the survey and schedules required for the planning and dissemination of data.
- Methodological metadata: A description of the methods supporting the processes associated to the survey.
- Definitional metadata: Includes the concepts, classifications, definitions of variables and questionnaires used.
- Quality metadata: Includes all the attributes in the quality reports and indicators defining the quality of a survey.
Metadata system(s)
- SCBDoc (PszDoc)
In 2007, with the support of Statistics Sweden, INSTAT decided that all observation registers and production systems under its responsibility should be documented in the SCBDOK system (PszDoc). The purpose is to provide a detailed account of the process of creating a statistical register, from data collection to presentation. The SCBDOK documentation is created in free text format following a standard template containing the following seven chapters:
0. General information
1. Contents outline
2. Data collection
3. Final observation registers
4. Statistical processing and presentation
5. Data processing system
6. Log file
- MetaPlus
MetaPlus was implemented in October 2013. The Classification database is a subsystem of MetaPlus. Previously INSTAT used SCBDoc to document all statistical products and processes. Chapter 3 of the SCBDoc documentation is derived from MetaPlus. MetaPlus is developed for documenting final observation registers, describing the micro data. The model is general and can be used for all stages in the production process. It can be used to describe raw data, in the data collection phase and to describe aggregated data used in tables in the statistical computation phase.
- Quality declarations
Some statisticical products have quality declarations and reports. These reports are documented according ESS Handbook for Quality Reports and Eurostat expert recommendations. These reports are actually in word or .pdf format but can easily converted in ESQRS, ESMS or SIMS.
- ESMS
Euro SDMX Metadata Structure (ESMS) is Eurostat’s SDMX based standard for reference metadata. INSTAT will start working on documenting the statistics according to this standard. Existing documentation (such as the work done on SCBDOK and on ESMS) will be reused when possible. After monitoring the quality of the existing ESMS and SCBDOK documentation INSTAT will:
- Establish ESMS as a standard for documenting reference metadata;
- Build up competence in Single Integrated Metadata Structure, SIMS (which combines ESQRS and ESMS)
- Set up a simple template in accordance with SIMS that can provide ESMS and ESQRS metadata in line with the European Statistical System Metadata Handler (ESS MH).
Costs and Benefits
Implementation strategy
IT Architecture
From the Official Statistical Program 2016, INSTAT should develop the data warehouse ( DW). One of the key prerequisite is to build up a system for structural metadata. Referential metadata is really important in the statistical point of view; it describes the quality of the processes conducted to produce the statistical output. Actually we do not have a single integrated system for the structural and referential metadata but MetaPlus system offer the possibility to add additional information to the structural metadata. This additional information can be quality declarations, SIMS or ESMS (SDMX metadata system) templates, PSZDoc Reports etc.
The physical database model for the Central MetaData Repository is built on MetaPlus from Statistics Sweden. In October 2013 the MetaPlus model have been adopted to meet the needs for INSTAT. The system is implemented on Microsoft SQL Server 2008 R2. Microsoft VB.NET is used for the developing of the maintenance tool. Bilingual documentation (Albanian and English) is now possible.
Current situation in INSTAT
Metadata Management Tools
MetaPlus
INSTAT use MetaPlus as metadata management tool. MetaPlus is a metadata system for variables and their components objects and value domains linked to population. Version 1 of MetaPlus is a basic version of the system that focuses on the needs of those who document, i.e. primarily a system for documentation of final observation registers (micro data). Version 2 is adapted to the use of different languages.
In MetaPlus content is linked to a hierarchical register structure. In this context a register is defined as data used to produce statistics. This definition of the register is not intended to apply to other contexts than in MetaPlus.
Since the structure is hierarchical, properties are inherited downward in the structure of the previous level. A register can have several variants and a register variant in turn can have several register versions.
Nace Rev.2 Application
In the frame of implementing the new NACE Rev2, in our institution was build a consultative application. The aim of this application is to facilitate the process of codification for all the staff that works on nomenclature and codification of the activity. Through this application you are able to:
- Search on maximum three words that might identify the economic activity of the enterprise and you will be shown all the possible alternatives (description and the respective NACE Rev2 code) that contain in the activity the key words you are searching for. In this way it will be easier to codify a certain activity. The searching function excludes the words related to professions.
- Search on a Nace Rev 2 code (4 digits) and you will be shown the description of this code, section, division, group and class. Also it will be possible to show the corresponding code and description in Nace Rev 1.1
- Search on a Nace Rev 1.1 code (4 digits) and you will be shown the description of this code, section, division, group and class. Also it will be possible to show the corresponding code description in Nace Rev 2
This application is really helpful now in the transitional period from NACE 1.1 to NACE Rev.2. A new web application with the same functionalities is developed from IT staff.
Standards and formats
Version control and revisions
Outsourcing versus in-house development
Sharing software components of tools
Overview of roles and responsibilities
INSTAT have set up a management structure for metadata. It contains a technical part and a content part.
- The content group has a methodology scope on the structure of metadata concepts such as objects, variables (statistical concepts) and populations. The content group has a deep knowledge in the templates, systems and application structure, but not necessary so much subject matter knowledge.
- The development team consists of IT-personnel with required competence in application development and database management.
The first main task for the Metadata group is to make a detailed plan for activities.
Metadata management team
Each subject matter department has an examinatior who is a part of the department metadata team. The documentation should be approved by an examinator who monitors that the documentation follows the agreed structure and is of good quality. Roles in MetaPlus application adapted to the structure INSTAT.
Training and knowledge management
Subject matter, methodology and IT will carry out workshops together, led by the metadata team, to make the documentations. The workshops will be focused on both MetaPlus and SIMS. Trainings of trainers in MetaPlus have been conducted with staff from Statistics Sweden.
Different documentations workshops and seminars will be conducted in the future for all the staff of INSTAT.
Partnerships and cooperation
Other issues
Lessons learned
Links: |
---|
INSTAT websitehttp://www.instat.gov.al |
NACE applicationhttp://www.instat.gov.al/al/metodologji/aplikacione.aspx |