- Angelegt von Fiona Willis-Núñez, zuletzt geändert am 02 Dez, 2014
| Contact person* | Alice Born |
|---|---|
| Job title | Director, Standards Division |
alice.born@statcan.gc.ca | |
| Telephone | +1 613.951.8577 |
Metadata strategy
Statistics Canada is undergoing an agency-wide modernization initiative to promote organizational efficiency, increase robustness of systems and processes and shorten implementation time for new projects. One of the key principles guiding this review is: Create metadata at the beginning of every process and use them throughout the project life cycle. A working group with members representing all phases of the statistical business process has been developing a strategy for statistical metadata management and an action plan to support these principles. The goals being considered for the strategy relate to four themes: drive, make available, structure and manage. Actions required to implement the strategy are expected to cover:
- Establishing governance;
- Creating centres of expertise;
- Adopting standard structure and content specifications (including GSIM);
- Developing a metadata portal to identify authoritative sources of metadata and make them available to users, and;
- Identifying metadata gaps required to enable metadata-driven processes.
Current situation
The strategy for statistical metadata management and high-level actions are being reviewed by senior management. Implementation of activities identified in the action plan is expected to begin in spring 2013. This includes the adoption of GSIM as a recommended reference model for the Agency. Some of the anticipated challenges to implementing the strategy and GSIM include limited resources, on-going work, transitioning to a SOA and Government of Canada initiatives (i.e., Open Data project). Establishing governance, assigning a project manager to lead implementation and validating GSIM through a pilot project are key steps to realize a metadata-driven environment.Metadata Classification
GSIM is being adopted to specify, design, and implement components that will easily integrate into “plug’n’play” solution architectures and seamlessly link to standard exchange formats (e.g. DDI, SDMX). It is important to note that GSIM does not make assumptions about the standards or technologies used to implement the model, which leaves the Agency room to determine its own implementation strategy.
Statistics Canada is beginning to use GSIM’s Concepts and Structures Groups as the main classifiers of metadata. These groups contain the conceptual and structural metadata objects, respectively, that are used as inputs and outputs in a statistical business process. The Structures group defines the terms used in relation to data and their structure. The Concepts group defines the meaning of data, providing an understanding of what the data are measuring.
Work focuses on aligning the new GSIM-based classification with other internal metadata classification models currently in use. For instance, IBSP identifies the following types of metadata:
- Reference metadata: Describes statistical datasets and processes.
- Definitional metadata: Description of statistical data (with meaning to business user community) E.g., concepts, definitions, variables, classifications, value meanings and domains.
- Quality metadata: Quality evaluation of a dataset or individual records; helps users assess the fitness of associated data for their specific purposes. E.g., CV, rolling estimates, analysts comments about the quality of a set of records.
- Operational metadata: links between the concepts and the physical data.
- Processing specifications: Capture, edit and output specifications and processing flags.
- Processing results: Content, outcomes, outputs of processing.
- Paradata: Data from the collection operation or statistical analysis used to support decision making in the survey process or statistical analysis. These include system logs, history files and comments.[1]
- Systems metadata: Low-level information about files, servers and infrastructure that allows the physical IT environment to be updated without re-specification by the end user.
[1] For example: analyst comments about their analysis, output of statistical processes; respondent comments, interviewer comments or additional information about the respondent obtained during collection.
Metadata system(s)
(a) Integrated Metadata Base (IMDB)
The IMDB is based on the ISO/IEC 11179 Metadata Registries and the Corporate Metadata Repository model (CMR). The metadata layer extends across all phases of the statistical business process and can support disseminated data, analysis, archived datafiles, and the planning and design of surveys. Metadata in the IMDB is beginning to be linked to some data warehouses, which hold both micro- and aggregate data; and can be potentially used for data analysis including data benchmarking and data confrontation.
(b) Integrated Business Surveys Project (IBSP)
The business statistics program includes approximately 250 surveys and administrative-based programs. The IBSP was initiated in April 2010 to make use of shared and generic corporate services and systems for collecting, processing, disseminating and storing statistical information. Content for business surveys is to be harmonized wherever possible[1] and the approach to data analysis streamlined across programs.
(c) The Common Tools Project
The goal of the Common Tools Project is to implement a harmonized set of processes and tools to support social surveys. The project is divided into two primary environments, the Social Survey Metadata Environment (SSME) and the Social Survey Processing Environment (SSPE). SSME uses four tools to feed information to and from a metadata repository. This allows metadata to be documented once and reused throughout the process to improve quality and generate efficiencies.
Costs and Benefits
Detailed costs associated with implementing these projects are not yet available. Benefits from establishing the IMDB include more rigorous information and metadata management, harmonization and standardization of concepts, knowledge sharing and reuse of information assets.
The IBSP has incorporated good metadata management and is starting to integrate systems that are metadata-driven and that optimize the use of corporate services such as collection and methodology. Reuse of content modules, increased use of administrative data and the adoption of electronic questionnaires are expected to reduce micro-editing, standardize methods and processes and lead to operational efficiencies.
Creating a metadata environment (SSME) within the Common Tools project has facilitated the transfer of information between business processes through a suite of tools for survey documentation.
Overall, these projects are reducing system maintenance, development and training costs while improving sharing of standards and best practices.
Implementation strategy
(a) Integrated Metadata Base (IMDB)
The IMDB has been implemented using a "step-wise" approach with three development phases and future opportunities to re-use metadata and expand the IMDB metadata model to link to other information systems in the Agency.
Phase 1 produced a set of static web pages displaying data sources and methods for each statistical program and survey. These were accessible to external users through hyperlinks from data tables and publications on the Statistics Canada website. Internal users could browse the full inventory through the Agency’s Intranet site.
In 2000, Phase 2 began collecting reference metadata including survey methodology and data accuracy. Information was formatted and validated by subject matter areas before being loaded into the IMDB. Like the initial phase, Phase 2 information is available to external users through hyperlinks on the website and an internal version on the Intranet site. Updates are triggered by new data releases so that metadata accompanies every release.
The past 10 years have seen improved quality of Phase 2 content and a push to include more information in the IMDB. Phase 3 has been initiated to add definitions of concepts, variables and classifications for all subject matter areas. This work is expected to be completed by spring 2015.
(b) Integrated Business Surveys Project (IBSP)
The IBSP has produced a semantic model with a proposed IBSP metadata classification, along with an agreed-upon vocabulary of terms and definitions. Details of metadata by system components are being identified. Current plans are to develop one single portal to make accessible overarching metadata, the data it describes and the processes it controls. This tool will enhance reporting capabilities and built-in quality assurance.
(c) The Common Tools Project
Systems have been developed using the Rational Unified Process (RUP), an iterative approach. Using this method, several tools are in development concurrently. The priority is to deliver basic functionality as quickly as possible and to combine all tools into an integrated processing and metadata management environment. The initial phase of the project created a processing environment and associated standards including naming conventions and directory structures. Subsequent phases cover individual processing steps.
IT Architecture
.Metadata Management Tools
.Standards and formats
.Version control and revisions
.Outsourcing versus in-house development
.Sharing software components of tools
.Overview of roles and responsibilities
.Metadata management team
.Training and knowledge management
.Partnerships and cooperation
.Other issues
.Lessons learned
.
- Keine Stichwörter