- Angelegt von Fiona Willis-Núñez, zuletzt geändert am 02 Dez, 2014
| Contact person* | Alice Born |
|---|---|
| Job title | Director, Standards Division |
alice.born@statcan.gc.ca | |
| Telephone | +1 613.951.8577 |
Metadata strategy
Statistics Canada is undergoing an agency-wide modernization initiative to promote organizational efficiency, increase robustness of systems and processes and shorten implementation time for new projects. One of the key principles guiding this review is: Create metadata at the beginning of every process and use them throughout the project life cycle. A working group with members representing all phases of the statistical business process has been developing a strategy for statistical metadata management and an action plan to support these principles. The goals being considered for the strategy relate to four themes: drive, make available, structure and manage. Actions required to implement the strategy are expected to cover:
- Establishing governance;
- Creating centres of expertise;
- Adopting standard structure and content specifications (including GSIM);
- Developing a metadata portal to identify authoritative sources of metadata and make them available to users, and;
- Identifying metadata gaps required to enable metadata-driven processes.
Current situation
The strategy for statistical metadata management and high-level actions are being reviewed by senior management. Implementation of activities identified in the action plan is expected to begin in spring 2013. This includes the adoption of GSIM as a recommended reference model for the Agency. Some of the anticipated challenges to implementing the strategy and GSIM include limited resources, on-going work, transitioning to a SOA and Government of Canada initiatives (i.e., Open Data project). Establishing governance, assigning a project manager to lead implementation and validating GSIM through a pilot project are key steps to realize a metadata-driven environment.Metadata Classification
GSIM is being adopted to specify, design, and implement components that will easily integrate into “plug’n’play” solution architectures and seamlessly link to standard exchange formats (e.g. DDI, SDMX). It is important to note that GSIM does not make assumptions about the standards or technologies used to implement the model, which leaves the Agency room to determine its own implementation strategy.
Statistics Canada is beginning to use GSIM’s Concepts and Structures Groups as the main classifiers of metadata. These groups contain the conceptual and structural metadata objects, respectively, that are used as inputs and outputs in a statistical business process. The Structures group defines the terms used in relation to data and their structure. The Concepts group defines the meaning of data, providing an understanding of what the data are measuring.
Work focuses on aligning the new GSIM-based classification with other internal metadata classification models currently in use. For instance, IBSP identifies the following types of metadata:
- Reference metadata: Describes statistical datasets and processes.
- Definitional metadata: Description of statistical data (with meaning to business user community) E.g., concepts, definitions, variables, classifications, value meanings and domains.
- Quality metadata: Quality evaluation of a dataset or individual records; helps users assess the fitness of associated data for their specific purposes. E.g., CV, rolling estimates, analysts comments about the quality of a set of records.
- Operational metadata: links between the concepts and the physical data.
- Processing specifications: Capture, edit and output specifications and processing flags.
- Processing results: Content, outcomes, outputs of processing.
- Paradata: Data from the collection operation or statistical analysis used to support decision making in the survey process or statistical analysis. These include system logs, history files and comments.[1]
- Systems metadata: Low-level information about files, servers and infrastructure that allows the physical IT environment to be updated without re-specification by the end user.
[1] For example: analyst comments about their analysis, output of statistical processes; respondent comments, interviewer comments or additional information about the respondent obtained during collection.
Metadata system(s)
.Costs and Benefits
.Implementation strategy
.IT Architecture
Statistics Canada is moving towards a SOA. A key enabler of SOA is the Enterprise Application Integration Platform (EAIP) that allows the delivery of solutions based on meta-data driven, reusable software components and standards. Most business segments will benefit from the common core business services, standard integration platform, workflow and process orchestration enabled by the EAIP. The platform also simplifies international sharing and co-development of applications and components.
Web services currently in use and under development by EAS are associated to information objects representing core business entities (e.g., questionnaires, classifications, tax data, business registry) that are classified into GSIM’s Concepts and Structures groups. This fits nicely with GSBPM as well: services provide the inputs and outputs to GSBPM statistical processes. They satisfy a basic set of SOA principles, i.e., they are loosely coupled (consumer and service are insulated from each other), interoperable (consumers and services function across Java, .NET and SAS), and reusable (they are used in multiple higher-level orchestrations and compositions). Work continues to establish a complete framework, including discoverability (via a service registry and inventory) and governance.
At this point, Statistics Canada has a combination of services and silo-based/point-to-point integration that can be described as a combination of maturity levels 3 and 4 in terms of the Open Group Service Integration Maturity Model (OSIMM) maturity matrix (see Figure 1). During the transition years to a corporate-wide SOA, incremental changes are being made by applying SOA adoption and governance by segment in which cross-silo services and consumers coexist with point-to-point integration of systems and data. Early adopters of SOA services include IBSP, SSPE and SNA.
Developing Data Service Centres (DSC) is a key initiative that fits into Statistics Canada’s emerging SOA. The objective of the DSC is to manage statistical information as an asset – to maximize its value by improving accessibility, utility, accuracy, security and transparency through the use of a centralized inventory of statistical data holdings, associated metadata and documentation. Key statistical files and associated standard metadata (i.e., file name, type, description, creators, owners, etc) will be registered and integrated into statistical processes via SOA. This integration will rely on a data access layer with common interfaces to access statistical files without the user needing to know their location, format and/or technology.
Metadata Management Tools
IMDB metadata discovery is performed via a Wiki-based solution and MetaWeb. Each Wiki page provides the context of the information and all available links. These pages are programmatically generated based on templates developed for the IMDB. MetaWeb is a JSP and Servlets-based application. Data are collected and populated into the IMDB via a Microsoft Excel IMDB Extraction/Loader, an Oracle PL/SQL IMDB Loader and MetaWeb.
The starting point for the Common Tools project (See Section VII - Figure 2) is the Questionnaire Development Tool (QDT) used to enter specifications for social survey data collection instruments. All question metadata is entered in the QDT, including questions and answer category text, interviewer instructions and conditions controlling flows. The Processing and Specifications Tool (PST) then loads variable metadata such as variable name, length and type. These are linked to question metadata already entered via QDT so no re-entering of question or answer category text is required. Finally, the Social Survey Processing Environment (SSPE) utilities use collection layouts or schema to generate variable metadata to be loaded to the metadata repository. Two projected tools will complete the picture: the Data Dictionary Tool (DDT), which will provide an interface to the metadata repository for updating descriptive variable metadata, and the Derived Variable Tool (DVT), which will allow entry of specifications for derived variables and will be used to produce detailed documentation for data users. Within Statistics Canada’s SOA, the SSPE metadata repository will export metadata in a canonical model to IMDB via an EAIP service under development[1] .
Solutions and tools are needed to support other types of metadata, specifically in the GSIM Structures and Production groups.
Standards and formats
The following is a list of standards and formats and where they are being used:
- BPMN – EAIP orchestrations;
- ISO/IEC 11179 Metadata registries – IMDB;
- CMR – IMDB;
- DDI 2.1 – DLI and Research Data Centres;
- DDI 3.0 – IMDB tool (automate metadata “wrap” for microdata files/PUMFs) and web services (extract metadata from IMDB for DLI and Research Data Centres). See Section IV-E for more details on this project;
- ISO/TS 17369 SDMX ML – Formatted data from dissemination;
- Neuchatel Terminology Model Part 1;
- Classification database object types V2.1 – Standards Division;
- ISO 3166-1:2006 Part 1: Country – Standards Division;
- ISO 19115 Geographic Information – Geography Division;
- ISO 15489-1 Part 1: General – Information management.
Version control and revisions
.Outsourcing versus in-house development
.Sharing software components of tools
.Overview of roles and responsibilities
.Metadata management team
.Training and knowledge management
.Partnerships and cooperation
.Other issues
.Lessons learned
.
- Keine Stichwörter