4.1 IT architecture
The IMDB project was initiated in 1998. The ANSI X3.285 Metamodel for the Management of Shareable Data and U.S. Bureau of Census's Corporate Metadata Repository (CMR) model were chosen as the basis for the data model design. The ANSI X3.285 was later enveloped by the ISO 11179 standard. The IMDB development was partitioned into the following phases:
Phase 1 Consolidation of existing metadata stores into a central store;
Phase 2 Metainformation describing the statistical business processes; and
Phase 3 Metadata for data elements
4.2 Metadata management tools
Information discovery (available for internal use only) is through a wiki-based solution. Each Wiki page provides the context of the information and provides all the available links to the information. The wiki view provides a non-linear view to the information as the user can decide on the path to take. The wiki engine selected for use at Statistics Canada is MediaWiki 1.8 (which is the same wiki engine used by Wikipedia). The wiki pages are programmatically generated. The information from the IMDB Oracle Phase 2 and Phase 3 database is extracted using a VB .Net application. Specific wiki templates were developed for the IMDB and these are used to provide a consistent display presentation. Wiki tags and Wiki templates added to extracted IMDB information and this information is directly populated into the MySQL database of the MediaWiki engine. The IMDB wiki pages are refreshed daily.
4.3 Standards and formats
An initial investigation was done by a development team to determine if there were already existing software tools both internally and externally to support collection. Existing software tools were not discovered, therefore, the decision was made for in-house development.
Oracle 8i was selected as the database and IBM Visual Age for Java was selected as the development tool. This system referred to as MetaStat was in development and production from 1999-2002. The development was ceased in 2002 because the IBM Visual Age for Java product was discontinued by the vendor and a migration path to another product was not supplied by the vendor. The data content collected and managed by the MetaStat system consists of Statistical Activity, Survey, Instance, Frame, Universe, Instrument, Data Files, Survey Methodology and Documentation. Supporting content also collected by MetaStat includes Organization, Contact, Keyword and Theme. The MetaStat system is still in current use for collection of Phase 2 information. New development for MetaStat was frozen in 2002. MetaStat support currently consists of ensuring the Oracle database drivers for Java and the Java classes (currently tested to support Java 1.3) will continue to behave as expected as we migrate to newer versions of the Oracle database. The current production version is Oracle 10g. MetaStat is being retired and the functionality will be incorporated into architecture of the Phase 3 system.
The decision was made by the development team to move towards open source development tools in order to reduce the risk of vendor lock-in as was experienced during the development of the MetaStat system. The development of the Phase 3 also provided an opportunity to enhance the data model to provide multilingual data support.
The system developed for Phase 3 is referred to as MetaWeb. It is a Java JSP and Servlet based solution. The data content collected by the MetaWeb system consists of Object Class, Property, Data Element Concept and Data Element. Conceptual Domain and Value Domain information is collected and populated into the IMDB database via a Microsoft Excel IMDB Extraction/Loader and an Oracle PL/SQL IMDB Loader. The decision to used Excel as a collection tool for the Conceptual Domain and Value Domain information was based on the functionality present in Excel for data manipulation (such as sorting), facilitation of presentation of complex multi-level information by using individual worksheets, familiarity of use of Excel in the organization and the ease of sharing of the Excel data with other applications (such as Datawarehousing) with in the organization.
Initial preparations for migration of the Phase 2 content into the Phase 3 architecture has started by creation of a bridge between the two systems which consist of a creation of Phase 3 system identifiers mapped to the Phase 2 system identifiers. Additional collection interfaces on the development schedule for MetaWeb include: Question, Question Response Choices, Question Block and a Value Meaning manager to support the Survey planning and design phase of the statistical cycle.
4.4 Version control and revisions
The open source version control tool Concurrent Versions System (CVS) and Windows Clients for CVS is used for managing for all software source code. Separate environments are set up for development, acceptance testing and production. When software is promoted from the test acceptance testing to the production environment, the software suite is tagged in CVS with a production release number.
4.5 Outsourcing versus in-house development
Most development of the IMDB has been done in-house by systems developers from Statistics Canada's Systems Development Division - a centralized service responsible for developing the Agency's applications. During periods of shortages of systems developers, contractors were hired but worked on-site with our systems developers.
4.6 Sharing software components of tools
The current IMDB system is ten years old and is currently being upgraded. Statistics Canada is willing to share its documentation on the IMDB model but there is limited scope to share any IMDB tools at this time.
4.7 Additional materials
---- Daniel W. Gillman, 1999: Corporate Metadata Repository (CMR) Model; U.S Bureau of Labor Statistics.