Login required to access the wiki. Please register to create your login credentials We apologize for any inconvenience this may cause, but please note that this step is necessary to protect your privacy and ensure a safer browsing experience. Thank you for your cooperation. Documents available for download: GAMSO , GSBPM , GSIM |
Contact person* | |
---|---|
Job title | Head of department |
Telephone | +385 1 48 06 201 |
Metadata strategy
There were three principal reasons to implement metadata in the Croatian Central Bureau for Statistics (CBS):
- to standardize definitions across all statistical activities
- to move the production of statistics closer to the subject-matter experts in order to speed up the statistical survey life cycle
- to present statistics on internet along with its context in order to make statistics understandable and available to users of all types, i.e. to extend the use of statistics beyond the usual statistical publications
The strategy document was prepared in CBS already in 2001 1. In 2002, a framework agreement was signed between the Division for Western Balkans at the Swedish International Development Cooperation Agency (Sida) and Statistics Sweden's International Consulting Office (ICO). In this framework the Swedish Statistical Office (SCB) provided support for the creation of the public macro database and a central metadata repository in CBS. In its final phases (2006 - 2007) the project was extended to support the development of the Integrated Statistical Information System (ISIS).
1. Zdenko Milonja: Information System Development Strategy, CBS, July 2001.
Current situation
The central metadata repository (CROMETA) is the essential part, the core of the Integrated Statistical Information system (ISIS) which is in the final stage of development. In other words, ISIS is developed upon CROMETA.
The original idea - to develop an automated statistical survey processing system on the client/server platform - resulted from the operational circumstances in CBS:
- the IT sector is strictly centralized, i.e. IT sector processes all statistical surveys upon the descriptions laid out by statisticians.
- the majority of statistical surveys are still processed on the mainframe
- the majority of surveys have similar processing stages (data entry, validation, correction, tabulation, dissemination); therefore the majority of corresponding data processing jobs have similar structure which could be incorporated in a generalized solution. Such a solution was developed in CBS for data processing on the mainframe in the 80s and is still in use.
The metadata repository must contain all the necessary information to be used as parameters for a general 'program' that produces specific operating procedures for particular surveys. Therefore it could be stated that centrally stored metadata could more or less automatically 'drive' the statistical production system. This is the basic purpose that initiated the metadata system development in the first place. Naturally, the idea was extended to cover all aspects of statistics, as laid out in the Reference ModelTM resulting from the MetaNet project within Eurostat (2000-2003).
The CROMETA model contains Reference ModelTM concepts extended and customized for CBS needs as well as specifics of a previous CBS metadata model and specifics needed to run PC-Axis as the main dissemination tool. Although the model is very complex and rather demanding to comprehend, it proved to be well conceived from the beginning, or rather from the moment we fixed the 'big picture'. Now the metadatabase is stable, with high tolerance for occasional changes that occur along with development of specific solutions for particular stages of the statistical life cycle.
The central metadata repository is presently rather empty since it is still in test phase, it contains data on just a few statistical surveys that were selected as pilots. We are well aware of the problems which may arise among statisticians with the obligations to enter or transfer all the 'knowledge' of all statistical activities.
Metadata Classification
Croatian Bureau of Statistics developed its own metadata model called CROMETA based on Reference ModelTM from the MetaNet project of Eurostat. It includes also some specific metadata from CBS and has nine groups or sections of metadata. Metadata objects in each section are closely related, but there are relationships between metadata from different sections.
- Organizational structure - all metadata as regards the statistical office, related organizations, persons working within the organizations and the responsibility of the latter, is kept within this metadata section.
Examples of metadata objects belonging to this section are organization, person and contact person, etc. - Variables and measurements - this section contains metadata about the variables collected within the frame of the statistical activity, as well as the methods and ways of measuring them. In this section the variables are described from a more general point, regardless of use and implementation in different studies/surveys.
Examples of metadata objects belonging to the section are global variable, object variable, value domain, measure unit, etc. - Studies and questionnaires - all metadata as regards studies, their versions, general methods for performing them and other, is kept within this metadata section. (Study is a term used for more broad definition of statistical activities, and includes surveys and other activities).
Examples of metadata objects belonging to this section are study, study version, questionnaire, question, interview method, population, coverage type, etc. - Classifications - this section contains metadata as concerns classifications used in statistical office.
Examples of metadata objects belonging to the section are classification family, classification, classification version, classification item, correspondence table, etc. - Publications - contains all metadata concerning publishing and dissemination of statistics.
Examples of metadata objects belonging to this section are publication, edition, publication series, etc. - Processing and validation rules - the Variables and measurements section describes the variables and the way of measuring them independent on implementation in a study/survey, and the Studies and questionnairessection handles all the metadata about studies and their methods. This metadata section could in some sense be described as the instance of variables and measurements put in the context of a study (or study version). E.g. the section contains all metadata as regards the methods for processing the studies, including validation, production process, registers/cubes/tables created etc.
Examples of metadata objects belonging to the section are context variable, data collection, statistical process, rule, matrix, register, cube, table, etc. - General - general characteristics of metadata objects (which are often metadata objects themselves) are kept within this metadata section. Furthermore there are some metadata objects that are used within all or at least several other metadata sections, and therefore cannot be placed directly in any of them. Also these metadata objects are kept within the General metadata section.
Examples of metadata objects belonging to the section are language, keyword, footnote, status, theme, statistical object type, etc. - Access and authorization rules - this metadata section stores metadata about who can do what with metadata and the data owned by CBS.
Examples of metadata objects belonging to the section are user group, privilege, access condition, access form, etc. - History and version handling - metadata has a certain life-cycle as determined by the methods defined by the statistical office. This means that all metadata objects may exists in an indefinite number of versions and that the history management is extremely important in order to keep consistency in metadata and data over time. The main part of the history and version handling is implemented through methods used by other metadata objects, but one example of a metadata object belonging to the section is update information, used for logging all changes made to a metadata object over time.
This classification of metadata could be mapped the other classifications: e.g. navigational metadata as keywords can be found in general section, or quality metadata is included in the section of studies and questionnaires.
Metadata system(s)
Costs and Benefits
Although the development of the new ISIS and the CROMETA in particular required a significant amount of resources, mainly from the IT sector, it is expected that positive effects on statistical production will by far outnumber the resource usage.
CROMETA will add functionality as well as quality to the new system through at least following aspects:
- well-described and uniformed metadata, i.e. all information on statistical surveys and statistics in one place
- using same classifications, registers and address lists (nowadays, they are different for each survey and not maintained centrally)
- speed up all the processes in the statistical survey life cycle and bring down the time needed to establish a new survey
- better control over statistical surveys in general and data processing in particular
Thereby CBS will provide a better image within the public sector in Croatia and international statistical society as statistics will be more accessible.
It should be noted that a significant extra effort will be required from statisticians to get used to the new methodology of survey maintenance and especially to provide all metadata necessary. It is expected that some statisticians will regard the new system simply as extra work, while some will, hopefully, gladly meet the challenges and benefit from that.
The profitability of the system should be regarded through direct and indirect revenues against all costs (investments in HW and SW, staff, maintenance etc.). It is rather difficult to separate the costs/benefits of the CROMETA system from the whole ISIS; therefore the costs and revenues are calculated for the ISIS in whole.
Direct revenues will consist primarily of the savings from the eventual shutdown of the mainframe, i.e. nearly 300.000€ yearly in terms of software licenses and maintenance. If the solution proves to be successful, it might also provide some revenues from international statistical exchange, since it is a usual practice among NSO's to exchange solutions for a reasonable amount of money. Also, CBS's IT staff is already engaged in consultancy to other NSO's. This will be enhanced further by the experience gained through system development. Consultancy also brings direct revenues to CBS. Indirect revenues may be gained from the lower maintenance cost of the client/server equipment against the maintenance cost of the mainframe which will be shut down eventually.
A significant benefit will be as much as 20 IT experts made available for new tasks. This number of people is heavily engaged in regular statistical production now and will be freed for other tasks and further development.
The costs are spread from equipment to the in-house staff and technical co-operation. It is expected that about 250.000€ will be spent altogether on hardware and licenses, the majority of that is already installed. The cost of IT education in the regular budget should be increased by 50% since there are more young developers to replace those who left CBS. There are no extra staff cost for the in-house development since it is done as the regular duty, but more developers should be hired or educated for the same reason as above.
The majority of investments were applied through technical support by Sida and Statistics Sweden from year 2002 to 2007. Additional help in terms of hardware purchase will be received from programs PHARE 2005 and PHARE 2006.
Implementation strategy
IT Architecture
CROMETA METADATA SERVER
CROMETA Metadata Server is a comprehensive solution that covers all aspects of a modern metadata repository. The server is not designed for methodologists, statisticians nor systems developers solely, but provides each key unit of a statistical office with appropriate functionality through its various modules. Using the server, the methodologists can rest assured that a well-established and constantly developing theoretical model on statistical metadata has been brought into use. At the same time, the owners and producers of statistical surveys will enjoy a user-friendly Windows-like interface for maintaining and browsing their metadata. As far as concerns the system developers, an XML-based web service ensures a straightforward, practically code-free integration with existing as well as new metadata consuming applications using the centrally stored metadata. Even the system administrators are not needed to intervene during the implementation. The whole server uses the latest and most established tools and techniques around.
The comprehensive CROMETA Metadata Server
CROMETA Metadata Notion
Basically, CROMETA Metadata Notion is the model on which the complete metadata server resides. This generic model inherits its theoretical base from the MetaNet Reference ModelTM on Metadata, which was originally developed within EUROSTAT in order to reach a common model on metadata in general, and a widespread terminology in particular. All terms used in order to describe various metadata concepts within CROMETA Metadata Notion are directly derived from the EUROSTAT model. Given the fact that the terminology was elaborated by experienced statisticians representing major national offices, it is fairly easy for any statistician to rapidly grasp and understand the terms as well as their conceptual meaning and relations within the model.
CROMETA Metadata Notion is purely conceptual, hence could be implemented on any technical platform.
CROMETA Metadata Storage Central
The conceptual metadata model covered through CROMETA Metadata Notion is implemented in practice though the CROMETA Metadata Storage Central. This is the basic storage point of metadata for any implementation of the metadata server.
A huge advantage of the storage engine is that it is not limited to run on a specific platform, furthermore, it is completely open to use for any purpose. In practice this means that the system developers may use the underlying data source directly in order to connect and integrate it with other solutions. Avoiding encapsulation into some hidden storage format enables the office to use the DBMS already purchased or to choose one in line with the budget. Moreover, built-in functionality of the DBMS chosen could be used for replicating and synchronizing data all over the system. Obviously, this approach also provides the office with possibilities to further customize the metadata model, should there be a need for it.
As mentioned, the CROMETA Metadata Storage Central could be customized for any database platform; however, the current implementation is developed for the Microsoft SQL Server platform. At present, the system runs on the Microsoft SQL Server 2005 DBMS.
CROMETA Metadata Business
CROMETA Metadata Business is a business tier that implements the conceptual model from CROMETA Metadata Notion as well as storage-specifics from CROMETA Storage Central through an object-oriented approach. Included in the business model is a fully documented UML class diagram, clearly displaying the metadata classes, objects, methods etc. available through the metadata server.
Basically, CROMETA Metadata Business puts all features such as version handling, multi-lingual support, etc. into play by applying the appropriate business rules. For the ordinary user, the use and work of this module is obviously completely invisible. However, when using the maintenance tool - CROMETA Metadata Manager for browsing or editing metadata, or when using metadata for tabulation from the website, CROMETA Metadata Business takes care of all the background processing enabling the various features to be put into play.
For the developer engaged with integrating various metadata consumers with the central metadata server, the CROMETA Metadata Business is essential. By using the CROMETA Metadata Business together with the CROMETA Metadata Consumer Services, the task of creating a real Integrated Statistical Information System (ISIS) has been considerably simplified. Connecting surrounding applications with the metadata server is basically just one click away. For additional information regarding integration and how to put metadata consumers in connection with the central repository, see CROMETA Metadata Consumer Services.
CROMETA Metadata Consumer Services
The CROMETA Metadata Consumer Services offers an uncomplicated and swift way for integrating existing as well as future metadata consumers with the metadata server. By exposing all metadata objects with belonging properties through an XML-based web service, the whole metadata repository is made available. Virtually any application requiring metadata can reach it in a practically code-free manner. Using the CROMETA Metadata Consumer Services, the system developers can connect to the metadata server and use its metadata just by referencing the web service and use the predefined methods for retrieving and editing metadata.
Consumer services that are considered important to be implemented soon:
- Exposing classifications through the official web site;
- Displaying the organization and areas of responsibility, contacts, etc;
- Publishing statistics using metadata for browsing purposes;
- Delivering publications including data and metadata;
All of these are features that could be implemented in a straightforward manner using the CROMETA Metadata Consumer Services.
CROMETA Metadata Manager
Most people will access the CROMETA Metadata Server through the CROMETA Metadata Manager. This is the tool which provides a memorable, graphical user-friendly interface for adding, browsing, editing and generally maintaining metadata. Basically everything that could be done to metadata could be achieved here; given you have the appropriate privileges to the metadata you are aiming for.
The CROMETA Metadata Manager tool runs on the following platforms: Microsoft Windows 2000, Microsoft Windows XP, Microsoft Windows 2003 Server, and Microsoft Windows Vista.
Basically, CROMETA Metadata Manager features a Windows-like interface. Any user familiar with using Microsoft Windows should be able to acquaint with the tool in no time. For a more detailed description of this tool, see Metadata Management Tools (4.2)
Technical Platform for CROMETA Metadata Server
As regards technical platform and development environment, the conceptual data model for CROMETA has been developed using Sybase Power Designer 9.5. The same software has been used for creating the physical data model that has been implemented on the Microsoft SQL Server 2000 RDBMS. Consequently, all database development has been carried out on the Microsoft SQL Server platform, while the CROMETA maintenance tool has been developed in Microsoft VB.NET. All object and use case modeling have been performed in Microsoft Visio. For managing the development in a shared environment, Microsoft Visual Source Safe has been used, while for project management purposes, Microsoft Project and SharePoint Portal Server have been applied. Since the development is still ongoing, the solution was upgraded and tested on Microsoft SQL Server 2005.
Metadata Management Tools
As part of the comprehensive CROMETA solutions, a maintenance tool has been developed, providing a user-friendly interface to the central metadata repository. It is expected for the tool to be developed further and extended with additional features that may be missing presently.
CROMETA Metadata Manager provides numerous functions, out of which the most obvious are to add new, edit and delete metadata from the central repository. It is difficult to describe the tool in words only; most probably the only way to get acquainted with it for real is to try it out in practice. However, some of the outstanding features of the tool are listed in this chapter.
Easy editing through graphical interface with multi-language support
Multi-language support
CROMETA Metadata Manager supports an indefinite number of languages, meaning that the languages desired for entering and maintaining metadata could be defined through the interface. All textual properties will then be open for browsing as well as editing in the languages defined. In practice this means that the office may decide to enter metadata in five languages, for example Croatian, English, Russian, French and Spanish. Obviously, all statistics may then be published dynamically in any of these languages. Furthermore, also the interface itself supports several languages; hence it could be customized for any language.
Versions management
All metadata objects may exist in numerous versions. Each version must be of one and only one state, however only one version of each object could be current/authorized. Versions will be described in detail in chapter 4.4.
General functions
All metadata, despite type or usage, could be added, edited or deleted using the exact same methods in the maintenance tool. Reusing methods between different types of metadata secures a tool that is recognizable and easy to use.
Add new version based on an existing version of a metadata object
Sometimes the differences between versions of a metadata object are minimal. For example, a questionnaire used within a survey one year, may only differ in terms of one or two new questions being added compared to the previous year.
To minimize the work within respect to this matter, CROMETA Metadata Manager supports "Add new based on", which means that all properties and connections of an existing version of an object is copied into a new object that could be further edited.
Taking the example with the questionnaire, it means that all questions, the questionnaire layout, validation rules, etc. are being copied from the previous questionnaire version. Obviously this saves a lot of time and effort. Basically the person responsible for the questionnaire only has to add the new questions in order to have it ready for processing.
History management
Besides the version management, the history of metadata could be displayed through the update log. Every time a change is made to metadata, this is logged with modifying user and date. The modifying user may also optionally supply a description of the change made. Through the history management, metadata may be studied and followed over time, including all changes, explanations, involved users and its lifetime cycle.
Authorization
In order to secure the metadata, a full-covering model for access and authorization is applied while using CROMETA Metadata Manager. First of all, the system supports a basic user group model where users can be divided into system administrators, object type experts, standard users etc. Secondly, all metadata created may be secured by the creator assigning privileges for how it may be used by other users. By default, all metadata kept within the system is read-only for all internal users.
In practice this means that when creating for example a new study, the creator may assign rights in any way he/she likes, using the tool, in order to ensure that only some persons are able to edit information, while others are only allowed to browse it, etc.
Search
CROMETA Metadata Manager allows searching and filtering objects in various ways. Quick search based on titles is available as well as a full-scale advanced search including descriptions, keyword-based search functionality, etc.
General properties managed in a general way
When working with any metadata object, the general properties will always be available in the same way. In practice this means that footnotes, keywords, documents, etc. could be connected to all metadata objects in the same flexible way.
Subscription to metadata
In some occasions users may find it useful to get notifications when objects like for example classifications are updated. This is supported by CROMETA Metadata Manager through subscription-functionality. A user is allowed to sign-up to any object, specifying with which frequency he or she would like to be notified on changes. As soon as the metadata object, for example the NACE-classification is modified; the registered users are notified according to their independent frequencies.
Locking and unlocking of metadata
In order to ensure that the same metadata are not edited simultaneously by two different users, CROMETA Metadata Manager has built in functionality for exclusively locking and unlocking of metadata.
Standards and formats
Version control and revisions
Each metadata object can have several versions that are valid in exclusive periods of time. There is only one version valid in any moment. This is determined by the Status of the metadata object as follows:
1. Under development
When a metadata object is created it gets this status by default. (This could of course be changed by the person creating it). This is the only status that allows the user to completely delete the metadata object from the system. A metadata object having this status could be viewed, edited and deleted physically.
2. Released
When a metadata object has been completely created, it is normally being released. This means that it is finalized but not yet authorized, i.e. it has not been pointed out as the current version. This could also be described as ready for review. A metadata object having this status could be viewed, edited and deleted (not shown).
3. Authorized
After being released a metadata object is usually reviewed by a reference group, or the creator alone. When authorizing it, it is automatically displayed as the current version. The metadata object previously displayed as current version is at this point automatically moved into status 4, Archived. A metadata object having this status could be viewed, edited and deleted (not shown).
4. Archived
When a metadata object is being replaced as current version by a successor, the state of it is automatically changed to Archived. This shows that the metadata object has been used as current version, but has now been succeeded by another version. A metadata object having this status could be viewed, edited and deleted (not shown).
5. Frozen
Metadata object that has been used as the current version is automatically given the status Archived. However, this object could still be edited. At some occasions you may not want to allow editing of old versions, and if so, the archived metadata object is given the status Frozen.
6. Deleted
Only metadata objects of the first status, Under development, could be physically deleted. However, metadata object of statuses two-five could be put into this status in order not to be shown in the metadata system. Metadata object of this status thereby appear to be deleted but could in fact be restored by the administrator.
Whenever some significant change on metadata must be applied, it is recommended to create a new version. If changes are small, e.g. spelling corrections, they can be applied without defining a new version. Such changes are kept in Update information, a general property of all metadata concepts in CROMETA.
Full access to all versions of metadatadata
Outsourcing versus in-house development
Sharing software components of tools
Overview of roles and responsibilities
Metadata management team
The CROMETA development project has been divided in two parts, as defined in July 2004:
- the Metadata Methodology sub-project with the task to form and define the conceptual metadata methodology for CBS, based on mapping of all existing metadata models and definition of processes needed to reach a comprehensive metadata solution. The result of this project should be the basis for technical implementation.
- the Technical Implementation sub-project with the goal to develop the solution to physical implementation of the conceptual model of CBS' metadata on MS SQL Server 2000 (later upgraded to the MS SQL Server 2005) and metadata management tool. The result of this project should be the implementation of the 'empty' metadata container and the user interface.
Each sub-project had its own development team consisting of 7 - 8 people; between them one project leader, 1 - 2 full time developers, others as supporting members with 30-75% engagements in the respective project. Unfortunately, the project suffered a significant loss of resources, nevertheless the interesting and challenging work. By the year 2007 the metadata repository was practically completed by three developers.
Training and knowledge management
Partnerships and cooperation
Other issues
Lessons learned
The most important questions are still unanswered since the central metadata management system is not deployed yet. The complete ISIS information system in general and CROMETA system in particular will force big changes upon the overall culture of CBS. The degree of content or discontent by the majority of statisticians when they start using the metadata maintenance tool is yet to be learned. In any case, we expect resistance from a number of subject-matter experts, especially those who cherish very much the legacy from past times when it was usual practice to order a tailor-made data processing system from the IT department.
Therefore the 'Lessons learned' here apply mainly to software development activities and this is by far less important than the overall cultural changes that will be met by the deployment of the new ISIS.
- The most important lesson learned is that there is no serious development when there is no development team appointed to this and only this project. This applies to IT developers as well as statisticians. Of course we knew that even before we started the project but we could not afford to have experts unavailable to regular production for a longer period of time. So we entered a vicious circle: we wanted to develop software to make production easier but we could not develop because we had to handle the production. For this reason the development lasted much longer than planned.
- The support from top management is crucial.
- Teamwork is very important and now it is enhanced with practical tools such as SharePoint etc.
- It is obvious from experiences of other NSOs that the involvement of statisticians is crucial for the project. Therefore we tried from the beginning to include selected statisticians in the development through various forms of cooperation but somehow it always ended after two or three meetings (see item1.).
- Strict project management. It is the responsibility of the project management that statisticians fell out of development activities sooner or later.
- The most painful lesson learned was that there is no project interesting and challenging enough to keep young and well educated IT experts from going to better paid jobs. IT experts in government bodies are paid two or three times less than in private sector and this needs no further comment. The CROMETA project started with 16 people (more or less involved) and ended with 3.
Links: |
---|