4.1 Metadata system(s)
The metadata infrastructure will be implemented within the 10 components covering the whole BmTS environment.
The Metadata Broad Logical Design 2004 defined nine components of key metadata infrastructure which were needed to create the physical metadata environment. These are shown in the diagram below.
Note: In defining the relationships, the terms period dependent and period independent are used. Period dependent refers to metadata which is linked to a specific activity/collection (includes quality metadata). Period independent metadata has meaning while held separate from data and can be applied to several collections (includes operational and conceptual metadata).
Search and discovery, Metadata and data access/ registration
These components reflect the ways the user interacts with the metadata. Ideally, searching, registration and access should be possible directly with each component, or through a central portal.
Data Definition - The data definition component is the only infrastructure linked directly to the data. This is the primary store which defines and adds meaning to the data. This is a period dependent store which compiles the relevant metadata for a single collection. All other storage components link to the data via the data definition store.
Passive Metadata Store - The passive metadata store is the next level removed from the data. It contains period dependent and period independent metadata about a collection of data (this includes survey collections and administrative data collections).
Question Library - The question library should be period independent. It contains questions and variables which have been defined independent of the data. The question library and classification management store are linked through the classifications used in questions.
Classification Management - The classification management store is another period independent store which manages the classifications used to define the data. It includes metadata linking classifications to each other (concordances) to allow more options when analysing and transforming the data.
Business Logic - The final period independent metadata store is the business logic component. While business logic is not linked directly to the data, it is applied to change the data through it's various states. This contains details of the rules and processes that may be applied to the data. Business logic may also be referenced in the design and methodology content of the passive metadata store. The business logic component sits partly outside the storage environment due to the need for software to access the rules and processes (e.g. rules engines).
Frame and Reference Stores - While the Frame is not part of the metadata environment, it may contain information which is used to define the data. Hence there is a link between this component and the data definition component.
Document Management - Document One is a tool for the management of documentation. As several reports and documents will be created during the business process, they are considered part of the wider metadata environment.
Standards Framework - The standards framework represents a tool for the central storage of standards used in the generic business process. This includes a definition of processes and methodologies at high levels. It will also include statistical standards which define how classifications are applied. Similar to Document One, this should be considered part of the wider metadata environment.
Logical View of Metadata Infrastructure and Relationships
In 2007, further analysis was completed using the gBPM and the MetaNet Reference Model to build a more detailed understanding of the logical metadata stores and the key relationships with data (see the model below).
The reference metadata layer contains stores of metadata with similar characteristics which allow it to be managed in a consistent way. For instance, classifications are used at various stages throughout the statistical business process to define various types of data. By storing and managing classifications in a separate 'classification management' store, we are able to analyse usage and identify opportunities for further standardisation.
In order to develop a fully integrated metadata environment, each metadata object will need to be linked with objects within the store, or in other reference stores. For instance, a description of business process will be stored in the 'standards and processes' component, this will also need to link to the 'operational metadata' component containing workflows and transformations which operationalise the process. The workflows and transformations may also reference 'business rules' which utilise concepts from the 'variable library'. Linking metadata objects allows the user to consider the full usage of each object and will enhance reuse and standardisation.The 'Structural Metadata Layer' is the mechanism for linking the reference metadata with the actual data. Each data item (or fact) within the data environment should contain a profile within the data definition store which identifies all the metadata relevant to that fact. Ideally this will consist of a map identifying the location of the relevant metadata in the reference stores. However, until all the reference stores exist, this component will contain snapshots of the relevant metadata.
When translating from the logical view, to the physical design it is anticipated that the components will take a different shape to that referenced in the model above. For instance, 2008 will see the investigation of a single system to manage classifications, questions and variables.
4.2 Costs and Benefits
The high level benefits of undertaking the metadata programme were outline in the introduction. Additional benefits to consider are as follows:
- maximising the value of metadata through reuse.
- reducing the unnecessary duplication of metadata.
- providing a more comparable, central source of metadata to allow for improvements through standardisation.
It is recognised that the development of a full metadata solution will require a large investment in the infrastructure of the organisation. There are also various levels of investment that could be applied to deliver the most practical solution (eg if the needs are a lower priority for one type of metadata, then the solution should be less complex). At this stage, the cost of delivery has not been calculated, however it is known the the following considerations will need to be addressed:
- Large amounts of metadata already exist in various systems will need to be transferred into new systems where practical.
- The principle of reuse, may require more effort in creating and storing metadata at early stages of survey development in order to reduce effort at later stages (essentially shifting the effort, rather than reducing it).
- There will need to be careful management to ensure the duplication of metadata is reduced (ie if it's already entered in the reference store, it should be selected and configured for current requirements rather than duplicated).
- Detailed versioning will be required to ensure that the metadata is relevent to the instance is relates to. This may increase the storage requirements.
4.3 Implementation strategy
The work of the metadata project during 2007 focused on identifying the high level needs of an integrated environment and the adaptation of a metadata conceptual model to understand the relationships and interactions with metadata. With the bigger picture in place, the intention is to focus on developing solutions for smaller components of the wider environment. This approach allows us to focus delivery in the areas which will provide the most gain, while still progressing along the path to delivering the fully integrated solution. It also allows us to assess the strategy at each stage to determine the most practical apporach and to minimise the risk of the delivering over-complicated solutions.