v0.1 GSIM Common Reference Model

This version was available for comment from 20 June to 1 July 2011.

Introduction

This page is intended to document progress in regard to the GSIM Common Reference Model.

This page will always refer to the most recent material that has been made available for broader review and input. It will also link to earlier material.

Broader context in regard to GSIM, and the OCMIMF Collaboration Team which is progressing it as a Statistical Network initiative, is available on this wiki's home page for GSIM, including a linked document providing Notes on the Generic Statistical Information Model.

Current Status

A set of initial ideas from the Collaboration Team were released for wider input on Monday 20 June as GSIM Common Reference Model V0.1. A list of specific unresolved questions is included in the package of information. Comments and suggestions in regard to all other aspects of the material are also sought.

The target date for initial feedback is Friday 1 July. Methods for providing feedback are described on the home page for GSIM.

Notes in regard to GSIM Common Reference Model V0.1.

Recognition of V0.1 as an incomplete draft

V0.1 of the GSIM Common Reference Model is recognised by the Collaboration Team as an incomplete draft. For a number of months the team focused on seeking to identify a "top level structure" which was most likely to be intuitive for the greatest number of users. Having arrived at the current concept only a few weeks ago, and then spent time refining it, very little work has been put in to

aligning the new approach with the work of the CORE ESSNet, with the work on the MCV Ontology and so on
reviewing and developing "Level 2" and "Level 3" within the GSIM Common Reference Model in the context of the newly established "Level 1"

From one perspective it is tempting, given a newly established Level 1, to now spend more time within Collaboration Team addressing the above two aspects - which would allow us to present a better rounded V0.1 for review outside the team in a few weeks' time.

This is seen as outweighed by the following factors

It is quite possible that a round of wider comment on V0.1 will include input that
- allows the Collaboration Team to address the two key outstanding aspects more effectively - saving time and improving results overall, and/or
- leads to further significant refinement of Level 1 and hence further changes the approach to Level 2 and Level 3

Substantial interest has been expressed beyond the Collaboration Team in seeing, and providing input on, V0.1 at the earliest possible stage in its development. Especially given external input may further reshape Level 1, it is hard to prioritise further finessing within the Collaboration Team prior to releasing V0.1 for comment. (31 May was the original target date.)

Suggestions to reviewers

The fact V0.1 is recognised to be an incomplete draft leads to some suggestions to reviewers.

It is expected and hoped that later draft versions (prior to V1.0) will warrant detailed review and comprehensive feedback. Rather than investing in such comprehensive review for V0.1, however, attention might focus on

The proposed overall scope, purpose and structure for the GSIM Common Reference Model
Levels 0 & 1 as currently specified
The list of unresolved issues included in the release package
Broad suggestions for progressing the two aspects (alignment with other initiatives and detailing of Levels 2 & 3) which were identified in the previous section. These aspects will now be progressed by the Collaboration Team subsequent to broader review of V0.1.

Text mining of information objects from documentation of GSBPM

While a number of other sources have also been referenced, the most prominent means of identifying information objects for V0.1 if the GSIM Common Reference Model has been "text mining" apparent references to information objects in the documentation of GSBPM. While the documentation description of GSBPM did not explicitly set out to catalogue and discuss the most significant information objects used and/or produced in the course of the various sub processes, it is seen as a good heuristic approach for arriving at an initial list. More information on the methods used, and the results from applying them, is available via the following page

The aim is that V1.0 of the Common Reference Model should be as meaningful and useful as possible to general readers (who are not ICT or Statistical Information Management specialists). Many other sources for identifying information objects appeared more theoretical and/or technical in nature than the text mining approach. Nevertheless, a possibility remains that the text mining based approach has left some significant gaps in structure and content

Information Object	Issue
Release	"Release" is a class about final products that have been approved to leave the agency. In some agencies, a release included pre-packages products only. In others, it includes a suite of information identified within separate information object groups on the diagram (e.g. "Dataset" and "Analysis").
Method	Are there particular information objects that people expect to see at Level 2? Currently, the objects that make up the method class are quite generic. There are concepts referred to in GSBPM which might appear at Level 2 although it is commonly a challenge to differentiate the information object (which should be incorporated in GSIM) from the activity (which belongs in the domain of statistical business process represented by GSBPM) when it comes to terms such as "edit" and "validate".
Weight	This can be thought of as a piece of data or a method. What is the preferred placement for this information object?
Event	The V0.1 document currently identifies two quite different definitions and scopes for the "Event" information object. Would you recommend a third alternative? Ideally the definition would be internationally recognised although it might not be (and ideally would not be?) specific to the domain of production of statistics. (BPMN is not a specifically statistical standard.) If you cannot identify a recommended third alternative, is there one of the two current definitions that you would recommend over the other? The first option makes "Event" an overarching concept (which encapsulates concepts related to points in time, activities, actors/agents etc) where the second positions it as referring to something that happens at a point in time and positions it as an interrelated "sibling" concept to Process.
Business Case	This can be thought of as a collation of information or a piece of information in its own right.
Survey	Where, if anywhere, should "Survey" be placed as an information object? - It could be added in the "Data Source" grouping as a means to source data (eg in the sense "sample survey") - It could be added in the "Collection Instrument" grouping as a means to collect data from respondents (eg in the sense "survey form", "survey interview") - Some agencies' appear to use the term "Survey" to designate an instance of the Statistical Business Process (also sometimes termed a Statistical Activity) even if that particular production process did not include designing, and collecting data through, a sample survey. or in the " Collection Instrument" class. In this case the "survey" information object might be expected to record information for a particular production process through the nine phases of the GSBPM. Given the ambiguous and varying meaning of the term "survey" it has not been included as an information object at this time - but could be incorporated if there is a broad consensus on how it should be included.
Data	As indicated within the documentation of this information object group, additional thought is required on structuring it on a consistent basis. For example - There is the question of whether separate primary information objects related to aggregate and unit level (macrodata and microdata?) should be identified at Level 2. - The definition of Statistical Data may not yet be clear in conveying the sense that it refers to information that is the objective <and the "raw material"> of the statistical data collection or production process as distinct from Process Metrics which inform in regard to the data collection and production processes themselves. One element of rationalisation may be to determine which distinctions are best addressed through differentiating attributes being applied to a single information object rather than defining separate information objects. At the level of the Common Reference Model, whatever approach is ultimately selected must make sense from the perspective of staff associated with the statistical business process who are not information modellers and also be able to be operationalized consistently through the associated semantic reference model.

Page tree