This version was available for comment from 20 June to 1 July 2011.

Introduction

This page is intended to document progress in regard to the GSIM Common Reference Model.

This page will always refer to the most recent material that has been made available for broader review and input. It will also link to earlier material.

Broader context in regard to GSIM, and the OCMIMF Collaboration Team which is progressing it as a Statistical Network initiative, is available on this wiki's home page for GSIM, including a linked document providing Notes on the Generic Statistical Information Model.

Current Status

A set of initial ideas from the Collaboration Team were released for wider input on Monday 20 June as GSIM Common Reference Model V0.1. A list of specific unresolved questions is included in the package of information. Comments and suggestions in regard to all other aspects of the material are also sought.

The target date for initial feedback is Friday 1 July. Methods for providing feedback are described on the home page for GSIM.

Notes in regard to GSIM Common Reference Model V0.1.

Recognition of V0.1 as an incomplete draft

V0.1 of the GSIM Common Reference Model is recognised by the Collaboration Team as an incomplete draft. For a number of months the team focused on seeking to identify a "top level structure" which was most likely to be intuitive for the greatest number of users. Having arrived at the current concept only a few weeks ago, and then spent time refining it, very little work has been put in to

  1. aligning the new approach with the work of the CORE ESSNet, with the work on the MCV Ontology and so on
  2. reviewing and developing "Level 2" and "Level 3" within the GSIM Common Reference Model in the context of the newly established "Level 1"

From one perspective it is tempting, given a newly established Level 1, to now spend more time within Collaboration Team addressing the above two aspects - which would allow us to present a better rounded V0.1 for review outside the team in a few weeks' time.

This is seen as outweighed by the following factors

  • It is quite possible that a round of wider comment on V0.1 will include input that
    • allows the Collaboration Team to address the two key outstanding aspects more effectively - saving time and improving results overall, and/or
    • leads to further significant refinement of Level 1 and hence further changes the approach to Level 2 and Level 3
  • Substantial interest has been expressed beyond the Collaboration Team in seeing, and providing input on, V0.1 at the earliest possible stage in its development. Especially given external input may further reshape Level 1, it is hard to prioritise further finessing within the Collaboration Team prior to releasing V0.1 for comment. (31 May was the original target date.)
Suggestions to reviewers

The fact V0.1 is recognised to be an incomplete draft leads to some suggestions to reviewers.

It is expected and hoped that later draft versions (prior to V1.0) will warrant detailed review and comprehensive feedback. Rather than investing in such comprehensive review for V0.1, however, attention might focus on

  • The proposed overall scope, purpose and structure for the GSIM Common Reference Model
  • Levels 0 & 1 as currently specified
  • The list of unresolved issues included in the release package
  • Broad suggestions for progressing the two aspects (alignment with other initiatives and detailing of Levels 2 & 3) which were identified in the previous section. These aspects will now be progressed by the Collaboration Team subsequent to broader review of V0.1.
Text mining of information objects from documentation of GSBPM

While a number of other sources have also been referenced, the most prominent means of identifying information objects for V0.1 if the GSIM Common Reference Model has been "text mining" apparent references to information objects in the documentation of GSBPM. While the documentation description of GSBPM did not explicitly set out to catalogue and discuss the most significant information objects used and/or produced in the course of the various sub processes, it is seen as a good heuristic approach for arriving at an initial list. More information on the methods used, and the results from applying them, is available via the following page

The aim is that V1.0 of the Common Reference Model should be as meaningful and useful as possible to general readers (who are not ICT or Statistical Information Management specialists). Many other sources for identifying information objects appeared more theoretical and/or technical in nature than the text mining approach. Nevertheless, a possibility remains that the text mining based approach has left some significant gaps in structure and content

  • No labels

2 Comments

  1. Firstly, this is a great piece of work, and rather more advanced already than I expected for a version 0.1. Congratulations to all involved. I have the following fairly minor comments:

    1) General - paragraph numbers would facilitate the comment and review process

    2) Components of the GSIM - First paragraph - could be useful to spell out the expected audiences explicitly in points 1. and 2.

    3) GSIM and GSBPM - 7th paragraph, first line - "without" instead of "with"

    4) Level 2 diagram:

    - Not sure about "Statistical Domain", and whether this has a real bearing on requirements, particularly given the moves towards more process-based statistical production. Also, if we use the current definition, and also apply the ideas behind "industrialisation" all statistical production is ultimately covered by one domain, making the object more or less redundant.

    - For "Statistical Unit" there may be another information object called something like "related unit". This may or may not be a statistical unit itself. For example, the statistical unit "enterprise" is typically constructed based on other units (legal unit, local unit, enterprise group), and "household" may be constructed from "person" or "family".

    - For "classification", it is becoming increasingly important to hold some sort of text description rather than just a code, as this facilitates re-coding to other classifications or other versions of the same classification, therefore perhaps "description" should be added.

    - For "data source", an object called "secondary source" may be needed to reflect the growing use of non-statistical sources, sometimes directly rather than via a register.

    - For "collection instrument", an object called "data transfer" could be added - this is the most common collection instrument for international organisations, and is becoming more important at the national level

    - For "data", is there a need for a catch-all object called "attribute" to cover things like quality flags, data status markers, data source codes etc? If so, perhaps "weight" could be included in "attribute"?

    Regarding the specific questions:

    1. Are there information object groups (Level 1), or primary information objects (Level 2), which are not yet represented in the Common Reference Model but should be?
    See above comments

    2. Should any of the information object groups (Level 1), or primary information objects (Level 2) which are currently included be placed at a different level?
    "Variable", "Classification" and "Data source" could be seen as attributes of "Statistical unit", whilst the other information object groups are attributes of the production process - but I am not sure yet quite where (if anywhere) this line of thought leads, and hence whether it implies any changes - perhaps something to discuss if it hasn't been discussed already?

    3. Some information object groups do not have additional primary information objects represented at Level 2. What are your thoughts on this?
    No problem

    4. Section III of the document sets out current ideas and plans related to the relationship of GSIM with other standards and models.
    a. Are there additional standards and models which you believe should be reviewed as a high priority by the team?

    It might be useful to go through Part B systematically to determine any other relationships, but I think you have captured the main ones - perhaps a sentence or two on ISO 11179 and the CORA model?

    b. Are there opportunities which seemed to have been overlooked in terms of GSIM leveraging, and/or co-ordinating with, the models and standards which have already been identified?

    Not that I can think of at the moment

    5. Annex 1 provides additional information on the approach taken to modelling GSIM, including the Common Reference Model, to date, and the planned path for moving forward. Do you have comments and/or advice in regard to the approach taken?

    No - the approach seems sound.

    The acronym for Common Reference Model is CRM, which is more commonly associated with the term Customer Relationship Management.  There is only a tenuous connection between the role of the Common Reference Model and the common translation of "CRM".  Is there an alternative designation you would suggest for what is currently the "Common Reference Model"?
    Depends if CRM will be used as a stand-alone acronym or as "GSIM-CRM". If the former, it is not very informative about what it covers, so perhaps Common Reference Information Model (CRIM) or Common Reference Statistical Object Model (CRSOM) - an acronym that is relatively unique might help those searching for information on the model (e.g. via Google). 

  2. Comments from Statistics Canada (Alice Born)

    General issues

    1.      Are there information object groups (Level 1), or primary information objects (Level 2), which are not yet represented in the Common Reference Model but should be?

    Level 1: In the Canadian model, we refer to "event" as a type of statistical unit (object class) such as birth, death, marriage, etc. I cannot think of another term right now for consideration.

    Level 2:

    Statistical domain: I agree with Steve's comment and that it should be removed from Requirement and add to Concept

    Statistical unit: add 'event', 'agents', 'items'; I agree with Steve, need related or derived statistical units  

    Variable: I suggest that your add property/characteristic, statistical unit (do objects have to be mutually exclusive?), and representation class (type of, value of). Also are we referring to "statistical variables"?

    Classification: I agree with Steve. I would change the title to "Statistical classification" and add the following information objects: class title, class definition, inclusions, exclusions, (concordances?)

    Data source: I agree with Steve. Add secondary source. Are "registers" administrative data sources? (cancer registries, birth/death registries, tax registries)

    Collection instrument: I am not sure how granular you want to be but I suggest adding: question block, response choices, notes to interviewers, etc.

    2.      Should any of the information object groups (Level 1), or primary information objects (Level 2) which are currently included be placed at a different level?

    For now, I think that the objects are at the appropriate level but more detail is needed at Level 2. Also, can the primary information objects occur in more than one object group (Level 1) if they are attributes of them?

    3.      Some information object groups do not have additional primary information objects represented at Level 2. What are your thoughts on this?

    Not sure at this time. It seems to me that there should some primary information objects for analysis and datasets. (I will discuss with Tim and Sheri).

    4.      Section III of the document sets out current ideas and plans related to the relationship of GSIM with other standards and models.

    Are there additional standards and models which you believe should be reviewed as a high priority by the team?

    Should include ISO11179 especially since DDI3.0 states that it is compatible with 11179, and that GSIM is to include semantic specifications.

    Are there opportunities which seemed to have been overlooked in terms of GSIM leveraging, and/or co-ordinating with, the models and standards which have already been identified?

    Not at this time.

    5.      Annex 1 provides additional information on

    o  the approach taken to modelling GSIM, including the Common Reference Model, to date, ando  the planned path for moving forwardDo you have comments and/or advice in regard to the approach taken?

    In my opinion, both the common reference model and semantic reference model are important. I would propose additional levels be added (Section VI). It is not clear to me what would be in Level 3 - is this semantic reference model layer?

    The acronym for Common Reference Model is CRM, which is more commonly associated with the term Customer Relationship Management.  There is only a tenuous connection between the role of the Common Reference Model and the common translation of "CRM".  Is there an alternative designation you would suggest for what is currently the "Common Reference Model"?

    In suggesting any alternative designation(s), please note that the GSIM Common Reference Model is intended to be a resource that stands in its own right, but also has a "Semantic Reference Model" (which could also be given an alternative designation if appropriate) associated with it that allows GSIM to be operationalized consistently.

    I agree with Steve's solution.


    Issues relating to specific information objects

    Information Object

    Issue

    Release

    "Release" is a class about final products that have been approved to leave the agency. In some agencies, a release included pre-packages products only. In others, it includes a suite of information identified within separate information object groups on the diagram (e.g. "Dataset" and "Analysis").

    Method

    Are there particular information objects that people expect to see at Level 2?

    Currently, the objects that make up the method class are quite generic.

    There are concepts referred to in GSBPM which might appear at Level 2 although it is commonly a challenge to differentiate the information object (which should be incorporated in GSIM) from the activity (which belongs in the domain of statistical business process represented by GSBPM) when it comes to terms such as "edit" and "validate".

    Weight

    This can be thought of as a piece of data or a method. What is the preferred placement for this information object?

    Event

    The V0.1 document currently identifies two quite different definitions and scopes for the "Event" information object.

    Would you recommend a third alternative?  Ideally the definition would be internationally recognised although it might not be (and ideally would not be?) specific to the domain of production of statistics.  (BPMN is not a specifically statistical standard.)

    If you cannot identify a recommended third alternative, is there one of the two current definitions that you would recommend over the other?

    The first option makes "Event" an overarching concept (which encapsulates concepts related to points in time, activities, actors/agents etc) where the second positions it as referring to something that happens at a point in time and positions it as an interrelated "sibling" concept to Process.

    Business Case

    This can be thought of as a collation of information or a piece of information in its own right.

    Survey

    Where, if anywhere, should "Survey" be placed as an information object?

    -          It could be added in the "Data Source" grouping as a means to source data (eg in the sense "sample survey")
     
    -          It could be added in the "Collection Instrument" grouping as a means to collect data from respondents (eg in the sense "survey form", "survey interview")


    -          Some agencies' appear to use the term "Survey" to designate an instance of the Statistical Business Process (also sometimes termed a Statistical Activity) even if that particular production process did not include designing, and collecting data through, a sample survey.
    or in the " Collection Instrument" class.  In this case the "survey" information object might be expected to record information for a particular production process through the nine phases of the GSBPM.
    Given the ambiguous and varying meaning of the term "survey" it has not been included as an information object at this time - but could be incorporated if there is a broad consensus on how it should be included.

    Data

    As indicated within the documentation of this information object group, additional thought is required on structuring it on a consistent basis.

    For example


    -          There is the question of whether separate primary information objects related to aggregate and unit level (macrodata and microdata?) should be identified at Level 2.


    -          The definition of Statistical Data may not yet be clear in conveying the sense that it refers to information that is the objective <and the "raw material"> of the statistical data collection or production process as distinct from Process Metrics which inform in regard to the data collection and production processes themselves. 

    One element of rationalisation may be to determine which distinctions are best addressed through differentiating attributes being applied to a single information object rather than defining separate information objects.

    At the level of the Common Reference Model, whatever approach is ultimately selected must make sense from the perspective of staff associated with the statistical business process who are not information modellers and also be able to be operationalized consistently through the associated semantic reference model.