Netherlands and Norway operate with at least 4 different steady states for data e.g. raw data, clean microdata, macrodata/statistical data, published data.

How is this modelled in GSIM? There is Unit Data and Dimensional Data, but how do we model the amount of processing? Attributes?

5 Comments

  1. I agree this would be an attribute, although it is not a standardised attribute within GSIM V1.0.  ABS hasn't standardised "steady states" of data in the past but we are looking to do so now.  it would be worth looking at whether, for next version of GSIM, we can come up with a common reference list of steady states to be associated with a standardised attribute?   

  2. Discussion in GSIM Implementation Group

    Most NSIs are not ready for standardisation around steady states of data, but this could be important for the modernisation of statistics.

    This could be used to indicate the amount of processing on the data and the business rules e.g. statistical disclosure control associated with this. 

    In the meantime, steady states of the data could be an optional attribute on the DataResource, with an uncontrolled vocabulary.

    This would be in accordance with GSIM Design Principle of standardising to the level og agreement but no further.

    Any NSI that wishes to extend the DataResource i.o. with this optional attribute can do so.

     

  3. Following the discussion on 9 April, I see a potentially different way of approaching this question.  Perhaps the next generation of the GSIM User Guide could provide a recommended means of extending a local implementation of GSIM  to support States of Data.  This would save adding an attribute to the formal GSIM specification if there is not yet consensus that such an attribute is relevant to enough NSIs.

    The discussion on 9 April, however, further convinced me that some recommendation on this point would be useful - even if it is not built into the formal specification.  I heard some people suggest (from memory) it should be an attribute on DataResource where I was thinking of it as an attribute on DataSet.  This is an example of how the half dozen or more agencies that are interested in States of Data could all go off and model it, and build it in to systems, in different and incompatible ways.  This would impact (in a small way) the ability to more easily share components based on a common information model. 

    While I agree the exact "allowable values" for States of Data and the business rules applied to specific states of data are matters for local implementation, I think the concept that there are states of data is somewhat prevalent (although not ubiquitous).  I continue to believe, therefore, some attempt to establish common high level modelling advice (which can be ignored by implementers if they choose) would be useful.

  4. Recommendation:

    Information about how states of data "could be done" should go into the next version of the User Guide, rather than trying to fit into the model

    Input to be given by countries using states of data may be done.

  5. user-39fa0