(Feedback from Statistics Spain; 28 September, 2017)

The part of GSIM of Structure seems to be described from the IT point of view but not from the Statistics point of view. We don't find certain objects like quality, estimates, variance, and errors. In contrast, we found a lot of different typology of processes. As an example, some objects of phase 5 and 6 of GSBPM that we don't find in GSIM could be: Variance, coefficient errors, aggregate, weights, edited and validated sample, quality indicators or dissemination product.

  • No labels

4 Comments

  1. Meeting 27 June, 2018

    • Agreed that users should be informed on how to include this information (i.e. quality, estimates, variance, aggregate, weights, edited sample…) but probably as text in Specification document (and addition to Variable Annex where appropriate) not as new GSIM objects
    • But not clear where the information should be included, some are more about datasets, some can be associated with process, etc. 
    • We need to know if people are using the information in the model and if so, how. The information can be grouped into some clusters, maybe using GSBPM? Need more input from countries how they do this. Alice BornAlistair Hamilton, Eva Holm, Essi Kaukonen, Mikko Saloila - can you share your experience?
  2. We tend to have a multifaceted approach to this.  One of the challenges is defining exactly what you're trying to indicate and how it relates to other things (eg "quality indicators" vs "process metrics" -> a measure of how much imputation was undertaken is a process metric for us, where the metric might later be reported as quality indicator).

    We use "User Defined Types" for some of these features (below the level of GSIM) so you can flag particular data structure components as weights and indicate which "statistical" measures the weight is used for (typically one to many).  Similar approach to variances but more likely to be one to one.

    Very roughly "User Defined Types (UDTs)" can be seen as mechanism to add specific to ABS "trivial subclassing" - or "tagging" - of a selected  class in GSIM to indicate specific to the ABS "business use"/"business process behaviour".  It provides extensibility - it is possible to add a new User Defined Type for - eg Data Structure Components (DSCs) without changing our "core model" - but you still get issues, eg

    • should a subset of currently defined DSCs associated with UDT "D" be moved across to reference the new UDT "E" instead?
    • which ABS business processes need to understand what to do with a DSC of UDT "E"?  

    "Aggregate" tends to be a User Defined Type applied to Data Set for us.

    "Edited and validated sample" would not be a single thing for us.  We could have a status on the Data Set that contained the sample that indicated it was "good to go" (GTG).  The business practice could be to undertake editing and validation before a GTG status is assigned. 

    We would be updating record and cell level Enterprise Status Codes (ESCs) as editing occurred.  The final validation process could upgrade ESCs to indicate no remaining anomalies could be detected and the "GTG" flag could then be assigned overall.  This scenario is way "down in the weeds" of how we're orchestrating and quality assuring business processes.

    "Estimate": Is this just indicating a post estimation estimate?  If so, at that stage we wouldn't care is it is an aggregate by estimation or some sort of aggregate from administrative data (eg motor vehicle registrations) with little/no "traditional" estimation.  Data set and process lineage would allow you to trace back to find that the first set of aggregates had been created through estimation.

    "Quality" on its own is way too broad for us.  Process vs statistical product quality is a key differentiator for us (even though the two are linked).  Even more significantly, quality is fitness for purpose.  Statistical estimates which are fundamentally fit for one purpose may not be fit for another.  What we are trying to do is provide a number of indicators to support informed judgement rather than make a single definitive measurement of "quality".

  3. We agree with Alistair what comes to quality.

    We have recently created an editing service that uses Instance Variables as an input/output of the editing process. From that experience, we see a lot of these things happening in the process part of GSIM. Like quality, they are not perhaps separete identifiable objects. All in all here is a lot of individual things and perhaps this issue needs to be divided into several sub issues handled separately, perhaps in the next review? =)

    BR, Mikko and Essi

  4. user-8e470

    In her clean up of the GSIM wiki pages, InKyung Choi found the GSIM User Guide which includes examples of how these can be handled in GSIM. 

    https://statswiki.unece.org/download/attachments/75563987/GSIM%20User%20Guide.pdf?api=v2 

    I am moving this issue down to a documentation type solution.