Are we really sure there is a need to have both population and universe objects?

 

Guillaume Duffes

  • No labels

13 Comments

  1. Here is the same issue raised during one of the DDI Moving Forward public reviews: https://ddi-alliance.atlassian.net/projects/D4Q2/issues/D4Q2-85?filter=allissues

  2. user-8e470

    Meeting 4/4: LIM proposed to add universe in addition to population in DDI4. Lets see what people think. Do we need population AND universe? - Alistair Hamilton Rebecca Stoks Francine Kalonji Eva Holm Essi Kaukonen Jenny Linnerud

  3. My own organisation liked the addition of Universe.

    Population needs geography and time. The producers know and can document these, but our external users want to search generally independent of the time and geography, see what is on offer (possibly across diferent statistical organisations) and then chose from the available data.

    Universe broadens the search possibilities.

  4. ABS experience is similar to Jenny's.  The ABS AIM model has had to build in a similar distinction between specific "time and space" bound populations vs "population selection specs" (which could be a Universe) that can be applied at a particular time and to a particular space.

    As a Population in GSIM currently can have a relationship to another Population, and Time and Space is options for Populations, you can describe this using a pair of objects both based on the Population class.  I think, however, it is such a fundamental thing that an extra class would be worth it. 

    Conversations in the ABS often tripped over whether people were talking about "Population" in the generalised specification sense or in the materialised in time and space sense.

  5. Agree that users often want to search by universe first and then drill down into a specific population so I think both are useful. I also wonder whether it is useful to have different objects for target and actual population or if this better handled as a quality statement.

  6. In Statistics Norway we are including Target Population and Survey Population as subtypes of Population. These were present in GSIM v1.0, but discarded in GSIM v1.1.

    We would support re-introducing them into the revised version of GSIM this year.

    v1.0 also had Frame Population and Analysis Population. We have not yet included these locally, but we would also support these two subtypes!

  7. I can see a use case for Analysis Population for Statistical Disclosure Control. Would be interested in other agencies' experiences of what metadata they need to support SDC, although that is probably its own thread.

  8. GSIM Revision Group meeting 25 April 2018

    Agreed to keep both Population and Universe

  9. Then, I bow to the majority. But I'm still not convinced of the need of an additional class (smile). If an Universe-like class is added, the same issue can be raised for the other types of Population classes mentioned by Jenny, and the list can be even longer...

  10. Indeed it can, but we need business cases each time and we need to use our Design Principles each time or it will become very difficult to maintain ....

  11. Hi Guillaume. 

    I think I understand your perspective but I as far as I currently understand (and this post can test my understanding!) there is a difference between the two cases.

    Is see Pop/Universe as somewhat similar to Variable / Represented Variable (RV) / Instance Variable (IV).  GSIM could have a single class of "Variable" that optionally has an association with a representation and optionally has an association with a Population.  This would support definition of "Variables" of a very general type and "Variables" of a very specific type.  The model could support defining relationships between the very general and the very specific Variables. 

    There are real world cases (including "simple data description") where the above simplicity of "just one class" would be beneficial.

    On the other hand, GSIM implementers would likely need rules such as Variables used in practice against data must have a defined Value Domain even though Value Domain is optional overall.

    The Variable / RV / IV separation helps ABS a lot in being able to encourage and harness reuse because, eg,

    • we can talk about cases where we want to get reuse of gsim:Variable even where we recognise the RV is non standard because it needs to be linked to administrative data that does not use ABS standard representation
    • we can talk about cases where we are capturing the same information with the same representation but the Population (and potentially, more specifically, the Universe) from which the data is captured varies (same RV, different IV) 

    Having Pop without Universe can work but we found ourselves tripping over in modelling and in conversation whether we were talking

    • "space and time specific" Populations or
    • "Universes" defined by characteristics of in scope units 

    Similarly to IV vs RV, we accept Populations will differ based on space and time but we want to see Universes specified on a consistent basis (ie reused) wherever possible (rather than having "gratuitous" differences).

    When we have looked at different "roles" for Populations we have so far not found extra attributes we need to specify for the Population class itself if it is Survey or Target.  Instead, Survey and Target will almost always be two different examples/instances of the Population class, with differences (as well as similarities) in definition. 

    The difference in role is picked up for ABS in how, and where, the two population instances are referenced rather than in making them instances of two different classes.

    If there are attributes intrinsic to the "different types of" Population themselves that should be modelled differently, conceptually, at the GSIM level then re-introducing sub-classing may well be appropriate.  Where differences primarily relate to role / purpose / application rather than intrinsic structural definition, however, implementation at ABS has sought to avoid subclassing.  The ABS approach mirrors - and was inspired by - the discussions for GSIM 1.1 that took out subclassing of Population and Unit Type.      

  12. Hi,

    Thanks very much for the explanations, that has convinced me overall (smile).

    I have indeed not thought of the analogy with the Variable/RepresentedVariable/InstanceVariable.

    I can see your point regarding the "roles" for Populations using the context as the instantiation factor.

    Have a nice weekend.

  13. Thanks to Al and my own colleagues we are now not proposing sub-types of Population in Statistics Norway or GSIM.

    Our suggestion was originally based on the Neuchâtel for variables work where the Population class was not assoiated directly to the physical (Instance) Variable as in GSIM, but closer to the conceptual Variable.

    My apologies for this wild goose chase!