ABS Problem Statement

The ABS (Australian Bureau of Statistics) is currently developing a Metadata Registry and Repository (MRR) aligned with GSIM.  While aligned with GSIM, the definition of objects for the ABS MRR will requires attributes additional to those which are defined in GSIM as a generic model.

A question has arisen in this regard when it comes to the modelling of Unit in the GSIM V1.0 specification.

The three examples given all relate to indivudal people, companies etc.  Being able to identify a specific unit in this way is important when, eg, populating a register/frame or the unit identifier field in a unit record file.

It is very common also, however, to talk about "types" of statistical units (eg persons, families, enterprises, registrations, transactions) 

While related, the concept of an indivudal unit and the concept of a type of unit don't seem equivalent.  It seems unlikely, therefore, that "persons" is simply a Unit in the same way as an individual person.

The alternative of saying "persons" is a Population also seems awkward.  A Population will usually have at least some form of temporal and spatial scope.  In theory "persons" as a unit type refers to all persons ever born, or yet to be born in the future.  This seems different in concept to the population, eg, of persons on Planet Earth (which would typically only include people who were alive at a particular time).

A couple of possibilities come to mind for capturing "unit type" as an attribute.  The attribute might be, eg, populated from an extensible controlled vocabulary of recognised unit types (at an agency specific or international level).

Option 1

The attribute could be applied to Unit.  This might (or might not) imply the list of unit types should be limited to mutually exclusive "base types".  (At the conceptual level, are Dan Gillman as a Person and as an Employee referring to two different Units?)

Option 2

Perhaps it would be more appropriate to attach the attribute to Population to describe the "type" of Units (which would be subject to more detailed scoping in the definition of the Population) from which the Population is composed.  For example US persons and Australian persons are

  • different Populations
  • composed of different individual Units
  • built from the same "type of unit"      

Option 3

Unit Type could be added as an object in its own right.  This would allow Unit Type to have attributes and relationships of its own - including with other Unit Types.  For example, it would be possible to define "Employee" as a specialisation of the "Person" Unit Type.  (If "Legal Entity" were ever a Unit Type it might be a specialisation applicable to either "Person" or "Corporation"?)

I think that adding a new object tends to add more complexity to the model than adding a new attrribute to an existing object.  Option 3 would need to add significant advantages over Option 1 or Option2 in order to justify the extra complexity?

Would you recommend other options be considered instead, or would you recommend one of the options listed above instead of the others?

   

18 Comments

  1. user-8e470

    Reply by Dan Gillman

    Thank you for clearly describing this problem.  When we were modeling GSIM, I knew we didn't really have Units right, but I figured we would get back to it later.  I guess later is now.

    As you point out, there are 3 ideas in use, and we have modeled 2 of them.  Those 3 are

    1. the units being observed
    2. the population the units belong to (which as you say has time and space attributes)
    3. the type of units being observed

    #1 and #2 are already modeled in GSIM.  As for #3, Sundgren and others refer to this as an Object Type.  Given that we are already using the term Unit to mean something else (#1), then I propose we use Object Type or possibly Unit Type to refer to #3.

    I am afraid there is no good, direct way to account for all 3 classes with the 2 we have.  Your example where Dan Gillman (#1, a Unit) is both an employee (#3, Object Type) and person (#3, another Object Type) is exactly right.  Likewise, some Unit (Dan Gillman) might be a member of many Populations (#2), such as US Federal Employees in 2013 or Persons in the US in 2013.  Finally, the Object Type is related to Population in the obvious way by seeing that Object Types are constituents, just as time (e.g., 2013) and space (e.g., US) are.  Object Types may be specialized, e.g., going from Employee to Federal Employee.

    It looks like Object Type should replace Poulation as a role for a Concept, and Population becomes an intersection of Object Type, Time, and Space.  How this is done isn't clear right now, we may have to reconvene a group to think it through.  In any case, I think your Option 3 is the way to go.  Option 2 will work, but I like the purer approach in Option 3, and we don't know right now if having Object Types as a class in its own right will simplify other problems that crop up.  Option 2 is much more constraining.

    Stats Can put together good lists for Object Types (called Object Classes as they are in 11179) about 10 years ago.  They should be incorporated.  The basic kinds are consistent with work Sundgren did to help make the creation of SDMX DSDs easier.

  2. user-cda26

    Hi Dan,

    I think I agree with what you have replied to Al above, that the use of Unit Type, particularly based on Al's option 3, seems like a good way to go (setting it up as another object, potentially the same as/very similar to the 11179 Object Class). However would this also mean that the role of Population in the (3) GSIM variable structures may also need to be reconsidered?

    As Al mentions in his example, we tend to refer to the 'types' of units (ie Person, Household, etc), rather than the particular unit or population. In terms of a Variable, should we consider that it is in fact made up of a combination of the Unit Type and Concept (bringing it closer into alignment with a 11179 DEC), rather than the current Population/Concept combination? For example, do we tend to refer to a variable as "Person's Age" (ie Unit Type/Concept) or "All Person's in Australia, Age" (Population/Concept).

    Would this then mean that the Variable/Represented Variables themselves are more re-useable, as they are defined based on a Unit Type, rather than for a particular Population, and it is the use of the Instance Variable in a particular study/dataset that would be further defined by the population.

    However, if the Variable is defined thorugh the use of Unit Type, then potentially there is no direct relationship between a population and a variable/represented variable itself, it would occur through the Instance Variable -> Datum -> Unit -> Population relationship chain?

     

  3. user-07a97

    Whilst I follow the arguments and possible solutions I am concerned that maybe the problem is more general than the specific use case you mention. In the solution of Object Type the relationship between Unit and Object Type is “many to many”. However, to extend this would not “person” also be related to objects other than a Unit? Perhaps other classes in the model may also wish to have a relationship with “Object Type”. Now, one can model this explicitly each time we discover this, or we can have a generalised construct that can relate any object with an “Object Type”.

    This was a problem we had in SDMX as it was not possible to predict how organisations would wish to “classify” other objects in the model. This is the purpose of the “Category” in SDMX (not the same "class" as the Category in GSIM but you can see why it was named category as it “categorises” other objects). So, in SDMX we have a “Categorisation” object that links one Category with any other SDMX object. A specific Category such as “person” can have many Categorisations, each of which links to one specific object such as the “Dan Gillman” Unit. Equally, the specific object, such as the Unit “Dan Gillman” can link to many Categories such as “person” and “employee”.

    Now, this example does no more than link a Unit to one or more Object Types or one Object Type to one or more Units. But in this more generic model it would allow the Object Type such as “person” to link also to a specific survey instrument or business case or any other object in the GSIM model. In SDMX we have found that in practice this generic model can be used to solve many use cases which were not all foreseen. One such use case in SDMX is to “classify” a particular process step with both a subject domain “Category” and a GSBPM process type (e.g. 1.2) “Category”.

    In order to assist with “discovery” SDMX groups different “Categories” into “Category Schemes” but this is already supported in GSIM if the Object Type inherits from Concept as Concepts can be grouped into one or more Concept Systems. So, in GSIM you can group “Unit Type” Concepts into a Concept System and then you have the “controlled vocabulary” for Unit Type.

    If you think the use case you describe with Unit and Unit Type (Object Type in Dan’s response) is perhaps just one of many similar types of use case then having an “Object Type Link” object that has an association to an Object Type class, and an association to any “class” in GSIM would solve this.

    In modelling terms we would have two new classes:

    1. An ”Object Type” class (sub class of Concept)
    2. An “Object Type Link” class which is associated to exactly one “Object Type” and to exactly one “Identifiable Artefact”. Interestingly this would also allow a Concept (which is an Identifiable Artefact”) to be linked to another Concept in a specific context as understood by the linking Concept and perhaps the Concept Scheme in which the linking Concept resides. However, this may not be a use case.

    I realise that in modelling terms these are just about the only two classes you need in any model (which whilst theoretically possible is not very useful as no-one has done any hard work to find out what is really required) it is nevertheless quite a useful construct if used within the context in which it is designed.

    I am not suggesting that the class names above should be used (as “Object Type” in object modelling terms tends to mean “Class” and could be confusing), I am just using the terminology of the thread so far.

  4. user-9f32a

    We have also discussed this issue. In the Statistics Sweden information model we have "object type" as an information object and we felt that the "type" was missing in GSIM. When we were discussing this our solution was along the same line as Helen is writing above and what Al has in proposal 3. For us object type and Unit type are the same things.

  5. This issue has also come up at Statistics NZ. In previous modelling we have done we have included the idea of a 'unit type' as a necessary item. I agree with Al's suggested option 3.

  6. user-1e833

    Hi all,

    I know there is already sort of a consensus in adding a new object for Unit Type so before doing that I would like to incorporate my point of view.

    My point is that we don’t need an extra object in GSIM to refer to unit type because Unit Type is an inherent characteristic of the Population, survey, framework, etc... object of our statistical study. Unit Types have similar characteristics as Values of Variables defined by a Classification (see Variable definition from GSIM 1.0).

    For instance:
    Take a certain statistical activity like Labor Force Survey and take “Employed Persons”. The Units are “Persons” and “Employed” is a Value element of the Variable “Employment status” defined by the employment status Classification (following certain methodology).
    So, in this case, Population would be “Employed Persons”, defined by Unit and the Variable as the concept playing the role of a characteristic.

    I guess this aligns more with the Option 2 described by Alistair and ideas by Helen's post.

  7. user-cda26

    Hi All,

    The attached two documents are our 'homework' from the last meeting - a suggested object definition for 'Unit Type' - for discussion at the meeting soon.

    The word document is an outline of what the new object could be, based on the template we used during GSIM development (so hopefully familiar!). At the end of the document (where the relationships are usually detailed) I have two suggestions, but am happy to get further input as part of the discussion.

    The PDF document is a rough diagram showing the potential of where Unit Type could be included in the GSIM model.

    Have a great day!

    Helen

    Draft - Unit Type diagram.pdf

    Draft - Unit Type.docx

     

  8. user-1e833

    Hi all,

    Let me elaborate a little bit more my misgivings about this issue through some examples. I’ve been reading all posts again (mine included) and maybe we need to clarify what we understand by Unit and provide more examples. 

    1. Let's consider Person as the ultimate Unit, i.e., with no restriction at all. For any given study we need to specify a framework, that is, we need a Population. We do this following certain methodologies and classifications, never at random.

              From this follows that anything that can be measured about a person can be expressed using Variables which elements are defined according to a given Classification.

    2. Take persons as human beings. Take those who are alive at this moment. From them, take those who have USA nationality. Now consider those residing in the USA at this moment. From them, take only those who are older than 16 years old. We could go on until we  end up identifying one particular individual.

    3. So the question: Is Persons (without restrictions) a different object as Persons when we consider whatever filter?. Could we then say:

    Unit Type: raw object of study: Persons (human beings), businesses, etc

    Unit: object of study associated with a Population: employed persons in USA

    a statistical process, is it relevant to explicitly define an object for Unit Type or could we incorporate it as part of the methodology that describes the variables which determine the Units object of our study?. What difference does it make (in modeling terms) that our statistical study be aimed at Persons or Businesses?.

    5. In GSIM1.0 paragraph 66 we find the following example:

    Population: Adults in Netherlands

    Variable: Educational attainment

    We could also say that the characteristic of being "adult" is defined somehow and it is part of a Classification so we could have a Variable for it: Age group. The same applies to “being from the Netherlands". It is easy to think about the Variable: Nationality. What is left here?. The Unit or Unit type: Person.

    6. About Helen’s diagram and Unit Type class:

     At the end of the day, everything we say about a basic Unit Type can be defined or classified somehow and we don't really need two different objects for Units, just one of them. In the proposed new diagram by Helen I would take Unit type out and, if necessary, only incorporate the Classification object.

     Considering the proposed definition of Unit Type: "For example, the Unit Type of “Person” could group together a set of Units based on the characteristic that they are ‘Persons’." But is not that "group of units based on persons" what we understand by elements of Variables which ultimately define a Population, either it be of Persons or Businesses?.

    Maybe we DO need to add an extra object but the examples and the arguments provided are still not clear enough to me.

    Thanks.

  9. Approached from a purely conceptual perspective there are two pieces of information we are trying to capture: information about individual units e.g. 'Alberto' 'Adam' and information that describes a set of units grouped based on a particular set of characteristics e.g. 'Person' or equally 'persons working on gsim'. 

    Considered in this way, the degree of aggregation used to describe the set of units is irrelevant. Grouping a set of units into 'people' or 'persons working on gsim' is the same action with the same relationships to other objects. Grouping these to different levels of aggregation does not mean that these need to be modelled as separate information objects.

    This is how gsim currently models this information: unit and population.

    In an ideal world this would be sufficient to capture this information with the proposed 'unit type' being represented as a population with subtypes to capture the more specific constraints on the units included.

    However, in day to day operations of a statistical agency people do tend to separate out these concepts and as a result the 'unit type' is often discussed separately from the idea of a population.

    The inclusion or not of an information object representing the highest aggregation of a group of units may not be a debate on whether or not such a concept exists as a separate entity but what will be of most use to people in stats agencies trying to understand and make use of gsim.

    To support the aim of gsim as a communication device and seemly fairly common usage of he concept perhaps this is an information object worth including I the mode but leave open to individual implementations as to whether it is used at all or instead included as a 'top level' population.

  10. user-1e833

    Adam, I think your post is quite interesting and it raises the question on the limits of GSIM. It looks like the idea is to include more objects and leave it to the particular implementations of the standard whether to use them or not. This model will be flexible enough to accommodate perfectly all different use cases but will probably be too complicated for the implementer to understand globally.  

    A different approach would be to stick to the "ideal world" and create a more concise and stable model, more theoretically founded. A third way could be to have both worlds in a 2 - tiered GSIM. 

    Anyway, this is not the place for this discussion but we see there are many objects in GSIM that people have issues understanding: unit vs unit type, category item vs category, code item vs code, etc... maybe we could do with less objects?. 

    Thanks.

  11. While I think I understand Alberto's perspective, I think for a variety of conceptual and practical reasons Unit Type is a commonly enough used and distinctive enough concept to be included in the next version of GSIM.  It helps alignment with ISO 11179, and I think that is not for an entirely arbitrary reason.  Take three examples : "Age of Person", "Age of Business" and "Age of Motor Vehicle".  Are these all the same variable?  They're all ages - but I'd see a conceptual difference between the age of a person, a business and a motor vehicle.  11179 would tell us they are all different Data Element Concepts.

    Note that when it came to Age of Mother vs Age of Student then I become more comfortable with Alberto's approach.  I'd be happy to say "I mean fundamentally the same thing by 'Age' but I am using a different qualification/classification applied to the base Unit Type".

    This is why, as I think Statistics Canada have done, in implementation I'd suggest a small as possible number of distinct Unit Types with qualifications/classifications being used to define a wide range of populations based on them.

    I agree there are shades of grey in this case, but I think Unit Type is dark enough to warrant inclusion.

  12. I agree completely with Al's analysis of how the characteristic age differs in its various uses.  Further, the use of Data Element Concept from ISO/IEC 11179 to illustrate those differences is exactly right.

    I also agree with Adam, that Unit is for individuals - Dan Gillman, Adam Brown, my dog Luigi, Lassie, etc.  The purpose of Unit Type is to have a non-contextualized aggregate for Units.  We use Population to place context on a Unit Type, for the time and place attributes associated with a Population are what differentiate that from Unit Type.

    Alberto is right that Unit Type could be modeled as an attribute of Population, but GSIM is supposed to be the fully blown out conceptual model.  Also, Population and Unit Type have different relationships attached to them.  Therefore, I want to see Unit Type and Population both as classes in the model.

    The idea that you can specialize a Unit Type down to one individual Unit is interesting.  That is the same idea as concatenating classifications on a large micro-data set to disclose records on individuals.  However, there is a terminological difference between the individual and the specialized Unit Type tht only contains one individual.  The specialized Unit Type is still a concept, and the individual Unit is an element of the extension of that concept.  They are not the same things.

     

     

  13. user-8e470

    Recommendation:

    Add unit type as a new object

  14. We might want to add the reasoning behind.

  15. I suggest replacing Population with Unit Type in the Concepts model.  This will solve the problems of using the description of Variables to describe data that are not statistical.  Unit Type will be just a synonym of Object Class as it is used in ISO/IEC 11179.  There will be no need to link Population to Instance Variable.

    We will need to think about where to put Population.  It may not belong in the Concepts section, as it is used in purely statistical situations.

  16. user-8e470

    added by Thérèse Lalor because it might be relevant here:

    GSIM Issue: Observation Unit is a structured collection

    In GSIM, Observation Unit (in Fig 11 called Collection Unit?) is defined as “A  Unit for which information can actually be  obtained during data collection”.

    Unit is defined as: “The object of interest in Statistical Activities and corresponds to at least one Population”.

    In our modeling of the data collection domain, we found that an Observation Unit most often is a structured set of objects. The examples given with the Unit definition therefore seem to be a simplification. In practice, we see examples such as: An Address “Broadway 12024” with the two Households living there, where Household 1 consists of Persons Mary, John and little Richard and Household 2 is Claus living by himself. Of each of these Persons we want to know the kind of Healthcare Services or facilities they have used in the last year. Instead of Healthcare, we might be interested in the mobility (trips or movement, a structured object in itself), the financial or economical transactions, etc.

    This Observation Unit thus consists of objects of four different types, related to each other in a structured, layered fashion. The Analysis Unit in this example might bePerson.

    The layered structure of the Observation Unit is reflected by the Unit Data Structure, through the Record Relationschip that ties together the instances of Logical Records(each one representing an object in the real world, like Broadway 12024, Mary, Claus or Healthcare service XYZ).

    Looking at Units this way, the question arises “What is Population in this context?”. Does each object(type) (Addresses, Households, Persons, etc.) have its own Population? If not, which of these object types then defines the Population?

  17. I think we discussed this in the past - the relationship between a population and an object type or unit type.  A population is an object type along with some geography and time specified.  So, the people residing in the US in November 2013 is a population based on the object type (persons), the geography (US) and time (November 2103).  We further specialize persons to be residents of the US, as people can either reside in the US or not.  That is based on some (simple) classification.  All (as far as I can tell) specializations of populations are based on applying one or more categories from one or more classifications to the basic idea of the applicable object type.

     

    Any relationships among object types in use for a statistical activity would be directly inherited by the relevant populations.  I would think the same structure applies to both the object types and the related populations in use for a specific instance of a statistical activity.