Reply by Dan Gillman
Thank you for clearly describing this problem. When we were modeling GSIM, I knew we didn't really have Units right, but I figured we would get back to it later. I guess later is now.
As you point out, there are 3 ideas in use, and we have modeled 2 of them. Those 3 are
- the units being observed
- the population the units belong to (which as you say has time and space attributes)
- the type of units being observed
#1 and #2 are already modeled in GSIM. As for #3, Sundgren and others refer to this as an Object Type. Given that we are already using the term Unit to mean something else (#1), then I propose we use Object Type or possibly Unit Type to refer to #3.
I am afraid there is no good, direct way to account for all 3 classes with the 2 we have. Your example where Dan Gillman (#1, a Unit) is both an employee (#3, Object Type) and person (#3, another Object Type) is exactly right. Likewise, some Unit (Dan Gillman) might be a member of many Populations (#2), such as US Federal Employees in 2013 or Persons in the US in 2013. Finally, the Object Type is related to Population in the obvious way by seeing that Object Types are constituents, just as time (e.g., 2013) and space (e.g., US) are. Object Types may be specialized, e.g., going from Employee to Federal Employee.
It looks like Object Type should replace Poulation as a role for a Concept, and Population becomes an intersection of Object Type, Time, and Space. How this is done isn't clear right now, we may have to reconvene a group to think it through. In any case, I think your Option 3 is the way to go. Option 2 will work, but I like the purer approach in Option 3, and we don't know right now if having Object Types as a class in its own right will simplify other problems that crop up. Option 2 is much more constraining.
Stats Can put together good lists for Object Types (called Object Classes as they are in 11179) about 10 years ago. They should be incorporated. The basic kinds are consistent with work Sundgren did to help make the creation of SDMX DSDs easier.
ABS Problem Statement
The ABS (Australian Bureau of Statistics) is currently developing a Metadata Registry and Repository (MRR) aligned with GSIM. While aligned with GSIM, the definition of objects for the ABS MRR will requires attributes additional to those which are defined in GSIM as a generic model.
A question has arisen in this regard when it comes to the modelling of Unit in the GSIM V1.0 specification.
The three examples given all relate to indivudal people, companies etc. Being able to identify a specific unit in this way is important when, eg, populating a register/frame or the unit identifier field in a unit record file.
It is very common also, however, to talk about "types" of statistical units (eg persons, families, enterprises, registrations, transactions)
While related, the concept of an indivudal unit and the concept of a type of unit don't seem equivalent. It seems unlikely, therefore, that "persons" is simply a Unit in the same way as an individual person.
The alternative of saying "persons" is a Population also seems awkward. A Population will usually have at least some form of temporal and spatial scope. In theory "persons" as a unit type refers to all persons ever born, or yet to be born in the future. This seems different in concept to the population, eg, of persons on Planet Earth (which would typically only include people who were alive at a particular time).
A couple of possibilities come to mind for capturing "unit type" as an attribute. The attribute might be, eg, populated from an extensible controlled vocabulary of recognised unit types (at an agency specific or international level).
The attribute could be applied to Unit. This might (or might not) imply the list of unit types should be limited to mutually exclusive "base types". (At the conceptual level, are Dan Gillman as a Person and as an Employee referring to two different Units?)
Perhaps it would be more appropriate to attach the attribute to Population to describe the "type" of Units (which would be subject to more detailed scoping in the definition of the Population) from which the Population is composed. For example US persons and Australian persons are
- different Populations
- composed of different individual Units
- built from the same "type of unit"
Unit Type could be added as an object in its own right. This would allow Unit Type to have attributes and relationships of its own - including with other Unit Types. For example, it would be possible to define "Employee" as a specialisation of the "Person" Unit Type. (If "Legal Entity" were ever a Unit Type it might be a specialisation applicable to either "Person" or "Corporation"?)
I think that adding a new object tends to add more complexity to the model than adding a new attrribute to an existing object. Option 3 would need to add significant advantages over Option 1 or Option2 in order to justify the extra complexity?
Would you recommend other options be considered instead, or would you recommend one of the options listed above instead of the others?