Issue #50: Statistical Classifications and Code Lists - Generic Statistical Information Model

user-8e470

Meeting 24/1:

Level of detail. We are adding additional objects from LIM into GSIM (Measurement Type, Measurement Unit, Sentinel Value Domain, Substantive Value Domain, Universe). There are some opportunities to simplify. Particularly in the Code, Code Value, Designation, Sign part of the model ACTION: Ask for examples from countries on how they are handling this. Alistair Hamilton Rebecca Stoks user-0ee85
Differentiation between statistical classification and code lists. Code lists can also have series etc. ACTION: Ask for examples from countries on how they are handling this. Guillaume Duffes Essi Kaukonen Catrin Karling Mauro Scanu Alice Born We could ask Estonia (Essi to ask).

Permalink

24 Jan, 2018

Alistair Hamilton

In terms of Item 1, I understand conceptually why those four classes you mention are included in GSIM currently. For implementation at the ABS, however, any practical value they might add for understanding, managing and reusing metadata was seen as outweighed by extra complexity for business implementers and either extra work by specifiers and users of metadata to get them right OR definition of "arbitrary" metadata related to them.

For ABS "Code Value" is now an attribute when defining a Code Item rather than a separate class.

In effect we use variant labels to support different "designations" for code items, including sets of designations that might look like codes but do not structurally correspond with how data associated with that code item is actually coded (which is Code Value for us).

Sometimes, for example, we disseminate output with labels that look like codes, or a radically abbreviated name, such as "NSW" for "New South Wales", even though the data in the ABS is coded with (eg) "1" - which would be the Code Vale for the Code Item.

think the opportunity to "trim complexity" in one area to offset adding complexity in others is a good one to explore. All the things you have listed as additional from LIM reflect things the ABS implementation has had to address in practice - although we have slightly different (but not better) solutions in place currently for some of them.

Permalink

25 Jan, 2018

user-8e470

From Jenny Linnerud: (copied from Issue #22: Missing Composite Aggregation for Code Item, Code List, Classification Item, Statistical Classification)

Another plea from me related to this theme. Most people in statistical organisations think they know what a Code List is.

In the model you can easily og from Code List to Code Item, but then you drop off the edge of the GSIM planet trying to get to Code and Code Value via Node Set, Node, Designation and Sign.

Reading the definitions and examples for Code, Code Value and Code Item does not make it easier to know whether the Code is a short version of the Code Value or the opposite ie Is Code Item <Code, Code Value> or is it <Code Value, Code>. For example Code List: Gender. Code Item: <1, male>, but is the Code Value 1 or is it male?

Code is subtype of Designation, Code Value is subtype of Sign and Sign encodes Designation so I suspect Code Value is 1 and Code is male, but that was a lot of work to get there. Not many people working in statistical organisations would claim that they know what a Sign and a Designation are, but they do think they know what a Code List is.

An easy solution would be to make this explicit in the Explanatory text for Code Item. A more radical solution woulld be to dispense with Sign & Designation....

Permalink

25 Jan, 2018

Guillaume Duffes

Regarding item 2. discussed yesterday: as explained during the call, we try to tag as statistical classification only these that comply with the basic business rules (i.e exhaustive and mutually exclusive). That's quite a Our classification management system is only able to manage the statistical classifications, the series and the families, not the variants yet. In the long run, we intend to consider the variants that are not compliant with the basic rules as simple codelists.

As for Jenny's comment, we took the liberty of not implementing Sign and Design that very few people would have understood, and that remain conceptual objects as far as we understand them.

Permalink

25 Jan, 2018

Alistair Hamilton

Same story for ABS in regard to Statistical Classifications needing to provide exclusive and exhaustive categorisation at each level (as GSIM specifies). They are subject to more formal definition and governance than some Code Lists. A particular example is "ragged" structures where (eg) some items at a low "complete coverage" level may be subject to further decomposition where other items are not.

For example in our Industry Classification (ANZSIC 2006) the Group Level item 016 does not break down further at the Class Level where the group level code 017 breaks down to 0171 and 0172. Currently for the Statistical Classification 0160 is added at the class level to ensure completeness at that level (although it is sometimes argued another Classification Item - they belong to 1..1 level...with the Code 016 could be added to the Class Level rather than Code 0160). The Subdivision 76, in fact, does not subdivide further at either the Group or Class level but just arbitrarily becomes 760 and 7600.

A Code List can simply add code items for that further decomposition. In the above example

This might be by defining a only 0171, 0172 at the Class Level with nothing added to the Class level below Group 016, or
By defining no levels explicitly at all, just list all valid codes regardless of the level they might be thought to be at (eg including 0171, 0172, 016, 76 and possibly not including 017 if the aim is to force capture of data with greatest possible specificity of coding.

This is just one example of how Code Lists tend to be defined more pragmatically (purpose specific). Other examples include use of "Sentinel" values in Code Lists where these are typically not built into formal Statistical Classifications. Various Code Lists based on a formal Statistical Classification might support a range of a different "pragmatic" features relevant to their more specific purposes/contexts.

Permalink

28 Jan, 2018

Essi Kaukonen

Hi all!

I would like to bring defining the Code Lists and the Statistical Classifications and their relations into the discussion.

You could say that in order to enhance the reuse of Statistical Classifications and Code Lists in practice, all variants should be included in the Classification Series they belong to. However, if the Code Lists do not obey the business rules of a "true" Statistical Classification, they do not belong to a Classification Series at all? In other words: does the GSIM allow the possibility, that a certain Classifications Series can include both; Statistical Classifications and Code Lists?

In practice, for a statistician, it is hard to distinguish between Statistical Classifications and Code Lists. In our current system renewed 2016 the users do not necessarily need to do this. They can tag, whether the Node Set (we call them Classifcations) is a Statistical Classification or a Code List. Mostly users do not do that. But, in principle, one Classification Series can include both Statistical Classifications and Code Lists. What is more important, the Statistical Classications that are international or national standards / recommendations can be marked. In practice the rules related to Code Lists are not too tight, you can, for instance, have the very same Code List Item Name for two Code List Items at the same level; Postal Codes are a good example here…

In our GSIM Implementation we did what we did due to several reasons. Firstly, the long history of our central system for Classifications dating back to the beginning of 90's including both Statistical Classifications and Code Lists, secondly, the decentralized governance model based on decentralized updating practices of classifications and, thirdly the fact that we had tested the GSIM Statistical Classification Model during our development project, before it was attached to the whole GSIM mode, where all issues affecting our implementation.

Nevertheless, one main question remains: how to differentiate Code Lists and Statistical Classifications and what are actually their relations. Could we see the basic conceps and their relations also like this?

Permalink

19 Feb, 2018

Alistair Hamilton

ABS situation is similar with rules for Code Lists not being too tight and Statistical Classifications being focused on "well defined and designed standards" which follow "exclusive and exhaustive by level". Flexibility for Code Lists includes - particularly for dissemination purposes - code lists that "splice selectively" from more than one statistical classification + throw in some extra items that aren't found in any of the classifications. Some of the National Accounts output structures are an example that come to mind, but also some of our social survey outputs. This leads us to aspire to being able to provide simple traceability between Code Lists as (sometimes) pragmatic purpose specific "facades" grouping relevant concepts that, for standardisation purposes, "belong" to Statistical Classifications.

This could still be done with your proposal, as the (eg) two underlying classifications would each be a Code List and the "pragmatic" structure would be a third Code List. GSIM allows a Correspondence across more than two node sets. In practice, however, our business staff tend to focus on Correspondences between standard classifications (where there is a lot of reuse of the correspondence) and don't seem keen on applying them as means to relate "pragmatic" Code Lists back to more fundamental and standardised building blocks - recognising that not every code item will necessarily have a corresponding standardised item, or set of items, in a standard classification.

Permalink

19 Feb, 2018

user-8e470

New Zealand:

Issue #50: Statistical Classifications and Code Lists

[From our classification experts]: We use the Aria data model which incorporates Neuchatel as part of the underlying model. I think the relationship between codelists and statistical classifications has got to be thought in the context of their purposes – there is a relationship but the two aren’t inter-dependent of each other, and are used for different purposes. We’d see the codelist as more equating with a category and then views (or classifications) can be created, but the best practice principles that apply to a statistical classification don’t apply the same to a codelist. The business rules for a classification are clear enough but whether the distinction from a codelist needs to be also added in, I’m not sure. The code itself is just a placeholder for the data but for a classification it is sort of a building block ie the hierarchy and sequential rules around structuring. Is important to understand that statistical classifications are no longer the traditional thing they once were and that for GSIM to work, the relationship to concepts and how views are created is the critical aspect. So there may be benefits in simplifying the code and code value although I do feel that a lot of this is just getting into extreme detail and delineation for the sake of it.
[From RS]: Not being a classification expert, I do find this part of the model slightly confusing to start with so I think it would benefit from some simplification and more examples in the text. I did wonder whether designation needed to be an object or just a relationship. I agree with Jenny, that I would expect there to be a more direct relationship between code and code item.

Permalink

20 Feb, 2018

user-8e470

Estonia:

I am totally agree with the arguments that Alistair, Thérèse and Guillaume said. I have gone through the same route as they did to understand all the object and attributes in the GSIM model.

Concerning the use of GSIM actually we do not have implemented it. Our current metadata system is based on Neuchatel model, and I found some years ago that the developer has made it incorrectly (and this is not fixed I and will not do anymore), model itself is in place, but the idea that categories should be reusable, this is not in place, and for the management of code lists this is a little bit restrictive, but we try to survive 

Currently we manage both the simple code lists (for example provided answers to the questions within the questionnaire) and the standard code lists (SDMX) as a floating ones. The problem for us is that the codes that are assigned to the categories should be always the same or we have to manage separately the code lists with the same categories but with different codes. But this kind of management of the code lists is very exhausting (multiple management of categories).

Anyway, we are moving towards implement the GSIM model and using for that Colectica and DDI. I have studied GSIM quite deeply already (I am not aware is it clear enough for me already or not), and I can say that Colectica tool has been very good “teacher” for teaching me to better understand the GSIM. I try to share my experiences and knowledge I have so far.

The GSIM model is incorporated into DDI standard and there are following elements (according to GSIM these are attributes if I am not wrong) for category: Name and Label. For example for the gender code list “M” and “F” are names, “Male” and “Female” are labels, and when I use this code list within the questionnaire I can assign the codes “1” for “Male” and “2” for “Female”, but I can assign codes like “M” and “F” also. This construction is good, as the categories remain the same (and are reusable), but I can assign different Codes for certain circumstances (for example client of the survey or statistical activity provides ahead a concrete codes she/he is using or simply to list categories by a range numbers). For the official statistical classification categories the “Name” and the “Label” would be both mandatory, then for simple code lists only “Label” would be mandatory, because “reusable” includes both of Name and of Label together. For SDMX standard code lists both elements would be mandatory.

Trying to map between DDI, GSIM model and the statistical classification model (https://statswiki.unece.org/display/gsim/4+_Object+types+and+attributes), then it is not very easy, especially to understand the GSIM model (as Thérèse pointed out).

I tried to map these models (I have added also the element from DDI, this helps me better to understand what one or another attribute should be comparing the descriptions of each other). Comparison between statistical classification model and DDI is clear, but GSIM model – have to take more time to ingest it . Anyhow I feel that GSIM model wants also follow this DDI feature to assign “Codes” as needed.

Permalink

20 Feb, 2018

user-8e470

Response from Al:

Awesome explanation!

Some of this may come back to GSIM being a conceptual model where multiple agencies look at it and do more detailed implementation/use in different ways. The multiple ways are each consistent with GSIM but not necessarily consistent with each other.

"Name" is inherited all the way through GSIM and our ABS Information Model so we try to make Name meaningful. Therefore, for example, "New South Wales" might be the Name but we might have a Label "NSW" to be used for form design or dissemination where space is tight. We have also added in AIM for Code Items a specific "Code Value" that would hold "1" in this case - our standard code for NSW.

We haven't implemented it yet (and I wouldn't recommend it within the ABS) it is conceivable in regard to Gender there could be a non default label of "Boy" rather than "Male" for use in specific cases where that label was appropriate based on, eg, population definition or other aspects of context.

In effect, the Name becomes the default for Label but can be overridden by a more specific choice off label set.

This can be seen as a bit of "fudging" in regard to Designation in GSIM (which we haven't implemented separately) but it made sense to our business areas that had to populate and use metadata.

"Code Value" which AIM adds to Code Item is like "Code" for Classification Item in GSIM.

We have talked about, but not implemented, the possibility of being able to make Code Lists "shells" on top of Statistical Classifications. This gives us a somewhat uneasy relationship between what should be defined as a Code List vs a Statistical Classification (we primarily have many of the former and only true statistical examples of the latter). In particularly, non standard coding and/or naming for semantically standard Classification Items tends to be embodied in Code Items in a Code List for us in practice.

Estonia's (not specifically GSIM based) approach may have this nut cracked so names for Code Items can be, functionally, Codes.

Permalink

20 Feb, 2018

user-8e470

Dan:

It has been a while since I weighed in on one of these modeling and implementation problems, but here goes. I hope you find this useful. Of course anything I might say suffers from the adage that only in theory are theory and practice the same. J

The interesting part of codes, names, labels, identifiers, and other handles we use to “talk about” information objects (I mean these as objects, such as a specific data set, not classes) is the same fundamental thing is being done by all. They get associated with an object, and this association lets us store, retrieve, transform, transfer, and otherwise compute with it (the object). Each one of the kinds takes on its own role, such as a code is used in a data set to stand in place of a category from a code list or a statistical classification. We use the code to store the fact we assign an object (often a survey respondent) to the category the code stands for. Names, labels, etc. have similar roles for each of them. A label, as used in DDI, is a linguistic expression associated with some object. For instance, the US labor force survey is called the Current Population Survey. This is a label for that survey.

This means we can model the association of each kind in the same way. The differences among them are just the roles, i.e., the ways they are used.

From a modeling perspective, this problem of modeling the same general idea for different situations in the same is demonstrated through patterns. A pattern is a set of classes showing the attributes and relationships necessary to account for all situations. Then, classes in the subject matter part of the model ‘realize’ the pattern classes, effecting the ideas in the same way wherever used.

DDI has made liberal use of patterns in the DDI-4 development. I propose you do the same. It means you can manage the strings you use for each of these roles (finding homographs, for instance), and the pattern allows you to write the same code for treating each kind. Other use cases are similarly afforded.

Permalink

20 Feb, 2018

InKyung Choi

Meeting 21 Feb.

Level of detail. We are adding additional objects from LIM into GSIM (Measurement Type, Measurement Unit, Sentinel Value Domain, Substantive Value Domain, Universe). There are some opportunities to simplify. Particularly in the Code, Code Value, Designation, Sign part of the model
=> Guillaume to check implication of removing these objects (how they affect relationship, do we need to add attributes to other objects)
Differentiation between statistical classification and code lists. Code lists can also have series etc.
=> To add explanatory text to Code List (e.g. that it can be grouped together)

Permalink

21 Feb, 2018

Alistair Hamilton

Hi Dan Gillman.

In theory (and/or in Concept) I agree with you .

In the ABS we are generating Java Object libraries from our logical model. This is one reason why behaviourally we need consistent differentiation between ("true") Code Values, Names and Labels (and Registered Identifiers / URNs).

For example, Code Values are what is actually be used in the related Enterprise Wide Data Warehouse. (As mentioned before, we do have use cases where a particular label set may be used as a proxy for codes used outside our Enterprise Data Management Environment supporting "recoding" to/from our internal representation (Code Value) to an external "code like" representation.)

We also need to tightly restrict allowable characters for Code Values because some character set features will trip up COTS products (or even ABS developed products) used to process data encoded to the designated Code Values.

Code Values are also required to be unique across items within a particular Code List. As was noted by the Swedish contribution, there can be cases (usually for multi-level code lists in our case) where Names of Code Items will not be unique. In effect, Code Values "within the context of the Code List" are acting as identifiers for Code Items. (We very briefly considered whether we could use the registered URN for each Code Item, which is its "formal" identifier in our metadata registry, within data but both IT systems and statistical staff were not ready to do away with thinking in terms of Codes rather than "Identity".)

From a metadata governance point of view, our Standards Area has strict guidelines on Naming. Names for code items in standard or preferred standard level code list are designed to stand alone, without relying on context from the parent item, the Code List or the Represented Variable(s) that use the code list. This maximises reusability.

Label Sets, on the other hand, are purpose specific and may include abbreviations and other context appropriate short cuts that would not be appropriate for formal Names. In addition, every Code Item needs a Name (even if sometimes there may be duplication) but some Label Sets may not need to include Labels for every item (eg abbreviations used for a particular purpose may only be relevant to higher level items in a Code List).

Names are, in effect, the default Label unless an alternative Label Set is specified.

How ABS has decided to define and manage identifiers, code values, names and labels is not generic. I totally get that a generic model like GSIM or DDI would not want to build in this specific prescriptive differentiation.

In our own implementation we have aimed to start from commonality across these designation types and only built in differences where necessary for systematic use and preferred business definition and governance purposes.

If our logical model did not call these things out separately at all, however, then given it flows into implementation we would be in trouble.

Permalink

21 Feb, 2018

Jenny Linnerud

I have just realised that Classification Item has an attribute 'code' and I suspect that it is closer to Code Value than to Code. A possible violation of model coherence?

"Code: A Classification Item is identified by an alphabetical, numerical or alphanumerical code, which is in line with the code structure of the Level. The code is unique within the Statistical Classification to which the Classification Item belongs." See 4 _Object types and attributes#4_Objecttypesandattributes-_Toc375048650

I think it might be easier for users if we use 'codeValue' for an attribute for humans and machines and 'codeText' rather than 'code' for humans.

On the other hand, the Statistical Classification part of GSIM is the one with the most succesful implementations, so it might be easier to change Code Value to Code and Code to Code Text!

Permalink

08 Mar, 2018

Guillaume Duffes

Regarding previous comments and the expressed need for simplification:

Removing Designation would introduce discrepancies since it has an aggregation to Node, e.g this would imply it is not possible to link a Code to a CodeItem.
Removing Sign is technically possible, but would imply that Code Value would disappear as well. Whereas I can understand the example given by Alistair "Sometimes, for example, we disseminate output with labels that look like codes, or a radically abbreviated name, such as "NSW" for "New South Wales", even though the data in the ABS is coded with (eg) "1" - which would be the Code Vale for the Code Item.", we see "NSW" in the DDI implementation as an addition label attached to the CategoryItem, and this label is expressed in GSIM as a Sign that encodes the Designation. Same thing for Jenny's in example " Code List: Gender. Code Item: <1, male>", 1 is a GSIM Code and Male a GSIM Sign that encodes the Designation associated with the CategoryItem. I can see two ways of simplifying this part of the model:
1. Either only CodeValue is removed, since it is very confusing and makes people believe it only applies to CodeItem, whereas the model tells it applies to CodeItem as well as CategoryItem and ClassificationItem.
2. Or CodeValue and Sign are removed, and in this case Designation should certainly be renamed (I can't find a proper term, I hope native speakers can help me out ) and populated with an additional attribute like "label". In this case, the model would be close to the solution described by Alistair as I understand ("Code Value" renamed "label" has become an attribute).

Permalink

13 Mar, 2018

Alistair Hamilton

Hi Guillaume.

That's a great summary!

I'd just add, as mentioned earlier in the discussion chain, ABS has applied Code Value directly to the Code Item (Node) for use of that Code Item within the ABS. That means the Node can only have one Code Value (within the ABS / domain of implementation) where, via Designation, in theory it could have more than one.

As noted earlier, also, GSIM already has features that can let you know that two Nodes in two different Node Sets, are the same conceptually even though they differ in representation (eg in terms of Code Value or simply Name or Description).

The ABS constraint is pragmatic, and at the logical level.

Our Enterprise Data Management Environment, and applications using it, can be confident the Code Item will be represented in the data one way, and only one way, within the ABS environment
We can constrain the character set specifically for Code Value for internal purposes but still support external "code" requirements - that might include characters we have ruled out. We do this by treating these "external" codes as purpose specific label sets that are a trivial transform from the ABS internal representation to the required external representation.

Either of your two options make sense to me. At a pragmatic level (although I understand the concept) the ABS does not separate the sign from a designation in context. (Conceptually we could be seen as using the Designation to Concept link to be able to say "What type of label is this, to what label set does it belong?)

In a less formal sense "Designation" might still work for your Option 2. While ABS uses "Label" a slight draw back of that is that labels are usually thought of for human consumption. As mentioned, one possible use for "Label" in our case is a Machine to Machine data transfer where at the ABS end we use Code Value (bound to the Code Item) as the representation for the Code Item but another label set can contain the "code" that the provider to the ABS, or the consumer from the ABS, associates with this Code Item.

Cheers

Al

Permalink

13 Mar, 2018

user-8e470

Meeting 14/3: Final decisions (so we don't have to read through again)

ACTION: Propose that CodeValue and Sign are removed, and in this case Designation remains.

ACTION: => To add explanatory text to Code List (e.g. that it can be grouped together)

Permalink

14 Mar, 2018

Guillaume Duffes

Regarding last Thérèse's e-mail. I would say:

Both codelists and classifications can be related to Enumerated Value Domains – relationship already added to model": OK
"Code lists may contain both substantive and sentinel “categories” (value domains), and statistical classifications only contain substantive categories. - relationship already added to model": OK
1. Classifications have a self referential relationship “is based on”, codelists inherit the self referential relationship from nodeset “relates to. – could look at this?” Great catch. That means that StatisticalClassification has both relatesTo and isBasedOn self referential relationships, or that isBasedOn is a subtype of relatesTo. However, subtyping relationship in UML is not something I ever seen (I guess it is not allowed) unless by introducing complex and subtle constraints. My guess is that "isBasedOn" was intended essentially to cover the relationship a statistical classification and are enoughone of its variants even though a bunch of attributes already exists which could be redundant. So either we remove the "isBasedOn" and assume the inherited "relatesTo" and the attributes on variant are enough, or we rename "isBasedOn" as "isVariantOf" (assuming that the other types of relationship between statistical classifications are handled by the inherited "relatesTo" and remove the attributes "Variant" and "Variants available". I would tend to be in favour of the second options that models as a real relationship the link between a variant and its statistical classification as well as gives a role to "relatesTo" at the StatiscalClassification" level. What do the others think? Alistair Hamilton, Alice Born, Flavio Rizzolo, Essi Kaukonen?
Classification Series groups Statistical Classifications, not Code Lists – why should they to be grouped? (I have added an explanatory note to codelists, but no change to the modelling). Is it worth exploring whether the classification specific objects can be broaden to other nodesets (ie codelists)? The main candidate would be classification series…but then there are flow on effects to Classification family, classification index, classification index entry. ClassificationSeries as a composition of statistical classification is a business notion inherited from the former duality Classification/ClassificationVersion (I guess that Dan Gillman could talk about this for hours ), so it is more than a grouping. Grouping of codelists is something that could be also useful, but I see it more as an adhoc stuff, for dissemination purposes or in a scheme (in DDI terms) for access rights management purposes. Here it is again, the issue on "should be in the conceptual model or be left to the implementation"? Whereas I don't see any major problematic consequence to add a self referential relationship "groups" to NodeSet (anything can be grouped in adhoc manner, classification series, index, concepts, levels, etc. DDI 3.2 does it everywhere), I don't see either any significant value added to this in the conceptual model. So my feeling is mixed on this, if somebody has a strong business case on this, it would be helpful

Permalink

13 Jul, 2018

Flavio Rizzolo

I tend to favor the second option in (2.), i.e. renaming isBasedOn as isVariantOf and removing the attribute Variant and Variants available.

In fact, there are several attributes that could work better as associations, e.g. Derived from, Predecessor, Successor. Can we create associations instead or use the existing ones? It's more difficult to make a link machine actionable when embedded in a textual attribute. I haven't looked at the latest EA file for Classifications, so perhaps this has already been addressed...

Permalink

13 Jul, 2018

Alistair Hamilton

Option 2 broadly makes sense to me.

Due to change management considerations, when similar issues arise with the ABS logical model (AIM) wherever possible we tend to address it by implementation advice rather than by renaming attributes and relationships - eg use "isBasedOn" for the relationship between a Variant and its "base" Statistical Classifications, use the inherited "relatesTo" for all other relationships.

AIM has the relationships in question inherited from GSIM 1.1 and probably wouldn't rename in line with GSIM 1.2 because of the impact that would have on our existing metadata and tools. As a conceptual model, however, renaming to clarify intent is probably fine for GSIM?

Whether existing variant related attributes becomes redundant probably touches on GSIM as a conceptual model and "Neuchatel" documentation considerations.

Certainly the relationship allows "structural" determination of relationships in many cases. On the other hand the current definition for the string attribute Variant is

For those Statistical Classifications that are variants, notes the Statistical Classification on which it is based and any subsequent versions of that Statistical Classification to which it is also applicable.

As the Variant may be consistent with some versions of the "base" Statistical Classification but not others this might require allowing "isVariantOf" to structurally link to multiple Statistical Classifications which are versions of the same "base"?

As Flavio Rizzolo suggests, other current string attributes could be made structural. One issue that can arise in such cases, however, is that for a particular classification database that implements GSIM there may be no desire to fully model locally, for example, an international Statistical Classification (eg ISIC) from which a national standard is "derived". Being able to add a simple text reference for the international Statistical Classification from which the national Statistical Classification is derived may be sufficient for some implementers.

ABS will likely be in this situation with Predecessor. As far as I know the plan is not to "structurally" define all predecessors of our Industry classification series but only the 1993 and 2006 editions (eg not the 1983, 1978 and 1969 editions). Nevertheless, a text attribute for ANZSIC93 might acknowledge ASIC83 as its predecessor. (Historical versions of ASIC appear to only be available in hard copy.)

In general on this topic, ABS has assumed that it is fine for implementers to decide to make some attributes more formal/structural if that suits their implementation. For those who want a "bare minimum" approach to implementing GSIM, however, there is perhaps a case for the conceptual model to support a smaller set of "core" recommended structural relationships?

On that theme, I agree with Guillaume Duffes that groupings of Code Lists tend to be more flexible and ad hoc.

Classification Family and Classification Series have specific definitions, particularly that Families group Series - they don't simply group "non series" Statistical Classifications related to the same concept. As per the examples given in Neuchatel V2, ABS main Statistical Family examples would be Products and Geography (if the latter is treated as a set of classifications).

Statistical Classifications themselves can be grouped on a different flexible and ad hoc basis to Series and Family. For us grouping on a flexible and ad hoc basis is supported by generic implementation mechansims (not restricted to just Statistical Classification and Code list).

Permalink

16 Jul, 2018

Page tree

20 Comments

user-8e470

Alistair Hamilton

user-8e470

Guillaume Duffes

Alistair Hamilton

Essi Kaukonen

Alistair Hamilton

user-8e470

user-8e470

user-8e470

user-8e470

InKyung Choi

Alistair Hamilton

Jenny Linnerud

Guillaume Duffes

Alistair Hamilton

user-8e470

Guillaume Duffes

Flavio Rizzolo

Alistair Hamilton