What is a real life example of the difference between a Classification and a Code List?
GSIM documents that there is a difference between the two, but when you get right down to it, where do you draw this line?

In your opinion, what is the boundary between a Code List and a Classification?

14 Comments

  1. In GSIM, the difference between a code list and a classification is that the categories in each level of a classification are mutually exclusive and exhaustive.  Categories in each level of a code list don't have to satisfy those criteria.  For instance, race and ethnicity categories don't constitute a classification since each person can fall into many of those categories at once.  Therefore, the categories are not mutually exclusive.  For example, my ethnicity is English, Irish, Lithuanian, Russian, Scottish, and Ukranian.

  2. That is perfectly clear to me and it can be useful to distinguish. The issue linked to this is, once we agree that we would like to have both: can code list and classification share related objects?

  3. Yes, code lists and classification can share categories.  First, code lists or classifications themselves have versions, so they share categories from one version to another, but that is only sharing within class.  Here is an example of a code list and classification that share categories.  This example is somewhat artificial, but it illustrates the issue.  Supose a technical school of higher eduction offers coures in mathematics, physics, engineering, and computer science; and the school only allows students to major in one or two subjects.  So, the course classification is the list given above

    mathematics

    physics

    engineering

    computer science

    yet, the code list for majors is

    01 - mathematics

    02 - physics

    03 - engineering

    04 - computer science

    05 - math/phys

    06 - math/eng

    07 - math/cs

    08 - phys/eng

    09 - phys/cs

    10 - eng/cs

    It is clear that the categories designated by the first 4 codes are the same as the categories in the course classification.

     

  4. I have added an attachment with a real life example of the difference between a low investment, low reusability, low maintenance codelist with non-mutually exclusive categories and a high investment, high reusability but low maintenance classification.

  5. user-8e470

    Discussion:

    A code list is more general (weaker defined concept system) than a classification.

    If the categories are mutually exclusive and exhaustive, it is a classification...if it isn't then it is a code list.

    This is agreed??

     

    What about Hierarchical codelist? Is this where the line blurs?

    • It is possible to have a hierarchical code list that is not a classification. ( Dan Gillman to provide example)
    • Hierarchical code lists can have more than one parent. This is not a classification. This is how a SDMX hierarchical code list works. ( user-07a97 will provide an example - geography).
    • We need to check if GSIM want to mirror what SDMX does...maybe we don't....

     

     

     

     

  6. user-07a97

    I wrote a comment giving an example use case of a code with multiple parents. Whilst the example was correct the comment about GSIM support for this was not correct. Apologies if you have already read the comment (I have now removed it). I will write a (hopefully correct) comment as soon as I can. I'm out if the office for the rest of this week.

    Cheers

    Chris

  7. user-07a97

    Here is the revised example use case of a code having multiple parents.

     

    Hierarchical Code List

    Chris Nelson

    16 July 2013

    Scope

    The scope of this note is to give an example of a hierarchical code list where a code may have more than one parent. This is in response to the request made at the CC for the implementation group on 2 July.

    GSIM Code List and Classification

    Both of these inherit from a Node Set. A Node Set contains Nodes and both Code Item and Classification Item inherit from Node. A Node can be hierarchic in that it can comprise “child” or “part” Nodes: each Node may only have one parent Node (and may have none). However, a Node has a mandatory association to a Category.

    So, whilst in GSIM a Code cannot have more than one parent Code, there is nothing in the model that prohibits the same Category being represented more than once in a Code List – (the definition of a Classification Scheme explicitly prohibits this for Classification Items).

    An example use case for this is given below.

    Example of a Code with Multiple Parents in the same Hierarchical Code List (HCL).

    The example is a Geography Code List. In this list each geographic location such as a country can be in more than one “hierarchy”. GSIM has no such object as an explicit hierarchy: hierarchical structures are defined in parent/child or whole/child relationships between one Node and other Nodes.

    Example codes that could have multiple parents using countries could be:

    • Continent
    • Trading Block
    • Currency Block
    • Military Union

    Any one country can be in one or more of these hierarchies. In the GSIM model the semantic of the “hierarchy” (e.g. Continent) would need to be a Category linked to Code, as this is the only way of grouping contained lower level (+child or + part) Codes.

    This use case is true for data dissemination and could also be true for other processes. Like other Categories the “parent” country in the Code List has no explicit relationship to data but can be used by an application to determine data values to which it relates (e.g. where the country is a dimension in a dimensional data set an application can allow viewing by e.g. Continent) .

    Note that in SDMX there is an object called “Hierarchical Code List” (HCL) and this is a different object from a Code List where the Code can have a parent/child hierarchy but is restricted to each Code having a maximum of one parent Code.

    The HCL in SDMX does not specify “Codes” in the GSIM sense (though each “node” or “hierarchical code” has an Id), it merely references Codes from one or more code lists and places them in one or more hierarchies (the “hierarchical code” can have children). This is not dissimilar to the GSIM Code (which can have children) referencing a Category (as the country semantic (e.g. France) will be a Category). However, in the SDMX HCL it is possible to maintain the codes comprising the “hierarchy” in a code list that is different from the code list comprising the countries. The HCL brings these together. Note that the SDMX HCL also has a Hierarchy object that contains the hierarchical codes.

    **** end ****

  8. user-07a97

    I offered to create a Hierarchical Code List using the GSIM classes that exist already. I have called this list a Complex Code List as the Code List in GSIM can already support hierarchical structures of Code Items. This may not be a good name and this is open to suggestion if this structure is deemed worthy of inclusion in a future version of GSIM.

    Note that whilst this model is based on the SDMX model for Hierarchical Code List (HCL) the SDMX HCL also has optional Levels and some additional attributes to support specific uses cases but these are no included in this model. The green classes are those that exist already and the light tan colour is used for the new classes.

     

     

  9. user-43b9a

    A small reply on this interesting discussion. A country is not a geographical location. It is an administrative or political location. That is why a lot of regional classifications are in reality code lists. One level is a geographical classification (continent) and the sublevel is a administrative classifciation (country). This is not possiple because one country can be part of two or more continents. A level in an hierarchical classification should be a refinement (=according the same point of view = geographical in this example) of the parent level.

  10. user-8e470

    Hi Chris, Sorry just realized I had not tagged you here. It was for an example that was not geography. Cheers,Thérèse

  11. user-8e470

    Discussion 2/10: 

    Action: In the documentation we need to have a good example of what is a classification and what is a codelist (drawn from the the comments at the top of the thread). 

  12. user-9b682

    I agree with the notion that a code list does not have to have categories that are mutually exclusive and exhaustive,  whereas a statsitical classification does.  I can't agree with Dan's example, however.  It is entirely possible to create a set of race, ancestry or ethnicity categories that are mutually exclusive.  Whilst data on these concepts will frequently, but not always, refer to persons, the entities being classified are not persons.  They are ethnic or ancestral groups or identities and  a person may be associated with multiple groups or identities,  in the same way as a person may speak multiple languages.  It doesn't follow that that English and Ukrainian languages, ethnic groups etc are not discrete and mutually exclusive concepts. 

    The point is that it is possible to create a set of categories in such a way that they are mutually exclusive and exhaustive or in such a way that they are not mutually exclusive and or exhaustive.  For example, if I create a set of language categories that contains English, Scouse, Geordie, Cockney, Bronx,  Dutch, Flemish, Afrikaans and Netherlandic,  this is not a classification becasue the categories are neither mutually exclusive nor exhaustive of a particular population of languges - but it is a category set.

  13. user-9b682

    I think this discussion needs also to deal with the relation between category sets and statsitical classifications. The code list is the easy part,  since the code is simply a representation of a category. 

  14. David - The race/ethnicity example was based on the practical difficulty of writing an exhaustive list, even though it may be possible in theory.  My naming 4 ethnicities was intended to illustrate the fact that people consider themselves all kinds of combinations of races and ethnicities.  This is especially problematic in the US.  The combinations are too numerous to list.  In this sense, this can only be a code list and not a classification.