In GSIM 1.5, two new subtypes of Exchange Channel were added – Statistical Register and Data Harvest – and the Web Scraper Channel was removed. It was thought this change reflected the current and new ways of exchanging information with a statistical organisation. [for further information, see Issue #46]. Do you have any comments on the new objects?

Feedback from countries

CountryResponse in shortFeedback from country
CroatiaOK with the changeThe Croatian Bureau of Statistics finds the change acceptable.
LithuaniaOK with the changeElimination of Web Scrapping in favor of Data Harvest  seems reasonable, as the different possibilities become available and there are different ways to acquire the same data. The term is broad enough to encompass more options.   
FinlandOK with the change + proposal for further changes

In our opinion, the changes made in this part of GSIM are in the right direction. However, we would suggest some further changes. 

Firstly, we would like to combine the Administrative Register and Statistical Register into one object (perhaps a super type) called RegisterAdministrative Register and Statistical Register could then either be mentioned in the description of the Register –object or included as sub types. 

Secondly, we prefer Data Harvesting over Web Scraping. However, we are not quite sure, whether Data harvesting as a concept is still too web-focused? 

We suggest that in the descriptions of the exchange channels the most important examples would be mentioned. Especially new data sources, like the following:

-scanner data

-sensor data

-other types of business data (e.g. bonus data from chain stores)

MexicoOK with the change +

“Data Harvest” is a more general concept than “Web Scrapper Channel” to describe a way to obtain data from Internet, data banks/databases or from other instruments, but it relates to an action when the other types of information channels are related to things (“Statistical Register”, “Administrative Register”, “Questionnaire”). We think that a term like “Harvested Data” would be more compatible with this line of thoughts.

There is a difference between the concept of “data” and the concept of “information”. A process transforms “data” into “information”.  We do not agree with the change made to the “Exchange channel” definition. In our view, both concepts - with independent objects - must be considered to state a clear difference when the “Exchange channel” is used to gather data, and when it is used to deliver information.

AustraliaOK with the change +

See ABS comments on issue #46.

Generalizing the former “Web Scraper” channel to Data Harvesting (including from 3rd party APIs) is strongly supported.

As per comments in the wiki, ABS typically harnesses Administrative Register sources outside the ABS to maintain Statistical Registers within the ABS, rather than having exchanges with Statistical Registers beyond the ABS. Nevertheless, if other agencies (eg in the European context) interact with external Statistical Registers the addition makes sense.

As per the wiki post on 7 June 2018, caution is urged in regard to the suggestion that a Statistical Register within an agency might be seen as having “exchange channels” with SPs that use that Register. Routing every flow of data from an internal Statistical Support Program or from SP to another SP via a GSIM Exchange Channel appears to risk unnecessary complexity. A focus on addition formalization where information flows in and out across the boundaries of the agency as a whole, however, appears to add value.

CanadaOK with the changeNo comments! It does reflect the new types Exchange Channels.


  • No labels

16 Comments

  1. InKyung Choi

    (Feedback from Norway)

    We cannot see that Data Harvest is an improvement on Web Scraper Channel. If we do keep it, then could we at least change it to Data Harvester or Data Harvest Channel. Harvest sounds like the result of harvesting with a harvester. Nor do I like the Definition “A concrete and usable tool to pass information between two sources, usually by a machine to machine mechanism.” Hopefully, not much information goes from us although it may be in the Provision Agreement. Suggest “A concrete and usable tool to pass information from one source to another, usually by a machine to machine mechanism.” I note that API is mentioned as a type of Data Harvest(er). Good to see that API is mentioned. We are increasingly required to collect information from Administrative Registers via the owners APIs. A good example of an Environmental Change that is costing us a lot of time and resources. Google gives 12 million hits for ‘data harvesting’ and 2 million for ‘web scraping’. Maybe we should have web harvester!

  2. InKyung Choi

    (Feedback from Norway)

    In our office we make Statistical Registers from Administrative Registers so for us it is more internal than an ‘Exchange channel used for incoming information’. It is more like a subtype of Information/Data Resource internally. Looking at your definition and explanatory text it is clear that you also regard this as internal to a statistical organisation. We strongly suspect this information object is not and should not be a subtype of Exchange Channel. We recommend that this is removed as a subtype of Exchange Channel. It is very important, but it should be in the Structures 

  3. InKyung Choi

    (GSIM Revision Meeting 24th October, 2018)

    On Data Harvest

    • To change Data Harvest to Data Harvesting (was preferred to Data Harvester)
    • To change definition from “A concrete and usable tool to pass information between two sources, usually by a machine to machine mechanism” to “… pass information from one source to another, usually by a machine to machine mechanism

    • To add new data source example in explanatory text 

    On Statistical Register

    • Some organisations use Statistical Register for internal purpose, but Exchange Channel is not necessarily defined for "external" user, so still can be subtype of Exchange Channel

    • It can be both input and output - can there be more explanation about product to include this?

    • Will try the proposal from Finland to create a new object Register

    On Exchange Channel: to add mention of API

  4. Mikko Saloila

    Definition:

    A Register is a regularly updated list of Units and their properties which is obtained from an external organisation (or sometimes from another department of the same organisation)

    Explanatory Text:

    All the Units in a Register typically have an identifier that makes it possible to update the Register with new information on the Units. Examples of Register are Administrative Register and Statistical Register.

    An Administrative Register is a source of administrative information. This administrative information is usually collected for an organisation's operational purposes, rather than for statistical purposes.

    A Statistical Register provides an (ideally) complete inventory of the statistical Units within a specific Population, and describes these Units using different characteristics. One example is a business register held within a statistical organization.

  5. Mikko Saloila

    Above is the suggestion from me and Essi for discussion or next meeting.

    BR,

    Mikko and Essi

  6. InKyung Choi

    (GSIM Revision Meeting 14th November, 2018)

    • Exchange Channel is not necessarily for exchange with external organisations, it could be for internal purpose
    • In some NSOs, Statistical Register is something the organisation produces for internal consumption, not for sharing
    • Register can be used for both internal (in case of Statistical Register) and external (in case of Administrative Register)

    Agreed to

    • Remove dashed line separating Product and others 
    • Update definition and explanatory text of the Register (in particular "which is obtained from an external organisation" part) and Product (in particular "is the only defined type of Exchange Channel for outgoing information" part)
    • For the moment, not to keep Administrative Register and Statistical Register as separate sub-types (this will be explained in the text of Register) to prevent confusion that they are the only sub-types of Register (UML does not say this, but could confuse people) 
  7. Mikko Saloila


    Register Definition:

    A Register is a regularly updated list of Units and their properties which is received from an external organisation (or sometimes from another department of the same organisation)

    Explatanotory text:

    Same as before (included in a message above)


    Product Definition:

    A package of content that can be disseminated as a whole.

    Product Explanatory text:

    Product is a type of Exchange Channel for outgoing information. A Product packages Presentations of Information Sets for an Information Consumer. The Product and its Presentations are generated according to Output Specifications, which define how the information from the Information Sets it consumes are presented to the Information Consumer. (The rest of the explanatory text is as it is)


    Remarks:

    We noticed, that (at least) the definition of Exchange Channel requires some changes in order this to work in the model.


  8. Mikko Saloila

    The Exchange Channel and all the related objects need to be checked if they still match this new conception of outgoing and ingoing information.
    This is quite a big change that we now say that basically all the types of Exchange Channel can go in and out.


    BR,

    Mikko and Essi

  9. InKyung Choi

    How would we do with attribute Information Provide Identifier in Administrative Register? which does not exist in Statistical Register.

    Jenny proposed to remove this attribute in Issue #2-24 as this can be handled by relationships through other information objects, if this is the case, we might not need to worry about anyway..

  10. Jenny Linnerud

    I still struggle with this. For me a Statistical Register does not go in or out of a statistical organisation. It is fed on a regular basis by information from Administrative Register(s) and supports statistical production. It would be a type of Information Resource or Data Resource, but not an Exchange Channel. Lets discuss this more.

    Maybe we need to focus on the primary purpose/intention of the Exchange Channel. Our data collection people were confused that a Questionnaire was an ingoing channel when they knew they preprinted the questionnaire with data from inside the statistical  organisation and pushed this out to reduce the response burden, but also to enable the information provider to update outdated data. Simliarly publishing a Product can also result in questions coming in to the Statistical organisation for clarification, but that is not the primary purpose of the Product. They should be as self-explanatory as possible. Is the Exchange Channel primarily used to bring information in to the Statistical organisation or send it out?

  11. InKyung Choi

    GSIM Virtual Sprint (23 Jan.)

    Discussion points

    • Intention of Exchange Channel: is for both in/out. 
    • Then is it for external or internal too? If statistical organisation wants to exchange information internally, is it going to be through Exchange Channel? 
    • In Stat Sweden, for both internal and external, we use provision agreement, works same for both internal and external
    • Not only NSO, but other statistical organisations also can produce statistical register, which can be communicated to us, hence external exchange
    • Need to update the explanatory text of EC to include that the exchange channel can be used for internal incoming/outgoing (among different departments etc.)
    • But if we add "internal" part in the EC, would we also need to add all other things to can be exchanged within the organisation (e.g. software), this could make it more difficult. 
    • The way statistical org is operating is evolving, there is more push to make the information internally available as much as possible for, e.g. to reduce response burden. There are a lot of information circulating internally. 
    • It would be good to have "internal" part in line with above argument, but it would make this part of model too complex, we might not have time for that
    • Regarding possibility of making modeling complicated by including "internal" part in EC. We don't need to model every possible internal exchange (e.g. colleagues dropping by), but we don't need to formalise everything, we should have a way to formalise a certain type of internal exchange that happens more and more frequently in the statistical organisation. 
    • Then what is it that separates EC from others? Formal arrangement? Well defined protocol/control?

    Decision (compared to this version of Exchange Channel)

    • Remove dash line
    • Keep the two registers separate 
    • Purpose of EC should be made more explicit - EC is not only for external but also for internal purposes
    • Make explanatory text generic enough to allow above interpretation. 
      • For product:

    Object

    Group

    Definition

    Explanatory Text

    Synonyms

    Product

    Exchange

    A package of content that can be disseminated as a whole.

    A Product is a the only defined type of Exchange Channel for outgoing information. A Product packages Presentations of Information Sets for an Information Consumer. The Product and its Presentations are generated according to Output Specifications, which define how the information from the Information Sets it consumes are presented to the Information Consumer. The Protocol for a Product determines the mechanism by which the Product is disseminated (e.g website, SDMX web service, paper publication).

    Provision Agreement between the statistics organization and the  Information Consumer governs the use of a Product by the Information Consumer. The Provision Agreement, which may be explicitly or implicitly agreed, provides the legal or other basis by which the two parties agree to exchange data. In many cases, dissemination Provision Agreements are implicit in the terms of use published by the statistical organization.

    For static Products (e.g. paper publications), specifications are predetermined.  For dynamic products, aspects of specification could be determined by the Information Consumer at run time. Both cases result in Output Specifications specifying Information Set data or referential metadata that will be included in each Presentation within the Product.

      • For Questionnaire

    Object

    Group

    Definition

    Explanatory Text

    Synonyms

    Questionnaire

    Exchange

    A concrete and usable tool to elicit information from observation Units.


    This is an example of a way statistical organizations collect information (an Exchange Channel). Each collection mode (e.g. in-person, CAPI, online questionnaire) should be interpreted as a new Questionnaire derived from the Questionnaire SpecificationThe Questionnaire is a tool in which data is obtained.

    The Questionnaire is a subtype of Exchange Channel, as it is a way in which data is obtained.


      • For Data Harvest *outstanding issue (examples in the explanatory text) 

    Object

    Group

    Definition

    Explanatory Text

    Synonyms

    Data Harvesting

    Exchange

    A concrete and usable tool to pass information from one source to another, usually by a machine to machine mechanism.

    Examples of Data Harvesting channels include

    -webscrapping

    -API


    • to make action word: collecting sensor data and other types of business data (e.g. bonus data from chain stores)
    • to find better word for: scanning
    • do we keep "channels"?
      • For Statistical Register *outstanding issue (statistical in front of Units?) 

    Object

    Group

    Definition

    Explanatory Text

    Statistical Register

    Exchange

    A Statistical Register is a register that is a regularly updated list of Units and their properties that is designed for statistical purposes.


    A Statistical Register provides an (ideally) complete inventory of the statistical Units within a specific Population, and describes these Units using different characteristics. One example is a (statistical) business register held within a statistical organization.

    All the statistical Units in a Statistical Register have an identifier that makes it possible to update the Statistical Register with new information on the statistical Units.

      • For Administrative Register *outstanding issue (do we need to mention "unit"?)

    Object

    Group

    Definition

    Explanatory Text

    Synonyms

    Administrative Register

    Exchange

    A source of administrative information which is obtained from an external organisation (or sometimes from another department of the same organisation)

    The Administrative Register is a source of administrative information obtained from external organisations. The Administrative Register would be provided under a Provision Agreement with the Information Provider supplying organisation. This administrative information is usually collected for an organisation's operational purposes, rather than for statistical purposes.

      • Exchange Channel *outstanding issue

    Object

    Group

    Definition

    Explanatory Text

    Exchange Channel

    Exchange

    A means of exchanging information.

    An abstract object that describes the means to receive (data collection) or send (dissemination) information. 

    Different Exchange Channels are used for collection and dissemination. Examples of collection Exchange Channel include QuestionnaireWeb Scraper Channel and Administrative Register. The only example of a dissemination Exchange Channel currently contained in GSIM is Product. Additional Exchange Channels can be added to the model as needed by individual organizations.

  12. InKyung Choi

    GSIM Virtual Sprint (24 Jan.)

    Object

    Group

    Definition

    Explanatory Text

    Data Harvest

    Exchange

    A concrete and usable tool to pass information from one source to another, usually by a machine to machine mechanism.

    Examples of Data Harvest channels are Webscraping or an API include web scrapper, API, scanner, sensor, satellite, etc.

    Data Harvesting vs. Data Harvest or Data Harvester: we should have noun-form information object, consistently throughout the GSIM model; Data Harvester has become a modern term compared to webscrapping (more frequently used; source-google (smile)); For plenary: Data Harvest - okay? → okay

    Object

    Group

    Definition

    Explanatory Text

    Statistical Register

    Exchange

    Statistical Register is a register that is a regularly updated list of Units and their properties that is designed for statistical purposes.


    Statistical Register provides an (ideally) complete inventory of the statistical Units within a specific Population, and describes these Units using different characteristics. One example is a the statistical business register held within a statistical organization.

    All the statistical Units in a Statistical Register have an identifier that makes it possible to update the Statistical Register with new information on the statistical Units.

    New proposal: All the statistical Units in a Statistical Register have an identifier that makes it possible to update the Statistical Register with new information coming from administrative units and/or for Units. Essi Kaukonen Mikko Saloila Guillaume Duffes Eva Holm Marina Signore is this okay?

    For plenary: do we want to keep the last sentence in the explanatory text of Statistical Register (wo "statistical") → see new proposal above

    Object

    Group

    Definition

    Explanatory Text

    Exchange Channel

    Exchange

    A means of exchanging information.

    An abstract object that describes the means to receive (data collection) or send (dissemination) information. The Exchange Channel is used for external and internal purposes.

    Different Exchange Channels are used for collection and dissemination. Examples of collection Exchange Channel for receiving information include Questionnaire*Web Scraper Channel and Administrative Register. The only An example of a dissemination Exchange Channel for sending information currently contained in GSIM is Product. Additional Exchange Channels can be added to the model as needed by individual organizations.

    Object

    Group

    Definition

    Explanatory Text

    Administrative Register

    Exchange

    A source of administrative information which is obtained usually from an external organisation  (or sometimes from another department of the same organisation)

    The Administrative Register is a source of administrative information obtained usually from external organisations. The Administrative Register would be provided under a Provision Agreement with the Information Provider supplying organisation. This administrative information is usually collected for an organisation's operational purposes, rather than for statistical purposes.

    "usually" has been added as some statistical organisations do have administrative registers (e.g. France)

  13. Mikko Saloila

    The explanatory text of Statistical Register is ok for us. 

  14. InKyung Choi

    Reading again, the last sentence of explanatory text for Statistical Register sounds a bit weird.. 

    All the statistical Units in a Statistical Register have an identifier that makes it possible to update the Statistical Register with new information coming from administrative units and/or for Units

    Shouldn't it be

    All the statistical Units in a Statistical Register have an identifier that makes it possible to update the Statistical Register with new information coming from administrative units on the Units. ?

  15. Jenny Linnerud

    I suggest we change 'obtained usually' to 'usually obtained'

  16. Jenny Linnerud

    I preferred 'All the statistical Units in a Statistical Register have an identifier that makes it possible to update the Statistical Register with new information on the statistical Units."

    Introducing a new term 'administrative units' is not making any of this clearer to me.

    What I think we still lack is the definition of a r(R)egister that both Administrative Register and the Statistical Register are subtypes of.

    Work on this was commenced in the Glossary work that Dan has referred to, but we do need to get GSIM v1.2 out before the enire glossary is completed. The statistical ontology work may also contribute positively, but again we need to get GSIM v1.2 out before that work is completed.