If I want to document that a data set was collected through a survey, how do I do it? Mode? Instrument? Instrument Implementation? Acquisition Activity? Data Channel? Channel Activity Specification?

The question arises because I am trying to distinguish between different types of microdata: event-history, timeseries, snapshot, survey, register in order to be able point the Data Consumer to more information.

GSIM v1.0 Annex C Definitions

Mode

A set of characteristics that describe the technique (the "how") used for the data acquisition through a given Data Channel based on a specific Instrument Implementation.

While the Data Channel describes the means used for data acquisition, the Instrument describes the "what" (i.e. the content, for example, in terms of questions in a questionnaire or a list of agreed time series codes in a data exchange template) and an Instrument Implementation describes the tool used to apply the Instrument; the Mode describes "how" the Data Channel is going to be used. The Mode is relevant for all types of Data Channels, Instrument Implementations and Instruments and can change over time. The list of Modes will potentially grow in the future and vary from organization to organization.

Instrument

A tool conceived to record the information that will be obtained from the Observation Units.

The Instrument describes the tool used to collect data. It could be a traditional survey, a set of requirements for a software collection program, a clinical procedure, etc. Instrument is described from the perspective of the statistical organization collecting the data. It includes the special type of Instrument used for the explicit purpose of gathering data through a questionnaire (Survey Instrument). The behavior and characteristics of a concrete Instrument is determined by an Instrument Implementation. Several implementations can be based in the same Instrument giving the possibility of using multiple channels and to apply different collection techniques (Modes) to gather data. An example of this is when a printed format to collect information for a survey is substituted by a software program; in both cases the Instrument will collect the data from the Unit but the behavior of the Instrument will be different accordingly with its implementation.

Instrument Implementation

A concrete and usable tool for gathering information based on the rendering of the description made by an Instrument.

This represents an implementation of an Instrument. It describes the way in which an Instrument has been translated from a design to a concrete tool. It could represent a printed form, a software program made following a specific technological paradigm (web service, web scraping robot, etc.), the software used by a specialized device to collect data, etc. When it describes a Survey Instrument, it can contain descriptions of how each construct (e.g. Questions, Value Domains, validation Rules contained in the Instrument) is implemented.

Acquisition Activity

The set of executed processes and the actual resources required as inputs and produced as outputs to acquire data about a given Population for a particular reference period. It includes the process and resource required to acquire data in a Statistical Program consisting of gathering data via one or more Data Channels in order to create or feed one or more Data Resources.

This object holds Statistical Activity information that relates specifically to data collection or acquisition. It inherits the relationships and attributes from the Statistical Activity type.

Data ChannelA means of exchanging data.

A Data Channel is an abstract object that describes the means for communicating with Data Resource(s). The Data Channel identifies the Instrument Implementation, Mode, and Data Resource that are to be used in a process. In some cases the Data Channel that is used by the Data Provider to send its responses could be different that the one used by the statistical office or organization to request information; the statistical office may put electronic formats that can be downloaded by the Data Provider and once answered returned by traditional mail. Two specialized objects are used to implement this abstract object: Channel Design Specification used at design time and Channel Activity Specification used at run time.

Channel Activity Specification

A structured, well-defined specification for a proposed change. The description of the Data Channel made at run time.

This object is a specialization of a Data Channel and is used to describe the behaviour of a Data Channel at execution time.



5 Comments

  1. user-07a97

    Indexing data resources to support data discovery

    This model is a result of use case discussions in the Webex call of the implementation group on 13 August 2013.

    In order to support a user trying to find suitable data resources and being able to give the user sufficient information about the resources, here is a model that allows this. This model introduces the Index and Index Item. It is proposed that this is the same object as the Index in the proposed classification model, as it performs the same function (i.e. it groups objects according to a semantic).

    As this Index/Index Item is likely to be useful in many parts of GSIM (and many of these will become apparent as more and more implementations are done) the model allows an Index Item to reference any Identifiable Artefact. We haven’t actually modelled yet which of the classes are “Identifiable” but certainly Information Resource will need to be Identifiable and all Node Set and Node will be Identifiable so the Index Item can reference a Classification Item (which inherits from Node.)  So this model supports the use of the Index on the proposed classification model.

    The Index Item is hierarchical (inherited from Node) so the user can “drill down” to find data resources.  Note that the Index Item can be coded as the Node can have an association to a Code. The Index Item has a mandatory association to a Category via its inheritance from Node. Therefore, the semantic of the vocabulary that is used for searching can be any Concept that is a Category.

    The model does not prescribe a controlled vocabulary as any Category can be used to as the semantic of the Index Item.

     

    As the same Identifiable Artefact can be indexed by many Index Items it is possible for an application to support multiple “word” searches ( e.g. find data resources that are indexed by “survey source” and “partial”), or to drill down one tree and then see how else the Information Resources at the end are indexed.

     

  2. user-8e470

    Discussion 13/8: Perhaps Adam Brown or Franck Cotton who were involved in the drafting of these objects can help find some clarity??

    This table from the Specification was not quite enough for what was needed. Could we make it better?

    Table 1. Examples of Data Channel, Instrument, Instrument Implementation and Mode


    Data Channel

    Instrument

    Instrument Implementation

    Mode

    Physical presence

    Questionnaire

     

     

     

     

     

    Paper Form

     

     

    Traditional interview

    Traditional mail

    Self-administered

     

    Direct deposit

    Computer

    Software Program

     

     

    CAPI interview

    Phone

    CATI interview

    Internet

    Self-administered

    Data scanner device

    Set of Requirements

     

     

     

     

    Data Scanner Program

    Data collector

    Internet

     

    Web Scraping Robot

     

    Web queries

    Agents

    Internet

    Web Service Consumer Program

    Applications interconnection

    Secondary transfer of data

    Data Transfer

    Data Medium, File Transfer, Web Sphere Application

  3. user-8e470

    Comments from Adam:

    To identify the background to a particular data set the key object is the AcquistionActivity. This identifies the DataChannel (through the ChannelActivitySpecification type), through which the mode and instrument implementation can be discovered, and also identifies the DataResource and Dataset in question. What is currently in GSIM lays out a framework for how all of this could be included but In regard to the specific examples Jenny gives these will need to be tested to determine if the current model holds. I'd suggest that the examples Jenny gives are actually a few different and no mutually exclusive properties of a dataset and so might in fact be described by number of different attributes belonging to multiple GSIM objects. I would suggest that using these examples we work through and determine how these fit in and around the Dataset and AcquistionActivity objects.

    Adam

  4. user-07a97

    Whilst some of the information required to support data discovery is in the model and can be used, these constructs were not put there explicitly to support data discovery. Data discovery is an important use case and we need to decide whether it is important to support this in GSIM in a way that is mappable to implementation standards and in a way that is a common practice in implementations in statistical organisations. I only have experience of mullti-dimensional data at the dissemination end (i.e. not discoverable by means of collection etc.) and that is why I used the "Subject Field" in the Data Resource model. This was really a placeholder for data discovery but as it stands it is probably not powerful enough.It was originally introduced to support the grouping of Concept Systems but it has no real structure.  It needs at least to be hierarchical which suggests that it is in a scheme of some sort which in turn suggest that it is really a Node Set with Nodes - hence the Index.

    So, I think the question for the group to discuss is not whether the information required to discover data is somewhere in the model (quite likely a lot if it is), but whether we should model an explicit, consistent, and generic way of discovering data (in fact, not just data but for any resource such as Classification Items).

  5. user-8e470

    Discussion 18/9:

    Are we ready to resolve this? Is this something for GSIM version after this one?

    Maybe we need a good use case? Norway and Australia will have practical applications but they are not ready now. 

    Categorized as future work. Is this something Data without Boundaries or Stats Network to solve this??