CSDA 2.0 - VII. Types of Data

25. Not all data an NSI has at its disposal or produces has the same importance and value and therefore not all data needs to be cared for and managed in the same way. In order to avoid unnecessary effort and cost, and also in order to be compliant with legislation such as privacy laws, it is advantageous to distinguish certain classes of data, where each class can be managed in its own way. The idea is that it might be helpful for NSI's to distinguish such different types and to develop and adopt different policies for the management of each type.

26. Most probably, policies regarding the management of the data could be quite different from one type to another. It is to be expected that the various types will have different

security policies, such as access authorization, back-up, etc.;
retention periods;
metadata quality (completeness, correctness) requirements;

27. As a first approach, the following types or classes are defined:

Explorative: Data that is obtained from outside sources, is usually “sampled” and is used to assess the nature, structure and quality (usability) of that data source. After the exploration, this data in most cases loses its value.
Organizational: The true (data) assets of the organization, that are to be treated as such and must be protected and shared where possible.
Temporary, local: Data that is produced as an intermediate product in a statistical process and has no real value outside that process. This data usually loses its value after the process (cycle) is completed, but may have value for the next cycle as a reference. May be persisted within the process space.

28. An important sub-type of “Organizational” is the Master Data such as statistical registers, back-bones of populations, collections of statistical units. For instance: Company register, People Register, Buildings register. At least the stable “snapshots” that come out of the ongoing maintenance process and reflect the state of the population at (statistically) relevant moments in time. The unstable, continuously updated, collection from which the “snapshots” are taken, could be classified as “temporary, local”.

29. “Organizational” data is the main input for statistical production processes. In addition, it will be needed in explorative research as well. The final output from statistical processes definitely will be “Organizational”, intermediate results may be typed “Organizational” or “Temporary, local”, depending on its value for the organization. The true purpose of exploration is insight or knowledge (of new data sources or about new use cases for existing data sources). Therefore it is expected that explorative research will not create data output, but rather knowledge, for instance in the form of metadata for future “Organizational” data. Statistical products (the output of the NSI) is “Organizational” only.

30. On first sight, “Explorative” and “Temporary” may seem very similar, but “Explorative” always comes from outside sources, whereas "Temporary" is derived from "Organizational".

31. Note that the classification proposed here is certainly not a recommendation to physically separate the data of the various types. In fact, the need to be able, where relevant, to integrate all kinds and types of data, will make it counterproductive or even futile to try to keep the various types of data separated. Organizations will need to develop policies and means for managing the policies defined for the various types, no matter where that data is located and how many copies of a particular dataset are “floating around”.

32. In order to define the classification more precisely, the following questions (not necessarily disjoint) are still to be answered:

When does data (a dataset) become a data asset?
What criteria must be fulfilled for data to be(come) a data asset?
Which type(s) of data are worth to be(come) data assets?

33. Now even if one succeeds in properly defining each of the classes of data, and then defining the set of policies to be applied for each of those classes, it is not so simple to actually manage the proper application of such policies. The main reason for this is that data literally is everywhere. Specifically digital data is copied easily, and in the course of daily operations, information of different classes is used together in the same processes, making it very hard to ensure the proper application of the policies defined for each class.

Space shortcuts

Page tree