Seitenhierarchie
Zum Ende der Metadaten springen
Zum Anfang der Metadaten

43. Quality assessment is recognized as an important issue in statistical production. Many organisations and international and national initiatives have considered various aspects of quality and some of them explicitly consider processes related to data integration. The following dimensions of quality need to be assessed: accuracy, relevance, consistency, accessibility, comparability and timeliness.

44. A useful reference for the quality measures in data integration steps is the Quality Indicators for the Generic Statistical Business Process Model (GSBPM). In this work, indicators to evaluate the quality of standard linkage procedures are proposed.

45. When integrating survey data and administrative data, the ESSnet Komuso (Quality of multisource statistics) provides useful documents to:

  • Take stock of the existing knowledge on quality assessment and reporting and review it critically in order to produce recommendations on the most suitable approaches;
  • Develop new indicators for the quality of the output based on multiple sources;
  • Produce a methodological framework for reporting on the quality of output;
  • Produce indicators relating to the quality of frames themselves and the data whose production is supported by frames;
  • Produce a methodological framework for assessing the quality of the frames used in social statistics; draft a proposal for minimum quality requirements for sampling frames for EU social statistics;
  • Produce recommendations on updating the ESS Standard and the ESS Handbook for Quality Reports.

46. The outputs of the "Methodologies for an integrated use of administrative data in the statistical process" (MIAD) project provide a generic framework to assess the quality of the administrative data at the input stage, quality indicators for the discovery phase and acquisition phase, and a guide to reporting the usability of an administrative data source.

47. At the national level, statistical organisations recognize the necessity of developing a framework for assessing quality in the usage and integration of different data sources. The following resources are recommended:

48. The quality assessment framework, including the quality indicators, is described in Guide to reporting on admin data quality is helpful in carrying out validation studies. The quality framework is based on Li-Chun Zhang's two-phase life-cycle method model for integrated statistical microdata (Figure 1) which expands the total survey error paradigm to include administrative data.


Figure 1. Zhang's two-phase life-cycle method model for integrated statistical microdata

49. The framework enables understanding of the error sources from the individual data sources including those arising from the integrated datasets. Zhang's two-phase life-cycle model assists in determining the associated methodological and operational issues that may impact on quality resulting from producing statistical information from linked administrative data sources.

50. Phase 1 assesses the quality of an input data source that is intended to be used in the production of a statistical product. A statistical organisation needs to understand the design decisions undertaken by the producers of the source to determine methods to turn the data into the statistical information required by the statistical organisation. Quality of the input data source is assessed against the purpose for which it was collected. For a survey dataset, this purpose is defined for a statistical target concept and target population. For an external data source, the entries or 'objects' in the dataset might be people or businesses, but they could also be transaction records, or other events of relevance to the collecting agency. At this stage, evaluation is entirely with reference to the dataset itself, and does not depend on what a statistical organisation intends to do with the data. Quality issues in the input data source will flow through into any use of the data in the production of a statistical product.

51. Phase 2 categorises the difficulties arising from taking variables and objects from source datasets and using them to measure the statistical target concept and population a statistical organisation are interested in. In this phase, the statistical organisation considers what they want to do with the data, and determine how well the source datasets match what they would ideally be measuring.

52. The quality assessment involves 3 steps.

Step 1: Initial metadata collation: Basic information is collected about each of the source datasets used in the validation project. The information relates to the source agency, purpose of the data collection, populations, variables and timeliness of the data.
Step 2: Phase 1 evaluation: Errors occurring in phase 1 of the quality framework are determined and categorised for each source dataset. This involves detailed consideration of how the methods, purpose, known issues, and other aspects of the original data collection contribute to each of the specific error categories in the phase 1 flow chart in figure 1.
Step 3: Phase 2 evaluation: As for the previous step, errors arising in phase 2 of the quality framework are listed and examined in a similar way, taking into account the dataset(s) being integrated to produce the final output. These errors are considered with respect to the intended statistical target concepts and population. The effects of phase 1 errors on the creation of statistical units, or the particular details of the misalignment between concepts on different datasets, must be understood.

53. The Guide to Reporting the Quality of Administrative Data provides a metadata information template that encourages thinking about the key aspects of quality in an organised way. It is also a convenient way to record a standard set of information to compare different datasets. The basic information required are: name of data source agency, purpose of data collection, time period covered by the data, the population (target and actual) population of the dataset, the reporting units, a short description of key variables and the timing/delay information and method of collection.

54. For the integration of Big Data sources in the statistical production, several initiatives at international level have stated the potential of these new data sources, as well as the quality issues related to several aspects. Firstly, the change of paradigm imposed by the new sources, compared to the traditional sample surveys, moves attention from the well-studied sampling errors to the non-sampling errors, so the population coverage and the self-selectivity of the observations become the most recognized and investigated issues. A general framework for assessing the Quality of Big Data has been prepared by the HLG-MOS Big Data project.

 

  • Keine Stichwörter
Report inappropriate content