- There are a number of issues to be discussed when considering the data sets to be integrated. These include the conceptual alignment of the datasets, identifiers and privacy concerns. The following sections outline these issues.
| Anker | ||||
|---|---|---|---|---|
|
- Most of data sources to be integrated are external to the statistical organisation. The statistical organisation does not always have control over the definition of the concepts and populations used in the collection of the data.
- The data to be integrated needs to correspond to statistical concepts. Administrative and data form other non-traditional sources are primarily collected for non-statistical purposes. Therefore, there are often differences from what is required in statistics such as differences in concepts, coverage, units, definition of variables. There is an important need for detailed descriptive metadata to assist in the assessment of the quality of the data sources. The following dimensions of quality need to be assessed: accuracy, relevance, consistency, accessibility, comparability and timeliness.
- Using someone else's data means a statistical organisation cannot control any of the decisions on measurements and populations undertaken by an external data source provider. A statistical organisation need to understand the design decisions, so they can determine what to do to turn external data into the statistical information they want.
- These differences affect the usability of the external data source in the production of a statistical product specifically with regard to: the coverage of population, the validity of the target concepts, the availability and accuracy of descriptive metadata, sampling error, bias, legal basis for data, data collection methodology/questionnaire design, response burden, by product data versus survey question, confidentiality of the resulting output, and different consequences for different types of data provided. These differences need to be clearly explained, documented, and stored to ensure reuse and improvement of assessments. Good quality variables closely related to each other in different datasets would be ideal to use for linking.
- Collaboration with the data provider is one way to lower the risks. This is especially applicable in the case of administrative records which are collected for the purpose of implementing various non-statistical programs concerning legal requirements such as taxation, housing, pensions, social benefits, trade in goods, etc. Both the provider and the statistical organisation have an interest in quality, but the relevant quality aspects and priorities can be different for the production of statistical data. Statisticians may have to make compromises concerning coverage, data quality, classifications, etc., in administrative sources.
- Collaboration of the statistical organisations with administrative authorities in the preparation of legal documents establishing and maintaining an administrative source is a good solution to overcome this problem. The approval of the statistical organisations in passing legislation on administrative records may be stated in a Statistical Act.
- Control of the methods by which the administrative data are collected and processed rests with the administrative agency. They are specialized in formulating transparent rules and procedures. The statistical organisations have experience in data collection, classifications and data validation. In some cases, the same data are used by several institutions, so continuous collaboration in institutional methodological groups is recommended to develop a system that is satisfactory for administrative and statistical purposes. When acquiring data, cooperation agreements are signed to divide the tasks between the parties of the agreement, to define the rules and conditions of transferring data such as timeliness, technical implementation and metadata.
| Anker | ||||
|---|---|---|---|---|
|
- Another requirement for data integration is connectivity. This is easiest with a unified identification system across different sources. In many countries, unified identity systems exist for persons, businesses, farmers and addresses (or geo codes). Often the identity numbers are anonymized and translated into statistical identity numbers for privacy protection in the statistical production.
- If there is no unified system in the country, it is much more difficult to link different sources. If the sources contain unique identifiers, the integration is directly achieved via these identifiers; otherwise, it is necessary to define and prepare a procedure for pooling records by different parameters (indirect integration).
| Anker | ||||
|---|---|---|---|---|
|
- Integrating and holding/storing more data sources increases disclosure risks and therefore needs to be managed carefully. To assure public acceptance, privacy and confidentiality rules must be also clear. Privacy refers to the freedom from intrusion into one's personal information. Confidentiality refers to personal information shared with others. Confidentiality means that the information can be assessed only by authorized individuals.
- A Personal Data Protection Act determines the rules on processing personal data in a way that the legal rights of the individuals concerning privacy and integrity of individual's data are not violated. The ability to integrate data sources in national statistics also depends on the trust of observation units: persons, households, enterprises, agricultural holdings and other organizations. This means that respondents and administrative sources will share their data if they are convinced that the confidentiality of the data and identity is ensured and that the shared data will only be used for statistical purposes.
- The statistical organisation needs to provide information and explanations of the applied procedures. The protection (safeguarding) of confidentiality also aims to ensure that the disseminated data do not allow direct identification (via direct identifiers) or indirect identification (by any other means). This confidentiality must be protected under legislation. The mission of national statistics is to transmit and release statistical results to the widest extent possible while minimizing the risk of the disclosure of information on units. To this end, appropriate statistical disclosure methods are needed to ensure compliance with the legislation.
| CSS Stylesheet |
|---|
#comments-section {display: none;}
#children-section {display: none;}
|
Überblick
Community Forums
Inhalte
ThemeBuilder