|
Some countries already have extensive experience integrating survey and administrative data sources and there have been collaborative projects in this area, for example Eurostat projects.
The administrative data may have existed for some time but not been used. It may be integrated using record linking or statistical matching or may use modelling approaches. It may involve pooling or combining information from multiple surveys, including surveys not conducted by the NSOs themselves.
There are common challenges faced in this type of integration. The quality of administrative dataset may be good enough for administrative purposes but not sufficient for statistical purposes. Transforming administrative datasets into statistical datasets may require improving the quality and dealing with conceptual differences, especially when we want to use administrative data in a direct way. In the case of surveys carried out with the use of data from administrative sources it is crucial to gather all data (from survey and administrative sources) in one database.
Examples of sources that can potentially be integrated are: Labour Force Surveys and social insurance register and/or educational registers, data from ministries of culture and cultural associations to produce statistics on museum attendance and there are several examples of administrative data being combined with survey data for producing indicators traditionally collected through censuses.
Experiments
Broad Approach
Job Vacancies and Overtime (JVOT) - Hungary, Slovenia and Serbia
System of consultation and Geographic location of schools
Share and document experience and lessons learnt on the System of consultation and Geographic location of schools
The following section summarises the issues and learnings identified so far for this part of the project.
The degree and systems of integrating administrative and survey sources vary greatly across countries; some have fully developed register-based statistical systems, while others are just starting to integrate the data. In the official statistics production process, administrative and survey data can be integrated in different ways. Usually, administrative data are the source for the population frame for sample surveys. They can also be used to supplement surveys in questionnaires, for a part of the population, for a set of variables, for estimation or for the data validation and editing process. In some cases, administrative data can replace the sample survey; in these cases the statistics are based entirely on administrative sources. Administrative data can also be a source for establishing and maintaining statistical registers data, which are further used in implementing surveys.
Sample surveys are generally more flexible than administrative sources as they are designed to meet a precise purpose. Administrative sources are on the other hand the result of a legislative system. Administrative sources usually offer better coverage of target populations and in general have high response rates. As these data are already being collected for administrative purposes, it is cost-effective and cheaper to acquire the data than to conduct a sample survey. Also, there is no additional respondent burden. The ability of administrative data in covering whole populations enables the production of local area data to a level of detail not permitted by sample surveys, which is also of advantage in implementing local policies.
There are a number of challenges in integrating administrative and survey data. Since administrative data are collected for non-statistical purposes, the difference in concepts might lead to coverage problems as well as bias problems. In some cases, such as business statistics, units do not necessarily correspond directly to the definition of the required statistical units. This requires some modelling for converting the administrative units into statistical units. It is likely that there will also be differences in the definitions of variables. It is important to have a thorough knowledge of the impact of these differences. Sometimes it is possible to influence the administrative definition by co-operating with the responsible authority.
Another issue is classifications. In cases of different classifications, the usual step is to use correspondence tables and conversion tools based on additional variables that may be available for converting into more correct classification code. However, even the same classifications may result in different data, especially when classifications are complex or the rules of a classification are difficult to apply. In administrative sources, there would often be respondent coding, while a sample survey may have open questions and coding is often done by experts. Co-operation between the NSI and the administrative authority is a good way to solve a part of the classification problem. The NSI can provide experience and may be the one responsible for maintaining the classification. Another issue that concerns classification is a decision to use directly translated international classifications or national classifications. It depends on what national data are needed; however, the first option is usually harder to implement in case of changes and revisions compared to having national classifications. To change a classification in an administrative source is a demanding task since there can be many data providers that need to become familiar with the changes.
Problems to overcome are also the missing data and errors. Missing data happen due to unit or variable non-response, but in administrative sources the causes can be different. It is important to identify if errors and missing data are systematic and apply appropriate validation and editing rules.
Timeliness is one more point in integrating administrative and survey data. Administrative data may not be available in time or may not coincide with the statistical reference period. It can be resolved by analysing the impact and if necessary adjusting it by models.
Legal basis
The first and most important for the use of administrative sources for statistical purposes is the legal basis. It is sound if national legislation is aware of already existing administrative sources rather than recollecting data. The usage of administrative sources is often stated in a Statistical Act. To assure public acceptance, a Personal Data Protection Act is also important. It determines the rules on processing personal data in a way that the legal rights of the individuals concerning privacy and integrity of individual's data are not violated.
Collaboration with administrative data providers
Administrative records are data collected for the purpose of implementing various non-statistical programs concerning legal requirements such as taxation, housing, pensions, social benefits, trade in goods, etc. Statisticians may have to make compromises concerning coverage, data quality, classifications, etc., in administrative sources. Collaboration of the NSIs with administrative authorities in the preparation of legal documents establishing and maintaining an administrative source is a good solution to overcome this problem. The approval of the NSIs in passing legislation on administrative records may be stated in a Statistical Act.
Institutional methodological groups
Control of the methods by which the administrative data are collected and processed rests with the administrative agency. They are specialized in formulating transparent rules and procedures. The NSIs have experience in data collection, classifications and data validation. In some cases, the same data are used by several institutions, so continuous collaboration in institutional methodological groups is recommended to develop a system that is satisfactory for administrative and statistical purposes.
Cooperation agreements
Cooperation agreements are signed to divide the tasks between the parties of the agreement, to define the rules and conditions of transferring data such as timeliness, technical implementation and metadata.
Unified identification system
Existence of a unified identification system across different sources is one of the most important aspects in the integration of administrative data. If there is no such system, it is much more difficult to link different sources. In such a case linking and matching methods must be applied.
Collaboration in legal acts and policies
Collaboration in institutional methodological groups
All administrative sources are different, which may result in using different methods. Some of the methods relevant to the use of administrative data are:
A significant number of tools exist for record linkage and matching (e. g. Statmatch in R...). These will be documented.
These are some links to standards which are relevant to integrating administrative and survey sources:
UNECE Assist. Knowledge base on the use of administrative and secondary sources in statistics http://www1.unece.org/stat/platform/display/adso/ASSIST
ESS.VIP ADMIN Project. The project purposes are to support the EU Member States to reap the benefits (decrease costs and burden, increase data availability) of using administrative data sources for the production of official statistics, and to guarantee the quality of the output produced using administrative sources, in particular the comparability of the statistics required for European purposes. https://ec.europa.eu/eurostat/cros/content/essvip-admin-administrative-data-sources_en
ESSnet project on Data Integration. The project focused on methodologies for data integration (Record Linkage, Statistical Matching, Micro integration Processing) and on statistical aspects to be considered to make those methods concretely applicable by NSIs. http://ec.europa.eu/eurostat/cros/content/data-integration-finished_en
ESSnet project Integration of Survey and Administrative Data. The project purpose was to promote knowledge and application in practice of sound methodologies for the joint use of existing data sources in the production of official statistics. http://ec.europa.eu/eurostat/cros/content/isad-finished_en
Eurostat (2013): The use of registers in the context of EU–SILC: challenges and opportunities. http://ec.europa.eu/eurostat/documents/3888793/5856365/KS-TC-13-004-EN.PDF
UNECE (2007): Register-based statistics in the Nordic countries. Review of the best practices with focus on population and social statistics. http://unstats.un.org/unsd/dnss/docViewer.aspx?docID=2764
UNECE project on Quality Indicators for the Generic Statistical Business Process Model (GSBPM). This is an on-going project aimed at developing quality indicators to monitor the quality of the statistical production process for each of the phases of the GSBPM, including the sub-phase, ‘integrate data’. The project is currently reviewing and updating the quality indicators to include the use of administrative data in the production of official statistics. On-going work is available from http://www1.unece.org/stat/platform/display/QI/Quality+Indicators+Home.
These are some of the essential skills needed for integrating administrative and survey sources:
The resources needed for integrating data include budget, IT infrastructure and human resources. Administrative data are usually cheaper than sample surveys as they are already being collected for administrative purposes, but they would still require some budget. The fact that acts in favour of data integration is the rapid development in the IT area, i.e. hardware equipment as well as a wide range of software tools. The IT infrastructure needed for integrating data covers servers, tools for database development where microdata and metadata are stored (e.g. Oracle, SQL), software for data processing (e.g. SAS, R) and different tools for data processing and dissemination. Human resources include subject-matter statisticians, methodologists and IT experts.
Info only visible for administrators: |
#action-menu-link {display: none;} |