| Panel | ||||||
|---|---|---|---|---|---|---|
| ||||||
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. If you re-use all or part of this work, please attribute it to the United Nations Economic Commission for Europe (UNECE), on behalf of the international statistical community. |
...
| Panel | ||||||
|---|---|---|---|---|---|---|
| ||||||
After noting the lack of a generalized approach to describe how satellite data can be used by NSOs, as well as, acknowledging that the issue is even more complicated because use of satellite data often requires ML techniques which themselves are being experimented and not yet integrated in the production process in many NSOs, the development of the generic process pipeline is one of the first the deliverables in the Imagery Theme team. A generic process model describes high-level activities that need to be followed to achieve a certain objective or to deliver a specific output. This pipeline focuses on the specific use of satellite data to produce official information. This pipeline aims to address following issues:
The pipeline developed as in diagram and outlines the six main stages (business understanding, data collection and preparation, ML modelling, prediction, dissemination, evaluation) and the main specialized roles (thematic expert, E0 scientist, data scientists, statisticians and computer scientists) involved in each of the steps. The diagram of the pipeline is provided below. More detailed description for this activity can be found in the specific report as well as additional examples related to the pilot projects of the Imagery Theme team. Diagram of the pipeline |
| Panel | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||
In order to explore the potential of alternative data sources to those already known in the Official Statistics (Censuses, Surveys and Administrative Records) or to enrich existing projects, several projects were carried out aiming to take advantage of satellite images with Machine Learning (ML) techniques. This document is intended to summarize the pilot projects carried out by Australia, the Netherlands, Switzerland and Mexico. Machine Learning involves the automatic discovery of patterns in the data using computational algorithms and, from those regularities, proceeding to carry out tasks such as the detection of various categories (Bishop, 2006) in a training set. This is called Supervised Learning. The pilot projects reported in this document belong to this category and show the application of various classification algorithms that seek to relate the implicit or explicit patterns found in data carefully labeled by experts, with equivalent patterns in unlabeled data, intending for the algorithms to identify generalization rules that allow assigning categories to objects that have not been manually analyzed. Once the algorithms assign the “predicted” category, it is important to perform the evaluation of the ability of the algorithm to generalize with previously labeled testing sets, but never used in training procedures, and reporting the corresponding performance metrics for each project. Each country wrote a detailed report of their work and corresponding experiments, we invite the reader to review the specific details of each country, in this document we will present the essential aspects. Problem to solve Each Statistical Office established the characteristics of the pilot test to be carried out, in which satellite images were used in the context of Machine Learning applications in order to solve specific office problems. As stated by the NSOs themselves, they try to solve problems related to the reduction of human intervention in the process of updating the Address Register (AR) or the measurement of statistical variables such as poverty or expansion of urban areas, as well as the detection of change in land use and land cover (LULC). Regularly finding a link to satellite images implies having some type of geographically referenced statistical information, as well as field work for validation, which is the basis for training automatic classification algorithms. The participating countries have a georeferenced source for such training. The countries established the main motivation of their pilot test, identifying a relevant motivation that allows them to explore the validity of the approach, through the execution of the pilot project and a subsequent evaluation with respect to the original motivation once the project is completed. Some countries are still in the preliminary stages so definitive results are not yet available in some cases. The expectations of the participants involve the need to create a new process that complements the activities of the NSOs or simply to improve existing processes. Either way, progress will be based on the application of Machine Learning techniques to satellite images.
|
...
