Background

1. The appropriate use of geospatial information is crucial in realising the full potential of the data produced in the data ecosystem. Geospatial information, as the digital currency of geographic location, is playing an increasingly important role for the work of statistical organisations. Primarily, authoritative geospatial information is produced by the National Geospatial Information Agencies (NGIAs) or mapping agencies. However, all data with a geographic location is a constituent component of the data ecosystem, which the national statistical organisations often finds itself as the custodian.

2. The data ecosystem in which statistical organisations operate is more diverse than ever, there are various actors, from government agencies, private companies to citizens, producing data with different tools and in different formats. With digitalisation and advance of technologies, data are also being generated by non-human agents at an explosive rate (e.g. sensor data, data from web-crawler, mobility data from cell phones). Although these datasets vary on “what” they are about and “how” they are generated, they can be linked through information on “where” the dataset is referring to. While unique unit identifiers such as personal ID and many social or demographic variables are considered confidential and difficult to derive from original datasets, such geospatial information is often available to a certain disaggregated level and this can be used to integrate data from different sources. Geospatial information is an unambiguous and universal key that cuts across all data. Statistical information combined with location information (henceforth referred to as “geospatially enabled statistics”) can provide critical knowledge by the integration with other data produced by various actors in the data ecosystem to understand multi-faceted issues that the society currently faces such as sustainable development, rapid urbanisation and climate change.

3. Geospatially enabled statistics, in particular, at the sub-national and high spatial resolution, greatly increase the relevance of statistical information by providing the geographic context of the phenomenon that the dataset is capturing. This enables policy makers and researchers to more easily understand and analyse this geographic relationship, leading towards the development of more targeted, locally relevant, and actionable plans such as access to pubAn understanding lic infrastructure (e.g. school, transportation, green area), rural / urban inequality and emergency planning. The value of geospatially enabled statistics is not limited to the public sector. Wide use of map services through the web has lowered the access barrier to location information and changed the way it is used for decision making for all spheres of the society. Geospatially enabled statistics allow the end users of this data (including companies, enterprises, and citizens) to benefit from localised information more pertinent to their business and other needs. By allowing the geospatial analysis and the linkage with various data sets, geospatially enabled statistics can also open up the new research potential for the scientific and academic community.

4. Geography has long been understood as a fundamental component of the work of statistical organisations (e.g. in the geographical classification for designing sampling and processing raw data, as a base dimension with which statistical data are released, as a tool to support and plan field operations), yet the scope and extent of the usage has been limited. For example, the geographic granularity of statistical data is often released at a large regional level which makes it difficult to draw a meaningful geographic context but also not flexible enough to be integrated with other data sources. To address the information needs of various users in an increasingly complex and intertwined society, there is also a great need for statistical data to be geospatially enabled using consistent and common geographies, in an accessible and usable format.

5. It is important to note that geospatially enabled statistics are not just for particular use cases or one-off exercises. An understanding of the role and characteristics of geography is an important for all stages of the statistical production process and the production of geospatially enabled statistics should be a routine operation for statistical organisations. Further, as the novel coronavirus (COVID-19) global pandemic has highlighted, statistical organisations should be prepared to produce geospatially enabled statistical data in an efficient and timely manner. To ensure this occurs, geospatially relevant activities and considerations should be integrated into the regular production processes of statistical organisations, so that the design and production of geospatially enabled statistics can be conducted in a systematic and consistent way.

Situating the environment for the GSBPM and GSGF

6. To identify what activities and considerations are needed for the production of geospatially enabled statistics and document them in the context of the statistical production process, two global frameworks are used in this paper:

  • The Generic Statistical Business Process Model (GSBPM) 1  describes the set of activities needed to produce official statistics. It provides a standard framework and harmonised terminology to help statistical organisations to modernise their statistical production processes. The GSBPM is one of the cornerstones of the standards-based modernisation strategy of the United Nations Economic Commission for Europe (UNECE) High-Level Group for the Modernisation of Official Statistics (HLG-MOS) and is widely adopted as a de-facto standard process model by the global official statistics community since its development in 2008. The model was endorsed by the Conference of European Statisticians (CES) in 2017;

  • The Global Statistical Geospatial Framework (GSGF) 2  describes five Principles and supporting key elements for the production of harmonised and standardised geospatially enabled data. These five Principles are:

    • Principle 1: Use of fundamental geospatial infrastructure and geocoding;
    • Principle 2: Geocoded unit record data in a data management environment;
    • Principle 3: Common geographies for dissemination;
    • Principle 4: Statistical and geospatial interoperability;
    • Principle 5: Accessible and usable geospatially enabled statistics.

7. Developed by the United Nations Expert Group on the Integration of Statistical and Geospatial Information (UN EG‐ISGI), the GSGF was adopted by the United Nations Committee of Experts on Global Geospatial Information Management (UN-GGIM) in 2019 and endorsed by the Statistical Commission in 2020. GSGF is a key framework for facilitating the integration of statistical and geospatial information 3 . The resulting data, produced following the Principles, can be readily integrated with statistical, geospatial and other information to inform and facilitate data-driven and evidence-based decision making to support local, sub-national, national, regional, and global development priorities and agendas, such as the 2020 Round of Population and Housing Censuses and the 2030 Agenda for Sustainable Development. Several initiatives have been undertaken to support countries in adopting the Principles of the GSGF such as the Implementation Guide for the GSGF in Europe which was developed by the ESSnet project GEOSTAT 3 4  and is currently being the subject of following work of GEOSTAT 4 5 . Further, the UN EG-ISGI continues its effort to provide practical and relevant guidance that enables countries to implement the GSGF in their national context.

8. As a standard process model in the statistical community, the GSBPM has an immediate connection to GSGF Principle 4 (Statistical and geospatial interoperability). Its common language and terminology can facilitate communication between the statistical and geospatial communities and provide a basis for understanding and aligning their business processes.

9. Further, the GSBPM can also be an enabling framework to help the GSGF Principles to be integrated into the production process of statistical organisations. The GSBPM lays out typical activities and steps that statistical organisations take when producing statistics and this provides a structure to document geospatial-related activities so that relevant actions are taken at the right stage of the production process. For example, consideration of common geographies for dissemination (GSGF Principles 3 and 5) should be taken into account from the early stage of the process. Ideally, the discussion about the type and the resolution of geographies and their implications should take place with users during the need specification stage, then reflected in the design stage and subsequently implemented in the process, analysis and dissemination stages according to these design decisions. This sequence of work can be modelled using GSBPM Phases and Sub-processes as building blocks

Contextualising the geospatial view of the GSBPM

10. The Geospatial View of the GSBPM (henceforth GeoGSBPM) describes geospatial-related activities, in particular, those that are needed to produce geospatially enabled statistics, using the framework of the GSBPM. Section 2 follows the structure of the eight GSBPM Phases and describes what activities and considerations should be included in each Phase. Section 3 discusses activities and considerations that should be done as overarching processes or at the corporate level to support the eight Phases of the production process 6 . These geospatial-related actions and considerations are identified while taking into account GSGF Principles so that the resulting statistics have a higher level of standardisation and geospatial flexibility, as well as a greater capacity for data integration. Although degree varies, each GSGF Principle is relevant to most GSBPM Phases and affects the production process through the overarching processes and corporate-level activities as depicted in Figure 1 below. Further, Table 1 provides a matrix of some of key activities that take place in the GSBPM Phase related to each GSGF Principle.

Figure 1. GSBPM and GSGF Principles

11. In addition to assisting the production of geospatially enabled statistics in a consistent and systematic way, the GeoGSBPM can support statistical organisations in the following ways:

  • By identifying common activities required for the production of geospatially enabled statistics, it can facilitate sharing of geospatial services, methods and tools that can be applied regardless of data types, domains and output formats;
  • By highlighting which geospatial-related activities and considerations are needed in the context of a typical statistical production process, it can assist efforts to make standards and technologies of the statistical and geospatial communities more interoperable;
  • By clarifying the process in which statistical data and geospatial information flow and interact with each other, it can provide a common framework to manage quality and metadata of statistical and geospatial information and services.


Table 1. Geospatial-related activities in the GeoGSBPM and GSGF Principles


GSBPM Phase

Specify Needs

GSBPM Phase

Design

GSBPM Phase

Build

GSBPM Phase

Collect

GSBPM Phase

Process

GSBPM Phase

Analyse

GSBPM Phase

Disseminate

Overarching processes / Corporate activities

GSGF 1. Use of fundamental geospatial infrastructure and geocoding

When assessing data availability, the existence and availability of suitable geospatial information should be first identified from authoritative sources within the National Spatial Data Infrastructure (NSDI)

Geospatial variable (geographies) should be designed for the statistical unit level. Using point-based location as the base geospatial variable will provide considerable adaptability to changes over time and flexibility to aggregate up to various dissemination level geographies


Geocoding should be conducted for each statistical unit that is collected and at the most detailed level (e.g. point-based geocoding as opposed to area-based geocoding)

Standardisation should take place before the integration of datasets. It can be done through, for example, matching location information in the datasets with centralised standard systems (e.g. address matching, geocoding) which should be based on the national geospatial information context



Quality management include: identify the authoritative (external or internal) sources of reference data and establish quality profile of reference data

GSGF 2. Geocode unit record data in a data management environment


The design of components includes: point-of-entry validation for geographical information; matching strategy; and, spatial analysis



The mechanism of matching or geocoding the statistical unit-record established in Design phase should be consistently applied



Quality management include: develop quality dimensions and metrics to be used at different stages, and a consistent matching strategy

GSGF 3. Common geographies for production and dissemination of statistics

Needs of users in terms of geographies (e.g. size of unit, type) is discussed. Implications (e.g. cost, reliability, quality) should be communicated and consulted with users

When grid geographies are used, the choice of grid system should take existing regional and global systems into consideration


Inaccuracies in geospatial information detected during field collection should be documented and transferred to the central geospatial information system for maintenance and update if necessary (if permitted under statistical confidentiality rules)





GSGF 4. Statistical and geospatial interoperability - data, standards and processes


Design of all production components should take into account standards used in the geospatial community

Geospatial services have a broad stakeholder group, statistical organisations should check and consult with service inventories of stakeholders before building components on their own



When preparing the analysis output, it is important to pay attention to semantic interoperability so that the output can be understood and used without ambiguities by users from different domains

International standards should be used as a norm to ensure that the products can be found and consumed easily across a range of various user groups from the public and private sectors

Alignment and harmonisation of geospatial metadata concepts with those of statistical metadata is critical

GSGF 5. Accessible and usable geospatially enabled statistics

Discussion on the output format is useful as users for high spatial resolution data (e.g. city, municipal authority) might require data to be provided in certain formats that are digestible within their GIS system. Implications of the size of geographic units in terms of confidentiality risk should be discussed with users

Design of these outputs should also take potential downstream uses into consideration. Accessibility and usability of geospatially enabled statistics and services can greatly increase by use of standards and open data formats

Metadata elements are put together during development of dissemination components so that they can be disseminated along with the data products and services. To make it more findable and accessible for both internal and external users, metadata should be documented using standard taxonomy and vocabulary




Cataloguing and tagging the content using relevant metadata standards can greatly increase the usability of the analysis outputs. Geospatial product components should be cross-checked with other components (e.g. tabular aggregates, before release so that they do not breach privacy on their own as well as in combination with other outputs


Statistical organisations are encouraged to explore the semantic web standards as a long-term strategic objective with successive milestones to achieve dissemination of data and metadata within the framework of Linked Open Data (LOD)


  1. UNECE HLG-MOS “Generic Statistical Business Process Model” (https://statswiki.unece.org/display/GSBPM)
  2. UN-GGIM “Global Statistical Geospatial Framework” (http://ggim.un.org/meetings/GGIM-committee/9th-Session/documents/The_GSGF.pdf)
  3. Where the GSGF is the bridge between the statistical and geospatial communities, many NGIAs are guided by another key framework - the Integrated Geospatial Information Framework (IGIF). The IGIF provides a basis and guide for developing, integrating, strengthening and maximizing geospatial information management and related resources in all countries and offers countries an overarching framework that complements and strengthens their existing National Spatial Data Infrastructure (NSDI). Importantly, the IGIF is not just a resource for NGIAs, but can help to strengthen the use of geospatial information within a country. For more about IGIF, see UN-GGIM IGIF (https://ggim.un.org/IGIF/).
  4. GEOSTAT 3 Project (https://www.efgs.info/geostat/geostat-3/)
  5. GEOSTAT 4 Project (https://www.efgs.info/geostat/geostat-4/)
  6. Note that corporate-level supporting activities are not in the scope of the GSBPM, but covered by the Generic Activity Model for Statistical Organisation (GAMSO), another HLG-MOS model complementing GSBPM. For more, see Section 3.


  • No labels