Guest Authors: Hossein Hassani (Webster University, Vienna, Austria), Leila Marvian (Big Data Lab, Imam Reza International University, Mashhad 178-436, Iran) and Steve MacFeely (World Health Organisation)

image-2023-7-24_14-1-46.png


We are back with another instalment of INGEST and we are very excited to introduce our first guest blog post, written by the eminent authors Hossein Hassani, Adjunct Professor at Webster University in Vienna, Austria and World Top 1% Scientist specialising in data science, statistics, artificial intelligence, machine learning, and big data analytics, Leila Marvian, Lecturer at Imam Reza International University in Mashhad, Iran who has expertise in programming for statistical computing and data analysis particularly using R and Python programming languages, and Steve MacFeely, Director of Data and Analytics at the World Health Organisation (WHO) and Adjunct Professor at University College Cork in Ireland who has a wealth of experience in the global statistical arena. In this blog post, Hossein, Leila and Steve will present an exciting new R package which has recently been developed which allows users to seamlessly integrate two GIS data sources by pre-processing, cleansing, and unifying the data based on common attributes. As we briefly mentioned at the end of our last post, this tool is a game-changer and is likely to solve all your technical data integration problems so let's jump right in! 


In the sphere of data science and geographic information systems (GIS), the fusion of diverse GIS databases stands as a cornerstone for advancing research and applications across a multitude of domains. From enhancing official statistics to revolutionizing healthcare, finance, and social sciences, the integration of GIS data opens new avenues for insightful analysis and decision-making.

The challenge of GIS data integration

For decades, the geospatial community has grappled with the formidable challenge of managing and harmonizing disparate GIS databases. The intricate nature of pre-processing, cleaning, unifying, and integrating such data is rooted in the inherently complex characteristics of geospatial information. This complexity arises from a multitude of sources: the varied formats in which spatial data is collected; the differing scales, projections, and coordinate systems used; the diverse origins of the data, including satellite imagery, ground surveys, and increasingly, user-generated content from mobile devices and social media platforms.

Each dataset comes with its unique set of metadata, accuracy levels, and update cycles, necessitating a nuanced approach to integration. The process of ensuring compatibility and coherence across datasets involves not just simple data cleaning procedures but also sophisticated techniques like coordinate transformation, conflation, and the reconciliation of semantic differences. Moreover, the rapid evolution of technology and data acquisition methods means that geospatial datasets are expanding not only in size but also in complexity. Historically, the lack of standardised tools tailored to tackle these multifaceted tasks has been a bottleneck, limiting the ability of professionals and researchers to carry out comprehensive geospatial analyses. Analysts often had to rely on a patchwork of software solutions or custom-built scripts, which could be time-consuming, prone to errors, and not easily reproducible. This piecemeal approach also hindered collaboration and sharing of geospatial analyses and insights.

The importance of overcoming these hurdles cannot be overstated, as geospatial data plays a crucial role in a wide array of critical applications. From urban planning and environmental monitoring to disaster response and global health initiatives, the insights derived from integrated geospatial data are indispensable. It guides decision-makers in policy formulation, business strategy, and scientific research, impacting lives and shaping the future of our societies. As such, the development of a comprehensive tool that can streamline the process of GIS data integration represents a significant leap forward for the field. By providing a standardised, efficient, and robust means to pre-process, clean, unify, and integrate diverse geospatial datasets, such a tool unlocks the full potential of geospatial analysis, facilitating more accurate, insightful, and actionable intelligence from the wealth of data available.


A solution at last: The GIS INTEGRATION R package

Recognizing the critical need for efficient and seamless GIS data integration, a ground-breaking R package, developed by a team of experts: Hossein Hassani (Adjunct Professor at Webster University), Leila Marvian (Lecturer at Imam Reza International University), Sara Stewart (UNECE Consultant), and Steve MacFeely (Director of Data and Analytics at WHO) was recently introduced. The package was rigorously tested using a range of data sources, including a statistical output geography introduced by the Northern Ireland Statistics and Research Agency (NISRA) after the 2021 Census, known as Super Data Zones, alongside other population data from the census. The authors are grateful to NISRA for the availability and quality of their data which proved invaluable to the testing process.

The newly released R package, called GIS INTEGRATION, emerges as a beacon of innovation, meticulously designed to address the multifaceted challenges of GIS data integration. The package is freely available through CRAN, the Comprehensive R Archive Network, which is R's central software warehouse containing an archive of the latest (and previous) versions of R distribution, supporting documents and associated packages for access and download.

Here's a glimpse into the capabilities and benefits of this revolutionary tool:

  • Intelligent Pre-Processing: GIS INTEGRATION is equipped with advanced algorithms to perform intelligent pre-processing of two GIS maps, laying the groundwork for accurate integration.
  • Advanced Textual Operations: Incorporating techniques such as Lemmatization and Stemming, the package enhances the textual analysis of geospatial data, including the nuanced task of retaining negative prepositions for sentiment analysis.
  • Data Cleaning and Standardisation: With functionalities to lowercase variable names, remove punctuation, and trim extra spaces, GIS INTEGRATION ensures your data is clean and uniform, facilitating smoother integration.
  • Synonym Finding and Standardisation: The package excels in identifying synonyms and standardizing common names across datasets, a crucial step for linking related but differently labelled data points.
  • Seamless Linking of Maps: At its core, GIS INTEGRATION achieves the ultimate goal of seamlessly linking two GIS maps, enabling a unified analysis of combined geospatial datasets.
  • Geospatial Analytic and Visualisation: Involves analysing and visualising geographical data, such as location-based information, maps, and GIS systems, to derive insights and make informed decisions across various domains like urban planning, environmental science, transportation, public health, and more.

A milestone for geospatial analysis

The unveiling of the GIS INTEGRATION R package stands as a transformative event in the world of geospatial analysis. This pivotal development represents far more than a mere incremental advancement; it is a paradigm shift that promises to redefine the landscape of spatial data exploration and utilisation.

Geospatial analysis has long been a cornerstone in disciplines ranging from environmental science to urban development, disaster management to public health. However, the potential for innovation and discovery within these fields has often been constrained by the cumbersome nature of integrating complex GIS datasets. The GIS INTEGRATION R package directly addresses this bottleneck, offering a suite of tools that effortlessly melds disparate data sources into a cohesive whole. This tool transcends traditional boundaries by drastically reducing the time and technical expertise required to prepare and harmonise spatial data. Its impact is twofold: it not only amplifies the existing capabilities of seasoned researchers and GIS professionals but also democratises access to sophisticated geospatial analysis for a broader audience. As a result, it facilitates a more inclusive environment where experts and novices alike can contribute to the collective understanding of spatial phenomena.

The implications for this leap forward are vast. In academia, it allows for more robust research methodologies, enabling scholars to synthesise larger datasets and derive more nuanced insights. In the public sector, it equips policymakers with the precise, data-driven foundation necessary for crafting effective legislation and public services. In the private sphere, businesses can leverage integrated GIS data to enhance logistical operations, optimise site selection, and understand market trends with unprecedented clarity. Moreover, the GIS INTEGRATION R package is poised to serve as a catalyst for innovation, sparking novel approaches to old problems and inviting exploration into previously uncharted territories. It opens up possibilities for cross-disciplinary research, where, for example, climatologists, economists, and sociologists can collectively investigate the multifaceted impacts of climate change on economies and communities.

As we stand on the cusp of a new era in geospatial analysis, the GIS INTEGRATION R package is not just a tool but a beacon that guides the way towards a future rich with the promise of ground-breaking discoveries and transformative innovations that have yet to be imagined. With its release, researchers can step into a world where the full potential of geospatial data can be unlocked, harnessed, and utilised for the greater good of humanity and the planet.

We would invite you to take advantage of this leap in technical support for data integration activities by downloading the package and realising the many benefits that the tool can bring to your own areas of expertise.



Next time . . .

We move our focus towards the use of standards to support the integration of statistical and geospatial information, first exploring UNECE's Geospatial View of the Generic Statistical Business Process Model (or GeoGSBPM for short) which describes a range of geospatial-related activities that are needed to produce geospatially-enabled statistics and, crucially, to integrate geospatial information within the statistical process. We hope to see you next time!


This document was produced with the financial assistance of the European Union. The views expressed herein can in no way be taken to reflect the official opinion of the European Union.

  • No labels