Experiment Proposal HCSO proposal 3

Work Package(s): WP2

Name:

Integrating web scraped data for the compilation of price statistics

Description:

 Producing price statistics is a common task for NSIs. In the HCSO, apart from a data collection that serves as the basis for the calculation of price indices and for the measurement of changes in the purchasing power of households, the use of web scraped information is currently investigated in the form of methodological pilots, for specific sub-domains such as the prices of flight tickets and prices in retail shops (the introduction of web scraping technics for real-estate prices is also foreseen as a pilot). The aim of this experiment proposal is to find out how the use of web scraping can be integrated into the production of price statistics, expanding its use to a wider range of products and services, and/or increasing the frequency to contribute to the improvement of price statistics.

Expected Benefits:

Due to that all the NSIs are interested in the production of price statistics, the results of an experiment oriented to the use of alternative data sources, such as web scraping, seem to be easily shareable. Furthermore, the different web scraping tools and the web scraping based statistical models can also be shared. Another expected benefit yields from the fact that it is possible to choose the online analysis of prices in a given – preferentially English speaking – country commonly chosen by the participants according to a predetermined series of research steps. This would allow participating NSIs to carry out their own research with their own tools on the same data, then to discuss and share their results, together with the answers given to the emerging challenges.

Data characteristics:

Data characteristics will be defined together with the participants taking into account the common goals.

Activities:

-        Determine the common dataset

-        Decide what research steps need to be carried out

-        Realize web scraping

-        Compare results and challenges

-        Analyse the web scrapped dataset

-        Develop common recommendations and guidelines

Issues and Risks:

Outputs:

Learnings, recommendations, code, IT tools, case studies