(from Frances Krsinich, Statistics NZ)

Experiment Proposal                                    Work Package(s): WP0

Name:

Online and scanner data for integrated price measurement:

Lodgement of scanner data and web-scraped price data to enable integrated and internationally comparable price measurement.

Description:

Obtaining, documenting and lodging monthly scanner data from GfK and daily web-scraped data from PriceStats, on a common set of consumer electronics products, for a given time period, across a number of countries.

Expected Benefits:

Web-scraping of online price information is a relatively cheap and easy way of obtaining high-frequency price data for a wide range of product-types. New methodology (Krsinich, 2016) enables us to leverage off the longitudinal nature of the data to derive non-revisable quality-adjusted price indexes despite limited information on the individual products’ characteristics.

However, lack of expenditure data means that price indexes from online data can’t reflect the relative expenditure on different products so, although these high-frequency indexes are good indicative measures and can identify turning-points in real time, there is a risk of bias over the longer term.

Calibration to scanner data, which is less-frequent (eg monthly for consumer electronics products from GfK) gives the potential to combine the high-frequency and timeliness of online data with information on relative expenditures available from the scanner data.

By obtaining, lodging and documenting scanner data for a selection of consumer electronics products (for which both webscraped online data from PriceStats and scanner data from GfK is available internationally), for a number of countries (suggest New Zealand, Australia and the Netherlands in the first instance, as they are all currently actively researching these methods) we have the potential to save time and resources by having data available in common formats, so that we can undertake studies which compare results across countries, ensuring that we are all using the same methods and data.

This would be a first step towards a common approach to methodology around the integration of this type of data, and might encourage the coordinated use and purchase and agreements to access this type of data across countries, rather than each country approaching PriceStats and GfK separately.

Krsinich, F. 2016. “The FEWS index: Fixed effects with a window splice”. Journal of Official Statistics (forthcoming)

Data characteristics:

One of the major tasks of the project will be convincing GfK and PriceStats of the benefits to them of sharing their data with us as research data – so, the data is not yet available, but we hope to make it available and accessible by all the sandbox pariticpants.

GfK scanner data for consumer electronics products has differing levels of coverage (across retailers) in different countries, but it is generally quite high coverage.  In New Zealand it is close to full-coverage.

The data has monthly average prices and total expenditure at the detailed product (ie barcode) level, along with very comprehensive sets of product characteristics. Retailer information is not available (for confidentiality of retailers) and there is some masking of model names to protect confidentiality of retailers for products that are sold predominantly by one retailer. However, the associated sets of product characteristics can be used to create anonymised product identifiers which enable product linking over time (while not undoing the confidentialising).

PriceStats webscraped online price data is collected for a wide range of retailers internationally. The data is cleaned and classified to 3-digit COICOP. Daily price, product identifier (which will differ across retailer) and 3-digit COICOP will be available in the data.

Activities:

We plan to make a case to each of PriceStats and GfK to convince them of the mutual (to us and them) benefits of loding research data in the sandpit. Then to concord and standardised formats (if necessary across countries and retailers (if applicable). And to produce documentation on the data to be lodged alongside it.

This lodgement will involve agreements between the working group and each of GfK and PriceStats. Statistics New Zealand already has a relationship with both of these suppliers.

A research partnership between SNZ, Statisics Netherlands and Australian Bureau of Statistics already exists in a fairly informal way, and it may be useful for representatives from these three agencies to sign up to the request to add weight to it.

Issues and Risks:

We risk not being able to convince one or both of PriceStats or GfK to share their data with us.

There might be issues of non-comparability of the GfK data across countries (although the company and their data system are international, it appears that different Stats agencies who’ve used it to date experience it as formatted differently).

Outputs

For this work package (WP0) the output is the lodgement of the data for a selection of products, a common time period (as long as possible, but ideally a minimum of a two years) and across a number of countries (goal - at least NZ, Australia, Netherlands), along with documentation of the data.