View Source

Building on the work of the UNECE HLG-MOS Machine Learning Project (2019-2020), the UK Office for National Statistics Data Science Campus, in partnership with the UNECE, launched a new international initiative, the Machine Learning Group 2021, in January this year. The objectives of the Group include:

Facilitate the creation, development and implementation of research projects and skill-building activities that meet the global statistical community’s needs.
Build and engage a strong machine learning community by sharing resources and good practice, exchanging ideas and experiences, and keeping abreast of developments in the field.
Offer open, shareable, and easily accessible resources to the community; and
Facilitate machine learning capacity building for official statistics.

The research work of the ML Group is divided into 5 Work Streams (WS) that aimed to address different issues that arise when using machine learning for official statistics (see “ML Group 2021 Work Streams Outputs” below for more information for each work stream and outputs). The monthly ML Group meetings throughout the year has built a community where members can share experiences, build connections and keep up to date with the new developments (see “ML Group 2021 Monthly Meeting Presentations” below for more information). The Coffee and Coding sessions, training materials collected as well as reports from various Work Streams will help facilitate the learning the ML.

The ML Group that started with 120 members has now grown to about 250 members from 33 countries and 5 international organizations who either lead, assist or follow the numerous activities under the ML Group. You can find a summary of the group's work in 2021 in its final report here.

The international efforts for advancing the use of ML for official statistics continue in 2022, read more about Machine Learning Group 2022 here.

ML Group 2021 Journey

ML Group 2021 Work Streams Outputs

Machine Learning for Official Statistics > Machine Learning Group 2021 > image2021-11-9_11-12-26.png

Work Stream 1 (WS1) - From Idea to Valid Solution

The pilot studies are conducted to assess the added value of ML in various thematic areas: coding and classification, edit and imputation the use of imagery data, modeling and route optimization. A study conducted on the replication experience highlighted that benefits of sharing theses ML projects.

Theme	Paper
Coding and Classification	Brazil - Apply ML techniques to classification and aggregation web scraped price data
	Turkey - Using Big Data Tools and Machine Learning Techniques to Assign Classification of Individual Consumption by Purpose (COICOP) Categories
	Chile - Coding and Classification: Automated coding of classifiers as a shared service
	Poland - Using ML classify unstructured information hidden in the text description of real estate advertisements
	UK - Automated coding of Standard Industrial and Occupational Classifications (SIC/SOC) with github repo
Edit and Imputation	Poland - Multiple imputation through machine learning in a survey of sport clubs
Imagery Analysis	Malaysia - Estimating Malaysia Rubber Plantation Area Productivity Using Satellite Imagery and Machine
Imagery Analysis	Indonesia - Feasibility study of Satellite Imagery Analysis for Wealth Index Development in Indonesia
Modeling	US (BLS) - State level expenditure estimates based on ML techniques
Route Optimisation	Chile - Route Optimisation through genetic algorithm
Replication	Belgium (Flanders) - Replicating successful data science projects across NSOs

The reports and codes (if available) from the WS1 pilot studies are also available at Studies and Codes

Work Stream 2 (WS2) - From Valid Solution to Production

The WS2 explores the issues around the operationalization of machine learning solutions, it consists of three activities from IMF and INEGI (Mexico).

Paper

IMF - Automated production tool to code IMF member state time series data using ML algorithms

INEGI (Mexico) - Deployment of a Data Lake architecture to put into production data science projects

INEGI (Mexico) - Design and assess a whole workflow to enable Natural Language Processing and Machine Learning methodologies to be integrated into a continuous production process

WS2 team - Journey from Experiment to Production

Work Stream (WS3) - Ethical Consideration in the Use of ML for Research and Statistics

Led by UK Statistics Authority

This high-level guidance explores ethical considerations associated with the use of machine learning techniques for research and statistical purposes. This guidance is not exhaustive, but aims to assist and support analysts, researchers, data scientists, and statisticians navigating the ethical issues surrounding machine learning based projects.

Click here for full report

Work Stream 4 (WS4) - Model Retraining

Led by Statistics Finland

The WS4 identifies the circumstances under which an ML model should be retrained in order to maintain the predictive power and quality of the model.

Click here for full report

Work Stream 5 (WS5) - Quality Framework for Statistical Algorithm

Led by INEGI (Mexico)

The WS5 explores the dimensions of Quality Framework for Statistical Algorithm (QF4SA) in a consolidated project to analyze an output based on a set of standard metrics and procedures

Click here for full report

ML Group 2021 Monthly Meeting Presentations

Date	Speaker	Presentation
27 Oct	Saeid Molladavoudi (Statistics Canada)	Supervised Text Classification with Leveled Homomorphic Encryption (presentation slides)
	James Beck (Australian Taxation Office)	MLOps in the Australian Taxation Office (presentation slides)
	InKyung Choi (UNECE)	Survey on Machine Learning Trianing Needs (presentation slides)
29 Sept	Alex Measure (Bureau of Labour Statistics, USA)	Linking fatal work-related injuries with machine learning even when the names are missing (presentation slides)
29 Sept	Marc Ponsen (Statistics Netherlands)	WordGraph2Vec: using language constructs to create sentence embeddings (presentation slides)
28 June	Arie Wahyu Wijayanto (BPS Indonesia)	Feasibility Study of Satellite Imagery Analysis for Wealth Index Development in Indonesia (presentation slides)
28 June	Shirin Roshanafshar and Joanne Yoon (Statistics Canada)	2021 Census Comment Classification Machine Learning PoC (presentation slides not available for public)
24 May	Valery Dongmo-Jiongo (Statistics Canada)	Webscrapped data and ML for CPI (presentation slides)
	Markie Muryawan (UN Statistics Division)	AIS Data Task Team and Global Platform (presentation slides)
	Thanasis Anthopoulos (Office of National Statistics, UK)	Sic/Soc ML classification project (presentation slides not available for public)
26 April	Kate Burnett-Isaacs (Statistics Canada)	HLG-MOS Synthetic Data Project (presentation slides)
22 March	Sigrid van Hoek (Statistics Netherlands)	Fair algorithms project (presentation slides)
22 March	Lily O'Flynn and Simon Whitworth (Statistics Authority, UK)	UK SA Data Ethics (presentation slides)
23 Feb	Riitta Piela and Rok Platinovsek (Statistics Finland)	Best practices in maintaining the quality of data in ML developments (presentation slides)
	Casper Eriksen (Danish Business Authority)	Multilingual Classification of Economic Activities (presentation slides)
	Michael Reusens (Statistics Flanders)	WS1 Theme 5: Transferring Knowledge and Experience (presentation slides)