Building on the work of the UNECE HLG-MOS Machine Learning Project (2019-2020), the UK Office for National Statistics Data Science Campus, in partnership with the UNECE, launched a new international initiative, the Machine Learning Group 2021, in January this year. The objectives of the Group include:

  • Facilitate the creation, development and implementation of research projects and skill-building activities that meet the global statistical community’s needs.
  • Build and engage a strong machine learning community by sharing resources and good practice, exchanging ideas and experiences, and keeping abreast of developments in the field.
  • Offer open, shareable, and easily accessible resources to the community; and
  • Facilitate machine learning capacity building for official statistics.

The research work of the ML Group is divided into 5 Work Streams (WS) that aimed to address different issues that arise when using machine learning for official statistics (see “ML Group 2021 Work Streams Outputs” below for more information for each work stream and outputs). The monthly ML Group meetings throughout the year has built a community where members can share experiences, build connections and keep up to date with the new developments (see “ML Group 2021 Monthly Meeting Presentations” below for more information). The Coffee and Coding sessions, training materials collected as well as reports from various Work Streams will help facilitate the learning the ML.

The ML Group that started with 120 members has now grown to about 250 members from 33 countries and 5 international organizations who either lead, assist or follow the numerous activities under the ML Group. You can find a summary of the group's work in 2021 in its final report here.

The international efforts for advancing the use of ML for official statistics continue in 2022, read more about Machine Learning Group 2022 here




ML Group 2021 Journey


ML Group 2021 Work Streams Outputs



Work Stream 1 (WS1) - From Idea to Valid Solution 

The pilot studies are conducted to assess the added value of ML in various thematic areas: coding and classification, edit and imputation the use of imagery data, modeling and route optimization. A study conducted on the replication experience highlighted that benefits of sharing theses ML projects.

ThemePaper
Coding and Classification


Brazil - Apply ML techniques to classification and aggregation web scraped price data
Turkey - Using Big Data Tools and Machine Learning Techniques to Assign Classification of Individual Consumption by Purpose (COICOP) Categories
Chile - Coding and Classification: Automated coding of classifiers as a shared service
Poland - Using ML classify unstructured information hidden in the text description of real estate advertisements
UK - Automated coding of Standard Industrial and Occupational Classifications (SIC/SOC) with github repo
Edit and ImputationPoland - Multiple imputation through machine learning in a survey of sport clubs
Imagery AnalysisMalaysia - Estimating Malaysia Rubber Plantation Area Productivity Using Satellite Imagery and Machine
Indonesia - Feasibility study of Satellite Imagery Analysis for Wealth Index Development in Indonesia
Modeling US (BLS) - State level expenditure estimates based on ML techniques
Route OptimisationChile - Route Optimisation through genetic algorithm
ReplicationBelgium (Flanders) - Replicating successful data science projects across NSOs

The reports and codes (if available) from the WS1 pilot studies are also available at Studies and Codes 


Work Stream 2 (WS2) - From Valid Solution to Production 

The WS2 explores the issues around the operationalization of machine learning solutions, it consists of three activities from IMF and INEGI (Mexico). 

Paper
IMF - Automated production tool to code IMF member state time series data using ML algorithms
INEGI (Mexico) - Deployment of a Data Lake architecture to put into production data science projects
INEGI (Mexico) - Design and assess a whole workflow to enable Natural Language Processing and Machine Learning methodologies to be integrated into a continuous production process
WS2 team - Journey from Experiment to Production

Work Stream (WS3) - Ethical Consideration in the Use of ML for Research and Statistics

Led by UK Statistics Authority

This high-level guidance explores ethical considerations associated with the use of machine learning techniques for research and statistical purposes. This guidance is not exhaustive, but aims to assist and support analysts, researchers, data scientists, and statisticians navigating the ethical issues surrounding machine learning based projects.

Click here for full report

Work Stream 4 (WS4) - Model Retraining

Led by Statistics Finland

The WS4 identifies the circumstances under which an ML model should be retrained in order to maintain the predictive power and quality of the model.

Click here for full report

Work Stream 5 (WS5) - Quality Framework for Statistical Algorithm

Led by INEGI (Mexico)

The WS5 explores the dimensions of Quality Framework for Statistical Algorithm (QF4SA) in a consolidated project to analyze an output based on a set of standard metrics and procedures

Click here for full report

ML Group 2021 Monthly Meeting Presentations 

DateSpeakerPresentation
27 OctSaeid Molladavoudi (Statistics Canada)Supervised Text Classification with Leveled Homomorphic Encryption (presentation slides)
James Beck (Australian Taxation Office)MLOps in the Australian Taxation Office (presentation slides)
InKyung Choi (UNECE)Survey on Machine Learning Trianing Needs (presentation slides)
29 SeptAlex Measure (Bureau of Labour Statistics, USA)Linking fatal work-related injuries with machine learning even when the names are missing (presentation slides)
Marc Ponsen (Statistics Netherlands)WordGraph2Vec: using language constructs to create sentence embeddings (presentation slides)
28 June

Arie Wahyu Wijayanto (BPS Indonesia)

Feasibility Study of Satellite Imagery Analysis for Wealth Index Development in Indonesia (presentation slides)
Shirin Roshanafshar and Joanne Yoon (Statistics Canada)2021 Census Comment Classification Machine Learning PoC (presentation slides not available for public)
24 MayValery Dongmo-Jiongo (Statistics Canada)Webscrapped data and ML for CPI (presentation slides)
Markie Muryawan (UN Statistics Division)AIS Data Task Team and Global Platform (presentation slides)
Thanasis Anthopoulos (Office of National Statistics, UK)Sic/Soc ML classification project (presentation slides not available for public)
26 April

Kate Burnett-Isaacs (Statistics Canada)

HLG-MOS Synthetic Data Project (presentation slides)
22 MarchSigrid van Hoek (Statistics Netherlands)Fair algorithms project (presentation slides)
Lily O'Flynn and Simon Whitworth (Statistics Authority, UK)UK SA Data Ethics (presentation slides)
23 Feb

Riitta Piela and Rok Platinovsek (Statistics Finland)

Best practices in maintaining the quality of data in ML developments (presentation slides)
Casper Eriksen (Danish Business Authority)Multilingual Classification of Economic Activities (presentation slides)
Michael Reusens (Statistics Flanders) 

WS1 Theme 5: Transferring Knowledge and Experience (presentation slides)