Page tree

Building on the work of the UNECE HLG-MOS Machine Learning Project (2019-2020), the UK Office for National Statistics Data Science Campus, in partnership with the UNECE, launched a new international initiative, the Machine Learning Group 2021, in January this year. The objectives of the Group include:

  • Facilitate the creation, development and implementation of research projects and skill-building activities that meet the global statistical community’s needs.
  • Build and engage a strong machine learning community by sharing resources and good practice, exchanging ideas and experiences, and keeping abreast of developments in the field.
  • Offer open, shareable, and easily accessible resources to the community; and
  • Facilitate machine learning capacity building for official statistics.

The research work of the ML Group is divided into 5 Work Streams (WS) that aimed to address different issues that arise when using machine learning for official statistics (see “ML Group 2021 Work Streams Outputs” below for more information for each work stream and outputs). The monthly ML Group meetings throughout the year has built a community where members can share experiences, build connections and keep up to date with the new developments (see “ML Group 2021 Monthly Meeting Presentations” below for more information). The Coffee and Coding sessions, training materials collected as well as reports from various Work Streams will help facilitate the learning the ML.

The ML Group that started with 120 members has now grown to about 250 members from 33 countries and 5 international organizations who either lead, assist or follow the numerous activities under the ML Group. You can find a summary of the group's work in 2021 in its final report here.

The international efforts for advancing the use of ML for official statistics continue in 2022, read more about Machine Learning Group 2022 here




ML Group 2021 Work Streams Outputs



Work Stream 1 (WS1) - From Idea to Valid Solution 

The pilot studies are conducted to assess the added value of ML in various thematic areas: coding and classification, edit and imputation the use of imagery data, modeling and route optimization. A study conducted on the replication experience highlighted that benefits of sharing theses ML projects.

ThemePaper
Coding and Classification


Brazil - Apply ML techniques to classification and aggregation web scraped price data
Turkey - Using Big Data Tools and Machine Learning Techniques to Assign Classification of Individual Consumption by Purpose (COICOP) Categories - Full report (coming soon)
Chile - Coding and Classification: Automated coding of classifiers as a shared service - Full report (coming soon)
Poland - Using ML classify unstructured information hidden in the text description of real estate advertisements - Full report (coming soon)
UK - Automated coding of Standard Industrial and Occupational Classifications (SIC/SOC) with github repo
Edit and ImputationPoland - Multiple imputation through machine learning in a survey of sport clubs
Imagery AnalysisMalaysia - Estimating Malaysia Rubber Plantation Area Productivity Using Satellite Imagery and Machine - Full report (coming soon)
Indonesia - Feasibility study of Satellite Imagery Analysis for Wealth Index Development in Indonesia
Modeling US (BLS) - State level expenditure estimates based on ML techniques
Route OptimisationChile - Route Optimisation through genetic algorithm
ReplicationBelgium (Flanders) - Replicating successful data science projects across NSOs

The reports and codes (if available) from the WS1 pilot studies are also available at Studies and Codes 


Work Stream 2 (WS2) - From Valid Solution to Production 

The WS2 explores the issues around the operationalization of machine learning solutions, it consists of three activities from IMF and INEGI (Mexico). 

Paper
IMF - Automated production tool to code IMF member state time series data using ML algorithms - Full report (coming soon)
INEGI (Mexico) - Deployment of a Data Lake architecture to put into production data science projects - Full report (coming soon)
INEGI (Mexico) - Design and assess a whole workflow to enable Natural Language Processing and Machine Learning methodologies to be integrated into a continuous production process - Full report (coming soon)
WS2 team - Journey from Experiment to Production

Work Stream (WS3) - Ethical Consideration in the Use of ML for Research and Statistics

Led by UK Statistics Authority

This high-level guidance explores ethical considerations associated with the use of machine learning techniques for research and statistical purposes. This guidance is not exhaustive, but aims to assist and support analysts, researchers, data scientists, and statisticians navigating the ethical issues surrounding machine learning based projects.

Click here for full report

Work Stream 4 (WS4) - Model Retraining

Led by Statistics Finland

The WS4 identifies the circumstances under which an ML model should be retrained in order to maintain the predictive power and quality of the model.

Full report (coming soon)

Work Stream 5 (WS5) - Quality Framework for Statistical Algorithm

Led by INEGI (Mexico)

The WS5 explores the dimensions of Quality Framework for Statistical Algorithm (QF4SA) in a consolidated project to analyze an output based on a set of standard metrics and procedures

Click here for full report

ML Group 2021 Monthly Meeting Presentations 

DateSpeakerPresentation
27 OctSaeid Molladavoudi (Statistics Canada)Supervised Text Classification with Leveled Homomorphic Encryption (presentation slides)
James Beck (Australian Taxation Office)MLOps in the Australian Taxation Office (presentation slides)
InKyung Choi (UNECE)ML Training Needs survey results (presentation slides)
29 SeptAlex Measure (Bureau of Labour Statistics, USA)Linking fatal work-related injuries with machine learning even when the names are missing (presentation slides)
Marc Ponsen (Statistics Netherlands)WordGraph2Vec: using language constructs to create sentence embeddings (presentation slides)
28 June

Arie Wahyu Wijayanto (BPS Indonesia)

Feasibility Study of Satellite Imagery Analysis for Wealth Index Development in Indonesia (presentation slides)
Shirin Roshanafshar and Joanne Yoon (Statistics Canada)2021 Census Comment Classification Machine Learning PoC  (presentation slides - to be uploaded)
24 MayValery Dongmo-Jiongo (Statistics Canada)Webscrapped data and ML for CPI (presentation slides)
Markie Muryawan (UN Statistics Division)AIS Data Task Team and Global Platform (presentation slides)
Thanasis Anthopoulos (Office of National Statistics, UK)Sic/Soc ML classification project (presentation slides not available for public)
26 April

Kate Burnett-Isaacs (Statistics Canada)

HLG-MOS Synthetic Data Project (presentation slides)
22 MarchSigrid van Hoek (Statistics Netherlands)Fair algorithms project (presentation slides)
Lily O'Flynn and Simon Whitworth (Statistics Authority, UK)UK SA Data Ethics (presentation slides)
23 Feb

Riitta Piela and Rok Platinovsek (Statistics Finland)

Best practices in maintaining the quality of data in ML developments (presentation slides)
Casper Eriksen (Danish Business Authority)Multilingual Classification of Economic Activities (presentation slides)
Michael Reusens (Statistics Flanders) 

WS1 Theme 5: Transferring Knowledge and Experience (presentation slides)

  • No labels
Report inappropriate content