Building on the work of the UNECE HLG-MOS Machine Learning Project (2019-2020), the UK Office for National Statistics Data Science Campus, in partnership with the UNECE, launched a new international initiative, the Machine Learning Group 2021, in January this year. The objectives of the Group include:
- Facilitate the creation, development and implementation of research projects and skill-building activities that meet the global statistical community’s needs.
- Build and engage a strong machine learning community by sharing resources and good practice, exchanging ideas and experiences, and keeping abreast of developments in the field.
- Offer open, shareable, and easily accessible resources to the community; and
- Facilitate machine learning capacity building for official statistics.
The research work of the ML Group is divided into 5 Work Streams (WS) that aimed to address different issues that arise when using machine learning for official statistics (see “ML Group 2021 Work Streams Outputs” below for more information for each work stream and outputs). The monthly ML Group meetings throughout the year has built a community where members can share experiences, build connections and keep up to date with the new developments (see “ML Group 2021 Monthly Meeting Presentations” below for more information). The Coffee and Coding sessions, training materials collected as well as reports from various Work Streams will help facilitate the learning the ML.
The ML Group that started with 120 members has now grown to about 250 members from 33 countries and 5 international organizations who either lead, assist or follow the numerous activities under the ML Group. You can find a summary of the group's work in 2021 in its final report here.
The international efforts for advancing the use of ML for official statistics continue in 2022, read more about Machine Learning Group 2022 here.
ML Group 2021 Journey
- November (2020) - Planning for ML 2021 started
- January - ML 2021 collecting ideas for 2021
- January 29 - ML 2021 First Monthly Meeting
- April 22 - Joint meeting with UN-CEBD AIS Task Team
- June 6 - Coffee and coding session #1 Life Cycle of ML project
- June 30 - Road to Bern Webinar on Data Science and Official Statistics
- September 16 - Coffee and coding Session #2 Design and Execution of ML project
- November 19 - ML 2021 Public Webinar
- December 20 ML 2021 Conclusion
ML Group 2021 Work Streams Outputs
Work Stream 1 (WS1) - From Idea to Valid Solution
The pilot studies are conducted to assess the added value of ML in various thematic areas: coding and classification, edit and imputation the use of imagery data, modeling and route optimization. A study conducted on the replication experience highlighted that benefits of sharing theses ML projects.
The reports and codes (if available) from the WS1 pilot studies are also available at Studies and Codes
Work Stream 2 (WS2) - From Valid Solution to Production
The WS2 explores the issues around the operationalization of machine learning solutions, it consists of three activities from IMF and INEGI (Mexico).
Paper |
IMF - Automated production tool to code IMF member state time series data using ML algorithms |
INEGI (Mexico) - Deployment of a Data Lake architecture to put into production data science projects |
INEGI (Mexico) - Design and assess a whole workflow to enable Natural Language Processing and Machine Learning methodologies to be integrated into a continuous production process |
WS2 team - Journey from Experiment to Production |
Work Stream (WS3) - Ethical Consideration in the Use of ML for Research and Statistics
Led by UK Statistics Authority This high-level guidance explores ethical considerations associated with the use of machine learning techniques for research and statistical purposes. This guidance is not exhaustive, but aims to assist and support analysts, researchers, data scientists, and statisticians navigating the ethical issues surrounding machine learning based projects. |
Work Stream 4 (WS4) - Model Retraining
Led by Statistics Finland The WS4 identifies the circumstances under which an ML model should be retrained in order to maintain the predictive power and quality of the model. |
Work Stream 5 (WS5) - Quality Framework for Statistical Algorithm
Led by INEGI (Mexico) The WS5 explores the dimensions of Quality Framework for Statistical Algorithm (QF4SA) in a consolidated project to analyze an output based on a set of standard metrics and procedures |
ML Group 2021 Monthly Meeting Presentations
Date | Speaker | Presentation |
---|---|---|
27 Oct | Saeid Molladavoudi (Statistics Canada) | Supervised Text Classification with Leveled Homomorphic Encryption (presentation slides) |
James Beck (Australian Taxation Office) | MLOps in the Australian Taxation Office (presentation slides) | |
InKyung Choi (UNECE) | Survey on Machine Learning Trianing Needs (presentation slides) | |
29 Sept | Alex Measure (Bureau of Labour Statistics, USA) | Linking fatal work-related injuries with machine learning even when the names are missing (presentation slides) |
Marc Ponsen (Statistics Netherlands) | WordGraph2Vec: using language constructs to create sentence embeddings (presentation slides) | |
28 June | Arie Wahyu Wijayanto (BPS Indonesia) | Feasibility Study of Satellite Imagery Analysis for Wealth Index Development in Indonesia (presentation slides) |
Shirin Roshanafshar and Joanne Yoon (Statistics Canada) | 2021 Census Comment Classification Machine Learning PoC (presentation slides not available for public) | |
24 May | Valery Dongmo-Jiongo (Statistics Canada) | Webscrapped data and ML for CPI (presentation slides) |
Markie Muryawan (UN Statistics Division) | AIS Data Task Team and Global Platform (presentation slides) | |
Thanasis Anthopoulos (Office of National Statistics, UK) | Sic/Soc ML classification project (presentation slides not available for public) | |
26 April | Kate Burnett-Isaacs (Statistics Canada) | HLG-MOS Synthetic Data Project (presentation slides) |
22 March | Sigrid van Hoek (Statistics Netherlands) | Fair algorithms project (presentation slides) |
Lily O'Flynn and Simon Whitworth (Statistics Authority, UK) | UK SA Data Ethics (presentation slides) | |
23 Feb | Riitta Piela and Rok Platinovsek (Statistics Finland) | Best practices in maintaining the quality of data in ML developments (presentation slides) |
Casper Eriksen (Danish Business Authority) | Multilingual Classification of Economic Activities (presentation slides) | |
Michael Reusens (Statistics Flanders) | WS1 Theme 5: Transferring Knowledge and Experience (presentation slides) |