The Machine Learning Group 2022 is an international platform that aims to advance the use of ML in the official statistics field. The ML Group 2022 brings together more than 400 people from 35+ different countries and 20+ international organizations and is coordinated by the UK Office for National Statistics Data Science Campus and the UNECE HLG-MOS. The objectives of the Group include:
- Knowledge exchange: Joining a group to share knowledge and experience on a topic of common interest on a regular basis. (e.g. a study group or discussion group)
- Research support. Providing feedback and advice to other ML Group members working on their own research projects, or receiving input on your own project.
- Research collaboration. Working on a common topic or data set with other group members with a primary aim of delivering the output together
The ML Group 2022 is focusing its activities on a number of key themes which are important to advancing our understanding of the added value of ML for official statistics and how it can best be integrated into statistical systems. Members have formed small groups to work together on knowledge exchange and research activities aimed at delivering common outputs. The groups are organised and run by members themselves, who contribute in different ways depending on their interests and the objectives of the group. The summary of group activities can be found in "ML Group 2022 Theme Group Outputs" panel below.
The monthly ML Group meetings throughout the year provided a platform where members can share experiences, build connections and keep up to date with the new developments (see “ML Group 2022 Monthly Meeting Presentations” panel below for more information). The Group also ran three Coffee and Coding events, two of them are available (see "Coffee and Coding Session" panel below).
THE FINAL REPORT IS AVAILABLE HERE .
ML Group 2022 Journey
- Dec 2021 - Planning for ML 2022 started
- Dec 2021~Jan - collection of ideas
- Jan 26 - ML 2022 launch at Dubai Expo
- Feb 9 - ML 2022 First Meeting
- April 27 - Coffee and coding session on ML Foundations
- July 12-14 Sprint in Newport, UK
- Nov 2 - Coffee and coding session on introduction to Git and GitHub
- Nov 30 - Webinar
- Dec 20 - Conclusion
ML Group 2022 Theme Group Outputs
Theme Group - Web-scraped Data
Led by Statistics Flanders The web holds a great potential to complete the traditional statistical production as it contains an immense amount of information that is relevant for almost any policy domain. However, transforming web data to trustworthy statistics is not straightfoward with numerous technical and methodological challenges. In the Theme Group, three organisations, Statistics Flanders, Statistics Poland and Turkish Statistical Institute, implemented experimental statistics using web scraped data in parallel for the production of identifying companies engaging in AI activity, R&D activity, corporate social responsibility activity respectively. Click here for full report to read more about the implementation studies from the three organisations. |
Theme Group - Text Classification
Classifying textual response into predefined categories (e.g., job description into the Standard Occupational Classification (SOC), twitter comment into positive/negative sentiment) is one of common tasks that statistical organisations conduct when producing statistics. Traditionally, this used to be done manually or through a complex rule-based system, both of which are costly, time-consuming and hard to manage. With the advance of natural language processing and machine learning techniques, ML can help statistical organisations conduct this task in a more efficient way. This Theme Group provided a knowledge exchange platform for those working on text classification in statistical organisations to share their works, receive feedback from peers and discuss on common challenges. Click here for full report to read more about the activity of the group in 2022, key observations made throughout the year and NLP resources recommended. |
Theme Group - Imagery Analysis
Led by Statistics Netherlands and ONS This group will focus on the use of machine learning for earth observation data. Its objectives are expected to concentrate on capability building and the research projects proposed by members (see right) Click here for full report |
Theme Group - AIS Data
Led by CSO Ireland and Norwegian School of Economics The Theme Group explored methods to identify the berth areas using AIS data. Due to the vast size of data, the raw data was filtered through H3 index. During the regular meetings, various geospatial objects handling methods were introduced. Click here for the cookbook for creating Berth Polygons based on AIS data |
Theme Group - Quality of Training Data
Led by Statistics Netherlands This group explored issues related to human annotation process and sampling methods to obtain representative training sets. Click here for the final report |
Theme Group - Model Retraining
Led by UNECE The Group examined the key concepts around the drifts (e.g., drifts in data, drifts in model), methods for monitoring and detecting those drifts (e.g., performance based-approach, distribution-based approach). It also discussed the implications for statistical organisations (pros, cons), and factors that enabling the monitoring and re-training. Click here for the full report |
Theme Group - Infrastructure
Led by Statistics Sweden This group aims to share experiences of statistical organisations in developing open base platforms for ML data processing and analysis that will enable collaboration with external research partners. Click here for the full report "Building an ML Ecosystem in Statistical Organisations" |
ML Group 2022 Monthly Meeting Presentations
Date | Speaker | Presentation |
---|---|---|
October 26 | Florian Dumpert (Federal Statistical Office of Germany) | Workshop on Quality Aspects of Machine Learning - presentation slides |
Javier Oyarzun, Laura Wile (Statistics Canada) | Quality Control of Machine Learning Coding: A Statistics Canada experience - presentation slides | |
September 21 | Yuhua Li (Cardiff University, UK) | Covariate shift detection based on exponentially weighted moving average (presentation slides) |
Riitta Piela (Statistics Finland) | Reaching for MLOps Level 1 at Statistics Finland (presentation slides) | |
August 31 | Summer Wang (Australian Bureau of Statistics) | Raising Survey Response Rates by Using Machine Learning to Predict Gold Providers (presentation slides) |
Saeid Molladavoudi (Statistics Canada) | Statistics Canada’s Framework for Responsible ML (presentation slides) | |
June 15 | Piet Daas (CBS Netherlands) | Using web site texts to identify different types of companies (presentation slides) |
David Corney (Full Fact, UK) | How to stop people misusing statistics: Automatic verification of statistical claims (presentation slides) | |
May 4 | Florian Dumpert (Federal Statistical Office of Germany) | Quality Framework for Statistical Algorithms (presentation slides) |
April 6 | Abel Dasylva (Statistics Canada) | Estimating linkage errors without training data and without assumptions about the interactions among the linkage variables (presentation slides) |
Joep Burger (CBS Netherlands) | Convolutional neural networks for learning target variables and extracting image features from Earth Observation (presentation slides) | |
March 2 | Ingmar Weber (Qatar Computing Research Institute) | Using Advertising Data to Model Digital Gender Gaps and Poverty (presentation slides) |
Ralf Becker (UN Statistics Division) | Introduction to the new UN Big Data Training Catalogue (presentation slides) |
Coffee and Coding Session
Date | Speaker | Presentation |
---|---|---|
January 26 | Alex Noyvirt and Claus Sthamer (UK ONS) |
|
April 27 | Tom Wise (UK ONS) | Machine Learning foundations and focused on the theory behind these techniques.
|
November 2 | Tabitha Williams and Brittny Vongdara (Statistics Canada) | In this session, Tabitha Williams and Brittny Vongdara from Statistics Canada provided an interactive lesson on using GitHub, and an introduction to Git. Topics covered included forking a repository, making a commit, collaboration, and how to avoid uploading your data on GitHub. The session also included the theory and a discussion on the difference between GitHub and Git, what a Git project looks like normally, and best practices. |
ML Group Sprint - Newport, UK
ML Group Sprint | |
---|---|
12-14 July | Members of the Machine Learning Group 2022 met at the UK Data Science Campus in Newport, UK on 12-14 July for an in-person sprint. The aim was to accelerate the work of three of this year's theme groups: web scraping data, model retraining and quality of training data. The meeting was a great opportunity for in-person discussions and knowledge exchange between the three groups. You can read more about the sprint in this report. |