As it progressed, the Machine Learning project was informed about other developments of ML to produce official statistics. In particular, during a series of virtual sessions held in October 2020, several speakers were invited to provide an introduction on ML developments conducted in their statistical organisations, It is important to note that they were not carried out within the ML project. The presentations are shared to further highlight the interest in advancing the use of ML.
| Main statistical process | Development | Data source |
|---|---|---|
| Question design | USA - Automated double-barreled question classification using machine learning BigSurv20 - Big Data meets Survey Science | Survey |
USA - Using generative adversarial active learning to identify poor closed-ended survey responses BigSurv20 - Big Data meets Survey Science | Survey | |
Improving SHARE translation verification BigSurv20 - Big Data meets Survey Science | Survey | |
| Text classification | Belgium Flanders - A better statistic on innovative companies in Flanders using web scraping and machine learning | Web scraped |
| Administrative and metadata | ||
| UK - Automated classification of web scraped clothing data in consumer price statistics | Web scraped | |
| Write-in responses | ||
USA USCB - Shared AI Services Hosting Application | Write-in responses | |
| Switzerland - Automation of General Classification of Economic Activities coding - NOGAuto | All sources | |
| Paper documents | ||
| Germany - Using supervised classification for categorizing answers to an open-ended question on panel participation motivation BigSurv20 - Big Data meets Survey Science | Write-in responses | |
USA - A framework for using machine learning to support qualitative data coding BigSurv20 - Big Data meets Survey Science | Write-in responses | |
USA - Training deep learning models with active learning framework to classify “other (please specify)“ comments BigSurv20 - Big Data meets Survey Science | Write-in responses | |
USA - Measuring the validity of open-ended questions: Application of unsupervised learning methods BigSurv20 - Big Data meets Survey Science | Write-in responses | |
USA - A text mining and machine learning platform to classify businesses into NAICS codes BigSurv20 - Video | Combination of sources | |
Netherlands - Prediction of author’s educational background using text mining BigSurv20 - Video | ||
| Netherlands - Detecting innovative companies via the text on their website BigSurv20 - Video | ||
| Netherlands - Evaluating and improving a text classifier for subpopulations: the case of cyber crime BigSurv20 - Presentation | ||
| Record linkage or matching | Canada - Machine Learning for Record Linkage at Statistics Canada | All sources |
| USA BLS - Matching fatal injury records with supervised machine learning | Survey and administrative | |
| Edit and Imputation | All sources | |
Australia - Census Occupancy Imputation for Census 2021 | Census | |
Australia - Repairing Big Data sets using KNN | Combination of sources | |
Switzerland - Data validation with machine learning - Plausi++ (document) Switzerland - Improving Data Validation using Machine Learning (presentation) | Administrative | |
| Estimation and Analysis | Combination of sources | |
OECD - Nowcasting Services Trade | Aggregates | |
| Switzerland - ML_SoSi: Individual trajectories in the social security system | Combination of sources | |
Austria - LEARN4SDGis–A machine learning based poverty mapping exercise in Austria BigSurv20 - Big Data meets Survey Science BigSurv20 - Big Data meets Survey Science | ||
Germany - Using administrative data and machine learning to address nonresponse bias in establishment surveys BigSurv20 - Big Data meets Survey Science | Administrative |