Theme | Title | Country/Organisation | Data Source | ML methods | Programme code availability | Programming Language | Note |
---|
Coding & Classification | Occupation and Economic activity coding using natural language processing | Mexico | Survey data | Extra tree, Naive bayes, XGBoost, Support vector machine, Multilayer perceptron, Decision tree, Random forest, K-nearest neighbors, Logistic regression, Ensemble | | Python |
|
Coding & Classification | Industry and Occupation Coding | Canada | Survey data | | Yes (Click GitHub link) | Python |
|
Coding & Classification | Sentiment Analysis of twitter data | Belgium (Statistics Flanders) | Social media data | Word embedding, Logistic regression, XGBoost, Random forest | Yes (Click GitHub link) | Python |
|
Coding & Classification | Coding textually described data on economic activity collected from Labour Force Survey | Serbia | Survey data | Random forest, Support vector machine, Logistic regression |
|
|
|
Coding & Classification | Coding Workplace Injury and Illness | USA BLS | Survey data | | Yes (Click GitHub link) | Python |
|
Coding & Classification | Production description to ECOICOP | Poland | Web scraping data | Naive bayes, Logistic regression, Random forest, Support vector machine, Neural network | Yes (Click Github link) | Python |
|
Coding & Classification | Pilot Phase - Automated Coding using the IMF’s Catalog of Time Series Phase 2 - Automated production tool to code IMF member state time series data using ML algorithms | IMF | Descriptions of indicators in data files | Logistic regression, K-nearest neighbors |
| Python |
|
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | Iceland | Survey data | Deep learning | Yes (See section 5 of the report) | R |
|
Coding & Classification | Standard Industrial Code Classification by Using Machine Learning | Norway | Administrative data | Logistic regression, Random forest, Naive bayes, Support vector machine, FastText, Neural network |
| Python |
|
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | Italy | Administrative data, Survey data, Census data | Multilayer perceptron, Log linear | Yes (Click GitHub link) | Python |
|
Edit & Imputation | Imputation in the sample survey on participation of Polish residents in trips | Poland | Survey data | CART, Random forest, Optimal weighted nearest neighbor, Support vector machine |
| R |
|
Edit & Imputation | Machine learning for imputation | Germany | Survey data | K-nearest neighbors, Bayesian network, Random forest, Support vector machine |
| R |
|
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Belgium (VITO) |
| Lasso regression, Linear regression, Neural network, Random forest, Ridge regression | Yes (Click GitHub link) | Python |
|
Edit & Imputation | Editing of Living Cost and Food Survey Income data | UK | Survey data | Decision tree, Random forest, Neural network |
|
|
|
Edit & Imputation | Editing in the Italian Register of the Public Administration | Italy | Administrative data | Decision tree, Random forest |
| R |
|
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Italy |
|
|
|
|
|
Imagery Analysis | | Australia | Aerial imagery | Convolutional neural network |
| R |
|
Imagery Analysis | Learning statistical information from images: a proof of concept | Netherlands | Aerial imagery, Satellite imagery | Convolutional neural network |
| Python |
|
Imagery Analysis | Arealstatistik Deep Learning (ADELE) | Switzerland | Satellite imagery, Administrative data | Convolutional neural network, Random forest |
| Python | Land cover statistics, Land use statistics |
Imagery Analysis | Use of Landsat satellite data for the mapping of urban areas in non-census years | Mexico | Satellite imagery | Convolutional neural network, Extra tree |
| Python |
|
Imagery Analysis | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | UNECE |
| |
|
|
|
Coding & Classification | Automated coding of Standard Industrial and Occupational Classifications (SIC/SOC) | UK | Survey data, Census data | Logistic regression | Yes (Click Github link) | Python |
|
Coding & Classification | Apply ML techniques to classification and aggregation web scraped price data | Brazil | Web scraping data | Logistic regression, Support vector machine, Naive bayes, Random forest, XGBoost |
| Python |
|
Edit & Imputation | Multiple imputation through machine learning in a survey of sport clubs | Poland | Survey data | Random forest, CART |
| R |
|
Modeling | State level expenditure estimates based on ML techniques | US | Survey data, Census planning data | Gradient-boosting machine, Lasso regression, K-nearest neighbors |
|
|
|
Route Optimisation | Route Optimisation through genetic algorithm | Chile |
| Genetic algorithm |
| R |
|
Coding & Classification | Using Big Data Tools and Machine Learning Techniques to Assign Classification of Individual Consumption by Purpose (COICOP) Categories | Turkey | Survey data, Imagery data, Scanner data | Logistic regression, Support vector machine, Naive bayes, BERT, Convolutional neural network |
| Python |
|
Imagery Analysis | Feasibility study of Satellite Imagery Analysis for Wealth Index Development in Indonesia | Indonesia | Satellite imagery | Convolutional neural network, Ridge regression, Support vector machine |
|
|
|
| Three projects (Scrape an ICT variable, Gain insights from an open-ended question, Create a framework for government R&D survey) | Türkiye | Web scraping data | Top2Vec | Yes (Github link for Project 1, Project 2, Project 3) | Python |
|
| Statistics on companies undertaking activities in the field of corporate social responsibility (CSR) using web scraping and machine learning | ML2022 web scraping theme group report | Belgium, Türkiye, Poland | Web scraping data | Yes (from Türkiye - link) | |
|
|
|
|
| Unsupervised ranking and categorisation of companies using web scraping and machine learning | Belgium (Statistics Flanders) | Web scraping dataPython |
|
|
|
|