| Theme | Title | Country/Organisation | Data Source | ML methods | Programme code availability | Programming Language | Note |
|---|
| Coding & Classification | Occupation and Economic activity coding using natural language processing | Mexico | Survey data | Extra tree, Naive bayes, XGBoost, Support vector machine, Multilayer perceptron, Decision tree, Random forest, K-nearest neighbors, Logistic regression, Ensemble | | Python |
|
| Coding & Classification | Industry and Occupation Coding | Canada | Survey data | | Yes (Click GitHub link) | Python |
|
| Coding & Classification | Sentiment Analysis of twitter data | Belgium Flanders | Social media data | Word embedding, Logistic regression, XGBoost, Random forest | Yes (Click GitHub link) | Python |
|
| Coding & Classification | Coding textually described data on economic activity collected from Labour Force Survey | Serbia | Survey data | Random forest, Support vector machine, Logistic regression |
|
|
|
| Coding & Classification | Coding Workplace Injury and Illness | USA | Survey data | | Yes (Click GitHub link) | Python |
|
| Coding & Classification | Production description to ECOICOP | Poland | Web scraping data | Naive bayes, Logistic regression, Random forest, Support vector machine, Neural network | Yes (Click Github link) | Python |
|
| Coding & Classification | Automated Coding using the IMF’s Catalog of Time Series | IMF |
|
|
|
|
|
| Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | Iceland | Survey data | Deep learning | Yes (See section 5 of the report) | R |
|
| Coding & Classification | Standard Industrial Code Classification by Using Machine Learning | Norway | Administrative data | Logistic regression, Random forest, Naive bayes, Support vector machine, FastText, Neural network |
| Python |
|
| Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | Italy | Administrative data, Survey data, Census data | Multilayer perceptron, Log linear | Yes (Click GitHub link) | Python |
|
Edit & Imputation | Imputation in the sample survey on participation of Polish residents in trips | Poland | Survey data | CART, Random forest, Optimal weighted nearest neighbor, Support vector machine |
| R |
|
| Edit & Imputation | Machine learning for imputation | Germany | Survey data | K-nearest neighbors, Bayesian network, Random forest, Support vector machine |
| R |
|
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Belgium VITO |
| Lasso regression, Linear regression, Neural network, Random forest, Ridge regression | Yes (Click GitHub link) | Python |
|
Edit & Imputation | Editing of Living Cost and Food Survey Income data | UK | Survey data | Decision tree, Random forest, Neural network |
|
|
|
Edit & Imputation | Editing in the Italian Register of the Public Administration | Italy | Administrative data | Decision tree, Random forest |
| R |
|
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Italy |
|
|
|
|
|
| Imagery Analysis | | Australia | Aerial imagery | Convolutional neural network |
| R |
|
| Imagery Analysis | Learning statistical information from images: a proof of concept | Netherlands | Aerial imagery, Satellite imagery | Convolutional neural network |
| Python |
|
| Imagery Analysis | Arealstatistik Deep Learning (ADELE) | Switzerland | Satellite imagery, Administrative data | Convolutional neural network, Random forest | To be made available | Python | Land cover statistics, Land use statistics |
| Imagery Analysis | Use of Landsat satellite data for the mapping of urban areas in non-census years | Mexico | Satellite imagery | Convolutional neural network, Extra tree |
| Python |
|
| Imagery Analysis | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | UNECE |
| |
|
|
|