- This page contains pilot studies conducted under the HLG-MOS Machine Learning Project / ONS-UNECE Machine Learning Group and programming codes (if available). If you want your study or code to be added, please contact UNECE
- You can search by Theme, ML method, Programme code availability and Programming Language using the filter below.
Theme | Title | Country/Organisation | Data Source | ML methods | Programme code availability | Programming Language | Note |
---|---|---|---|---|---|---|---|
Coding & Classification | Occupation and Economic activity coding using natural language processing | Mexico | Survey data | Extra tree, Naive bayes, XGBoost, Support vector machine, Multilayer perceptron, Decision tree, Random forest, K-nearest neighbors, Logistic regression, Ensemble | Yes (Click File attachment) | Python | |
Coding & Classification | Canada | Survey data | FastText | Yes (Click GitHub link) | Python | ||
Coding & Classification | Belgium (Statistics Flanders) | Social media data | Word embedding, Logistic regression, XGBoost, Random forest | Yes (Click GitHub link) | Python | ||
Coding & Classification | Coding textually described data on economic activity collected from Labour Force Survey | Serbia | Survey data | Random forest, Support vector machine, Logistic regression | |||
Coding & Classification | USA BLS | Survey data | Neural network | Yes (Click GitHub link) | Python | ||
Coding & Classification | Poland | Web scraping data | Naive bayes, Logistic regression, Random forest, Support vector machine, Neural network | Yes (Click Github link) | Python | ||
Coding & Classification | Pilot Phase - Automated Coding using the IMF’s Catalog of Time Series Phase 2 - Automated production tool to code IMF member state time series data using ML algorithms | IMF | Descriptions of indicators in data files | Logistic regression, K-nearest neighbors | Python | ||
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | Iceland | Survey data | Deep learning | Yes (See section 5 of the report) | R | |
Coding & Classification | Standard Industrial Code Classification by Using Machine Learning | Norway | Administrative data | Logistic regression, Random forest, Naive bayes, Support vector machine, FastText, Neural network | Python | ||
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | Italy | Administrative data, Survey data, Census data | Multilayer perceptron, Log linear | Yes (Click GitHub link) | Python | |
Edit & Imputation | Imputation in the sample survey on participation of Polish residents in trips | Poland | Survey data | CART, Random forest, Optimal weighted nearest neighbor, Support vector machine | R | ||
Edit & Imputation | Germany | Survey data | K-nearest neighbors, Bayesian network, Random forest, Support vector machine | R | |||
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Belgium (VITO) | Lasso regression, Linear regression, Neural network, Random forest, Ridge regression | Yes (Click GitHub link) | Python | ||
Edit & Imputation | UK | Survey data | Decision tree, Random forest, Neural network | ||||
Edit & Imputation | Editing in the Italian Register of the Public Administration | Italy | Administrative data | Decision tree, Random forest | R | ||
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Italy | |||||
Imagery Analysis | Australia | Aerial imagery | Convolutional neural network | R | |||
Imagery Analysis | Learning statistical information from images: a proof of concept | Netherlands | Aerial imagery, Satellite imagery | Convolutional neural network | Python | ||
Imagery Analysis | Switzerland | Satellite imagery, Administrative data | Convolutional neural network, Random forest | Python | Land cover statistics, Land use statistics | ||
Imagery Analysis | Use of Landsat satellite data for the mapping of urban areas in non-census years | Mexico | Satellite imagery | Convolutional neural network, Extra tree | Python | ||
Imagery Analysis | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | UNECE | |||||
Coding & Classification | Automated coding of Standard Industrial and Occupational Classifications (SIC/SOC) | UK | Survey data, Census data | Logistic regression | Yes (Click Github link) | Python | |
Coding & Classification | Apply ML techniques to classification and aggregation web scraped price data | Brazil | Web scraping data | Logistic regression, Support vector machine, Naive bayes, Random forest, XGBoost | Python | ||
Edit & Imputation | Multiple imputation through machine learning in a survey of sport clubs | Poland | Survey data | Random forest, CART | R | ||
Modeling | State level expenditure estimates based on ML techniques | US | Survey data, Census planning data | Gradient-boosting machine, Lasso regression, K-nearest neighbors | |||
Route Optimisation | Route Optimisation through genetic algorithm | Chile | Genetic algorithm | R | |||
Coding & Classification | Using Big Data Tools and Machine Learning Techniques to Assign Classification of Individual Consumption by Purpose (COICOP) Categories | Turkey | Survey data, Imagery data, Scanner data | Logistic regression, Support vector machine, Naive bayes, BERT, Convolutional neural network | Python | ||
Imagery Analysis | Feasibility study of Satellite Imagery Analysis for Wealth Index Development in Indonesia | Indonesia | Satellite imagery | Convolutional neural network, Ridge regression, Support vector machine | |||
Coding & Classification | Three projects (Scrape an ICT variable, Gain insights from an open-ended question, Create a framework for government R&D survey) | Türkiye | Web scraping data | Top2Vec | Python | ||
Coding & Classification | Statistics on companies undertaking activities in the field of corporate social responsibility (CSR) using web scraping and machine learning | Poland | Web scraping data | ||||
Coding & Classification | Unsupervised ranking and categorisation of companies using web scraping and machine learning | Belgium (Statistics Flanders) | Web scraping data |