Login required to access the wiki. Please register to create your login credentials We apologize for any inconvenience this may cause, but please note that this step is necessary to protect your privacy and ensure a safer browsing experience. Thank you for your cooperation. Documents available for download: GAMSO , GSBPM , GSIM |
- This page contains pilot studies conducted under the HLG-MOS Machine Learning Project / ONS-UNECE Machine Learning Group and programming codes (if available). If you want your study or code to be added, please contact UNECE
- You can search by Theme, ML method, Programme code availability and Programming Language using the filter below.
Theme | Title | Country/Organisation | Data Source | ML methods | Programme code availability | Programming Language | Note |
---|---|---|---|---|---|---|---|
Coding & Classification | Occupation and Economic activity coding using natural language processing | Mexico | Survey data | Extra tree, Naive bayes, XGBoost, Support vector machine, Multilayer perceptron, Decision tree, Random forest, K-nearest neighbors, Logistic regression, Ensemble | Yes (Click File attachment) | Python | |
Coding & Classification | Canada | Survey data | FastText | Yes (Click GitHub link) | Python | ||
Coding & Classification | Belgium (Statistics Flanders) | Social media data | Word embedding, Logistic regression, XGBoost, Random forest | Yes (Click GitHub link) | Python | ||
Coding & Classification | Coding textually described data on economic activity collected from Labour Force Survey | Serbia | Survey data | Random forest, Support vector machine, Logistic regression | |||
Coding & Classification | USA BLS | Survey data | Neural network | Yes (Click GitHub link) | Python | ||
Coding & Classification | Poland | Web scraping data | Naive bayes, Logistic regression, Random forest, Support vector machine, Neural network | Yes (Click Github link) | Python | ||
Coding & Classification | Pilot Phase - Automated Coding using the IMF’s Catalog of Time Series Phase 2 - Automated production tool to code IMF member state time series data using ML algorithms | IMF | Descriptions of indicators in data files | Logistic regression, K-nearest neighbors | Python | ||
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | Iceland | Survey data | Deep learning | Yes (See section 5 of the report) | R | |
Coding & Classification | Standard Industrial Code Classification by Using Machine Learning | Norway | Administrative data | Logistic regression, Random forest, Naive bayes, Support vector machine, FastText, Neural network | Python | ||
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | Italy | Administrative data, Survey data, Census data | Multilayer perceptron, Log linear | Yes (Click GitHub link) | Python | |
Edit & Imputation | Imputation in the sample survey on participation of Polish residents in trips | Poland | Survey data | CART, Random forest, Optimal weighted nearest neighbor, Support vector machine | R | ||
Edit & Imputation | Germany | Survey data | K-nearest neighbors, Bayesian network, Random forest, Support vector machine | R | |||
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Belgium (VITO) | Lasso regression, Linear regression, Neural network, Random forest, Ridge regression | Yes (Click GitHub link) | Python | ||
Edit & Imputation | UK | Survey data | Decision tree, Random forest, Neural network | ||||
Edit & Imputation | Editing in the Italian Register of the Public Administration | Italy | Administrative data | Decision tree, Random forest | R | ||
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Italy | |||||
Imagery Analysis | Australia | Aerial imagery | Convolutional neural network | R | |||
Imagery Analysis | Learning statistical information from images: a proof of concept | Netherlands | Aerial imagery, Satellite imagery | Convolutional neural network | Python | ||
Imagery Analysis | Switzerland | Satellite imagery, Administrative data | Convolutional neural network, Random forest | Python | Land cover statistics, Land use statistics | ||
Imagery Analysis | Use of Landsat satellite data for the mapping of urban areas in non-census years | Mexico | Satellite imagery | Convolutional neural network, Extra tree | Python | ||
Imagery Analysis | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | UNECE | |||||
Coding & Classification | Automated coding of Standard Industrial and Occupational Classifications (SIC/SOC) | UK | Survey data, Census data | Logistic regression | Yes (Click Github link) | Python | |
Coding & Classification | Apply ML techniques to classification and aggregation web scraped price data | Brazil | Web scraping data | Logistic regression, Support vector machine, Naive bayes, Random forest, XGBoost | Python | ||
Edit & Imputation | Multiple imputation through machine learning in a survey of sport clubs | Poland | Survey data | Random forest, CART | R | ||
Modeling | State level expenditure estimates based on ML techniques | US | Survey data, Census planning data | Gradient-boosting machine, Lasso regression, K-nearest neighbors | |||
Route Optimisation | Route Optimisation through genetic algorithm | Chile | Genetic algorithm | R | |||
Coding & Classification | Using Big Data Tools and Machine Learning Techniques to Assign Classification of Individual Consumption by Purpose (COICOP) Categories | Turkey | Survey data, Imagery data, Scanner data | Logistic regression, Support vector machine, Naive bayes, BERT, Convolutional neural network | Python | ||
Imagery Analysis | Feasibility study of Satellite Imagery Analysis for Wealth Index Development in Indonesia | Indonesia | Satellite imagery | Convolutional neural network, Ridge regression, Support vector machine | |||
Coding & Classification | Three projects (Scrape an ICT variable, Gain insights from an open-ended question, Create a framework for government R&D survey) | Türkiye | Web scraping data | Top2Vec | Python | ||
Coding & Classification | Statistics on companies undertaking activities in the field of corporate social responsibility (CSR) using web scraping and machine learning | Poland | Web scraping data | ||||
Coding & Classification | Unsupervised ranking and categorisation of companies using web scraping and machine learning | Belgium (Statistics Flanders) | Web scraping data |