- This page contains pilot studies conducted under the HLG-MOS Machine Learning Project and programming codes (if available). If you want your study or code to be added, please contact UNECE
- You can search by Theme, ML method, Programme code availability and Programming Language using filter below.
Theme Title Country/Organisation Topic Reference Other Not available Other ML techniques Schnaubelt, Mahias (2019) : A comparison of machine learning model validation schemes for non-stationary time series data, FAU Discussion Papers in Economics, No. 11/2019, Friedrich-Alexander-Universitat Erlangen-N ¨ urnberg, Institute for Economics, N ¨ urnberg. hp ://hdl.handle.net/10419/209136 Coding & Classification Industry and Occupation Coding Canada ML application Justin J. Evans, Isaac Ross, Julie Portelance. StatisticsCanada_CCHS_ML_Production_Report. [Online] 2020. https://statswiki.unece.org/display/MLP/Working+documents?preview=/244092601/256970399/Statistics_Canada_FastText_Techniques_Report.pdf Coding & Classification Industry and Occupation Coding Canada ML code and data https://github.com/UNECE/CodingandClassification_Statcan Coding & Classification Industry and Occupation Coding Canada ML techniques YanPeng Gao, Isaac Ross, Justin J. Evans. Statistiscs_Canada_FastText_Techniques_Report. [Online] 2019. https://statswiki.unece.org/download/attachments/244092601/Statistics_Canada_FastText_Techniques_Report.pdf?version=2&modificationDate=1567626783886&api=v2 Coding & Classification Sentiment Analysis of twitter data Belgium Flanders ML code https://github.com/jmaslankowski/WP7-Population-Life-Satisfaction Coding & Classification Sentiment Analysis of twitter data Belgium Flanders ML code https://github.com/mireusen/hlmos-statistiek-vlaanderen-twitter Coding & Classification Sentiment Analysis of twitter data Belgium Flanders ML code https://github.com/wimulkeman/dutch-sentiment-analysis Coding & Classification Sentiment Analysis of twitter data Belgium Flanders ML model https://github.com/wietsedv/bertje/blob/master/README.md Coding & Classification Sentiment Analysis of twitter data Belgium Flanders ML model https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3 Coding & Classification Production description to ECOICOP Poland ML code https://colab.research.google.com/drive/1Epn2NeFRuFC_XyXtQ4qezGVBA5aAzqIh Coding & Classification Production description to ECOICOP Poland ML code and data https://github.com/statisticspoland/ecoicop_classification Coding & Classification Production description to ECOICOP Poland ML library https://scikit-learn.org/stable/index.html Coding & Classification Not available Other ML application https://www.cbs.nl/nl-nl/over-ons/innovatie/project/innovatieve-hotspots Coding & Classification WP1 - Theme 1 Coding and Classification Report Theme report ML library https://en.wikipedia.org/wiki/FastText Coding & Classification WP1 - Theme 1 Coding and Classification Report Theme report ML tutorial https://machinelearningmastery.com/types-of-classification-in-machine-learning/ Coding & Classification WP1 - Theme 1 Coding and Classification Report Theme report ML tutorial https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/ Coding & Classification WP1 - Theme 1 Coding and Classification Report Theme report Naive Bayes https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/ Coding & Classification WP1 - Theme 1 Coding and Classification Report Theme report Random Forest https://builtin.com/data-science/random-forest-algorithm Coding & Classification WP1 - Theme 1 Coding and Classification Report Theme report Random Forest https://towardsdatascience.com/understanding-random-forest-58381e0602d2 Coding & Classification WP1 - Theme 1 Coding and Classification Report Theme report Subject matter https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2010/soc2010volume2thestructureandcodingindex#electronic-version-of-the-index Coding & Classification WP1 - Theme 1 Coding and Classification Report Theme report XGBoost https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/ Coding & Classification Automatic coding of occupation and industry in social statistical surveys US BLS ML application https://www.bls.gov/iif/deep-neural-networks.pdf Coding & Classification Automatic coding of occupation and industry in social statistical surveys US BLS ML application https://www.bls.gov/iif/deep-neural-networks.pdf Coding & Classification Automatic coding of occupation and industry in social statistical surveys US BLS ML application https://www.bls.gov/osmr/research-papers/2014/pdf/st140040.pdf Coding & Classification Automatic coding of occupation and industry in social statistical surveys US BLS ML application https://www.bls.gov/osmr/research-papers/2014/pdf/st140040.pdf Coding & Classification Automatic coding of occupation and industry in social statistical surveys US BLS ML code https://github.com/USDepartmentofLabor/soii_neural_autocoder Coding & Classification Automatic coding of occupation and industry in social statistical surveys US BLS ML tutorial https://github.com/ameasure/autocoding-class/blob/master/machine_learning.ipynb Edit & Imputation Not available Other Terminology https://www.analyticsvidhya.com/glossary-of-common-statistics-and-machine-learning-terms/ Edit & Imputation Machine learning for imputation Germany Bayesian Networks Cheng J., Greiner R., Kelly J., Bell D. A., & Liu W. (2002). Learning Bayesian Networks from Data: An Information-Theory Based Approach. Artificial Intelligence, 137, 43–90. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Di Zio M., Sacco G., Scanu M., & Vicard P. (2004). Multivariate Techniques for Imputation Based on Bayesian Networks. Compstat 2004 Symposium. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Di Zio M., Scanu M., Coppola L., Luzi O., & Ponti A. (2004). Bayesian Networks for Imputation. Journal of the Royal Statistical Society Series A, 167(2), 309–322. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Jensen F. V. & Nielsen T. D. (2007). Bayesian Networks and Decision Graphs. Second edition. Springer. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Kalisch M., Bühlmann P. (2007). Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Lauritzen S. L. (1995). The EM Algorithm for Graphical Association Models With Missing Data. Computational Statistics and Data Analysis, 19, 191–201. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Moore A. & Wong W. (2003). Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), 552–559. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Rey del Castillo P. (2012). Use of Machine Learning Methods to Impute Categorical Data. Conference of European Statisticians WP. 37. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Riggelsen C. (2006). Learning parameters of Bayesian networks from incomplete data via importance sampling. International Journal of Approximate Reasoning, 42(1-2), 69–83. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Spirtes P., Glymour C., & Scheines R. (2000). Causation, prediction, and search. Second edition. MIT Press. Edit & Imputation Machine learning for imputation Germany Bayesian Networks Tsamardinos I., Brown L. E., & Aliferis C. F. (2006). The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 65, 31–78. Edit & Imputation Machine learning for imputation Germany K-nearest neighbour Beretta L. & Santaniello A. (2016). Nearest Neighbor Imputation Algorithms: A Critical Evalutation. Medical Informatics and Decision Making, 16, 197–208. Edit & Imputation Machine learning for imputation Germany K-nearest neighbour Cucala L., Marin J. M., Robert C. P., & Titterington D. M. (2009). A Bayesian Reassessment of Nearest-Neighbor Classification. Journal of the American Statistical Association, 104, 263–273. Edit & Imputation Machine learning for imputation Germany K-nearest neighbour Devroye L., Györfi L., & Lugosi G. (1996). A Probabilistic Theory of Pattern Recognition. Springer. Edit & Imputation Machine learning for imputation Germany K-nearest neighbour Liao S. G., Lin Y., Kang D. D., Chandra D., Bon J., Kaminski N., Sciurba F. C., & Tseng G. C. (2014). Missing Value Imputation in High-Dimensional Phenomic Data: Imputable or not, and how? Bioinformatics, 15, 346. Edit & Imputation Machine learning for imputation Germany K-nearest neighbour Troyanskaya O., Cantor M., Sherlock G., Brown P. O., Hastie T., Tibshirani R., Botstein D., & Altman R. B. (2001). Missing Value Estimation Methods for DNA Microarrays. Bioinformatics, 17, 520–525. Edit & Imputation Machine learning for imputation Germany ML application Beck M., Dumpert F., & Feuerhake J. (2018). Proof of Concept Machine Learning – Abschlussbericht. Online available on: https://www.destatis.de/GPStatistik/receive/DEMonografie_monografie_00004835 (in German) Edit & Imputation Machine learning for imputation Germany ML application Bertsimas D., Pawlowski C., & Zhuo Y. D. (2017). From predictive methods to missing data imputation: an optimization approach. The Journal of Machine Learning Research, 18(1), 7133–7171. Edit & Imputation Machine learning for imputation Germany ML application Park S., Pannekoek J., & van der Loo M. P. J. (2018). Imputation of Economic Data based on Random Forest. Technical Report. Online available on statswiki. Edit & Imputation Machine learning for imputation Germany ML application Richman M. B., Trafalis T. B., & Adrianto I. (2009). Missing data imputation through machine learning algorithms. In Artificial Intelligence Methods in the Environmental Sciences (pp. 153–169). Edit & Imputation Machine learning for imputation Germany ML application Yang B., Janssens D., Ruan D., Bellemans T. & Wets G. (2013). A data imputation method with support vector machines for activity-based transportation models. In Computational Intelligence for Traffic and Mobility (pp. 159–171). Edit & Imputation Machine learning for imputation Germany ML code Crookston N. L. & Finley A. O. (2007). yaImpute: An R Package for kNN Imputation. Journal of Statistical Software, 23(10), 1–16. Edit & Imputation Machine learning for imputation Germany ML code Mayer M. (2019). missRanger: Fast Imputation of Missing Values. Online: https://cran.r-project.org/web/packages/missRanger/index.html Edit & Imputation Machine learning for imputation Germany ML code Scutari M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35(3), 1–22. Edit & Imputation Machine learning for imputation Germany ML code Steinwart I. & Thomann P. (2017). liquidSVM: A Fast and Versatile SVM package. Online: https://arxiv.org/abs/1702.06899. Edit & Imputation Machine learning for imputation Germany ML code van Buuren S. & Groothuis-Oudshoorn K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. Edit & Imputation Machine learning for imputation Germany ML Code Wright M. N. & Ziegler A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17. Edit & Imputation Machine learning for imputation Germany ML techniques Hamner B., Frasco M., & LeDell E. (2018). Metrics: Evaluation Metrics for Machine Learning. Online: https://CRAN.R-project.org/package=Metrics. Edit & Imputation Machine learning for imputation Germany ML techniques Honghai F., Guoshun C., Cheng Y., Bingru Y., & Yumei C. (2005). A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587). Edit & Imputation Machine learning for imputation Germany ML techniques Mikhchi A., Honarvar M., Kashan N. E. J., & Aminafshar, M. (2016). Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation. Journal of theoretical biology, 399, 148–158. Edit & Imputation Machine learning for imputation Germany ML techniques Stekhoven D. J. & Buehlmann P. (2012). MissForest – non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118. Edit & Imputation Machine learning for imputation Germany ML techniques van Buuren S. (2018). Flexible Imputation of Missing Data. 2nd edition. CRC. Edit & Imputation Machine learning for imputation Germany ML tutorial Torgo L. (2010). Data Mining with R, learning with case studies Chapman and Hall/CRC. Online: http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR. Edit & Imputation Machine learning for imputation Germany Not published Dumpert F., Hansen M., Peters F., & Spies L. (2018). Bericht zur Maßnahme Machine Learning Methodik. Internal Paper, yet unpublished, in German. Edit & Imputation Machine learning for imputation Germany R library //cran.r-project.org/ Edit & Imputation Machine learning for imputation Germany Random Forest Athey S., Tibshirani J., & Wager S. (2019). Generalized Random Forests. The Annals of Statistics, 47(2), 1148–1178. Edit & Imputation Machine learning for imputation Germany Random Forest Biau G. & Scornet E. (2016). A random forest guided tour. Test, 25(2), 197–227. Edit & Imputation Machine learning for imputation Germany Random Forest Breiman L. (2001). Random forests. Machine learning, 45(1), 5–32. Edit & Imputation Machine learning for imputation Germany Random Forest Burgette L. F. & Reiter J. P. (2010). Multiple imputation for missing data via sequential regression trees. American journal of epidemiology, 172(9), 1070–1076. Edit & Imputation Machine learning for imputation Germany Random Forest Caiola G. & Reiter J. P. (2010). Random Forests for Generating Partially Synthetic, Categorical Data. Trans. Data Privacy, 3(1), 27-42. Edit & Imputation Machine learning for imputation Germany Random Forest Ding Y. & Simonoff J. S. (2010). An investigation of missing data methods for classification trees applied to binary response data. Journal of Machine Learning Research, 11, 131–170. Edit & Imputation Machine learning for imputation Germany Random Forest Doove L. L., Van Buuren S., & Dusseldorp E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92–104. Edit & Imputation Machine learning for imputation Germany Random Forest Feelders, A. (1999). Handling missing data in trees: surrogate splits or statistical imputation? In European Conference on Principles of Data Mining and Knowledge Discovery (pp. 329–334). Edit & Imputation Machine learning for imputation Germany Random Forest Mentch L. & Hooker G. (2016). Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. Journal of Machine Learning Research, 17(1), 841–881. Edit & Imputation Machine learning for imputation Germany Random Forest Reiter J. P. (2005). Using CART to generate partially synthetic public use microdata. Journal of Official Statistics, 21(3), 441–462. Edit & Imputation Machine learning for imputation Germany Random Forest Saar-Tsechansky M. & Provost F. (2007). Handling missing values when applying classification models. Journal of Machine Learning Research, 8, 1623–1657. Edit & Imputation Machine learning for imputation Germany Random Forest Wager S., Hastie T., & Efron B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. Journal of Machine Learning Research, 15(1), 1625–1651. Edit & Imputation Machine learning for imputation Germany Statistics Bankier M., Lachance M., & Poirier P. (2000). 2001 Canadian census minimum change donor imputation methodology. UNECE Work Session on Statistical Data Editing 2000, Working Paper No. 17. Online: http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2000/10/sde/17.e.pdf Edit & Imputation Machine learning for imputation Germany Statistics Breiman L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199–231. Edit & Imputation Machine learning for imputation Germany Statistics Chambers R. (2001). Evaluation Criteria for Statistical Editing and Imputation. Online available: https://www.cs.york.ac.uk/euredit/ Edit & Imputation Machine learning for imputation Germany Statistics Little R. J. & Rubin D. B. (1987; 2002). Statistical analysis with missing data. Wiley. Edit & Imputation Machine learning for imputation Germany Statistics Little R. J. (2011). Imputation. In: Lovric M., International Encyclopedia of Statistical Science. Springer. Edit & Imputation Machine learning for imputation Germany Statistics Rubin D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Boser B. E., Guyon I. M., & Vapnik V. N. (1992). A training algorithm for optimal margin classifiers. Fifth Annual ACM Workshop on Computational Learning Theory, 144–152. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Chechik G., Heitz G., Elidan G., Abbeel P., & Koller D. (2007). Max-margin classification of incomplete data. In Advances in Neural Information Processing Systems (pp. 233–240). Edit & Imputation Machine learning for imputation Germany Support Vector Machine Cortes C. & Vapnik V. N. (1995). Support-vector networks. Machine Learning, 20, 273–297. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Drechsler J. & Reiter J. P. (2011). An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Computational Statistics & Data Analysis, 55(12), 3232–3243. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Drechsler J. (2010). Using support vector machines for generating synthetic datasets. In International Conference on Privacy in Statistical Databases (pp. 148–161). Edit & Imputation Machine learning for imputation Germany Support Vector Machine Hable R. (2012). Asymptotic normality of support vector machine variants and other regularized kernel methods. Journal of Multivariate Analysis, 106, 92–117. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Honghai F., Guoshun C., Cheng Y., Bingru Y., & Yumei C. (2005). A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587). Edit & Imputation Machine learning for imputation Germany Support Vector Machine Pelckmans K., De Brabanter J., Suykens J. A., & De Moor B. (2005). Handling missing values in support vector machine classifiers. Neural Networks, 18(5-6), 684–692. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Rogers S. D. (2012). Support Vector Machines for Classification and Imputation. Master thesis. Brigham Young University. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Smola A. J., Vishwanathan S. V. N., & Hofmann T. (2005). Kernel Methods for Missing Variables. In AISTATS 2005 – Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (pp. 325–332). Edit & Imputation Machine learning for imputation Germany Support Vector Machine Steinwart I. & Christmann A. (2008). Support Vector Machines. Springer. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Stewart T. G., Zeng D., & Wu M. C. (2018). Constructing support vector machines with missing data. Wiley Interdisciplinary Reviews: Computational Statistics, 10, 1–16. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Wen Z., Shi J., Li Q., He B., & Chen J. (2018). ThunderSVM: A fast SVM library on GPUs and CPUs. Journal of Machine Learning Research, 19(21), 1–5. Edit & Imputation Machine learning for imputation Germany Support Vector Machine Yang B., Janssens D., Ruan D., Bellemans T., & Wets G. (2013). A data imputation method with support vector machines for activity-based transportation models. In Computational Intelligence for Traffic and Mobility (pp. 159-171). Edit & Imputation Machine learning for imputation Germany Support Vector Machine Zhang Y. & Liu Y. (2009). Data imputation using least squares support vector machines in urban arterial streets. IEEE Signal Processing Letters, 16(5), 414–417. Edit & Imputation Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints Italy ML application Martin Beck, Florian Dumpert, Joerg Feuerhake (2018). Machine Learning in Official Statistics (Shorter English version available on arXiv: https://arxiv.org/abs/1812.10422) Edit & Imputation Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints Italy Standards GSBPM (2019). Generic Statistical Business Process Model. Version 5.1, January 2019, UNECE. Available at: https://statswiki.unece.org/display/GSBPM/Generic+Statistical+Business+Process+Model. Edit & Imputation Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints Italy Standards GSDEM (2019). Generic Statistical Data Editing Models - GSDEMs, Version 2.0, April 2019, UNECE. Available at: https://statswiki.unece.org/display/sde/GSDEM Edit & Imputation Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints Italy Standards GSIM (2019). Generic Statistical Information Model, Version 1.2, May 2019, UNECE. Available at: http://www1.unece.org/stat/platform/display/gsim. Edit & Imputation Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints Italy Statistics EDIMBUS (2007). Recommended Practices for Editing and Imputation in Cross-sectional Business Surveys, EDIMBUS project report, https://ec.europa.eu/eurostat/documents/64157/4374310/30-Recommended+Practices-for-editing-and-imputation-in-cross-sectional-business-surveys-2008.pdf. Edit & Imputation Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints Italy Statistics MEMOBUST (2014). Handbook on Methodology of Modern Business Statistics, CROS-portal, Eurostat, https://ec.europa.eu/eurostat/cros/content/handbook-methodology-modern-business-statistics_en. Edit & Imputation Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints Italy Statistics Van der Loo M. (2015) A Formal Typology of Data Validation Functions, UNECE, Conference of European Statisticians, Budapest. Available at: http://www.markvanderloo.eu/files/statistics/WP_5_Netherlands_A_formal_typology_of_data_validation_functions.pdf Edit & Imputation Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints Italy Statistics Waal, T.de, Pannekoek, J. and Scholtus, S. (2011). Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken. Edit & Imputation Imputation of the variable “Attained Level of Education” in Base Register of Individuals Italy ML application [1] Di Zio M., Di Cecco D., Di Laurea D., Filippini R., Massoli P., Rocchetti G. “Mass imputation of the attained level of education in the Italian System of Registers”, Workshop on Statistical Data Editing, Neuchâtel, Switzerland, 18-20 September 2018 Edit & Imputation Imputation of the variable “Attained Level of Education” in Base Register of Individuals Italy ML application [2] Di Zio M., Filippini R., Rocchetti G. “An imputation procedure for the Italian attained level of education in the register of individuals based on administrative and survey data”, Workshop on Statistical Data Editing, Neuchâtel, Switzerland, 31 August - 2 September 2020 Edit & Imputation Imputation of the variable “Attained Level of Education” in Base Register of Individuals Italy ML application [3] Bernasconi, Eleonora, et al. "Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning." arXiv preprint arXiv:1907.09423 (2019). Edit & Imputation Imputation of the variable “Attained Level of Education” in Base Register of Individuals Italy ML application [4] De Fausti Fabrizio, Pugliese Francesco and Diego Zardetto. "Toward Automated Website Classification by Deep Learning." arXiv preprint arXiv:1910.09991 (2019). Edit & Imputation Imputation of the variable “Attained Level of Education” in Base Register of Individuals Italy ML code https://github.com/defausti/MLP_Imputation.git Edit & Imputation Imputation of the variable “Attained Level of Education” in Base Register of Individuals Italy ML techniques [6] Yoon, Jinsung, James Jordon, and Mihaela Van Der Schaar. "Gain: Missing data imputation using generative adversarial nets." arXiv preprint arXiv:1806.02920 (2018). Edit & Imputation Imputation of the variable “Attained Level of Education” in Base Register of Individuals Italy Statistics [5] Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of control, signals and systems 2.4 (1989): 303-314. Edit & Imputation Not available Other ML code Stekhoven, D. J. (2015). missForest: Nonparametric missing value imputation using random forest. Astrophysics Source Code Library Edit & Imputation Not available Other Statistics Gray, D. (2019). A Generalized Framework to Evaluate Imputation Strategies: Recent Developments. In JSM Proceedings, Government Statistics Section. Alexandria, VA: American Statistical Association. 1861-1870 Edit & Imputation Not available Other Statistics Gray, D. (2020). Evaluating Imputation Methods using ImpACT: First Case Study, United Nations Statistical Commission and Economic Commission for Europe – Workshop on Statistical Data Editing Edit & Imputation Not available Other Statistics Stelmack, A. (2018). On the Development of a Generalized Framework to Evaluate and Improve Imputation Strategies at Statistics Canada, United Nations Statistical Commission and Economic Commission for Europe – Workshop on Statistical Data Editing. Edit & Imputation WP1 - Theme 2 Edit and Imputation Report Theme report Data Science Cao L. (2017). Data science: a comprehensive overview. ACM Computing Surveys, 50(3), 1–42. Edit & Imputation WP1 - Theme 2 Edit and Imputation Report Theme report Statistics Chambers R. (2001). Evaluation Criteria for Statistical Editing and Imputation. Edit & Imputation Early estimates of energy balance statistics using machine learning Belgium VITO Big Data Daas, P.J.H., Puts, M.J., Buelens, B. and van den Hurk, P. (2015). Big data as a source for official statistics. Journal of Official Statistics, 31, 249–262. Edit & Imputation Early estimates of energy balance statistics using machine learning Belgium VITO Big Data Hassani, H., Saporta, G. and Silva, E.S. (2014). Data mining and official statistics: the past, the present and the future. Big Data, 1, 34–43. Edit & Imputation Early estimates of energy balance statistics using machine learning Belgium VITO ML code https://github.com/VITObelgium/energy-balance-ml Edit & Imputation Early estimates of energy balance statistics using machine learning Belgium VITO ML tutorial Hastie, T., Tibshirani, R., Friedman, J. & Franklin, J. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. New York: Springer. Edit & Imputation Early estimates of energy balance statistics using machine learning Belgium VITO Random Forest Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Edit & Imputation Early estimates of energy balance statistics using machine learning Belgium VITO Statistics Claeskens, G. & Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge: Cambridge University Press. Edit & Imputation Early estimates of energy balance statistics using machine learning Belgium VITO Statistics Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models, Vol. 1 New York: Cambridge University Press. Imagery Use of Landsat satellite data for the mapping of urban areas in non-census years Mexico Data https://ieeexplore.ieee.org/document/8518312 Imagery Use of Landsat satellite data for the mapping of urban areas in non-census years Mexico Data https://www.opendatacube.org/ Imagery Learning statistical information from images: a proof of concept Netherlands Data https://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische-data/kaart-van-100-meter-bij-100-meter-met-statistieken Imagery Learning statistical information from images: a proof of concept Netherlands Data Persian cat, Model T, Granny Smith; http://image-net.org/challenges/LSVRC/2015/browse-synsets Imagery Arealstatistik Deep Learning (ADELE) Switzerland ML application https://www.bfs.admin.ch/bfs/de/home/statistiken/raum-umwelt/erhebungen/area.assetdetail.5687737.html Imagery WP1 - Theme 3 Imagery Analysis Report Theme report Big Data Curzi, G., Modenini, D., & Tortora, P. (2020). Large Constellations of Small Satellites: A Survey of Near Future Challenges and Missions. Aerospace, 7, 133. doi:10.3390/aerospace7090133 Imagery WP1 - Theme 3 Imagery Analysis Report Theme report Big Data Safyan, M. (2020). Handbook of Small Satellites, Technology, Design, Manufacture, Applications, Economics and Regulation. 1057-1073. doi:10.1007/978-3-030-36308-664 Imagery WP1 - Theme 3 Imagery Analysis Report Theme report Data http://aws.amazon.com/es/public-data-sets/landsat/ Imagery WP1 - Theme 3 Imagery Analysis Report Theme report Data http://landsat.gsfc.nasa.gov/?p=10221 Imagery WP1 - Theme 3 Imagery Analysis Report Theme report Data https://eur-lex.europa.eu/eli/reg_del/2013/1159/oj Imagery WP1 - Theme 3 Imagery Analysis Report Theme report Data Toth, C., & Jóźków, G. (2016). Remote sensing platforms and sensors: A survey. ISPRS Journal of Photogrammetry and Remote Sensing, 22-36. Imagery WP1 - Theme 3 Imagery Analysis Report Theme report ML application Ferreira, B., Iten, M., & Silva, R. G. (2020). Monitoring sustainable development by means of earth observation data and machine learning: a review. Environmental Sciences Europe, 32, 120. doi:10.1186/s12302-020-00397-4 Imagery WP1 - Theme 3 Imagery Analysis Report Theme report ML application Holloway, J., & Mengersen, K. (2018). Statistical Machine Learning Methods and Remote Sensing for Sustainable Development Goals: A Review. Remote Sensing, 10, 1365. doi:10.3390/rs10091365 Imagery WP1 - Theme 3 Imagery Analysis Report Theme report ML application Youssef, R., Aniss, M., & Jamal, C. (2020). Machine Learning and Deep Learning in Remote Sensing and Urban Application: A Systematic Review and Meta-Analysis. Proceedings of the 4th Edition of International Conference on Geo-IT and Water Resources 2020, Geo-IT and Water Resources 2020. New York, NY, USA: Association for Computing Machinery. doi:10.1145/3399205.3399224 Imagery WP1 - Theme 3 Imagery Analysis Report Theme report ML techniques Bishop, C. M. (2006). Pattern Recognition and Machine Learning. USA: Springer. Imagery Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning UNECE Big Data [1] Conference of European Statisticians (2019) In-depth Review on Satellite Imagery and Earth Observation Technology in Official Statistics Imagery Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning UNECE Big Data [1] United Nations Global Working Group on Big Data (2017) Satellite Imagery and Geospatial Data Task Team Report Imagery Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning UNECE Big Data Committee on Earth Observation Satellites (2015) Satellite Earth Observations in Support of Climate Information Challenges Imagery Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning UNECE Data [1] Lewis, A. et al. (2017) Remote Sensing of Environment Imagery Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning UNECE Data [1] UCS Satellite Database (accessed Feb. 2020) Imagery Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning UNECE Data Roberts, D., Dunn, B. and Mueller, N. (2018) Open Data Cube Products Using High-Dimensional Statistics of Time Series Imagery Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning UNECE Standards United Nations Economic Commission for Europe (2019) Generic Statistical Business Process Model (version 5.1) Imagery Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning UNECE Statistics [1] United Nations Statistics Division (2019) Guidelines on the use of electronic data collection technologies in population and housing censuses Quality WP2 Quality Framework Australian Bureau of Statistics (2005). Data Quality Framework, Australian Bureau of Statistics, (https://www.abs.gov.au/websitedbs/D3310114.nsf//home/Quality:+The+ABS+Data+Quality+Framework) Quality WP2 Quality Framework Eurostat (2017). European Statistics Code of Practice , Eurostat, https://ec.europa.eu/eurostat/web/quality/european-statistics-code-of-practice. Quality WP2 Quality Framework Statistics Canada (2017). Quality Assurance Framework, Statistics Canada, https://www150.statcan.gc.ca/n1/pub/12-539-x/12-539-x2019001-eng.htm Quality WP2 Quality Framework United Nation (2019). National Quality Assurance Frameworks Manual for Official Statistics, United Nations, https://unstats.un.org/unsd/methodology/dataquality/) Quality WP2 Quality Framework United Nations (2012). Guidelines for the template for a generic national quality assurance, United Nations, https://unstats.un.org/unsd/statcom/doc12/BG-NQAF.pdf. Quality WP2 Quality ML application Luque, A., Carrasco, A., Martín, A. and de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231. Quality WP2 Quality ML application Pepe, M.S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press. Quality WP2 Quality ML application Vanwinckelen, G. and Blockeel, H. (2014). Look before you leap: Some insights into learner evaluation with cross-validation. JMLR Workshop and Conference Proceedings, 1, 3–19. Quality WP2 Quality ML techniques Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2014). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. arXiv Quality WP2 Quality ML techniques Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. 2nd edition. Springer. Quality WP2 Quality ML techniques Japkowicz, N. and Shah, M. (2011).Evaluating Learning Algorithms.Cambridge University Press. Quality WP2 Quality ML techniques Stothard, C. (2020). Evaluating Machine Learning Classifiers: A review. Australian Bureau of Statistics, available upon request. Quality WP2 Quality Practices Arrieta, B.A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R. and Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115 Quality WP2 Quality Practices Begley C, Ioannidis J. (2015). Reproducibility in science: Improving the standard for basic and preclinical research. Circ. Res. P 116-126. Quality WP2 Quality Practices Bhatt, U., Xiang, A., Sharma, S., Weller,A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J.M.F. and Eckersley, P. (2020). Explainable machine learning in deployment. arXiv Quality WP2 Quality Practices Goodman, S., Fanelli, D. and Ioannidis, J. (2016). What does research reproducibility mean? Science Translational Medicine, p 341-353 Quality WP2 Quality Practices Hanson, B., Sugden, A. and Alberts, B. (2011) Making data maximally available. Science, p 331-649. Quality WP2 Quality Practices Molnar (2019) Interpretable Machine Learning - A Guide for Making Black Box Models Explainable Quality WP2 Quality Practices Petkovic (2020) AI and trust: explainability, transparency. Ethical implications of AI and AI Tools Lab, Frankfurt Big Data Lab, Goethe University Quality WP2 Quality Practices Ribeiro, M.T., Singh, S. and Guestrin, C. (2016) “Why Should I Trust You?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144 Quality WP2 Quality Practices Stodden, V., Seiler, J. and Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proc Natl Acad Sci USA p 2584–2589. Quality WP2 Quality Practices Szabo, L. (2019) Artificial intelligence is rushing into patient care—and could raise risks. Scientific American, December 2019 Quality WP2 Quality Practices Vilone, G. and Longo, L. (2020) Explainable artificial intelligence: a systematic review. arXiv Quality WP2 Quality Statistics Bengio, Y. And Grandvalent, Y. (2004). No Unbiased Estimator of the Variance of K-Fold Cross-Validation. Journal of Machine Learning Research, 5, 1089–1105. Quality WP2 Quality Statistics Bickel, P. J. and Freedman, D. A. (1981). Some Asymptotic Theory for the Bootstrap. The Annals of Statistics, 9(6), 1196–1217. Quality WP2 Quality Statistics Biemer, P.P. (2010). Total Survey Error – Design, Implementation, and Evaluation. Public Option Quarterly, 74(5), 817–848. Quality WP2 Quality Statistics Borra, S. and Di Ciaccio, A. (2010). Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods. Computational Statistics and Data Analysis, 54, 2976–2989. Quality WP2 Quality Statistics DiCiccio, T. and Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, p 189-212 Quality WP2 Quality Statistics Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics. 7(1), 1–26. Quality WP2 Quality Statistics Eurostat (2014). Handbook on Methodology of Modern Business Statistics, CROS-portal, MEMOBUST, https://ec.europa.eu/eurostat/cros/content/handbook-methodology-modern-business-statistics_en. Quality WP2 Quality Statistics Groves, R.M. and Lyberg, L. (2010). Total Survey Error – Past, Present, and Future. Public Opinion Quarterly, 74(5), 849–879. Quality WP2 Quality Statistics Hand D.J. (2012) Assessing the performance of classification methods. International Statistical Review. 80(3), 400–414. Quality WP2 Quality Statistics Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53, 3735–3745. Quality WP2 Quality Statistics Platek, R. and Särndal, C.-E. (2001). Can a Statistician Deliver? Journal of Official Statistics, 17(1), 1–20. Quality WP2 Quality Statistics Quenouille, M.H. (1956). Notes on Bias in Estimation. Biometrika, 43, 353–60. Quality WP2 Quality Statistics Stone, M. (1974). Cross-validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Society B, 36, 111–147. Quality WP2 Quality Statistics Wolter, K. M. (2007). Introduction to Variance Estimation.2nd edition.Springer. Other Not available Other ML application Christen, P. (2007). “A two-step Classification to Unsupervised Record Linkage”, in Proceedings of the 6-th Australian Conference on Data Mining and Analytics, 70, 111-119. Other Not available Other ML library De Bruin, J. (2019). “Python Record Linkage Toolkit: A toolkit for record linkage and duplicate detection in Python”. Zenodo. https://doi.org./10.5281/zenodo.3559043 Other Not available Other Statistics Fellegi, I.P., and Sunter, A.B. (1969), ”A theory of record linkage”, Journal of the American Statistical Association, 64, 1183–1210