References

This page contains a list of references provided in the ML project reports or taken from other sources. The references are categorized by topic. To report any inaccuracy in the assigned category, please contact UNECE
You can search by Topic, Theme or Title (or the report)

Oops, it seems that you need to place a table or a macro generating a table within the Table Filter macro.

Theme	Title	Topic	Reference
Other	Not available	ML techniques	Schnaubelt, Ma‹hias (2019) : A comparison of machine learning model validation schemes for non-stationary time series data, FAU Discussion Papers in Economics, No. 11/2019, Friedrich-Alexander-Universitat Erlangen-N ¨ urnberg, Institute for Economics, N ¨ urnberg. h‹p ://hdl.handle.net/10419/209136
Coding & Classification	Industry and Occupation Coding	ML application	Justin J. Evans, Isaac Ross, Julie Portelance. StatisticsCanada_CCHS_ML_Production_Report. [Online] 2020. https://statswiki.unece.org/display/MLP/Working+documents?preview=/244092601/256970399/Statistics_Canada_FastText_Techniques_Report.pdf
Coding & Classification	Industry and Occupation Coding	ML code and data	https://github.com/UNECE/CodingandClassification_Statcan
Coding & Classification	Industry and Occupation Coding	ML techniques	YanPeng Gao, Isaac Ross, Justin J. Evans. Statistiscs_Canada_FastText_Techniques_Report. [Online] 2019. https://statswiki.unece.org/download/attachments/244092601/Statistics_Canada_FastText_Techniques_Report.pdf?version=2&modificationDate=1567626783886&api=v2
Coding & Classification	Sentiment Analysis of twitter data	ML code	https://github.com/jmaslankowski/WP7-Population-Life-Satisfaction
Coding & Classification	Sentiment Analysis of twitter data	ML code	https://github.com/mireusen/hlmos-statistiek-vlaanderen-twitter
Coding & Classification	Sentiment Analysis of twitter data	ML code	https://github.com/wimulkeman/dutch-sentiment-analysis
Coding & Classification	Sentiment Analysis of twitter data	ML model	https://github.com/wietsedv/bertje/blob/master/README.md
Coding & Classification	Sentiment Analysis of twitter data	ML model	https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3
Coding & Classification	Production description to ECOICOP	ML code	https://colab.research.google.com/drive/1Epn2NeFRuFC_XyXtQ4qezGVBA5aAzqIh
Coding & Classification	Production description to ECOICOP	ML code and data	https://github.com/statisticspoland/ecoicop_classification
Coding & Classification	Production description to ECOICOP	ML library	https://scikit-learn.org/stable/index.html
Coding & Classification	Not available	ML application	https://www.cbs.nl/nl-nl/over-ons/innovatie/project/innovatieve-hotspots
Coding & Classification	WP1 - Theme 1 Coding and Classification Report	ML library	https://en.wikipedia.org/wiki/FastText
Coding & Classification	WP1 - Theme 1 Coding and Classification Report	ML tutorial	https://machinelearningmastery.com/types-of-classification-in-machine-learning/
Coding & Classification	WP1 - Theme 1 Coding and Classification Report	ML tutorial	https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/
Coding & Classification	WP1 - Theme 1 Coding and Classification Report	Naive Bayes	https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
Coding & Classification	WP1 - Theme 1 Coding and Classification Report	Random Forest	https://builtin.com/data-science/random-forest-algorithm
Coding & Classification	WP1 - Theme 1 Coding and Classification Report	Random Forest	https://towardsdatascience.com/understanding-random-forest-58381e0602d2
Coding & Classification	WP1 - Theme 1 Coding and Classification Report	Subject matter	https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2010/soc2010volume2thestructureandcodingindex#electronic-version-of-the-index
Coding & Classification	WP1 - Theme 1 Coding and Classification Report	XGBoost	https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/
Coding & Classification	Automatic coding of occupation and industry in social statistical surveys	ML application	https://www.bls.gov/iif/deep-neural-networks.pdf
Coding & Classification	Automatic coding of occupation and industry in social statistical surveys	ML application	https://www.bls.gov/iif/deep-neural-networks.pdf
Coding & Classification	Automatic coding of occupation and industry in social statistical surveys	ML application	https://www.bls.gov/osmr/research-papers/2014/pdf/st140040.pdf
Coding & Classification	Automatic coding of occupation and industry in social statistical surveys	ML application	https://www.bls.gov/osmr/research-papers/2014/pdf/st140040.pdf
Coding & Classification	Automatic coding of occupation and industry in social statistical surveys	ML code	https://github.com/USDepartmentofLabor/soii_neural_autocoder
Coding & Classification	Automatic coding of occupation and industry in social statistical surveys	ML tutorial	https://github.com/ameasure/autocoding-class/blob/master/machine_learning.ipynb
Edit & Imputation	Not available	Terminology	https://www.analyticsvidhya.com/glossary-of-common-statistics-and-machine-learning-terms/
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Cheng J., Greiner R., Kelly J., Bell D. A., & Liu W. (2002). Learning Bayesian Networks from Data: An Information-Theory Based Approach. Artificial Intelligence, 137, 43–90.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Di Zio M., Sacco G., Scanu M., & Vicard P. (2004). Multivariate Techniques for Imputation Based on Bayesian Networks. Compstat 2004 Symposium.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Di Zio M., Scanu M., Coppola L., Luzi O., & Ponti A. (2004). Bayesian Networks for Imputation. Journal of the Royal Statistical Society Series A, 167(2), 309–322.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Jensen F. V. & Nielsen T. D. (2007). Bayesian Networks and Decision Graphs. Second edition. Springer.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Kalisch M., Bühlmann P. (2007). Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Lauritzen S. L. (1995). The EM Algorithm for Graphical Association Models With Missing Data. Computational Statistics and Data Analysis, 19, 191–201.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Moore A. & Wong W. (2003). Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), 552–559.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Rey del Castillo P. (2012). Use of Machine Learning Methods to Impute Categorical Data. Conference of European Statisticians WP. 37.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Riggelsen C. (2006). Learning parameters of Bayesian networks from incomplete data via importance sampling. International Journal of Approximate Reasoning, 42(1-2), 69–83.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Spirtes P., Glymour C., & Scheines R. (2000). Causation, prediction, and search. Second edition. MIT Press.
Edit & Imputation	Machine learning for imputation	Bayesian Networks	Tsamardinos I., Brown L. E., & Aliferis C. F. (2006). The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 65, 31–78.
Edit & Imputation	Machine learning for imputation	K-nearest neighbour	Beretta L. & Santaniello A. (2016). Nearest Neighbor Imputation Algorithms: A Critical Evalutation. Medical Informatics and Decision Making, 16, 197–208.
Edit & Imputation	Machine learning for imputation	K-nearest neighbour	Cucala L., Marin J. M., Robert C. P., & Titterington D. M. (2009). A Bayesian Reassessment of Nearest-Neighbor Classification. Journal of the American Statistical Association, 104, 263–273.
Edit & Imputation	Machine learning for imputation	K-nearest neighbour	Devroye L., Györfi L., & Lugosi G. (1996). A Probabilistic Theory of Pattern Recognition. Springer.
Edit & Imputation	Machine learning for imputation	K-nearest neighbour	Liao S. G., Lin Y., Kang D. D., Chandra D., Bon J., Kaminski N., Sciurba F. C., & Tseng G. C. (2014). Missing Value Imputation in High-Dimensional Phenomic Data: Imputable or not, and how? Bioinformatics, 15, 346.
Edit & Imputation	Machine learning for imputation	K-nearest neighbour	Troyanskaya O., Cantor M., Sherlock G., Brown P. O., Hastie T., Tibshirani R., Botstein D., & Altman R. B. (2001). Missing Value Estimation Methods for DNA Microarrays. Bioinformatics, 17, 520–525.
Edit & Imputation	Machine learning for imputation	ML application	Beck M., Dumpert F., & Feuerhake J. (2018). Proof of Concept Machine Learning – Abschlussbericht. Online available on: https://www.destatis.de/GPStatistik/receive/DEMonografie_monografie_00004835 (in German)
Edit & Imputation	Machine learning for imputation	ML application	Bertsimas D., Pawlowski C., & Zhuo Y. D. (2017). From predictive methods to missing data imputation: an optimization approach. The Journal of Machine Learning Research, 18(1), 7133–7171.
Edit & Imputation	Machine learning for imputation	ML application	Park S., Pannekoek J., & van der Loo M. P. J. (2018). Imputation of Economic Data based on Random Forest. Technical Report. Online available on statswiki.
Edit & Imputation	Machine learning for imputation	ML application	Richman M. B., Trafalis T. B., & Adrianto I. (2009). Missing data imputation through machine learning algorithms. In Artificial Intelligence Methods in the Environmental Sciences (pp. 153–169).
Edit & Imputation	Machine learning for imputation	ML application	Yang B., Janssens D., Ruan D., Bellemans T. & Wets G. (2013). A data imputation method with support vector machines for activity-based transportation models. In Computational Intelligence for Traffic and Mobility (pp. 159–171).
Edit & Imputation	Machine learning for imputation	ML code	Crookston N. L. & Finley A. O. (2007). yaImpute: An R Package for kNN Imputation. Journal of Statistical Software, 23(10), 1–16.
Edit & Imputation	Machine learning for imputation	ML code	Mayer M. (2019). missRanger: Fast Imputation of Missing Values. Online: https://cran.r-project.org/web/packages/missRanger/index.html
Edit & Imputation	Machine learning for imputation	ML code	Scutari M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35(3), 1–22.
Edit & Imputation	Machine learning for imputation	ML code	Steinwart I. & Thomann P. (2017). liquidSVM: A Fast and Versatile SVM package. Online: https://arxiv.org/abs/1702.06899.
Edit & Imputation	Machine learning for imputation	ML code	van Buuren S. & Groothuis-Oudshoorn K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67.
Edit & Imputation	Machine learning for imputation	ML Code	Wright M. N. & Ziegler A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17.
Edit & Imputation	Machine learning for imputation	ML techniques	Hamner B., Frasco M., & LeDell E. (2018). Metrics: Evaluation Metrics for Machine Learning. Online: https://CRAN.R-project.org/package=Metrics.
Edit & Imputation	Machine learning for imputation	ML techniques	Honghai F., Guoshun C., Cheng Y., Bingru Y., & Yumei C. (2005). A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587).
Edit & Imputation	Machine learning for imputation	ML techniques	Mikhchi A., Honarvar M., Kashan N. E. J., & Aminafshar, M. (2016). Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation. Journal of theoretical biology, 399, 148–158.
Edit & Imputation	Machine learning for imputation	ML techniques	Stekhoven D. J. & Buehlmann P. (2012). MissForest – non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118.
Edit & Imputation	Machine learning for imputation	ML techniques	van Buuren S. (2018). Flexible Imputation of Missing Data. 2nd edition. CRC.
Edit & Imputation	Machine learning for imputation	ML tutorial	Torgo L. (2010). Data Mining with R, learning with case studies Chapman and Hall/CRC. Online: http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR.
Edit & Imputation	Machine learning for imputation	Not published	Dumpert F., Hansen M., Peters F., & Spies L. (2018). Bericht zur Maßnahme Machine Learning Methodik. Internal Paper, yet unpublished, in German.
Edit & Imputation	Machine learning for imputation	R library	//cran.r-project.org/
Edit & Imputation	Machine learning for imputation	Random Forest	Athey S., Tibshirani J., & Wager S. (2019). Generalized Random Forests. The Annals of Statistics, 47(2), 1148–1178.
Edit & Imputation	Machine learning for imputation	Random Forest	Biau G. & Scornet E. (2016). A random forest guided tour. Test, 25(2), 197–227.
Edit & Imputation	Machine learning for imputation	Random Forest	Breiman L. (2001). Random forests. Machine learning, 45(1), 5–32.
Edit & Imputation	Machine learning for imputation	Random Forest	Burgette L. F. & Reiter J. P. (2010). Multiple imputation for missing data via sequential regression trees. American journal of epidemiology, 172(9), 1070–1076.
Edit & Imputation	Machine learning for imputation	Random Forest	Caiola G. & Reiter J. P. (2010). Random Forests for Generating Partially Synthetic, Categorical Data. Trans. Data Privacy, 3(1), 27-42.
Edit & Imputation	Machine learning for imputation	Random Forest	Ding Y. & Simonoff J. S. (2010). An investigation of missing data methods for classification trees applied to binary response data. Journal of Machine Learning Research, 11, 131–170.
Edit & Imputation	Machine learning for imputation	Random Forest	Doove L. L., Van Buuren S., & Dusseldorp E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92–104.
Edit & Imputation	Machine learning for imputation	Random Forest	Feelders, A. (1999). Handling missing data in trees: surrogate splits or statistical imputation? In European Conference on Principles of Data Mining and Knowledge Discovery (pp. 329–334).
Edit & Imputation	Machine learning for imputation	Random Forest	Mentch L. & Hooker G. (2016). Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. Journal of Machine Learning Research, 17(1), 841–881.
Edit & Imputation	Machine learning for imputation	Random Forest	Reiter J. P. (2005). Using CART to generate partially synthetic public use microdata. Journal of Official Statistics, 21(3), 441–462.
Edit & Imputation	Machine learning for imputation	Random Forest	Saar-Tsechansky M. & Provost F. (2007). Handling missing values when applying classification models. Journal of Machine Learning Research, 8, 1623–1657.
Edit & Imputation	Machine learning for imputation	Random Forest	Wager S., Hastie T., & Efron B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. Journal of Machine Learning Research, 15(1), 1625–1651.
Edit & Imputation	Machine learning for imputation	Statistics	Bankier M., Lachance M., & Poirier P. (2000). 2001 Canadian census minimum change donor imputation methodology. UNECE Work Session on Statistical Data Editing 2000, Working Paper No. 17. Online: http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2000/10/sde/17.e.pdf
Edit & Imputation	Machine learning for imputation	Statistics	Breiman L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199–231.
Edit & Imputation	Machine learning for imputation	Statistics	Chambers R. (2001). Evaluation Criteria for Statistical Editing and Imputation. Online available: https://www.cs.york.ac.uk/euredit/
Edit & Imputation	Machine learning for imputation	Statistics	Little R. J. & Rubin D. B. (1987; 2002). Statistical analysis with missing data. Wiley.
Edit & Imputation	Machine learning for imputation	Statistics	Little R. J. (2011). Imputation. In: Lovric M., International Encyclopedia of Statistical Science. Springer.
Edit & Imputation	Machine learning for imputation	Statistics	Rubin D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Boser B. E., Guyon I. M., & Vapnik V. N. (1992). A training algorithm for optimal margin classifiers. Fifth Annual ACM Workshop on Computational Learning Theory, 144–152.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Chechik G., Heitz G., Elidan G., Abbeel P., & Koller D. (2007). Max-margin classification of incomplete data. In Advances in Neural Information Processing Systems (pp. 233–240).
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Cortes C. & Vapnik V. N. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Drechsler J. & Reiter J. P. (2011). An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Computational Statistics & Data Analysis, 55(12), 3232–3243.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Drechsler J. (2010). Using support vector machines for generating synthetic datasets. In International Conference on Privacy in Statistical Databases (pp. 148–161).
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Hable R. (2012). Asymptotic normality of support vector machine variants and other regularized kernel methods. Journal of Multivariate Analysis, 106, 92–117.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Honghai F., Guoshun C., Cheng Y., Bingru Y., & Yumei C. (2005). A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587).
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Pelckmans K., De Brabanter J., Suykens J. A., & De Moor B. (2005). Handling missing values in support vector machine classifiers. Neural Networks, 18(5-6), 684–692.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Rogers S. D. (2012). Support Vector Machines for Classification and Imputation. Master thesis. Brigham Young University.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Smola A. J., Vishwanathan S. V. N., & Hofmann T. (2005). Kernel Methods for Missing Variables. In AISTATS 2005 – Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (pp. 325–332).
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Steinwart I. & Christmann A. (2008). Support Vector Machines. Springer.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Stewart T. G., Zeng D., & Wu M. C. (2018). Constructing support vector machines with missing data. Wiley Interdisciplinary Reviews: Computational Statistics, 10, 1–16.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Wen Z., Shi J., Li Q., He B., & Chen J. (2018). ThunderSVM: A fast SVM library on GPUs and CPUs. Journal of Machine Learning Research, 19(21), 1–5.
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Yang B., Janssens D., Ruan D., Bellemans T., & Wets G. (2013). A data imputation method with support vector machines for activity-based transportation models. In Computational Intelligence for Traffic and Mobility (pp. 159-171).
Edit & Imputation	Machine learning for imputation	Support Vector Machine	Zhang Y. & Liu Y. (2009). Data imputation using least squares support vector machines in urban arterial streets. IEEE Signal Processing Letters, 16(5), 414–417.
Edit & Imputation	Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints	ML application	Martin Beck, Florian Dumpert, Joerg Feuerhake (2018). Machine Learning in Official Statistics (Shorter English version available on arXiv: https://arxiv.org/abs/1812.10422)
Edit & Imputation	Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints	Standards	GSBPM (2019). Generic Statistical Business Process Model. Version 5.1, January 2019, UNECE. Available at: https://statswiki.unece.org/display/GSBPM/Generic+Statistical+Business+Process+Model.
Edit & Imputation	Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints	Standards	GSDEM (2019). Generic Statistical Data Editing Models - GSDEMs, Version 2.0, April 2019, UNECE. Available at: https://statswiki.unece.org/display/sde/GSDEM
Edit & Imputation	Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints	Standards	GSIM (2019). Generic Statistical Information Model, Version 1.2, May 2019, UNECE. Available at: http://www1.unece.org/stat/platform/display/gsim.
Edit & Imputation	Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints	Statistics	EDIMBUS (2007). Recommended Practices for Editing and Imputation in Cross-sectional Business Surveys, EDIMBUS project report, https://ec.europa.eu/eurostat/documents/64157/4374310/30-Recommended+Practices-for-editing-and-imputation-in-cross-sectional-business-surveys-2008.pdf.
Edit & Imputation	Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints	Statistics	MEMOBUST (2014). Handbook on Methodology of Modern Business Statistics, CROS-portal, Eurostat, https://ec.europa.eu/eurostat/cros/content/handbook-methodology-modern-business-statistics_en.
Edit & Imputation	Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints	Statistics	Van der Loo M. (2015) A Formal Typology of Data Validation Functions, UNECE, Conference of European Statisticians, Budapest. Available at: http://www.markvanderloo.eu/files/statistics/WP_5_Netherlands_A_formal_typology_of_data_validation_functions.pdf
Edit & Imputation	Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints	Statistics	Waal, T.de, Pannekoek, J. and Scholtus, S. (2011). Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken.
Edit & Imputation	Imputation of the variable “Attained Level of Education” in Base Register of Individuals	ML application	[1] Di Zio M., Di Cecco D., Di Laurea D., Filippini R., Massoli P., Rocchetti G. “Mass imputation of the attained level of education in the Italian System of Registers”, Workshop on Statistical Data Editing, Neuchâtel, Switzerland, 18-20 September 2018
Edit & Imputation	Imputation of the variable “Attained Level of Education” in Base Register of Individuals	ML application	[2] Di Zio M., Filippini R., Rocchetti G. “An imputation procedure for the Italian attained level of education in the register of individuals based on administrative and survey data”, Workshop on Statistical Data Editing, Neuchâtel, Switzerland, 31 August - 2 September 2020
Edit & Imputation	Imputation of the variable “Attained Level of Education” in Base Register of Individuals	ML application	[3] Bernasconi, Eleonora, et al. "Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning." arXiv preprint arXiv:1907.09423 (2019).
Edit & Imputation	Imputation of the variable “Attained Level of Education” in Base Register of Individuals	ML application	[4] De Fausti Fabrizio, Pugliese Francesco and Diego Zardetto. "Toward Automated Website Classification by Deep Learning." arXiv preprint arXiv:1910.09991 (2019).
Edit & Imputation	Imputation of the variable “Attained Level of Education” in Base Register of Individuals	ML code	https://github.com/defausti/MLP_Imputation.git
Edit & Imputation	Imputation of the variable “Attained Level of Education” in Base Register of Individuals	ML techniques	[6] Yoon, Jinsung, James Jordon, and Mihaela Van Der Schaar. "Gain: Missing data imputation using generative adversarial nets." arXiv preprint arXiv:1806.02920 (2018).
Edit & Imputation	Imputation of the variable “Attained Level of Education” in Base Register of Individuals	Statistics	[5] Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of control, signals and systems 2.4 (1989): 303-314.
Edit & Imputation	Not available	ML code	Stekhoven, D. J. (2015). missForest: Nonparametric missing value imputation using random forest. Astrophysics Source Code Library
Edit & Imputation	Not available	Statistics	Gray, D. (2019). A Generalized Framework to Evaluate Imputation Strategies: Recent Developments. In JSM Proceedings, Government Statistics Section. Alexandria, VA: American Statistical Association. 1861-1870
Edit & Imputation	Not available	Statistics	Gray, D. (2020). Evaluating Imputation Methods using ImpACT: First Case Study, United Nations Statistical Commission and Economic Commission for Europe – Workshop on Statistical Data Editing
Edit & Imputation	Not available	Statistics	Stelmack, A. (2018). On the Development of a Generalized Framework to Evaluate and Improve Imputation Strategies at Statistics Canada, United Nations Statistical Commission and Economic Commission for Europe – Workshop on Statistical Data Editing.
Edit & Imputation	WP1 - Theme 2 Edit and Imputation Report	Data Science	Cao L. (2017). Data science: a comprehensive overview. ACM Computing Surveys, 50(3), 1–42.
Edit & Imputation	WP1 - Theme 2 Edit and Imputation Report	Statistics	Chambers R. (2001). Evaluation Criteria for Statistical Editing and Imputation.
Edit & Imputation	Early estimates of energy balance statistics using machine learning	Big Data	Daas, P.J.H., Puts, M.J., Buelens, B. and van den Hurk, P. (2015). Big data as a source for official statistics. Journal of Official Statistics, 31, 249–262.
Edit & Imputation	Early estimates of energy balance statistics using machine learning	Big Data	Hassani, H., Saporta, G. and Silva, E.S. (2014). Data mining and official statistics: the past, the present and the future. Big Data, 1, 34–43.
Edit & Imputation	Early estimates of energy balance statistics using machine learning	ML code	https://github.com/VITObelgium/energy-balance-ml
Edit & Imputation	Early estimates of energy balance statistics using machine learning	ML tutorial	Hastie, T., Tibshirani, R., Friedman, J. & Franklin, J. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. New York: Springer.
Edit & Imputation	Early estimates of energy balance statistics using machine learning	Random Forest	Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Edit & Imputation	Early estimates of energy balance statistics using machine learning	Statistics	Claeskens, G. & Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge: Cambridge University Press.
Edit & Imputation	Early estimates of energy balance statistics using machine learning	Statistics	Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models, Vol. 1 New York: Cambridge University Press.
Imagery	Use of Landsat satellite data for the mapping of urban areas in non-census years	Data	https://ieeexplore.ieee.org/document/8518312
Imagery	Use of Landsat satellite data for the mapping of urban areas in non-census years	Data	https://www.opendatacube.org/
Imagery	Learning statistical information from images: a proof of concept	Data	https://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische-data/kaart-van-100-meter-bij-100-meter-met-statistieken
Imagery	Learning statistical information from images: a proof of concept	Data	Persian cat, Model T, Granny Smith; http://image-net.org/challenges/LSVRC/2015/browse-synsets
Imagery	Arealstatistik Deep Learning (ADELE)	ML application	https://www.bfs.admin.ch/bfs/de/home/statistiken/raum-umwelt/erhebungen/area.assetdetail.5687737.html
Imagery	WP1 - Theme 3 Imagery Analysis Report	Big Data	Curzi, G., Modenini, D., & Tortora, P. (2020). Large Constellations of Small Satellites: A Survey of Near Future Challenges and Missions. Aerospace, 7, 133. doi:10.3390/aerospace7090133
Imagery	WP1 - Theme 3 Imagery Analysis Report	Big Data	Safyan, M. (2020). Handbook of Small Satellites, Technology, Design, Manufacture, Applications, Economics and Regulation. 1057-1073. doi:10.1007/978-3-030-36308-664
Imagery	WP1 - Theme 3 Imagery Analysis Report	Data	http://aws.amazon.com/es/public-data-sets/landsat/
Imagery	WP1 - Theme 3 Imagery Analysis Report	Data	http://landsat.gsfc.nasa.gov/?p=10221
Imagery	WP1 - Theme 3 Imagery Analysis Report	Data	https://eur-lex.europa.eu/eli/reg_del/2013/1159/oj
Imagery	WP1 - Theme 3 Imagery Analysis Report	Data	Toth, C., & Jóźków, G. (2016). Remote sensing platforms and sensors: A survey. ISPRS Journal of Photogrammetry and Remote Sensing, 22-36.
Imagery	WP1 - Theme 3 Imagery Analysis Report	ML application	Ferreira, B., Iten, M., & Silva, R. G. (2020). Monitoring sustainable development by means of earth observation data and machine learning: a review. Environmental Sciences Europe, 32, 120. doi:10.1186/s12302-020-00397-4
Imagery	WP1 - Theme 3 Imagery Analysis Report	ML application	Holloway, J., & Mengersen, K. (2018). Statistical Machine Learning Methods and Remote Sensing for Sustainable Development Goals: A Review. Remote Sensing, 10, 1365. doi:10.3390/rs10091365
Imagery	WP1 - Theme 3 Imagery Analysis Report	ML application	Youssef, R., Aniss, M., & Jamal, C. (2020). Machine Learning and Deep Learning in Remote Sensing and Urban Application: A Systematic Review and Meta-Analysis. Proceedings of the 4th Edition of International Conference on Geo-IT and Water Resources 2020, Geo-IT and Water Resources 2020. New York, NY, USA: Association for Computing Machinery. doi:10.1145/3399205.3399224
Imagery	WP1 - Theme 3 Imagery Analysis Report	ML techniques	Bishop, C. M. (2006). Pattern Recognition and Machine Learning. USA: Springer.
Imagery	Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning	Big Data	[1] Conference of European Statisticians (2019) In-depth Review on Satellite Imagery and Earth Observation Technology in Official Statistics
Imagery	Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning	Big Data	[1] United Nations Global Working Group on Big Data (2017) Satellite Imagery and Geospatial Data Task Team Report
Imagery	Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning	Big Data	Committee on Earth Observation Satellites (2015) Satellite Earth Observations in Support of Climate Information Challenges
Imagery	Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning	Data	[1] Lewis, A. et al. (2017) Remote Sensing of Environment
Imagery	Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning	Data	[1] UCS Satellite Database (accessed Feb. 2020)
Imagery	Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning	Data	Roberts, D., Dunn, B. and Mueller, N. (2018) Open Data Cube Products Using High-Dimensional Statistics of Time Series
Imagery	Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning	Standards	United Nations Economic Commission for Europe (2019) Generic Statistical Business Process Model (version 5.1)
Imagery	Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning	Statistics	[1] United Nations Statistics Division (2019) Guidelines on the use of electronic data collection technologies in population and housing censuses
Quality		Framework	Australian Bureau of Statistics (2005). Data Quality Framework, Australian Bureau of Statistics, (https://www.abs.gov.au/websitedbs/D3310114.nsf//home/Quality:+The+ABS+Data+Quality+Framework)
Quality		Framework	Eurostat (2017). European Statistics Code of Practice , Eurostat, https://ec.europa.eu/eurostat/web/quality/european-statistics-code-of-practice.
Quality		Framework	Statistics Canada (2017). Quality Assurance Framework, Statistics Canada, https://www150.statcan.gc.ca/n1/pub/12-539-x/12-539-x2019001-eng.htm
Quality		Framework	United Nation (2019). National Quality Assurance Frameworks Manual for Official Statistics, United Nations, https://unstats.un.org/unsd/methodology/dataquality/)
Quality		Framework	United Nations (2012). Guidelines for the template for a generic national quality assurance, United Nations, https://unstats.un.org/unsd/statcom/doc12/BG-NQAF.pdf.
Quality		ML application	Luque, A., Carrasco, A., Martín, A. and de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231.
Quality		ML application	Pepe, M.S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press.
Quality		ML application	Vanwinckelen, G. and Blockeel, H. (2014). Look before you leap: Some insights into learner evaluation with cross-validation. JMLR Workshop and Conference Proceedings, 1, 3–19.
Quality		ML techniques	Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2014). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. arXiv
Quality		ML techniques	Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. 2nd edition. Springer.
Quality		ML techniques	Japkowicz, N. and Shah, M. (2011).Evaluating Learning Algorithms.Cambridge University Press.
Quality		ML techniques	Stothard, C. (2020). Evaluating Machine Learning Classifiers: A review. Australian Bureau of Statistics, available upon request.
Quality		Practices	Arrieta, B.A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R. and Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115
Quality		Practices	Begley C, Ioannidis J. (2015). Reproducibility in science: Improving the standard for basic and preclinical research. Circ. Res. P 116-126.
Quality		Practices	Bhatt, U., Xiang, A., Sharma, S., Weller,A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J.M.F. and Eckersley, P. (2020). Explainable machine learning in deployment. arXiv
Quality		Practices	Goodman, S., Fanelli, D. and Ioannidis, J. (2016). What does research reproducibility mean? Science Translational Medicine, p 341-353
Quality		Practices	Hanson, B., Sugden, A. and Alberts, B. (2011) Making data maximally available. Science, p 331-649.
Quality		Practices	Molnar (2019) Interpretable Machine Learning - A Guide for Making Black Box Models Explainable
Quality		Practices	Petkovic (2020) AI and trust: explainability, transparency. Ethical implications of AI and AI Tools Lab, Frankfurt Big Data Lab, Goethe University
Quality		Practices	Ribeiro, M.T., Singh, S. and Guestrin, C. (2016) “Why Should I Trust You?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144
Quality		Practices	Stodden, V., Seiler, J. and Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proc Natl Acad Sci USA p 2584–2589.
Quality		Practices	Szabo, L. (2019) Artificial intelligence is rushing into patient care—and could raise risks. Scientific American, December 2019
Quality		Practices	Vilone, G. and Longo, L. (2020) Explainable artificial intelligence: a systematic review. arXiv
Quality		Statistics	Bengio, Y. And Grandvalent, Y. (2004). No Unbiased Estimator of the Variance of K-Fold Cross-Validation. Journal of Machine Learning Research, 5, 1089–1105.
Quality		Statistics	Bickel, P. J. and Freedman, D. A. (1981). Some Asymptotic Theory for the Bootstrap. The Annals of Statistics, 9(6), 1196–1217.
Quality		Statistics	Biemer, P.P. (2010). Total Survey Error – Design, Implementation, and Evaluation. Public Option Quarterly, 74(5), 817–848.
Quality		Statistics	Borra, S. and Di Ciaccio, A. (2010). Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods. Computational Statistics and Data Analysis, 54, 2976–2989.
Quality		Statistics	DiCiccio, T. and Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, p 189-212
Quality		Statistics	Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics. 7(1), 1–26.
Quality		Statistics	Eurostat (2014). Handbook on Methodology of Modern Business Statistics, CROS-portal, MEMOBUST, https://ec.europa.eu/eurostat/cros/content/handbook-methodology-modern-business-statistics_en.
Quality		Statistics	Groves, R.M. and Lyberg, L. (2010). Total Survey Error – Past, Present, and Future. Public Opinion Quarterly, 74(5), 849–879.
Quality		Statistics	Hand D.J. (2012) Assessing the performance of classification methods. International Statistical Review. 80(3), 400–414.
Quality		Statistics	Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53, 3735–3745.
Quality		Statistics	Platek, R. and Särndal, C.-E. (2001). Can a Statistician Deliver? Journal of Official Statistics, 17(1), 1–20.
Quality		Statistics	Quenouille, M.H. (1956). Notes on Bias in Estimation. Biometrika, 43, 353–60.
Quality		Statistics	Stone, M. (1974). Cross-validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Society B, 36, 111–147.
Quality		Statistics	Wolter, K. M. (2007). Introduction to Variance Estimation.2nd edition.Springer.
Other	Not available	ML application	Christen, P. (2007). “A two-step Classification to Unsupervised Record Linkage”, in Proceedings of the 6-th Australian Conference on Data Mining and Analytics, 70, 111-119.
Other	Not available	ML library	De Bruin, J. (2019). “Python Record Linkage Toolkit: A toolkit for record linkage and duplicate detection in Python”. Zenodo. https://doi.org./10.5281/zenodo.3559043
Other	Not available	Statistics	Fellegi, I.P., and Sunter, A.B. (1969), ”A theory of record linkage”, Journal of the American Statistical Association, 64, 1183–1210

Page tree

References