Login required to access the wiki. Please register to create your login credentials We apologize for any inconvenience this may cause, but please note that this step is necessary to protect your privacy and ensure a safer browsing experience. Thank you for your cooperation. Documents available for download: GAMSO , GSBPM , GSIM |
- This page contains a list of references provided in the ML project reports or taken from other sources. The references are categorized by topic. To report any inaccuracy in the assigned category, please contact UNECE
- You can search by Topic, Theme or Title (or the report)
Theme | Title | Topic | Reference |
---|---|---|---|
Other | Not available | ML techniques | Schnaubelt, Ma‹hias (2019) : A comparison of machine learning model validation schemes for non-stationary time series data, FAU Discussion Papers in Economics, No. 11/2019, Friedrich-Alexander-Universitat Erlangen-N ¨ urnberg, Institute for Economics, N ¨ urnberg. h‹p ://hdl.handle.net/10419/209136 |
Coding & Classification | Industry and Occupation Coding | ML application | Justin J. Evans, Isaac Ross, Julie Portelance. StatisticsCanada_CCHS_ML_Production_Report. [Online] 2020. https://statswiki.unece.org/display/MLP/Working+documents?preview=/244092601/256970399/Statistics_Canada_FastText_Techniques_Report.pdf |
Coding & Classification | Industry and Occupation Coding | ML code and data | https://github.com/UNECE/CodingandClassification_Statcan |
Coding & Classification | Industry and Occupation Coding | ML techniques | YanPeng Gao, Isaac Ross, Justin J. Evans. Statistiscs_Canada_FastText_Techniques_Report. [Online] 2019. https://statswiki.unece.org/download/attachments/244092601/Statistics_Canada_FastText_Techniques_Report.pdf?version=2&modificationDate=1567626783886&api=v2 |
Coding & Classification | Sentiment Analysis of twitter data | ML code | https://github.com/jmaslankowski/WP7-Population-Life-Satisfaction |
Coding & Classification | Sentiment Analysis of twitter data | ML code | https://github.com/mireusen/hlmos-statistiek-vlaanderen-twitter |
Coding & Classification | Sentiment Analysis of twitter data | ML code | https://github.com/wimulkeman/dutch-sentiment-analysis |
Coding & Classification | Sentiment Analysis of twitter data | ML model | https://github.com/wietsedv/bertje/blob/master/README.md |
Coding & Classification | Sentiment Analysis of twitter data | ML model | https://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3 |
Coding & Classification | Production description to ECOICOP | ML code | https://colab.research.google.com/drive/1Epn2NeFRuFC_XyXtQ4qezGVBA5aAzqIh |
Coding & Classification | Production description to ECOICOP | ML code and data | https://github.com/statisticspoland/ecoicop_classification |
Coding & Classification | Production description to ECOICOP | ML library | https://scikit-learn.org/stable/index.html |
Coding & Classification | Not available | ML application | https://www.cbs.nl/nl-nl/over-ons/innovatie/project/innovatieve-hotspots |
Coding & Classification | WP1 - Theme 1 Coding and Classification Report | ML library | https://en.wikipedia.org/wiki/FastText |
Coding & Classification | WP1 - Theme 1 Coding and Classification Report | ML tutorial | https://machinelearningmastery.com/types-of-classification-in-machine-learning/ |
Coding & Classification | WP1 - Theme 1 Coding and Classification Report | ML tutorial | https://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/ |
Coding & Classification | WP1 - Theme 1 Coding and Classification Report | Naive Bayes | https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/ |
Coding & Classification | WP1 - Theme 1 Coding and Classification Report | Random Forest | https://builtin.com/data-science/random-forest-algorithm |
Coding & Classification | WP1 - Theme 1 Coding and Classification Report | Random Forest | https://towardsdatascience.com/understanding-random-forest-58381e0602d2 |
Coding & Classification | WP1 - Theme 1 Coding and Classification Report | Subject matter | https://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2010/soc2010volume2thestructureandcodingindex#electronic-version-of-the-index |
Coding & Classification | WP1 - Theme 1 Coding and Classification Report | XGBoost | https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/ |
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | ML application | https://www.bls.gov/iif/deep-neural-networks.pdf |
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | ML application | https://www.bls.gov/iif/deep-neural-networks.pdf |
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | ML application | https://www.bls.gov/osmr/research-papers/2014/pdf/st140040.pdf |
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | ML application | https://www.bls.gov/osmr/research-papers/2014/pdf/st140040.pdf |
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | ML code | https://github.com/USDepartmentofLabor/soii_neural_autocoder |
Coding & Classification | Automatic coding of occupation and industry in social statistical surveys | ML tutorial | https://github.com/ameasure/autocoding-class/blob/master/machine_learning.ipynb |
Edit & Imputation | Not available | Terminology | https://www.analyticsvidhya.com/glossary-of-common-statistics-and-machine-learning-terms/ |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Cheng J., Greiner R., Kelly J., Bell D. A., & Liu W. (2002). Learning Bayesian Networks from Data: An Information-Theory Based Approach. Artificial Intelligence, 137, 43–90. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Di Zio M., Sacco G., Scanu M., & Vicard P. (2004). Multivariate Techniques for Imputation Based on Bayesian Networks. Compstat 2004 Symposium. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Di Zio M., Scanu M., Coppola L., Luzi O., & Ponti A. (2004). Bayesian Networks for Imputation. Journal of the Royal Statistical Society Series A, 167(2), 309–322. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Jensen F. V. & Nielsen T. D. (2007). Bayesian Networks and Decision Graphs. Second edition. Springer. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Kalisch M., Bühlmann P. (2007). Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Lauritzen S. L. (1995). The EM Algorithm for Graphical Association Models With Missing Data. Computational Statistics and Data Analysis, 19, 191–201. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Moore A. & Wong W. (2003). Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), 552–559. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Rey del Castillo P. (2012). Use of Machine Learning Methods to Impute Categorical Data. Conference of European Statisticians WP. 37. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Riggelsen C. (2006). Learning parameters of Bayesian networks from incomplete data via importance sampling. International Journal of Approximate Reasoning, 42(1-2), 69–83. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Spirtes P., Glymour C., & Scheines R. (2000). Causation, prediction, and search. Second edition. MIT Press. |
Edit & Imputation | Machine learning for imputation | Bayesian Networks | Tsamardinos I., Brown L. E., & Aliferis C. F. (2006). The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 65, 31–78. |
Edit & Imputation | Machine learning for imputation | K-nearest neighbour | Beretta L. & Santaniello A. (2016). Nearest Neighbor Imputation Algorithms: A Critical Evalutation. Medical Informatics and Decision Making, 16, 197–208. |
Edit & Imputation | Machine learning for imputation | K-nearest neighbour | Cucala L., Marin J. M., Robert C. P., & Titterington D. M. (2009). A Bayesian Reassessment of Nearest-Neighbor Classification. Journal of the American Statistical Association, 104, 263–273. |
Edit & Imputation | Machine learning for imputation | K-nearest neighbour | Devroye L., Györfi L., & Lugosi G. (1996). A Probabilistic Theory of Pattern Recognition. Springer. |
Edit & Imputation | Machine learning for imputation | K-nearest neighbour | Liao S. G., Lin Y., Kang D. D., Chandra D., Bon J., Kaminski N., Sciurba F. C., & Tseng G. C. (2014). Missing Value Imputation in High-Dimensional Phenomic Data: Imputable or not, and how? Bioinformatics, 15, 346. |
Edit & Imputation | Machine learning for imputation | K-nearest neighbour | Troyanskaya O., Cantor M., Sherlock G., Brown P. O., Hastie T., Tibshirani R., Botstein D., & Altman R. B. (2001). Missing Value Estimation Methods for DNA Microarrays. Bioinformatics, 17, 520–525. |
Edit & Imputation | Machine learning for imputation | ML application | Beck M., Dumpert F., & Feuerhake J. (2018). Proof of Concept Machine Learning – Abschlussbericht. Online available on: https://www.destatis.de/GPStatistik/receive/DEMonografie_monografie_00004835 (in German) |
Edit & Imputation | Machine learning for imputation | ML application | Bertsimas D., Pawlowski C., & Zhuo Y. D. (2017). From predictive methods to missing data imputation: an optimization approach. The Journal of Machine Learning Research, 18(1), 7133–7171. |
Edit & Imputation | Machine learning for imputation | ML application | Park S., Pannekoek J., & van der Loo M. P. J. (2018). Imputation of Economic Data based on Random Forest. Technical Report. Online available on statswiki. |
Edit & Imputation | Machine learning for imputation | ML application | Richman M. B., Trafalis T. B., & Adrianto I. (2009). Missing data imputation through machine learning algorithms. In Artificial Intelligence Methods in the Environmental Sciences (pp. 153–169). |
Edit & Imputation | Machine learning for imputation | ML application | Yang B., Janssens D., Ruan D., Bellemans T. & Wets G. (2013). A data imputation method with support vector machines for activity-based transportation models. In Computational Intelligence for Traffic and Mobility (pp. 159–171). |
Edit & Imputation | Machine learning for imputation | ML code | Crookston N. L. & Finley A. O. (2007). yaImpute: An R Package for kNN Imputation. Journal of Statistical Software, 23(10), 1–16. |
Edit & Imputation | Machine learning for imputation | ML code | Mayer M. (2019). missRanger: Fast Imputation of Missing Values. Online: https://cran.r-project.org/web/packages/missRanger/index.html |
Edit & Imputation | Machine learning for imputation | ML code | Scutari M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35(3), 1–22. |
Edit & Imputation | Machine learning for imputation | ML code | Steinwart I. & Thomann P. (2017). liquidSVM: A Fast and Versatile SVM package. Online: https://arxiv.org/abs/1702.06899. |
Edit & Imputation | Machine learning for imputation | ML code | van Buuren S. & Groothuis-Oudshoorn K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67. |
Edit & Imputation | Machine learning for imputation | ML Code | Wright M. N. & Ziegler A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17. |
Edit & Imputation | Machine learning for imputation | ML techniques | Hamner B., Frasco M., & LeDell E. (2018). Metrics: Evaluation Metrics for Machine Learning. Online: https://CRAN.R-project.org/package=Metrics. |
Edit & Imputation | Machine learning for imputation | ML techniques | Honghai F., Guoshun C., Cheng Y., Bingru Y., & Yumei C. (2005). A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587). |
Edit & Imputation | Machine learning for imputation | ML techniques | Mikhchi A., Honarvar M., Kashan N. E. J., & Aminafshar, M. (2016). Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation. Journal of theoretical biology, 399, 148–158. |
Edit & Imputation | Machine learning for imputation | ML techniques | Stekhoven D. J. & Buehlmann P. (2012). MissForest – non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118. |
Edit & Imputation | Machine learning for imputation | ML techniques | van Buuren S. (2018). Flexible Imputation of Missing Data. 2nd edition. CRC. |
Edit & Imputation | Machine learning for imputation | ML tutorial | Torgo L. (2010). Data Mining with R, learning with case studies Chapman and Hall/CRC. Online: http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR. |
Edit & Imputation | Machine learning for imputation | Not published | Dumpert F., Hansen M., Peters F., & Spies L. (2018). Bericht zur Maßnahme Machine Learning Methodik. Internal Paper, yet unpublished, in German. |
Edit & Imputation | Machine learning for imputation | R library | //cran.r-project.org/ |
Edit & Imputation | Machine learning for imputation | Random Forest | Athey S., Tibshirani J., & Wager S. (2019). Generalized Random Forests. The Annals of Statistics, 47(2), 1148–1178. |
Edit & Imputation | Machine learning for imputation | Random Forest | Biau G. & Scornet E. (2016). A random forest guided tour. Test, 25(2), 197–227. |
Edit & Imputation | Machine learning for imputation | Random Forest | Breiman L. (2001). Random forests. Machine learning, 45(1), 5–32. |
Edit & Imputation | Machine learning for imputation | Random Forest | Burgette L. F. & Reiter J. P. (2010). Multiple imputation for missing data via sequential regression trees. American journal of epidemiology, 172(9), 1070–1076. |
Edit & Imputation | Machine learning for imputation | Random Forest | Caiola G. & Reiter J. P. (2010). Random Forests for Generating Partially Synthetic, Categorical Data. Trans. Data Privacy, 3(1), 27-42. |
Edit & Imputation | Machine learning for imputation | Random Forest | Ding Y. & Simonoff J. S. (2010). An investigation of missing data methods for classification trees applied to binary response data. Journal of Machine Learning Research, 11, 131–170. |
Edit & Imputation | Machine learning for imputation | Random Forest | Doove L. L., Van Buuren S., & Dusseldorp E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92–104. |
Edit & Imputation | Machine learning for imputation | Random Forest | Feelders, A. (1999). Handling missing data in trees: surrogate splits or statistical imputation? In European Conference on Principles of Data Mining and Knowledge Discovery (pp. 329–334). |
Edit & Imputation | Machine learning for imputation | Random Forest | Mentch L. & Hooker G. (2016). Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. Journal of Machine Learning Research, 17(1), 841–881. |
Edit & Imputation | Machine learning for imputation | Random Forest | Reiter J. P. (2005). Using CART to generate partially synthetic public use microdata. Journal of Official Statistics, 21(3), 441–462. |
Edit & Imputation | Machine learning for imputation | Random Forest | Saar-Tsechansky M. & Provost F. (2007). Handling missing values when applying classification models. Journal of Machine Learning Research, 8, 1623–1657. |
Edit & Imputation | Machine learning for imputation | Random Forest | Wager S., Hastie T., & Efron B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. Journal of Machine Learning Research, 15(1), 1625–1651. |
Edit & Imputation | Machine learning for imputation | Statistics | Bankier M., Lachance M., & Poirier P. (2000). 2001 Canadian census minimum change donor imputation methodology. UNECE Work Session on Statistical Data Editing 2000, Working Paper No. 17. Online: http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2000/10/sde/17.e.pdf |
Edit & Imputation | Machine learning for imputation | Statistics | Breiman L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199–231. |
Edit & Imputation | Machine learning for imputation | Statistics | Chambers R. (2001). Evaluation Criteria for Statistical Editing and Imputation. Online available: https://www.cs.york.ac.uk/euredit/ |
Edit & Imputation | Machine learning for imputation | Statistics | Little R. J. & Rubin D. B. (1987; 2002). Statistical analysis with missing data. Wiley. |
Edit & Imputation | Machine learning for imputation | Statistics | Little R. J. (2011). Imputation. In: Lovric M., International Encyclopedia of Statistical Science. Springer. |
Edit & Imputation | Machine learning for imputation | Statistics | Rubin D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Boser B. E., Guyon I. M., & Vapnik V. N. (1992). A training algorithm for optimal margin classifiers. Fifth Annual ACM Workshop on Computational Learning Theory, 144–152. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Chechik G., Heitz G., Elidan G., Abbeel P., & Koller D. (2007). Max-margin classification of incomplete data. In Advances in Neural Information Processing Systems (pp. 233–240). |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Cortes C. & Vapnik V. N. (1995). Support-vector networks. Machine Learning, 20, 273–297. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Drechsler J. & Reiter J. P. (2011). An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Computational Statistics & Data Analysis, 55(12), 3232–3243. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Drechsler J. (2010). Using support vector machines for generating synthetic datasets. In International Conference on Privacy in Statistical Databases (pp. 148–161). |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Hable R. (2012). Asymptotic normality of support vector machine variants and other regularized kernel methods. Journal of Multivariate Analysis, 106, 92–117. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Honghai F., Guoshun C., Cheng Y., Bingru Y., & Yumei C. (2005). A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587). |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Pelckmans K., De Brabanter J., Suykens J. A., & De Moor B. (2005). Handling missing values in support vector machine classifiers. Neural Networks, 18(5-6), 684–692. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Rogers S. D. (2012). Support Vector Machines for Classification and Imputation. Master thesis. Brigham Young University. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Smola A. J., Vishwanathan S. V. N., & Hofmann T. (2005). Kernel Methods for Missing Variables. In AISTATS 2005 – Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (pp. 325–332). |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Steinwart I. & Christmann A. (2008). Support Vector Machines. Springer. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Stewart T. G., Zeng D., & Wu M. C. (2018). Constructing support vector machines with missing data. Wiley Interdisciplinary Reviews: Computational Statistics, 10, 1–16. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Wen Z., Shi J., Li Q., He B., & Chen J. (2018). ThunderSVM: A fast SVM library on GPUs and CPUs. Journal of Machine Learning Research, 19(21), 1–5. |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Yang B., Janssens D., Ruan D., Bellemans T., & Wets G. (2013). A data imputation method with support vector machines for activity-based transportation models. In Computational Intelligence for Traffic and Mobility (pp. 159-171). |
Edit & Imputation | Machine learning for imputation | Support Vector Machine | Zhang Y. & Liu Y. (2009). Data imputation using least squares support vector machines in urban arterial streets. IEEE Signal Processing Letters, 16(5), 414–417. |
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | ML application | Martin Beck, Florian Dumpert, Joerg Feuerhake (2018). Machine Learning in Official Statistics (Shorter English version available on arXiv: https://arxiv.org/abs/1812.10422) |
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Standards | GSBPM (2019). Generic Statistical Business Process Model. Version 5.1, January 2019, UNECE. Available at: https://statswiki.unece.org/display/GSBPM/Generic+Statistical+Business+Process+Model. |
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Standards | GSDEM (2019). Generic Statistical Data Editing Models - GSDEMs, Version 2.0, April 2019, UNECE. Available at: https://statswiki.unece.org/display/sde/GSDEM |
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Standards | GSIM (2019). Generic Statistical Information Model, Version 1.2, May 2019, UNECE. Available at: http://www1.unece.org/stat/platform/display/gsim. |
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Statistics | EDIMBUS (2007). Recommended Practices for Editing and Imputation in Cross-sectional Business Surveys, EDIMBUS project report, https://ec.europa.eu/eurostat/documents/64157/4374310/30-Recommended+Practices-for-editing-and-imputation-in-cross-sectional-business-surveys-2008.pdf. |
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Statistics | MEMOBUST (2014). Handbook on Methodology of Modern Business Statistics, CROS-portal, Eurostat, https://ec.europa.eu/eurostat/cros/content/handbook-methodology-modern-business-statistics_en. |
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Statistics | Van der Loo M. (2015) A Formal Typology of Data Validation Functions, UNECE, Conference of European Statisticians, Budapest. Available at: http://www.markvanderloo.eu/files/statistics/WP_5_Netherlands_A_formal_typology_of_data_validation_functions.pdf |
Edit & Imputation | Machine Learning for Data Editing Cleaning in NSI : Some ideas and hints | Statistics | Waal, T.de, Pannekoek, J. and Scholtus, S. (2011). Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken. |
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | ML application | [1] Di Zio M., Di Cecco D., Di Laurea D., Filippini R., Massoli P., Rocchetti G. “Mass imputation of the attained level of education in the Italian System of Registers”, Workshop on Statistical Data Editing, Neuchâtel, Switzerland, 18-20 September 2018 |
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | ML application | [2] Di Zio M., Filippini R., Rocchetti G. “An imputation procedure for the Italian attained level of education in the register of individuals based on administrative and survey data”, Workshop on Statistical Data Editing, Neuchâtel, Switzerland, 31 August - 2 September 2020 |
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | ML application | [3] Bernasconi, Eleonora, et al. "Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning." arXiv preprint arXiv:1907.09423 (2019). |
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | ML application | [4] De Fausti Fabrizio, Pugliese Francesco and Diego Zardetto. "Toward Automated Website Classification by Deep Learning." arXiv preprint arXiv:1910.09991 (2019). |
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | ML code | https://github.com/defausti/MLP_Imputation.git |
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | ML techniques | [6] Yoon, Jinsung, James Jordon, and Mihaela Van Der Schaar. "Gain: Missing data imputation using generative adversarial nets." arXiv preprint arXiv:1806.02920 (2018). |
Edit & Imputation | Imputation of the variable “Attained Level of Education” in Base Register of Individuals | Statistics | [5] Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of control, signals and systems 2.4 (1989): 303-314. |
Edit & Imputation | Not available | ML code | Stekhoven, D. J. (2015). missForest: Nonparametric missing value imputation using random forest. Astrophysics Source Code Library |
Edit & Imputation | Not available | Statistics | Gray, D. (2019). A Generalized Framework to Evaluate Imputation Strategies: Recent Developments. In JSM Proceedings, Government Statistics Section. Alexandria, VA: American Statistical Association. 1861-1870 |
Edit & Imputation | Not available | Statistics | Gray, D. (2020). Evaluating Imputation Methods using ImpACT: First Case Study, United Nations Statistical Commission and Economic Commission for Europe – Workshop on Statistical Data Editing |
Edit & Imputation | Not available | Statistics | Stelmack, A. (2018). On the Development of a Generalized Framework to Evaluate and Improve Imputation Strategies at Statistics Canada, United Nations Statistical Commission and Economic Commission for Europe – Workshop on Statistical Data Editing. |
Edit & Imputation | WP1 - Theme 2 Edit and Imputation Report | Data Science | Cao L. (2017). Data science: a comprehensive overview. ACM Computing Surveys, 50(3), 1–42. |
Edit & Imputation | WP1 - Theme 2 Edit and Imputation Report | Statistics | Chambers R. (2001). Evaluation Criteria for Statistical Editing and Imputation. |
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Big Data | Daas, P.J.H., Puts, M.J., Buelens, B. and van den Hurk, P. (2015). Big data as a source for official statistics. Journal of Official Statistics, 31, 249–262. |
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Big Data | Hassani, H., Saporta, G. and Silva, E.S. (2014). Data mining and official statistics: the past, the present and the future. Big Data, 1, 34–43. |
Edit & Imputation | Early estimates of energy balance statistics using machine learning | ML code | https://github.com/VITObelgium/energy-balance-ml |
Edit & Imputation | Early estimates of energy balance statistics using machine learning | ML tutorial | Hastie, T., Tibshirani, R., Friedman, J. & Franklin, J. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. New York: Springer. |
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Random Forest | Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. |
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Statistics | Claeskens, G. & Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge: Cambridge University Press. |
Edit & Imputation | Early estimates of energy balance statistics using machine learning | Statistics | Gelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models, Vol. 1 New York: Cambridge University Press. |
Imagery | Use of Landsat satellite data for the mapping of urban areas in non-census years | Data | https://ieeexplore.ieee.org/document/8518312 |
Imagery | Use of Landsat satellite data for the mapping of urban areas in non-census years | Data | https://www.opendatacube.org/ |
Imagery | Learning statistical information from images: a proof of concept | Data | https://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische-data/kaart-van-100-meter-bij-100-meter-met-statistieken |
Imagery | Learning statistical information from images: a proof of concept | Data | Persian cat, Model T, Granny Smith; http://image-net.org/challenges/LSVRC/2015/browse-synsets |
Imagery | Arealstatistik Deep Learning (ADELE) | ML application | https://www.bfs.admin.ch/bfs/de/home/statistiken/raum-umwelt/erhebungen/area.assetdetail.5687737.html |
Imagery | WP1 - Theme 3 Imagery Analysis Report | Big Data | Curzi, G., Modenini, D., & Tortora, P. (2020). Large Constellations of Small Satellites: A Survey of Near Future Challenges and Missions. Aerospace, 7, 133. doi:10.3390/aerospace7090133 |
Imagery | WP1 - Theme 3 Imagery Analysis Report | Big Data | Safyan, M. (2020). Handbook of Small Satellites, Technology, Design, Manufacture, Applications, Economics and Regulation. 1057-1073. doi:10.1007/978-3-030-36308-664 |
Imagery | WP1 - Theme 3 Imagery Analysis Report | Data | http://aws.amazon.com/es/public-data-sets/landsat/ |
Imagery | WP1 - Theme 3 Imagery Analysis Report | Data | http://landsat.gsfc.nasa.gov/?p=10221 |
Imagery | WP1 - Theme 3 Imagery Analysis Report | Data | https://eur-lex.europa.eu/eli/reg_del/2013/1159/oj |
Imagery | WP1 - Theme 3 Imagery Analysis Report | Data | Toth, C., & Jóźków, G. (2016). Remote sensing platforms and sensors: A survey. ISPRS Journal of Photogrammetry and Remote Sensing, 22-36. |
Imagery | WP1 - Theme 3 Imagery Analysis Report | ML application | Ferreira, B., Iten, M., & Silva, R. G. (2020). Monitoring sustainable development by means of earth observation data and machine learning: a review. Environmental Sciences Europe, 32, 120. doi:10.1186/s12302-020-00397-4 |
Imagery | WP1 - Theme 3 Imagery Analysis Report | ML application | Holloway, J., & Mengersen, K. (2018). Statistical Machine Learning Methods and Remote Sensing for Sustainable Development Goals: A Review. Remote Sensing, 10, 1365. doi:10.3390/rs10091365 |
Imagery | WP1 - Theme 3 Imagery Analysis Report | ML application | Youssef, R., Aniss, M., & Jamal, C. (2020). Machine Learning and Deep Learning in Remote Sensing and Urban Application: A Systematic Review and Meta-Analysis. Proceedings of the 4th Edition of International Conference on Geo-IT and Water Resources 2020, Geo-IT and Water Resources 2020. New York, NY, USA: Association for Computing Machinery. doi:10.1145/3399205.3399224 |
Imagery | WP1 - Theme 3 Imagery Analysis Report | ML techniques | Bishop, C. M. (2006). Pattern Recognition and Machine Learning. USA: Springer. |
Imagery | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | Big Data | [1] Conference of European Statisticians (2019) In-depth Review on Satellite Imagery and Earth Observation Technology in Official Statistics |
Imagery | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | Big Data | [1] United Nations Global Working Group on Big Data (2017) Satellite Imagery and Geospatial Data Task Team Report |
Imagery | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | Big Data | Committee on Earth Observation Satellites (2015) Satellite Earth Observations in Support of Climate Information Challenges |
Imagery | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | Data | [1] Lewis, A. et al. (2017) Remote Sensing of Environment |
Imagery | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | Data | [1] UCS Satellite Database (accessed Feb. 2020) |
Imagery | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | Data | Roberts, D., Dunn, B. and Mueller, N. (2018) Open Data Cube Products Using High-Dimensional Statistics of Time Series |
Imagery | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | Standards | United Nations Economic Commission for Europe (2019) Generic Statistical Business Process Model (version 5.1) |
Imagery | Generic Pipeline for Production of Official Statistics Using Satellite Data and Machine Learning | Statistics | [1] United Nations Statistics Division (2019) Guidelines on the use of electronic data collection technologies in population and housing censuses |
Quality | Framework | Australian Bureau of Statistics (2005). Data Quality Framework, Australian Bureau of Statistics, (https://www.abs.gov.au/websitedbs/D3310114.nsf//home/Quality:+The+ABS+Data+Quality+Framework) | |
Quality | Framework | Eurostat (2017). European Statistics Code of Practice , Eurostat, https://ec.europa.eu/eurostat/web/quality/european-statistics-code-of-practice. | |
Quality | Framework | Statistics Canada (2017). Quality Assurance Framework, Statistics Canada, https://www150.statcan.gc.ca/n1/pub/12-539-x/12-539-x2019001-eng.htm | |
Quality | Framework | United Nation (2019). National Quality Assurance Frameworks Manual for Official Statistics, United Nations, https://unstats.un.org/unsd/methodology/dataquality/) | |
Quality | Framework | United Nations (2012). Guidelines for the template for a generic national quality assurance, United Nations, https://unstats.un.org/unsd/statcom/doc12/BG-NQAF.pdf. | |
Quality | ML application | Luque, A., Carrasco, A., Martín, A. and de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231. | |
Quality | ML application | Pepe, M.S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press. | |
Quality | ML application | Vanwinckelen, G. and Blockeel, H. (2014). Look before you leap: Some insights into learner evaluation with cross-validation. JMLR Workshop and Conference Proceedings, 1, 3–19. | |
Quality | ML techniques | Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2014). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. arXiv | |
Quality | ML techniques | Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. 2nd edition. Springer. | |
Quality | ML techniques | Japkowicz, N. and Shah, M. (2011).Evaluating Learning Algorithms.Cambridge University Press. | |
Quality | ML techniques | Stothard, C. (2020). Evaluating Machine Learning Classifiers: A review. Australian Bureau of Statistics, available upon request. | |
Quality | Practices | Arrieta, B.A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R. and Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115 | |
Quality | Practices | Begley C, Ioannidis J. (2015). Reproducibility in science: Improving the standard for basic and preclinical research. Circ. Res. P 116-126. | |
Quality | Practices | Bhatt, U., Xiang, A., Sharma, S., Weller,A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J.M.F. and Eckersley, P. (2020). Explainable machine learning in deployment. arXiv | |
Quality | Practices | Goodman, S., Fanelli, D. and Ioannidis, J. (2016). What does research reproducibility mean? Science Translational Medicine, p 341-353 | |
Quality | Practices | Hanson, B., Sugden, A. and Alberts, B. (2011) Making data maximally available. Science, p 331-649. | |
Quality | Practices | Molnar (2019) Interpretable Machine Learning - A Guide for Making Black Box Models Explainable | |
Quality | Practices | Petkovic (2020) AI and trust: explainability, transparency. Ethical implications of AI and AI Tools Lab, Frankfurt Big Data Lab, Goethe University | |
Quality | Practices | Ribeiro, M.T., Singh, S. and Guestrin, C. (2016) “Why Should I Trust You?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144 | |
Quality | Practices | Stodden, V., Seiler, J. and Ma, Z. (2018). An empirical analysis of journal policy effectiveness for computational reproducibility. Proc Natl Acad Sci USA p 2584–2589. | |
Quality | Practices | Szabo, L. (2019) Artificial intelligence is rushing into patient care—and could raise risks. Scientific American, December 2019 | |
Quality | Practices | Vilone, G. and Longo, L. (2020) Explainable artificial intelligence: a systematic review. arXiv | |
Quality | Statistics | Bengio, Y. And Grandvalent, Y. (2004). No Unbiased Estimator of the Variance of K-Fold Cross-Validation. Journal of Machine Learning Research, 5, 1089–1105. | |
Quality | Statistics | Bickel, P. J. and Freedman, D. A. (1981). Some Asymptotic Theory for the Bootstrap. The Annals of Statistics, 9(6), 1196–1217. | |
Quality | Statistics | Biemer, P.P. (2010). Total Survey Error – Design, Implementation, and Evaluation. Public Option Quarterly, 74(5), 817–848. | |
Quality | Statistics | Borra, S. and Di Ciaccio, A. (2010). Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods. Computational Statistics and Data Analysis, 54, 2976–2989. | |
Quality | Statistics | DiCiccio, T. and Efron, B. (1996). Bootstrap confidence intervals. Statistical Science, p 189-212 | |
Quality | Statistics | Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics. 7(1), 1–26. | |
Quality | Statistics | Eurostat (2014). Handbook on Methodology of Modern Business Statistics, CROS-portal, MEMOBUST, https://ec.europa.eu/eurostat/cros/content/handbook-methodology-modern-business-statistics_en. | |
Quality | Statistics | Groves, R.M. and Lyberg, L. (2010). Total Survey Error – Past, Present, and Future. Public Opinion Quarterly, 74(5), 849–879. | |
Quality | Statistics | Hand D.J. (2012) Assessing the performance of classification methods. International Statistical Review. 80(3), 400–414. | |
Quality | Statistics | Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53, 3735–3745. | |
Quality | Statistics | Platek, R. and Särndal, C.-E. (2001). Can a Statistician Deliver? Journal of Official Statistics, 17(1), 1–20. | |
Quality | Statistics | Quenouille, M.H. (1956). Notes on Bias in Estimation. Biometrika, 43, 353–60. | |
Quality | Statistics | Stone, M. (1974). Cross-validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Society B, 36, 111–147. | |
Quality | Statistics | Wolter, K. M. (2007). Introduction to Variance Estimation.2nd edition.Springer. | |
Other | Not available | ML application | Christen, P. (2007). “A two-step Classification to Unsupervised Record Linkage”, in Proceedings of the 6-th Australian Conference on Data Mining and Analytics, 70, 111-119. |
Other | Not available | ML library | De Bruin, J. (2019). “Python Record Linkage Toolkit: A toolkit for record linkage and duplicate detection in Python”. Zenodo. https://doi.org./10.5281/zenodo.3559043 |
Other | Not available | Statistics | Fellegi, I.P., and Sunter, A.B. (1969), ”A theory of record linkage”, Journal of the American Statistical Association, 64, 1183–1210 |