• This page contains a list of references provided in the ML project reports or taken from other sources. The references are categorized by topic. To report any inaccuracy in the assigned category, please contact UNECE
  • You can search by Topic, Theme or Title (or the report) 

Oops, it seems that you need to place a table or a macro generating a table within the Table Filter macro.

The table is being loaded. Please wait for a bit ...

ThemeTitleTopicReference
OtherNot availableML techniquesSchnaubelt, Ma‹hias (2019) : A comparison of machine learning model validation schemes for non-stationary time series data, FAU Discussion Papers in Economics, No. 11/2019, Friedrich-Alexander-Universitat Erlangen-N ¨ urnberg, Institute for Economics, N ¨ urnberg. h‹p ://hdl.handle.net/10419/209136
Coding & ClassificationIndustry and Occupation CodingML applicationJustin J. Evans, Isaac Ross, Julie Portelance. StatisticsCanada_CCHS_ML_Production_Report. [Online] 2020. https://statswiki.unece.org/display/MLP/Working+documents?preview=/244092601/256970399/Statistics_Canada_FastText_Techniques_Report.pdf
Coding & ClassificationIndustry and Occupation CodingML code and datahttps://github.com/UNECE/CodingandClassification_Statcan
Coding & ClassificationIndustry and Occupation CodingML techniquesYanPeng Gao, Isaac Ross, Justin J. Evans. Statistiscs_Canada_FastText_Techniques_Report. [Online] 2019. https://statswiki.unece.org/download/attachments/244092601/Statistics_Canada_FastText_Techniques_Report.pdf?version=2&modificationDate=1567626783886&api=v2
Coding & ClassificationSentiment Analysis of twitter dataML codehttps://github.com/jmaslankowski/WP7-Population-Life-Satisfaction
Coding & ClassificationSentiment Analysis of twitter dataML codehttps://github.com/mireusen/hlmos-statistiek-vlaanderen-twitter
Coding & ClassificationSentiment Analysis of twitter dataML codehttps://github.com/wimulkeman/dutch-sentiment-analysis
Coding & ClassificationSentiment Analysis of twitter dataML modelhttps://github.com/wietsedv/bertje/blob/master/README.md
Coding & ClassificationSentiment Analysis of twitter dataML modelhttps://tfhub.dev/google/universal-sentence-encoder-multilingual-large/3
Coding & ClassificationProduction description to ECOICOPML codehttps://colab.research.google.com/drive/1Epn2NeFRuFC_XyXtQ4qezGVBA5aAzqIh
Coding & ClassificationProduction description to ECOICOPML code and datahttps://github.com/statisticspoland/ecoicop_classification
Coding & ClassificationProduction description to ECOICOPML libraryhttps://scikit-learn.org/stable/index.html
Coding & ClassificationNot availableML applicationhttps://www.cbs.nl/nl-nl/over-ons/innovatie/project/innovatieve-hotspots
Coding & ClassificationWP1 - Theme 1 Coding and Classification ReportML libraryhttps://en.wikipedia.org/wiki/FastText
Coding & ClassificationWP1 - Theme 1 Coding and Classification ReportML tutorialhttps://machinelearningmastery.com/types-of-classification-in-machine-learning/
Coding & ClassificationWP1 - Theme 1 Coding and Classification ReportML tutorialhttps://www.analyticsvidhya.com/blog/2017/09/common-machine-learning-algorithms/
Coding & ClassificationWP1 - Theme 1 Coding and Classification ReportNaive Bayeshttps://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
Coding & ClassificationWP1 - Theme 1 Coding and Classification ReportRandom Foresthttps://builtin.com/data-science/random-forest-algorithm
Coding & ClassificationWP1 - Theme 1 Coding and Classification ReportRandom Foresthttps://towardsdatascience.com/understanding-random-forest-58381e0602d2
Coding & ClassificationWP1 - Theme 1 Coding and Classification ReportSubject matterhttps://www.ons.gov.uk/methodology/classificationsandstandards/standardoccupationalclassificationsoc/soc2010/soc2010volume2thestructureandcodingindex#electronic-version-of-the-index
Coding & ClassificationWP1 - Theme 1 Coding and Classification ReportXGBoosthttps://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/
Coding & ClassificationAutomatic coding of occupation and industry in social statistical surveysML applicationhttps://www.bls.gov/iif/deep-neural-networks.pdf
Coding & ClassificationAutomatic coding of occupation and industry in social statistical surveysML applicationhttps://www.bls.gov/iif/deep-neural-networks.pdf
Coding & ClassificationAutomatic coding of occupation and industry in social statistical surveysML applicationhttps://www.bls.gov/osmr/research-papers/2014/pdf/st140040.pdf
Coding & ClassificationAutomatic coding of occupation and industry in social statistical surveysML applicationhttps://www.bls.gov/osmr/research-papers/2014/pdf/st140040.pdf
Coding & ClassificationAutomatic coding of occupation and industry in social statistical surveysML codehttps://github.com/USDepartmentofLabor/soii_neural_autocoder
Coding & ClassificationAutomatic coding of occupation and industry in social statistical surveysML tutorialhttps://github.com/ameasure/autocoding-class/blob/master/machine_learning.ipynb
Edit & ImputationNot availableTerminologyhttps://www.analyticsvidhya.com/glossary-of-common-statistics-and-machine-learning-terms/
Edit & ImputationMachine learning for imputationBayesian NetworksCheng J., Greiner R., Kelly J., Bell D. A., & Liu W. (2002). Learning Bayesian Networks from Data: An Information-Theory Based Approach. Artificial Intelligence, 137, 43–90.
Edit & ImputationMachine learning for imputationBayesian NetworksDi Zio M., Sacco G., Scanu M., & Vicard P. (2004). Multivariate Techniques for Imputation Based on Bayesian Networks. Compstat 2004 Symposium.
Edit & ImputationMachine learning for imputationBayesian NetworksDi Zio M., Scanu M., Coppola L., Luzi O., & Ponti A. (2004). Bayesian Networks for Imputation. Journal of the Royal Statistical Society Series A, 167(2), 309–322.
Edit & ImputationMachine learning for imputationBayesian NetworksJensen F. V. & Nielsen T. D. (2007). Bayesian Networks and Decision Graphs. Second edition. Springer.
Edit & ImputationMachine learning for imputationBayesian NetworksKalisch M., Bühlmann P. (2007). Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. Journal of Machine Learning Research, 8, 613–636.
Edit & ImputationMachine learning for imputationBayesian NetworksLauritzen S. L. (1995). The EM Algorithm for Graphical Association Models With Missing Data. Computational Statistics and Data Analysis, 19, 191–201.
Edit & ImputationMachine learning for imputationBayesian NetworksMoore A. & Wong W. (2003). Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning. In Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003), 552–559.
Edit & ImputationMachine learning for imputationBayesian NetworksRey del Castillo P. (2012). Use of Machine Learning Methods to Impute Categorical Data. Conference of European Statisticians WP. 37.
Edit & ImputationMachine learning for imputationBayesian NetworksRiggelsen C. (2006). Learning parameters of Bayesian networks from incomplete data via importance sampling. International Journal of Approximate Reasoning, 42(1-2), 69–83.
Edit & ImputationMachine learning for imputationBayesian NetworksSpirtes P., Glymour C., & Scheines R. (2000). Causation, prediction, and search. Second edition. MIT Press.
Edit & ImputationMachine learning for imputationBayesian NetworksTsamardinos I., Brown L. E., & Aliferis C. F. (2006). The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Machine Learning, 65, 31–78.
Edit & ImputationMachine learning for imputationK-nearest neighbourBeretta L. & Santaniello A. (2016). Nearest Neighbor Imputation Algorithms: A Critical Evalutation. Medical Informatics and Decision Making, 16, 197–208.
Edit & ImputationMachine learning for imputationK-nearest neighbourCucala L., Marin J. M., Robert C. P., & Titterington D. M. (2009). A Bayesian Reassessment of Nearest-Neighbor Classification. Journal of the American Statistical Association, 104, 263–273.
Edit & ImputationMachine learning for imputationK-nearest neighbourDevroye L., Györfi L., & Lugosi G. (1996). A Probabilistic Theory of Pattern Recognition. Springer.
Edit & ImputationMachine learning for imputationK-nearest neighbourLiao S. G., Lin Y., Kang D. D., Chandra D., Bon J., Kaminski N., Sciurba F. C., & Tseng G. C. (2014). Missing Value Imputation in High-Dimensional Phenomic Data: Imputable or not, and how? Bioinformatics, 15, 346.
Edit & ImputationMachine learning for imputationK-nearest neighbourTroyanskaya O., Cantor M., Sherlock G., Brown P. O., Hastie T., Tibshirani R., Botstein D., & Altman R. B. (2001). Missing Value Estimation Methods for DNA Microarrays. Bioinformatics, 17, 520–525.
Edit & ImputationMachine learning for imputationML applicationBeck M., Dumpert F., & Feuerhake J. (2018). Proof of Concept Machine Learning – Abschlussbericht. Online available on: https://www.destatis.de/GPStatistik/receive/DEMonografie_monografie_00004835 (in German)
Edit & ImputationMachine learning for imputationML applicationBertsimas D., Pawlowski C., & Zhuo Y. D. (2017). From predictive methods to missing data imputation: an optimization approach. The Journal of Machine Learning Research, 18(1), 7133–7171.
Edit & ImputationMachine learning for imputationML applicationPark S., Pannekoek J., & van der Loo M. P. J. (2018). Imputation of Economic Data based on Random Forest. Technical Report. Online available on statswiki.
Edit & ImputationMachine learning for imputationML applicationRichman M. B., Trafalis T. B., & Adrianto I. (2009). Missing data imputation through machine learning algorithms. In Artificial Intelligence Methods in the Environmental Sciences (pp. 153–169).
Edit & ImputationMachine learning for imputationML applicationYang B., Janssens D., Ruan D., Bellemans T. & Wets G. (2013). A data imputation method with support vector machines for activity-based transportation models. In Computational Intelligence for Traffic and Mobility (pp. 159–171).
Edit & ImputationMachine learning for imputationML codeCrookston N. L. & Finley A. O. (2007). yaImpute: An R Package for kNN Imputation. Journal of Statistical Software, 23(10), 1–16.
Edit & ImputationMachine learning for imputationML codeMayer M. (2019). missRanger: Fast Imputation of Missing Values. Online: https://cran.r-project.org/web/packages/missRanger/index.html
Edit & ImputationMachine learning for imputationML codeScutari M. (2010). Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software, 35(3), 1–22.
Edit & ImputationMachine learning for imputationML codeSteinwart I. & Thomann P. (2017). liquidSVM: A Fast and Versatile SVM package. Online: https://arxiv.org/abs/1702.06899.
Edit & ImputationMachine learning for imputationML codevan Buuren S. & Groothuis-Oudshoorn K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1–67.
Edit & ImputationMachine learning for imputationML CodeWright M. N. & Ziegler A. (2017). ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R. Journal of Statistical Software, 77(1), 1–17.
Edit & ImputationMachine learning for imputationML techniquesHamner B., Frasco M., & LeDell E. (2018). Metrics: Evaluation Metrics for Machine Learning. Online: https://CRAN.R-project.org/package=Metrics
Edit & ImputationMachine learning for imputationML techniquesHonghai F., Guoshun C., Cheng Y., Bingru Y., & Yumei C. (2005). A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587).
Edit & ImputationMachine learning for imputationML techniquesMikhchi A., Honarvar M., Kashan N. E. J., & Aminafshar, M. (2016). Assessing and comparison of different machine learning methods in parent-offspring trios for genotype imputation. Journal of theoretical biology, 399, 148–158.
Edit & ImputationMachine learning for imputationML techniquesStekhoven D. J. & Buehlmann P. (2012). MissForest – non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1), 112–118.
Edit & ImputationMachine learning for imputationML techniquesvan Buuren S. (2018). Flexible Imputation of Missing Data. 2nd edition. CRC.
Edit & ImputationMachine learning for imputationML tutorialTorgo L. (2010). Data Mining with R, learning with case studies Chapman and Hall/CRC. Online: http://www.dcc.fc.up.pt/~ltorgo/DataMiningWithR.
Edit & ImputationMachine learning for imputationNot publishedDumpert F., Hansen M., Peters F., & Spies L. (2018). Bericht zur Maßnahme Machine Learning Methodik. Internal Paper, yet unpublished, in German.
Edit & ImputationMachine learning for imputationR library//cran.r-project.org/
Edit & ImputationMachine learning for imputationRandom ForestAthey S., Tibshirani J., & Wager S. (2019). Generalized Random Forests. The Annals of Statistics, 47(2), 1148–1178.
Edit & ImputationMachine learning for imputationRandom ForestBiau G. & Scornet E. (2016). A random forest guided tour. Test, 25(2), 197–227.
Edit & ImputationMachine learning for imputationRandom ForestBreiman L. (2001). Random forests. Machine learning, 45(1), 5–32.
Edit & ImputationMachine learning for imputationRandom ForestBurgette L. F. & Reiter J. P. (2010). Multiple imputation for missing data via sequential regression trees. American journal of epidemiology, 172(9), 1070–1076.
Edit & ImputationMachine learning for imputationRandom ForestCaiola G. & Reiter J. P. (2010). Random Forests for Generating Partially Synthetic, Categorical Data. Trans. Data Privacy, 3(1), 27-42.
Edit & ImputationMachine learning for imputationRandom ForestDing Y. & Simonoff J. S. (2010). An investigation of missing data methods for classification trees applied to binary response data. Journal of Machine Learning Research, 11, 131–170.
Edit & ImputationMachine learning for imputationRandom ForestDoove L. L., Van Buuren S., & Dusseldorp E. (2014). Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics & Data Analysis, 72, 92–104.
Edit & ImputationMachine learning for imputationRandom ForestFeelders, A. (1999). Handling missing data in trees: surrogate splits or statistical imputation? In European Conference on Principles of Data Mining and Knowledge Discovery (pp. 329–334).
Edit & ImputationMachine learning for imputationRandom ForestMentch L. & Hooker G. (2016). Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. Journal of Machine Learning Research, 17(1), 841–881.
Edit & ImputationMachine learning for imputationRandom ForestReiter J. P. (2005). Using CART to generate partially synthetic public use microdata. Journal of Official Statistics, 21(3), 441–462.
Edit & ImputationMachine learning for imputationRandom ForestSaar-Tsechansky M. & Provost F. (2007). Handling missing values when applying classification models. Journal of Machine Learning Research, 8, 1623–1657.
Edit & ImputationMachine learning for imputationRandom ForestWager S., Hastie T., & Efron B. (2014). Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. Journal of Machine Learning Research, 15(1), 1625–1651.
Edit & ImputationMachine learning for imputationStatisticsBankier M., Lachance M., & Poirier P. (2000). 2001 Canadian census minimum change donor imputation methodology. UNECE Work Session on Statistical Data Editing 2000, Working Paper No. 17. Online: http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2000/10/sde/17.e.pdf
Edit & ImputationMachine learning for imputationStatisticsBreiman L. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical science, 16(3), 199–231.
Edit & ImputationMachine learning for imputationStatisticsChambers R. (2001). Evaluation Criteria for Statistical Editing and Imputation. Online available: https://www.cs.york.ac.uk/euredit/
Edit & ImputationMachine learning for imputationStatisticsLittle R. J. & Rubin D. B. (1987; 2002). Statistical analysis with missing data. Wiley.
Edit & ImputationMachine learning for imputationStatisticsLittle R. J. (2011). Imputation. In: Lovric M., International Encyclopedia of Statistical Science. Springer.
Edit & ImputationMachine learning for imputationStatisticsRubin D. B. (1987). Multiple imputation for nonresponse in surveys. Wiley.
Edit & ImputationMachine learning for imputationSupport Vector MachineBoser B. E., Guyon I. M., & Vapnik V. N. (1992). A training algorithm for optimal margin classifiers. Fifth Annual ACM Workshop on Computational Learning Theory, 144–152.
Edit & ImputationMachine learning for imputationSupport Vector MachineChechik G., Heitz G., Elidan G., Abbeel P., & Koller D. (2007). Max-margin classification of incomplete data. In Advances in Neural Information Processing Systems (pp. 233–240).
Edit & ImputationMachine learning for imputationSupport Vector MachineCortes C. & Vapnik V. N. (1995). Support-vector networks. Machine Learning, 20, 273–297.
Edit & ImputationMachine learning for imputationSupport Vector MachineDrechsler J. & Reiter J. P. (2011). An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Computational Statistics & Data Analysis, 55(12), 3232–3243.
Edit & ImputationMachine learning for imputationSupport Vector MachineDrechsler J. (2010). Using support vector machines for generating synthetic datasets. In International Conference on Privacy in Statistical Databases (pp. 148–161). 
Edit & ImputationMachine learning for imputationSupport Vector MachineHable R. (2012). Asymptotic normality of support vector machine variants and other regularized kernel methods. Journal of Multivariate Analysis, 106, 92–117.
Edit & ImputationMachine learning for imputationSupport Vector MachineHonghai F., Guoshun C., Cheng Y., Bingru Y., & Yumei C. (2005). A SVM regression based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587).
Edit & ImputationMachine learning for imputationSupport Vector MachinePelckmans K., De Brabanter J., Suykens J. A., & De Moor B. (2005). Handling missing values in support vector machine classifiers. Neural Networks, 18(5-6), 684–692.
Edit & ImputationMachine learning for imputationSupport Vector MachineRogers S. D. (2012). Support Vector Machines for Classification and Imputation. Master thesis. Brigham Young University.
Edit & ImputationMachine learning for imputationSupport Vector MachineSmola A. J., Vishwanathan S. V. N., & Hofmann T. (2005). Kernel Methods for Missing Variables. In AISTATS 2005 – Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (pp. 325–332).
Edit & ImputationMachine learning for imputationSupport Vector MachineSteinwart I. & Christmann A. (2008). Support Vector Machines. Springer.
Edit & ImputationMachine learning for imputationSupport Vector MachineStewart T. G., Zeng D., & Wu M. C. (2018). Constructing support vector machines with missing data. Wiley Interdisciplinary Reviews: Computational Statistics, 10, 1–16.
Edit & ImputationMachine learning for imputationSupport Vector MachineWen Z., Shi J., Li Q., He B., & Chen J. (2018). ThunderSVM: A fast SVM library on GPUs and CPUs. Journal of Machine Learning Research, 19(21), 1–5.
Edit & ImputationMachine learning for imputationSupport Vector MachineYang B., Janssens D., Ruan D., Bellemans T., & Wets G. (2013). A data imputation method with support vector machines for activity-based transportation models. In Computational Intelligence for Traffic and Mobility (pp. 159-171). 
Edit & ImputationMachine learning for imputationSupport Vector MachineZhang Y. & Liu Y. (2009). Data imputation using least squares support vector machines in urban arterial streets. IEEE Signal Processing Letters, 16(5), 414–417.
Edit & ImputationMachine Learning for Data Editing Cleaning in NSI : Some ideas and hintsML applicationMartin Beck, Florian Dumpert, Joerg Feuerhake (2018). Machine Learning in Official Statistics (Shorter English version available on arXiv: https://arxiv.org/abs/1812.10422)
Edit & ImputationMachine Learning for Data Editing Cleaning in NSI : Some ideas and hintsStandardsGSBPM (2019). Generic Statistical Business Process Model. Version 5.1, January 2019, UNECE. Available at: https://statswiki.unece.org/display/GSBPM/Generic+Statistical+Business+Process+Model.   
Edit & ImputationMachine Learning for Data Editing Cleaning in NSI : Some ideas and hintsStandardsGSDEM (2019). Generic Statistical Data Editing Models - GSDEMs, Version 2.0, April 2019, UNECE. Available at: https://statswiki.unece.org/display/sde/GSDEM  
Edit & ImputationMachine Learning for Data Editing Cleaning in NSI : Some ideas and hintsStandardsGSIM (2019). Generic Statistical Information Model, Version 1.2, May 2019, UNECE. Available at: http://www1.unece.org/stat/platform/display/gsim.  
Edit & ImputationMachine Learning for Data Editing Cleaning in NSI : Some ideas and hintsStatisticsEDIMBUS (2007). Recommended Practices for Editing and Imputation in Cross-sectional Business Surveys, EDIMBUS project report, https://ec.europa.eu/eurostat/documents/64157/4374310/30-Recommended+Practices-for-editing-and-imputation-in-cross-sectional-business-surveys-2008.pdf.  
Edit & ImputationMachine Learning for Data Editing Cleaning in NSI : Some ideas and hintsStatisticsMEMOBUST (2014). Handbook on Methodology of Modern Business Statistics, CROS-portal, Eurostat, https://ec.europa.eu/eurostat/cros/content/handbook-methodology-modern-business-statistics_en.  
Edit & ImputationMachine Learning for Data Editing Cleaning in NSI : Some ideas and hintsStatisticsVan der Loo M. (2015) A Formal Typology of Data Validation Functions, UNECE, Conference of European Statisticians, Budapest. Available at:    http://www.markvanderloo.eu/files/statistics/WP_5_Netherlands_A_formal_typology_of_data_validation_functions.pdf  
Edit & ImputationMachine Learning for Data Editing Cleaning in NSI : Some ideas and hintsStatisticsWaal, T.de, Pannekoek, J. and Scholtus, S. (2011). Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken. 
Edit & ImputationImputation of the variable “Attained Level of Education” in Base Register of IndividualsML application[1] Di Zio M., Di Cecco D., Di Laurea D., Filippini R., Massoli P., Rocchetti G. “Mass imputation of the attained level of education in the Italian System of Registers”, Workshop on Statistical Data Editing, Neuchâtel, Switzerland, 18-20 September 2018
Edit & ImputationImputation of the variable “Attained Level of Education” in Base Register of IndividualsML application[2] Di Zio M., Filippini R., Rocchetti G. “An imputation procedure for the Italian attained level of education in the register of individuals based on administrative and survey data”, Workshop on Statistical Data Editing, Neuchâtel, Switzerland, 31 August - 2 September 2020
Edit & ImputationImputation of the variable “Attained Level of Education” in Base Register of IndividualsML application[3] Bernasconi, Eleonora, et al. "Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning." arXiv preprint arXiv:1907.09423 (2019).
Edit & ImputationImputation of the variable “Attained Level of Education” in Base Register of IndividualsML application[4] De Fausti Fabrizio, Pugliese Francesco and Diego Zardetto. "Toward Automated Website Classification by Deep Learning." arXiv preprint arXiv:1910.09991 (2019).
Edit & ImputationImputation of the variable “Attained Level of Education” in Base Register of IndividualsML codehttps://github.com/defausti/MLP_Imputation.git
Edit & ImputationImputation of the variable “Attained Level of Education” in Base Register of IndividualsML techniques[6] Yoon, Jinsung, James Jordon, and Mihaela Van Der Schaar. "Gain: Missing data imputation using generative adversarial nets." arXiv preprint arXiv:1806.02920 (2018).
Edit & ImputationImputation of the variable “Attained Level of Education” in Base Register of IndividualsStatistics[5] Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of control, signals and systems 2.4 (1989): 303-314.
Edit & ImputationNot availableML codeStekhoven, D. J. (2015). missForest: Nonparametric missing value imputation using random forest. Astrophysics Source Code Library
Edit & ImputationNot availableStatisticsGray, D. (2019). A Generalized Framework to Evaluate Imputation Strategies: Recent Developments. In JSM Proceedings, Government Statistics Section. Alexandria, VA: American Statistical Association. 1861-1870
Edit & ImputationNot availableStatisticsGray, D. (2020). Evaluating Imputation Methods using ImpACT: First Case Study, United Nations Statistical Commission and Economic Commission for Europe – Workshop on Statistical Data Editing
Edit & ImputationNot availableStatisticsStelmack, A. (2018). On the Development of a Generalized Framework to Evaluate and Improve Imputation Strategies at Statistics Canada, United Nations Statistical Commission and Economic Commission for Europe – Workshop on Statistical Data Editing.
Edit & ImputationWP1 - Theme 2 Edit and Imputation ReportData ScienceCao L. (2017). Data science: a comprehensive overview. ACM Computing Surveys, 50(3), 1–42.
Edit & ImputationWP1 - Theme 2 Edit and Imputation ReportStatisticsChambers R. (2001). Evaluation Criteria for Statistical Editing and Imputation.
Edit & ImputationEarly estimates of energy balance statistics using machine learningBig DataDaas, P.J.H., Puts, M.J., Buelens, B. and van den Hurk, P. (2015). Big data as a source for official statistics. Journal of Official Statistics, 31, 249–262.
Edit & ImputationEarly estimates of energy balance statistics using machine learningBig DataHassani, H., Saporta, G. and Silva, E.S. (2014). Data mining and official statistics: the past, the present and the future. Big Data, 1, 34–43.
Edit & ImputationEarly estimates of energy balance statistics using machine learningML codehttps://github.com/VITObelgium/energy-balance-ml
Edit & ImputationEarly estimates of energy balance statistics using machine learningML tutorialHastie, T., Tibshirani, R., Friedman, J. & Franklin, J. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. New York: Springer.
Edit & ImputationEarly estimates of energy balance statistics using machine learningRandom ForestBreiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Edit & ImputationEarly estimates of energy balance statistics using machine learningStatisticsClaeskens, G. & Hjort, N. L. (2008). Model Selection and Model Averaging. Cambridge: Cambridge University Press.
Edit & ImputationEarly estimates of energy balance statistics using machine learningStatisticsGelman, A. & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models, Vol. 1 New York: Cambridge University Press.
ImageryUse of Landsat satellite data for the mapping of urban areas in non-census yearsDatahttps://ieeexplore.ieee.org/document/8518312
ImageryUse of Landsat satellite data for the mapping of urban areas in non-census yearsDatahttps://www.opendatacube.org/
ImageryLearning statistical information from images: a proof of conceptDatahttps://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische-data/kaart-van-100-meter-bij-100-meter-met-statistieken
ImageryLearning statistical information from images: a proof of conceptDataPersian cat, Model T, Granny Smith; http://image-net.org/challenges/LSVRC/2015/browse-synsets
ImageryArealstatistik Deep Learning (ADELE)ML applicationhttps://www.bfs.admin.ch/bfs/de/home/statistiken/raum-umwelt/erhebungen/area.assetdetail.5687737.html
ImageryWP1 - Theme 3 Imagery Analysis ReportBig DataCurzi, G., Modenini, D., & Tortora, P. (2020). Large Constellations of Small Satellites: A Survey of Near Future Challenges and Missions. Aerospace, 7, 133. doi:10.3390/aerospace7090133
ImageryWP1 - Theme 3 Imagery Analysis ReportBig DataSafyan, M. (2020). Handbook of Small Satellites, Technology, Design, Manufacture, Applications, Economics and Regulation. 1057-1073. doi:10.1007/978-3-030-36308-664
ImageryWP1 - Theme 3 Imagery Analysis ReportDatahttp://aws.amazon.com/es/public-data-sets/landsat/
ImageryWP1 - Theme 3 Imagery Analysis ReportDatahttp://landsat.gsfc.nasa.gov/?p=10221
ImageryWP1 - Theme 3 Imagery Analysis ReportDatahttps://eur-lex.europa.eu/eli/reg_del/2013/1159/oj
ImageryWP1 - Theme 3 Imagery Analysis ReportDataToth, C., & Jóźków, G. (2016). Remote sensing platforms and sensors: A survey. ISPRS Journal of Photogrammetry and Remote Sensing, 22-36.
ImageryWP1 - Theme 3 Imagery Analysis ReportML applicationFerreira, B., Iten, M., & Silva, R. G. (2020). Monitoring sustainable development by means of earth observation data and machine learning: a review. Environmental Sciences Europe, 32, 120. doi:10.1186/s12302-020-00397-4
ImageryWP1 - Theme 3 Imagery Analysis ReportML applicationHolloway, J., & Mengersen, K. (2018). Statistical Machine Learning Methods and Remote Sensing for Sustainable Development Goals: A Review. Remote Sensing, 10, 1365. doi:10.3390/rs10091365
ImageryWP1 - Theme 3 Imagery Analysis ReportML applicationYoussef, R., Aniss, M., & Jamal, C. (2020). Machine Learning and Deep Learning in Remote Sensing and Urban Application: A Systematic Review and Meta-Analysis. Proceedings of the 4th Edition of International Conference on Geo-IT and Water Resources 2020, Geo-IT and Water Resources 2020. New York, NY, USA: Association for Computing Machinery. doi:10.1145/3399205.3399224
ImageryWP1 - Theme 3 Imagery Analysis ReportML techniquesBishop, C. M. (2006). Pattern Recognition and Machine Learning. USA: Springer.
ImageryGeneric Pipeline for Production of Official Statistics Using Satellite Data and Machine LearningBig Data[1] Conference of European Statisticians (2019) In-depth Review on Satellite Imagery and Earth Observation Technology in Official Statistics
ImageryGeneric Pipeline for Production of Official Statistics Using Satellite Data and Machine LearningBig Data[1] United Nations Global Working Group on Big Data (2017) Satellite Imagery and Geospatial Data Task Team Report
ImageryGeneric Pipeline for Production of Official Statistics Using Satellite Data and Machine LearningBig DataCommittee on Earth Observation Satellites (2015) Satellite Earth Observations in Support of Climate Information Challenges
ImageryGeneric Pipeline for Production of Official Statistics Using Satellite Data and Machine LearningData[1] Lewis, A. et al. (2017) Remote Sensing of Environment
ImageryGeneric Pipeline for Production of Official Statistics Using Satellite Data and Machine LearningData[1] UCS Satellite Database (accessed Feb. 2020)
ImageryGeneric Pipeline for Production of Official Statistics Using Satellite Data and Machine LearningDataRoberts, D., Dunn, B. and Mueller, N. (2018) Open Data Cube Products Using High-Dimensional Statistics of Time Series
ImageryGeneric Pipeline for Production of Official Statistics Using Satellite Data and Machine LearningStandardsUnited Nations Economic Commission for Europe (2019) Generic Statistical Business Process Model (version 5.1)
ImageryGeneric Pipeline for Production of Official Statistics Using Satellite Data and Machine LearningStatistics[1] United Nations Statistics Division (2019) Guidelines on the use of electronic data collection technologies in population and housing censuses
Quality
FrameworkAustralian Bureau of Statistics (2005). Data Quality Framework, Australian Bureau of Statistics, (https://www.abs.gov.au/websitedbs/D3310114.nsf//home/Quality:+The+ABS+Data+Quality+Framework)
Quality
FrameworkEurostat (2017). European Statistics Code of Practice , Eurostat, https://ec.europa.eu/eurostat/web/quality/european-statistics-code-of-practice.
Quality
FrameworkStatistics Canada (2017). Quality Assurance Framework, Statistics Canada,  https://www150.statcan.gc.ca/n1/pub/12-539-x/12-539-x2019001-eng.htm
Quality
FrameworkUnited Nation (2019). National Quality Assurance Frameworks Manual for Official Statistics, United Nations, https://unstats.un.org/unsd/methodology/dataquality/)
Quality
FrameworkUnited Nations (2012). Guidelines for the template for a generic national quality assurance,  United Nations, https://unstats.un.org/unsd/statcom/doc12/BG-NQAF.pdf.
Quality
ML applicationLuque, A., Carrasco, A., Martín, A. and de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231.
Quality
ML applicationPepe, M.S. (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford University Press.
Quality
ML applicationVanwinckelen, G. and Blockeel, H. (2014). Look before you leap: Some insights into learner evaluation with cross-validation. JMLR Workshop and Conference Proceedings, 1, 3–19.
Quality
ML techniquesGoldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2014). Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. arXiv
Quality
ML techniquesHastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning. 2nd edition. Springer.
Quality
ML techniquesJapkowicz, N. and Shah, M. (2011).Evaluating Learning Algorithms.Cambridge University Press.
Quality
ML techniquesStothard, C. (2020). Evaluating Machine Learning Classifiers: A review. Australian Bureau of Statistics, available upon request.
Quality
PracticesArrieta, B.A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R. and Herrera, F. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115
Quality
PracticesBegley C, Ioannidis J. (2015).  Reproducibility in science: Improving the standard for basic and preclinical research.  Circ. Res. P 116-126.
Quality
PracticesBhatt, U., Xiang, A., Sharma, S., Weller,A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J.M.F. and Eckersley, P. (2020). Explainable machine learning in deployment. arXiv
Quality
PracticesGoodman, S., Fanelli, D. and Ioannidis, J. (2016).  What does research reproducibility mean?  Science Translational Medicine, p 341-353
Quality
PracticesHanson, B., Sugden, A. and Alberts, B. (2011) Making data maximally available. Science, p 331-649.
Quality
PracticesMolnar (2019) Interpretable Machine Learning - A Guide for Making Black Box Models Explainable
Quality
PracticesPetkovic (2020) AI and trust: explainability, transparency. Ethical implications of AI and AI Tools Lab, Frankfurt Big Data Lab, Goethe University
Quality
PracticesRibeiro, M.T., Singh, S. and Guestrin, C. (2016) “Why Should I Trust You?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144
Quality
PracticesStodden, V., Seiler, J. and Ma, Z. (2018).  An empirical analysis of journal policy effectiveness for computational reproducibility. Proc Natl Acad Sci USA p 2584–2589.
Quality
PracticesSzabo, L.  (2019) Artificial intelligence is rushing into patient care—and could raise risks. Scientific American, December 2019
Quality
PracticesVilone, G. and Longo, L. (2020) Explainable artificial intelligence: a systematic review. arXiv
Quality
StatisticsBengio, Y. And Grandvalent, Y. (2004). No Unbiased Estimator of the Variance of K-Fold Cross-Validation. Journal of Machine Learning Research, 5, 1089–1105.
Quality
StatisticsBickel, P. J. and Freedman, D. A. (1981). Some Asymptotic Theory for the Bootstrap. The Annals of Statistics, 9(6), 1196–1217.
Quality
StatisticsBiemer, P.P. (2010). Total Survey Error – Design, Implementation, and Evaluation. Public Option Quarterly, 74(5), 817–848.
Quality
StatisticsBorra, S. and Di Ciaccio, A. (2010). Measuring the prediction error. A comparison of cross-validation, bootstrap and covariance penalty methods. Computational Statistics and Data Analysis, 54, 2976–2989.
Quality
StatisticsDiCiccio,  T. and Efron, B. (1996).  Bootstrap confidence intervals.  Statistical Science, p 189-212
Quality
StatisticsEfron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics. 7(1), 1–26.
Quality
StatisticsEurostat (2014). Handbook on Methodology of Modern Business Statistics, CROS-portal, MEMOBUST, https://ec.europa.eu/eurostat/cros/content/handbook-methodology-modern-business-statistics_en.
Quality
StatisticsGroves, R.M. and Lyberg, L. (2010). Total Survey Error – Past, Present, and Future. Public Opinion Quarterly, 74(5), 849–879.
Quality
StatisticsHand D.J. (2012) Assessing the performance of classification methods. International Statistical Review. 80(3), 400–414.
Quality
StatisticsKim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53, 3735–3745.
Quality
StatisticsPlatek, R. and  Särndal, C.-E. (2001). Can a Statistician Deliver? Journal of Official Statistics, 17(1), 1–20.
Quality
StatisticsQuenouille, M.H. (1956). Notes on Bias in Estimation. Biometrika, 43, 353–60.
Quality
StatisticsStone, M. (1974). Cross-validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Society B, 36, 111–147.
Quality
StatisticsWolter, K. M. (2007). Introduction to Variance Estimation.2nd edition.Springer.
OtherNot availableML applicationChristen, P. (2007). “A two-step Classification to Unsupervised Record Linkage”, in Proceedings of the 6-th Australian Conference on Data Mining and Analytics, 70, 111-119.
OtherNot availableML libraryDe Bruin, J. (2019). “Python Record Linkage Toolkit: A toolkit for record linkage and duplicate detection in Python”. Zenodo. https://doi.org./10.5281/zenodo.3559043
OtherNot availableStatisticsFellegi, I.P., and Sunter, A.B. (1969), ”A theory of record linkage”, Journal of the American Statistical Association, 64, 1183–1210




  • No labels