EESW19 is offering three one-day short courses intended for business survey methodologists, researchers and other professionals in the field:
Machine Learning by José L. Cervera-Ferri
Statistical Data Cleaning for Business Statistics with R by Mark van der Loo
Fees apply for attending these courses. To register for one of these courses, please follow this link. The number of places is limited, registration closes after July 31, 2019.
José L. Cervera-Ferri
Machine Learning (ML) methods are becoming increasingly relevant for Statistical Offices as they are considering use of new data sources known under the name Big Data (satellite images, mobile call records, sensors and IoT, etc). The short course will present, at an introductory level, the most well-known ML methods, including linear and non-linear regression models, trees and random forests. Examples of actual application of ML methods in official statistics will be presented, as well as their potential in specific data treatment processes such as missing value imputation, classification and small area estimation.
Introduction of ML methods implies moving from design-based or model-based statistical inference to algorithm-based inference, with important consequences for the interpretability and transparency of methods for data users. A major barrier to the application of ML in Statistical Offices is the difficulty of combining statistical skills, IT expertise, subject-matter knowledge and communication skills to explain the methods and results to users. During the course, the participants will have the opportunity to share experiences in their offices about establishing Data Science teams, and identifying ways to attract and retain talent for these activities.
About the instructor
José L. Cervera-Ferri, CEO of DevStat, is an international consultant in official statistics, with more than 25 years of experience in governmental and private institutions. José was recently involved as consultant in the preparation of the conference on Modernization of European Statistics (Valencia 2014) and the ESS Big Data Event (Rome 2014) in which he/DevStat supported Eurostat for the preparation of the scientific programme, contributed as facilitator and rapporteur and edited the technical reports, which have been published in the ESS research portal (CROS). He also participates in projects related to the modernization of statistics from the legal viewpoint (impact assessment of the European framework legislation on social statistics). He currently coordinates the courses on Machine Learning for Eurostat and the EU National Statistical Offices under the European Statistical Training Programme (ESTP).
Statistical Data Cleaning for Business Statistics with R
Mark van der Loo
In this workshop I demonstrate how data quality can be systematically defined and improved using R. The workshop focuses on data validation (data checking), locating errors, and imputing missing or erroneous values under restrictions. I will draw from examples that are typical for the Structural Business Statistics survey where common restrictions include nonnegativity rules and record-wise balance checks. I present short introduction to the main principles, provide quizzes and discussions for the audience, and give short R-based exercises.
I will demonstrate a number of our R packages including 'validate' (for data quality checks) 'errorlocate' (for error localization), 'simputation' for imputation methods, 'rspa' for value adjustment, and 'lumberjack' (for keeping track of changes in data). Special attention will be paid on how to combine the various data processing steps, and how to analyze and visualize the results. At the end of the workshop participants will have insight into of some of the methods common in data editing for business surveys as well as an overview of how to implement that with free and open source R and the mentioned packages.
As there will be some practical exercises participants are required to bring a laptop with a recent version of R and RStudio (e.g. 3.5 and onwards).
About the instructor
Mark van der Loo works at the department of methodology at Statistics Netherlands. His main area of expertise is statistical data cleaning (data editing) and statistical computing in general. He has (co)authored several publicly available and widely used R packages. In 2018 he and Edwin de Jonge published the book "Statistical Data Cleaning with Applications in R" (John Wiley & Sons, Inc).
|Early bird fees (by May 15, 2019)||€|
|Reduced (ISI or IASS member, student)||80|
|Full fees (by July 31, 2019)|
|Reduced (ISI or IASS member, student)||100|
The organiser reserves the right to cancel due to low interest any of these courses. We will reach the decision whether there is enough interest to hold a course by May 20, 2019, at the latest. We will inform the registered participants that a course will be held even earlier than that, if the number of participants reaches the set lower limit. However, a course may be not given due to unforeseen circumstances, the so-called force majeur. In that case, the responsibility of the organiser is limited to paying back the received registration fees in full.