...
| Panel | ||||||
|---|---|---|---|---|---|---|
| ||||||
The purpose of the HLG-MOS Machine Learning Project is to advance the use of machine learning in official statistics. To this end, much of the initial work focused on demonstrating value through pilot projects (see Work Package 1). Although the results of many of these projects show great promise, the path from pilot project to production system is far from trivial. This is supported by the fact that despite dozens of participants, only a few members of the ML project report using machine learning in production currently. One challenge is demonstrating the methodological suitability of these techniques. This is the focus of Work Package 2 (Quality Framework). The purpose of WP3 is to identify and address the remaining challenges to integration and production deployment. The WP3 team pursued two activities to further this goal, a short online questionnaire designed to get a high level overview of the key challenges and successes, and a deeper investigation into 6 key questions:
The results are presented in this report.Several initiatives taken by statistical offices to further facilitate and accelerate the advancement of in the use of machine learning and data science, more generally, are provided at Initiatives to accelerate the integration of machine learning solutions. |
| Panel | ||||||
|---|---|---|---|---|---|---|
| ||||||
Our online questionnaire was designed and administered using SurveyMonkey. All members of the HLG-MOS Machine Learning project were encouraged to participate and also to forward the survey to colleagues with relevant expertise. Between September 15th and October 15th of 2020, 28 responses were collected and form the basis of this report. The questionnaire remains available online however at https://www.surveymonkey.com/r/6G5VVFH and additional responses are welcome and may be incorporated into future products. Our 28 respondents include representatives of national statistical organizations covering 14 countries and regions, all in either North America or Europe. Most report having a role of “Statistician / Data Scientist”, followed by “Analyst / Subject Matter Expert” and “Manager / Policy Maker.” Only one respondent reported a role of “Software Engineer / Information Technology Specialist”. Most respondents also report belonging to large national statistical organizations (54%) defined as those having more than 2000 employees, followed by 32% of respondents reporting the next largest grouping, between 500 and 2000 employees. What are the biggest challenges facing statistical agencies in ML? Our questionnaire divides this into two sub questions, one asking about “organizational issues” and the other about “technical issues.” Among organizational issues, “coordination between internal stakeholders” ranked among the largest challenges with 16/27 (59%) reporting this moderately limits, severely limits, or prevents use. Among technical issues, “availability of staff with appropriate machine learning algorithm skills” was the most limiting factor with 10/28 respondents (36%) reporting that it severely limits use. The average score of 1.8 makes this the most problematic issue identified in our survey. Our survey ends with a question about which activities have been most useful. Among the options presented, “collaboration with other statistical organizations” ranked as the most useful, with 14/28 respondents indicating it is “very useful”, followed closely by external training programs programs with 10/28 indicating “very useful”. See the appendix for additional details on the survey results. |
| Panel | ||||||
|---|---|---|---|---|---|---|
| ||||||
While the short questionnaire gives us a high level overview of challenges and potential solutions, it lacks detail. To compliment this information we asked project participants to describe how they addressed six key questions. We received detailed responses from 4 organizations, the UK Office of National Statistics (ONS), the Australian Bureau of Statistics (ABS), Statistics Flanders, and the U.S. Bureau of Labor Statistics (BLS), and related comments from many others. The questions, and a high level overview of the responses are below. Where should machine learning fit in a statistical organization? Participants indicated 4 broad approaches:
What should the machine learning pipeline look like in regards to organizational structure? Where should projects start, who should control what aspects when? Interestingly, the responses to this question resulted in two seemingly opposite ideas. One emphasized the importance of starting with a business need, moving to R&D, producing a prototype and then bringing in other areas like IT. The other emphasized the importance of building ML experience first, which in turn allows one to identify suitable business problems which might be solved by machine learning. In retrospect, it is clear that both are needed. An organization cannot determine whether machine learning is suitable if it knows nothing about machine learning, but it is also clear that the ultimate goal is to serve business needs. What machine learning skills are needed and where are they needed in the organization? On this question, there was general agreement among the responses. In organizations that distribute machine learning responsibilities across many divisions, machine learning requires new skills in many areas. Specifically:
Because of the difficulty of coordinating broadly distributed activities, another increasingly popular approach is to rely on positions and operational units that increasingly blur the distinctions between research, methodology, information technology, and subject matter. See, for example, Google’s Hybrid Approach to Research, and Data Scientist: The Sexiest Job of the 21st Century. In some organizations, a data scientist spends some of their time researching and evaluating different machine learning solutions to a problem (R&D, methodology), some of it building and running the model in production (IT), and some of it assisting with use and maintenance (subject matter). This blurring of boundaries reduces the extent to which machine learning skills need to be distributed across the organization, but requires individuals and teams with a broad range of skills and the organizational and IT infrastructure necessary to make it work.
How can organizations efficiently acquire the ML skills they need? Responses identified 4 strategies:
How should organizations demonstrate and communicate the value-added of ML techniques? One of the recurring challenges of working on projects involving many parties is the need to convince others to adopt or support new techniques. This is supported both by numerous anecdotes among participants in the ML group, and by questionnaire responses indicating coordination and resistance issues from internal stakeholders. Responses identified 3 potential strategies.
How should statistical organizations identify the right problems for machine learning? Our investigation uncovered 3 strategies.
|
...
| Panel | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







