Progress
Engagement in the group continues to be very high. Over 20 projects were proposed, either test ML techniques or the share knowledge and experience.
A sprint was held on May 13 (see objectives and agenda). It was attended by 12 participants from 9 countries, and hosted by the ONS. Good progress on Work packages 1 and 2 was made:
- WP1 - Pilot Study on Coding and Classification: Building on US Bureau of Labor Statistics' successful implementation of machine learning to autocode injuries and illnesses, the code and practices has been shared and will be tested on different types of data in Serbia, Poland and Belgium. The types of data include data collected on the Web, notably to measure sentiment. Other applications may join this pilot study.
- WP1 - Pilot Study on Edit and Imputation: A sub-group will investigate the potential of ML in automating the editing process and determine how the statistical foundations of ML and traditional techniques differ, where ML techniques can add value and in what context this value added can be the most beneficial. Examples from the UK and Italy will be used to conduct these investigations. Knowledge and experiences in imputation will be shared. Communication with the Statistical Data Editing expert group will be assured by members sitting on each group, as well as a member of the SDE on the ML project's distribution list.
- WP1 - Pilot Study on Imagery: The scope of this pilot study has not been finalized because some key project members were not in attendance. At the sprint, the UK presented a successful application of ML to use street images to produce relevant statistics at a relatively local scale (two cities). It is relevant to the needs and interests of Belgium and Netherlands. We discussed an idea to produce a document to assist new users of imagery data by describing the processing pipeline (that calls on ML) in the use of such data, and its accompanying high-level ML-questions and aspects to consider, as well as proposed ML solutions/applications or places to find them. The pilot study may look at using satellite data to measure population (density, change). It will likely not look at satellite data for land use, as this topic, including its ML aspects, has been extensively covered a UN Task Team a couple of years ago.
- WP2 - Quality: It will identify quality indicators and performance indicators in two contexts: ML applied to carry out traditional processes on traditional data, and ML applied on non-traditional data sources. To do this it will consider quality features (definitions, dimensions, indicators) from the official statistics and ML communities. It will seek to apply some of these indications in one or two of the WP1 pilot studies. One of the challenges will be to remain focused on ML issues and not the broader issues that come with the various data sources. These are covered more extensively by the ESSnet Big Data II WPK on Methodology and Quality. Communication with this group will be assured by two members sitting on each group and our respective wiki spaces.
- WP 3 - Lessons learned: This WP was not discussed directly. One of the intentions is to combine the experiences of organisations who have implemented or are close to implementing ML techniques with the experiences of organisations who will make advancements in implementing them through the WP1 pilot studies into lessons learned on topics such as: facilitators, obstacles, importance and costs of creating and maintaining learning datasets, role of manual operations, etc.
The project manager and participants at the sprint wish to thank the ONS and the UN Data Science Campus for their hospitality, facilities and overall accompaniment throughout the three days.
The documents produced at the sprint will be posted on the wiki and shared with all project members.
Next Steps
- Share documents presented at sprint will all project members
- Finalize and share the plans for each WP and pilot study: objectives, projects, deliverables, success indicators, timelines (most of this was set at the sprint; they now need to be written down, shared and agreed by all)
- Get the pilot studies in motion
Risks and Issues
Assure that the ML projects remain relevant to the participating organisations. The project manager will offer his support to project participants in getting access to the data that they need and raise access issues to the EB, as needed.
Issue Mitigation
News from the Groups | |||
|---|---|---|---|
Blue-skies Thinking | Identifying Topics/Opportunities | IN PROGRESS | |
| Follow-up selected topics | IN PROGRESS | ||
Developing Organisational Capability | Skills and Capability Framework | IN PROGRESS | DOC group members are working on paper showing connection between technical and complementary skills and increase awareness about the issue. It was prepared very initial document for further discussion among group members including to some extent alignments with GAMSO which is not easy task. The deadline is on 11 June, before next Webex call. |
| Promotion Forum | IN PROGRESS | We prepared one-page draft flyer for the CES session in June. It is dedicated to senior and mid level management to attract attention to the outputs of our group. | |
| Setting vision in NSOs | IN PROGRESS | After the Communications Sprint that took place in Geneva at the end of April, it was decided that the Strategic Communications Team will take over the work on the paper on setting vision in NSO's. Our group will have possibility to give comments to draft paper. | |
Other | The Organising Committee for the workshop on Culture Evolution will have first formal call at the beginning of June, just after the deadline to submit abstracts (end of May). | ||
Supporting Standards | Linking GSBPM and GSIM | IN PROGRESS | The task team is meeting regularly every three weeks. A template for the mapping has been agreed. The mapping is being done at two different levels of GSIM: a) a more conceptual level, corresponding to the specification level in GSIM; b) a less conceptual level, corresponding to the execution level in GSIM. The task team is concentrating on phase 5 of the GSBPM both at a design and implementation level. The mapping execrise, including examples from different countries is quite demanding. For the time being, the task team will be able to do the mapping for phase 5 and 4 and maybe one additional GSBPM phase. However, the task team will probably not be able to complete the mapping by November. |
| Core Ontology | IN PROGRESS | The development of the core ontology goes on at a steady pace, via virtual meetings (6 of them in the first semester) and offline exchanges. The construction of the model first focused on the integrated view between GSBPM and GAMSO, with discussions on the connection between the notions of activity and process. More recently, modelization of the statistical organizations and products was undertaken. A first version of the ontology will be available for presentation at the HLG meeting in November and afterwards submitted to public review. | |
| Alignment GSBPM and GAMSO | IN PROGRESS | The task team has produced a document specifying the activity to be done. Agreement has been reached. At next meeting in May, the task team will prepare and discuss first draft descriptions of overarching processes in GSBPM and their relationships to GAMSO corporate support activities. | |
| Metadata Glossary | IN PROGRESS | The work of the task team is proceeding regularly | |
| Other | The Supporting Standards Group is higly involved in the preparation of the June ModernStats World Workshop. | ||
Sharing Tools | Digitizing/editing CSPA document | IN PROGRESS | |
| Adding Services to Catalogue | IN PROGRESS | ||
| Communication restated CSPA | NOT STARTED | ||
| Other | |||
- HLG-MOS (3 March, NYC)
- Strategic Communication sprint (30 April-2 May, Geneva)
- Machine Learning Sprint (13-15 May, UK)
- Date Integration Workshop (21-23 May, Belgrade)
- Strategic Communication Sprint 2 (10 - 11 June, Gdansk)
- DissComm Workshop (12 - 14 June, Gdansk)
- ModernStats World Workshop (26-28 June, Geneva)
- HRMT Culture Evolution Workshop (11-13 September, Geneva)
- Data Collection Workshop (9 - 11 Oct, Geneva)
- Statistical Data Confidentiality (29-31 October, the Hague)
- Modernisation Workshop (19 - 21 Nov, Geneva)