Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Sub-Teams Questionnaire: 
Panel

https://forms.gle/GK6L8agRVayX4Kqe9


 

Panel
borderColor#27a9e1
borderWidth2
titleBGColor#27a9e1
borderStylesolid
titleBackground

The idea behind this project is to develop a better common understanding of the pros and cons, do’s and don’ts, of moving to further and more comprehensive use of open source software as a cornerstone for official statistics production, based on concrete experiences through use cases of broad interest.

Currently, the project is in early stages and the focus is on scoping (i.e. determining what exact topics to focus on and what deliverables to pursue).

Panel
borderColor#27a9e1
borderWidth2
titleBGColor#27a9e1
borderStylesolid
titleDocuments
Panel
borderColor#27a9e1
borderWidth2
titleBGColor#27a9e1
borderStylesolid
titleWork Packages
The following is a preliminary plan, based on the initial project pitch: 
  •  WP0. Scoping and ambition level
    • Agreed scope and ambition.
  • WP1. Generic aspects
    • Guidelines, principles, frameworks.
  • WP2. Use cases
    • Focus on co-development and community building.
  • WP3. Management, synergies and communication
    • Communication plan.


 



Panel
borderColor#27a9e1
borderWidth2
titleBGColor#27a9e1
borderStylesolid
titleResources

Initiatives in the official statistics community

NameDescriptionURL
Awesome List

A list of open-source software, organised into categories and regularly maintained.

HLG-MOS Carpentries Project

Ongoing work on a introduction to Python, R, and Git for NSOs

OECD .stat suite


ESS's OS4OS



Generic references to open-source



Tabs Container
directionhorizontal
Tabs Page
titleMeeting Notes


Expand
title10 May 2024

WIP

Participants: Andrew Tait, Carlo Vaccari; Kate Burnett-Isaacs; Karl McKenzie; Nevena Mitrovic; Inkyung Choi; Li Wang; Pierpaolo Massoli; Samanta Pietropaoli; Matyas Tamas Meszaros; Ken Rennoldson; Pubudu Senanayake; Lorenzo Asti; Francesco Isidori


  1. Sprint
    • A Sprint meeting will be organized on the last week of June in Belgrade at SORS, the Serbian Statistical Institute
  2. Scheduling of Sub-Team Meetings
    • Sub-Teams meetings will be organized fortnightly, a coordinator will be identified for each sub-team. Kate is available to lead the Governance sub-team
  3. Group Discussion
    • Insert into the document failures (like CSPA) and successes (SIS-CC, PxWeb)
    • Put in the Google Drive document all the ideas and references that we gradually deem useful for the final draft
    • We could analyze why some organisations are able to successfully handle Open-source projects and others don't. For this we could implement a survey among NSIs to detect the open-source current situation and issues in different countries, focusing also on culture aspects
    • Correlation between open-source and security: proprietary software doesn't imply security, see the Linus's Law 
    • There will be shared activities with the HLG-MOS AI project, as the problem of open training sets is crucial to be able to give confidence to the different existing language models. Carlo will participate in the sessions of the "Governance" sub-group of the AI project to work on this; if anyone is interested, please contact him
    • Try to define quality framework for the OS, trying to define quality dimensions like accuracy, complexity, operational efficiency, support level, breadth of use, ...
  4. Decisions
    • In addition to the statistical software developed by the NSIs, those open-source software tools that have become an integral part of the statistical production process, such as ETL tools, will also be analysed. However, basic tools, such as operating systems, will not be taken into consideration
    • Some topics, such as sustainability or user feedbacks, which are between the two sub-teams will be analyzed (also) in the plenary meetings
    • Confirmed meetings at 1:00 PM CET
    • Next plenary meeting in one month, in the meantime we will have the first meetings of sub-teams
  5. AoB



Expand
title17 April 2024

Participants: Andrew Tait; Aleksandra Skoko; Nevena Pavlovic; Pierpaolo Massoli; Carlo Vaccari; Inkyung Choi; Pubudu Senanayake; Kevin Townend; Samanta Pietropaoli; Martin Ralphs; Ken Rennoldson; Christie Glover; Jonathan Challener; Francesco Isidori; Karl McKenzie; Olav ten Bosch; Lorenzo Asti

  1. Project Manager Hired
    • Carlo Vaccari has been hired as project manager.
    • Carlo previous worked for Istat and has been a manager on other UNECE projects such as Big Data, Data Governance or Common Statistical Data Architecture (CSDA), and has worked on open source topics for many years.
    • Please feel free to contact Carlo as you see fit: vaccaricarlo@gmail.com
  2. Scheduling of Meetings
    • The main group will meet at least once a month.
    • Sub teams will meet more frequently.
    • It is noted that we wish to produce deliverables by November, which leaves 7 months do to so.
  3. Andrew Tait's Draft Group Structure
    • Andrew created a draft group structure based off the discussions from the first meeting (see: Draft Group Structure (dated 16.04.2024)).
      • Andrew's draft structure proposed having two sub-teams:
        • Sub-Team 1: Governance
          • This group would explore questions of maintenance, licensing, and open source culture.
        • Sub-Team 2: Repository
          • This group would stock-take existing repositories like the awesome list, and provide recommendations.
  4. Group Discussion
    • The group was in favor of the idea of having two sub-teams.
    • Team members are welcome to join either or both sub-teams, especially given the overlap in topics.
    • Previous Work:
      • Lessons should be drawn from past OS project failures (why did they fail when others succeeded? e.g. CSPA)
    • Current Work:
      • The group agreed that it is important to not "re-invent the wheel".
      • Jonathan Challenger referred to work already done on licencing options.
      • The Awesome List for Official Statistics Software was provided as an example of a OS respository.
      • It was suggested that the work that has been done on OS could be collected and referred to in one place such as a paper.
    • Culture:
      • How do OS projects reach a critical mass of support where they become self-sustaining?
        • I.e. How do feedback loops arise?
      • How to best facilitate technology adoption?
      • How do NSOs learn about how OS is used in other organisations? How can that knowledge be shared?
    • Trust:
      • The trust in OS software at times comes when interest in a piece of OS software has reached a critical mass.
      • There can be concerns about the trustiness of OS software, but also positive aspects:
        • The code behind OS software can be viewed and valuated, unlike with propriatary software.
          • E.g., R source code is often published alongside the work of methodologists and statisticians.
      • How can trust be measured?
      • Trust is something that is earned, and there are ways of doing so, e.g.:
        • Committing to transparency (including airing "dirty laundry")
    • Documentation and Discoverability:
      • The Awesome List is a public list which people can search and add suggestions for changes
      • The list captures knowledge of what OS software is available for official statistics alongwith metadata on the software
      • Likes on GitHub repos increase the visibility on Google, which increase the likes the repo will receive etc.
      • The list takes a bottom-up approach, i.e. it's not intended as an authoritative source on all things open-source.
      • In the past Olav experimented with badges for OS software on the list that showed how many times the software was downloaded.
        • Badges still exists for licencing information, version number, and last update, but the downloads badge was removed.
        • Downloads don't correspond to popularity one-to-one because of web scrapers.
        • Popularity relates to trust, but indicating popularity is still an open question for the list.
    • Sharing:
      • How many statistical institutions are sharing data, and how they are sharing data?
        • E.g. Sharing OS software that is uploaded to organisation websites vs CRAN.
      • How is the adoption and use of OS software monitored?
  5. Decisions:
    1. Two sub-teams will be created:
      1. Governance and Maintenance:
        • This team will explore questions on issues related to governance, maintenance, and culture, drawing upon successes and failures.
      2. Discoverability and Sharing:
        • This team will take stock of existing repositories, reviewing them in terms of best practices and creating recommendations derived from above, and improving them if needed. The team will also look at the ways information on OS is shared.
    2.  Deliverables:
      • Create a paper on OS.
  6. Actions:
    • Andrew to send out a questionaire asking members which sub-teams they want to be in.
    • Andrew to create a Google Doc for work on the OS paper.
  7. Suggested Reading


Expand
titleFebruary 4th 2024

Participants: Andrew Tait; Barteld Braaksma; Christie Glover; Craig Lindenmayer; Francesco Isidori; Inkyung Choi; Jonathan Challener; Jonathan Wylie; Karl McKenzie; Kate Burnett-Isaacs; Kevin Townend; Li Wang; Lorenzo Asti; Marcello D'Orazio; Pierpaolo Massoli; Pubudu Senanayake; Samanta Pietropaoli; Vytas Vaiciulis

Open-Source Software Project – Kick Off Meeting Minutes

Quick Summary:

The discussion primarily revolved around questions of governance, culture, maintenance, and discoverability in relation to open-source (OS) software.

There was a desire expressed to not retread covered ground, but to explore something new.

Common concerns were about OS maintenance (one’s own code and other dependencies) and understanding the culture of OS.

Potential deliverables could include:

  • Developing general guidance (i.e., a list of best principles) for managing your OS software.
  • Creating standards for code commenting and general OS knowledge management among NSOs.
  • Exploring OS culture: Learn from current examples (Linux, R etc.) and analyze how they arise, and how they are maintained.
  • Determining best practices for managing dependencies, including understanding links between various packages and their dependencies, and handling the risk of dependence losing support.
  • Developing a discoverable list of OS software where the relations between the software are indicated and/or methods for searching for OS software (e.g., standards around OS metadata, query development or guidance on querying).

Full Notes:

1       Initial Placeholder Project Outline (subject to potential complete revision)

  1. Explore Generic Aspects
    • Guidelines, Principles, Frameworks.
  2. Investigate Use Cases
    • Co-development, community building
  3. Management, synergies, and communication


2       Previous & Current OS Work

3       Housekeeping

  • The project will continue until the next HLG-MOS workshop, which most likely will be in November this year.
    • The project can be extended if needed.
  • A wiki space has been set up for the project: https://statswiki.unece.org/display/OSSP
    • Those without access to the wiki should let Andrew know (tait1@un.org)
    • Andrew will create wiki user accounts for Li Wang, Marcello D’Orazio, Pierpaolo Massoli, Kevin Townend, Francesco Isidori, and Lorenzo Asti.
    • Andrew will update Kate Burnett-Issacs’s user details.

4       Common Understanding

  • The project should ideally cover new ground. For example:
    • How can we go beyond merely sharing OS code and repos, and instead share and collaborate on best practices relating to OS use, development, and maintenance?
    • Part of OS culture is not just usage of OS, but also giving back to the OS community. How is this done and how is this done best?
    • How can support for OS be done at the enterprise level?
      • Note: This can be a particularly difficult issue for smaller organisations.
    • When does it make sense to invest in OS compared to buying software where the maintenance and security burdens are managed externally?
    • Should practices be standardized across NSOs? If so, then how?

5       Topics Raised In The Meeting

5.1      Governance

  • Licensing:
    • What licenses are best to use?
    • How to ensure that code developed by offices is not taken and monetized by the private sector?
  • Models of Governance:
    • What examples already exist of governance structures for OS software?
    • How can others best learn from these examples?
  • Support:
    • The move to OS comes with a change of responsibility in terms of software management, from IT to statisticians, analysts etc. How can this challenge be managed?
    • How do we avoid being tied to anchors of support?
      g. how to we avoid one person being key to the continuing functioning of an OS package?
    • What is a reasonable amount to support to provide for your OS packages?
      • Should that support be ringfenced? I.e. limited somewhat within the NSO community?
    • How do you set reasonable expectations for your OS packages?

5.2      Community

  • How does OS culture emerge?
    e.g. Linux, CRAN
    • The R community is another example. However, it is run by volunteers, whereas we wish to create such a community with staff, which may inform the lessons drawn from such an example.
  • How can we foster the emergence of OS culture?
  • How does Peer Review of OS work?
    • It is noted that one benefit that NSOs have is that they are not in competition with each other, so there is no conflict of interest in co-development.
  • How is OS culture maintained?
  • How to approach OS culture development with Executives?
    • No “free lunch”(OS requires investment).
    • How best to combat OS myths? (OS culture doesn’t necessarily happen organically).

5.3      Discoverability

  • Is it possible to create a discoverable list of OS software for NSOs?
    • This already exists in at least in one form at least: CBS’s Awesome List
      https://github.com/SNStatComp/awesome-official-statistics-software
    • Ideally, what is desired is a searchable list where relations between OS solutions are indicated (e.g., which packages depend on which other packages)
    • The list should have greater functionality: e.g., there should be a way to indicate when a tool is in trouble (lack of updates, full of bugs etc. [i.e., signs of no support])
    • How would the list be maintained?
  • How do you ensure that your published OS is seen by others?
  • How do you check for OS solutions so that you don’t accidentally re-invent the wheel?
  • Is there metadata that can be applied to OS software that could facilitate discoverability?
    • Is/can this metadata be standardized?

5.4      Maintenance

  • Knowledge Management:
    • How can code commenting & documentation be standardized across NSOs?
    • What practices can facilitate OS maintenance? E.g., coding in the open.
  • Risk Management:
    • How to deal with dependencies?
      • What happens when they’re unsupported?
      • Note: Old code isn’t a problem unique to OS. Old Code written in SAS, VBA etc. also can develop issues from lack of maintenance.
    • How to avoid over reliance on any one individual to maintain OS code/packages?

6       Potential Deliverables:

  • Developing general guidance (i.e., a list of best principles) for managing your OS software.
  • Creating standards for code commenting and general OS knowledge management among NSOs.
  • Exploring OS culture: Learn from current examples (Linux, R etc.) and analyze how they arise, and how they are maintained.
  • Determining best practices for managing dependencies, including understanding links between various packages and their dependencies, and handling the risk of dependence losing support.
  • Developing a discoverable list of OS software where the relations between the software are indicated and/or methods for searching for OS software (e.g., standards around OS metadata, query development or guidance on querying).

Tabs Page
titleTeam Members


Table Excerpt
nameTeamMembers
Table Filter
fixedCols
totalrow,,,,,
hidelabelsfalse
ddSeparator
sparkNameSparkline
hidePaneTable header
customNoTableMsgText
limitHeight
sparklinefalse
default
isFirstTimeEntertrue
cell-width150
hideColumnstrue
totalRowName
totalColName
customNoTableMsgfalse
disabledfalse
enabledInEditorfalse
globalFilterfalse
id1713737551726_-606365733
iconfilter
order0
hideControlsfalse
inversefalse
numbering
datefilter
column
sort
totalcol
disableSavefalse
rowsPerPage
separatorPoint (.)
labelsHide columns
thousandSeparator
ignoreFirstNrows
ddOperator
userfilter
datepatterndd M yy
numberfilter
heightValue
hideFilters
updateSelectOptionsfalse
worklog365|5|8|y w d h m|y w d h m
isORAND
showNRowsifNotFiltered
Name

Organisation

Kevin TownendStats NZ
Samanta PietropaoliISTAT
Kate Burnett-IsaacsInfrastructure Canada
Karl MckenzieONS
Ken RennoldsonONS
Craig  LindenmayerABS
Christie GloverStatistics Canada
Jonathan ChallenerOECD
Li WangStatistics Canada
Jonathan WylieStatistics Canada
Lorenzo AstiISTAT
Martin RalphsONS
Marcello D'OrazioISTAT
Francesco  IsidoriISTAT
Pierpaolo MassoliISTAT
Pubudu SenanayakeStats NZ
Jean-Marc  MUSEUXEurostat
Matyas Tamas  MESZAROSEurostat
Barteld BraaksmaCBS
Vytas VaiciulisCSO
Marc-Philippe St-AmourStatistics Canada
(Olav) K.O. ten BoschCBS

(Mark) M.P.J. van der Loo

CBS

Nevena Mitrovic

SORS

Aleksandra Skoko Despenic

SORS

Pascal Heus

Postman

Iraj Namdarian

CREA

Carlo Vaccari - Project Manager

Independent Expert 

Deraj Wilson-Aggarwa

ONS (Economic Statistics Change)

Giuseppe Sindoni

ISTAT
InKyung ChoiUNECE
Andrew TaitUNECE


Table Excerpt
nameTeamMembers
Table Filter
fixedCols
totalrow,,,,,
hidelabelsfalse
ddSeparator
sparkNameSparkline
hidePaneTable header
customNoTableMsgText
limitHeight
sparklinefalse
default
isFirstTimeEntertrue
cell-width150
hideColumnstrue
totalRowName
totalColName
customNoTableMsgfalse
disabledfalse
enabledInEditorfalse
globalFilterfalse
id1715348526572_73772025
iconfilter
order0
hideControlsfalse
inversefalse
numbering
datefilter
column
sort
totalcol
disableSavefalse
rowsPerPage
separatorPoint (.)
labelsHide columns
thousandSeparator
ignoreFirstNrows
ddOperator
userfilter
datepatterndd M yy
numberfilter
heightValue
hideFilters
updateSelectOptionsfalse
worklog365|5|8|y w d h m|y w d h m
isORAND
showNRowsifNotFiltered
Name

Organisation

Kevin TownendStats NZ
Samanta PietropaoliISTAT
Kate Burnett-IsaacsInfrastructure Canada
Karl MckenzieONS
Ken RennoldsonONS
Craig  LindenmayerABS
Christie GloverStatistics Canada
Jonathan ChallenerOECD
Li WangStatistics Canada
Jonathan WylieStatistics Canada
Lorenzo AstiISTAT
Martin RalphsONS
Marcello D'OrazioISTAT
Francesco  IsidoriISTAT
Pierpaolo MassoliISTAT
Pubudu SenanayakeStats NZ
Jean-Marc  MUSEUXEurostat
Matyas Tamas  MESZAROSEurostat
Barteld BraaksmaCBS
Vytas VaiciulisCSO
Marc-Philippe St-AmourStatistics Canada
(Olav) K.O. ten BoschCBS

(Mark) M.P.J. van der Loo

CBS

Nevena Mitrovic

SORS

Aleksandra Skoko Despenic

SORS

Pascal Heus

Postman

Iraj Namdarian

CREA

Carlo Vaccari - Project Manager

Independent Expert 

Deraj Wilson-Aggarwa

ONS (Economic Statistics Change)

Giuseppe Sindoni

ISTAT
InKyung ChoiUNECE
Andrew TaitUNECE



Tabs Page
idsubteams
titleSub-Teams
Table Excerpt
nameSubTeams
Table Filter
fixedCols
totalrow,,,,,
hidelabelsfalse
ddSeparator
sparkNameSparkline
hidePaneTable header
customNoTableMsgText
limitHeight
sparklinefalse
default
isFirstTimeEntertrue
cell-width150
hideColumnstrue
totalRowName
totalColName
customNoTableMsgfalse
disabledfalse
enabledInEditorfalse
globalFilterfalse
id1715349496989_-1658084932
iconfilter
order0
hideControlsfalse
inversefalse
numbering
datefilter
column
sort
totalcol
disableSavefalse
rowsPerPage
separatorPoint (.)
labelsHide columns
thousandSeparator
ignoreFirstNrows
ddOperator
userfilter
datepatterndd M yy
numberfilter
heightValue
hideFilters
updateSelectOptionsfalse
worklog365|5|8|y w d h m|y w d h m
isORAND
showNRowsifNotFiltered
Sub-TeamFirst NameSurnameOrganisation
Governance and MaintenanceLorenzoAstiItalian National Istitute of Statistics (ISTAT)
Governance and MaintenanceKateBurnett-IsaacsInfrastructure Canada
Governance and MaintenanceJonathanChallenerOECD
Governance and MaintenanceMarcelloD'OrazioItalian National Institute of Statistics - Istat
Governance and MaintenanceChristieGloverStatistics Canada
Governance and MaintenanceKarlMckenzieOffice for National Statistics
Governance and MaintenanceMatyasMeszarosEurostat
Governance and MaintenanceNevenaMitrovicStatistical office of the Republic of Serbia
Governance and MaintenanceMartinRalphsOffice for National Statistics UK
Governance and MaintenanceAleksandraSkoko DespenicStatistic Office of Republic Serbia
Governance and MaintenanceAndrewTaitUNECE
Governance and MaintenanceKevinTownendStats NZ
Governance and MaintenanceCarloVaccariUNECE project manager
Governance and MaintenanceLiWangStatistics Canada
Governance and MaintenanceDerajWilson-AggarwalONS
Governance and MaintenanceMarkvan der LooStatistics Netherlands and Leiden University
Discoverability and SharingLorenzoAstiItalian National Istitute of Statistics (ISTAT)
Discoverability and SharingJonathanChallenerOECD
Discoverability and SharingMatyasMeszarosEurostat
Discoverability and SharingNevenaMitrovicStatistical office of the Republic of Serbia
Discoverability and SharingIrajNamdarianCouncil for Agricultural Research and Economics - CREA
Discoverability and SharingSamanta PietropaoliISTAT - Italian National Institute of Statistics
Discoverability and SharingKenRennoldsonOffice for National Statistics (UK)
Discoverability and SharingAndrewTaitUNECE
Discoverability and SharingCarloVaccariUNECE project manager
Discoverability and SharingDerajWilson-AggarwalONS
Discoverability and SharingOlavten BoschStatistics Netherlands