Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Assemble task team to lead sandbox work  

January 2014
All those interested in participating will be encouraged to do so, but a task team will be required to steer the work, ensuring objectives are pursued and processes are documented. 

Obtain and install necessary hardware, software etc.

January-March 2014

  •  Set up Pentaho suite 
    • configure Pentaho for Hadoop distribution and version          
    • test the configuration
  •  Set up R with RHadoop 
    • test the configuration.

Undergo training of task team to ensure familiarity with technical tools and start collaboration between team members

April-June 2014
  • Utilisation of online documents, tutorials, demonstration videos etc.
  • Potential running of a training session (conditional upon hosting and/or financial support from a participating organization), which could be undertaken alongside another Big Data event to save costs for participants.
Obtain requisite datasets and undertake analyses in sandbox
July-October 2014
  • Obtain and install data sets (minimum of one from each category outlined in preceding section) Note: process of obtaining datasets that are not freely available (whether paid or not) should be begun at the onset of the project, in order to have them available by this stage of the work.
  • For each dataset:
    • study availability of variables 
    • analyse the representativeness of the statistical figures 
    • study other statistical figures available 
    • produce some statistics
    • document all processes and results on an ongoing basis.
Produce a general model for achieving the goal of producing statistics from Big Data, to communicate effectively with statistical organizations
November-December 2014
  • Document findings
  • Incorporate documented results into dissemination materials and activities