Login required to access the wiki. Please register to create your login credentials We apologize for any inconvenience this may cause, but please note that this step is necessary to protect your privacy and ensure a safer browsing experience. Thank you for your cooperation. Documents available for download: GAMSO , GSBPM , GSIM |
Contact person* | |
---|---|
Job title | |
Email | |
Telephone |
Statistical business process model
Overview of the Process Model
Also shown in figure 6 are the main phases of Statistics South Africa's statistical cycle, internally referred to as the statistical value chain (SVC). The development of the SVC forms part of the organisation's standardisation of processes and systems. The SVC will provide broad guidelines for survey areas.
It was decided that the organisation should not re-invent the wheel in its development of the SVC. Therefore in developing the survey cycle, the starting point was to study how national statistics organizations (NSOs) similar to ours had logically structured their survey cycles. Another guiding principle was that Stats SA was not re-engineering the way it conducts surveys, but wanted to map and structure it as much as possible. We had to adapt survey cycles from organisations whose survey operations mimicked those of us. This work was done in 2005. Having made all considerations, we found the Statistics New Zealand's business process model for survey cycles to be the most aligned to our survey operations.
How Stats SA's Statistical Process Maps into Common Metadata Framework (CMF) Lifecycle Model
Resulting form the process stated above, Stats SA's survey cycle consists of the following phases:
1. Need
Although Stats SA already produces statistics that satisfy needs of a large and diverse group of the country's socio-political landscape, requirements for new or supplementary statistical products often arise. Requests for such new products may emanate from a number of sources, including government departments, private business and other stakeholders. When this happens, the first step in statistical production is to understand the need for the required statistics, i.e., what the required statistics are going to be used for in concrete terms by their users. Often for new projects, a lot of detailed information about what is needed is not at first clear and in some cases, not present, even from the perspective of the initiators of the project. It is therefore important to go through a process of refining the understanding the information (statistics) needs to be addressed by the results of the project.
2. Design
The design phase consists of preparing ground for the execution of a statistical production project. This stage is reached when either a new project has been given the go ahead or a frequent on-going project is about to begin. In the case of on-going surveys, the design is usually in place already, but in a few cases it might need to be altered to cater for additional requirements or special circumstances.
3. Build
The build phase puts together all the pieces of the infrastructure for a statistical production project. These include the computer system, scanners, printing out of questionnaires, etc. The build phase also includes the procurement of the pieces of the infrastructure as necessary. The amount of work done in this phase also varies for new and on-going surveys.
4. Collect
Although the term collection may have different meanings in a statistical organization, it is used in the SVC to refer to both direct and administrative methods of data collection. The direct collection method refers to data collection in which Stats SA sources data directly from the respondents. In administrative collection, data are drawn from databases of other organizations which in turn source them from their respondents. It is important to note that this phase has a major influence on the activities of the design phase.
5. Process
The Process phase includes capturing collected data into databases so that data processing may be done. Data processing is necessitated by a number of issues. Chief among these is a fact that the data collection process is fraught with errors. The process phase is undergone to remove these data validity errors so as to improve data quality, and to package the data for use by analysis tools.
6. Analyse
After data have been cleaned during the Process phase, they are now ready for manipulation using analytical tools. This is the analysis done by domain experts to get insight into the meaning of the data. Further data quality enhancements may be done at this phase.
7. Disseminate
Stats SA collects data in order to produce statistics to be used by different stakeholders in the country including the general South African public. This means that the organization has to have ways of giving these communities access to the data and the resultant statistics. The Disseminate phase formalizes the steps Stats SA needs to go through in order to distribute information to the different communities as well as give them access to data repositories.
Stats SA uses a number of dissemination methods to ensure that the data produced by the organization is accessible to the widest user community. These include: electronic (e.g. via the internet), printed output and compact disks.
Table 1 below shows the mapping between Stats SA's survey cycle and the METIS cycle:
METIS | Stats SA |
Survey planning and design | Need and Design Phases |
Survey preparation | Part of Design Phase |
Data collection | Collection Phase |
Input processing | Processing Phase |
Derivation, Estimation, Aggregation | Processing Phase |
Analysis | Analysis Phase |
Dissemination | Dissemination Phase |
Post Survey Evaluation |
|
Table 1: METIS Cycle vs. Stats SA's Cycle
Post Survey Evaluation is currently done outside the statistical cycle. It is performed only for the large surveys such as the population census and the community survey.
Metadata used/created at each phase
Metadata are used and/or produced in each phase of the statistical value chain. This strong link between the between the SVC and metadata informs all the development of the metadata subsystem.
Stats SA's Statistical Value Chain
Statistics South Africa's core areas, i.e., those divisions in the organization responsible for the production of statistics, have up to now operated using different approaches. Although it is generally understood in the organization that there are many commonalities in the way different divisions conduct their work, no attempt has been made to formalize a standard statistical production process for the entire organization. The development of the SVC for the organization is a move to correct this situation. The SVC is a generalisation of the activities that need to take place from the beginning to the end of a statistical production process.
Stats SA envisions its statistical cycle along the lines of Michael Porter's Value Chain Model Michael Porter explained this model in his 1985 book, "Competitive Advantage: Creating and Sustaining Superior Performance". Hence we refer to our statistical cycle as the SVC. The value chain categorizes value adding activities of an organization. Figure 7 below is a schematic diagram of the main phases of Stats SA's SVC.
Figure 7: High level phases of Stats SA's Statistical Value Chain
The SVC was designed to be general, catering for most scenarios of statistical production. For example, it is clear that not all the phases of the value chain will be used by all surveys. Figure 8 below shows a flowchart of statistical production within the context of the SVC. It can be seen that old frequent surveys might not follow the same path as new frequent or once off surveys.
Figure 8: Flowchart of a statistical production using phases of the SVC
A high level description of the main phases of the SVC was given in section 2.4 above. In this section we give a detailed view of the activities involved in each phase.
Need Phase
The Need phase consists of the following activities:
Determine the need
The objectives and purpose for doing the particular survey or research must be defined. This starts with conducting interviews with the organisation or individual(s) requesting the new survey. This is an iterative process that concludes with a definition of a statement of need.
Determine Information Requirements
A need for a survey or study is triggered by requirements for information that solves a given problem. A clear determination of the nature and extent of this information or data is needed. This is done through consultations with domain experts from the community in need of the information.
Develop Budget and Plan
Similar to any project that requires resources, a statistical production project has to have a cost-benefit analysis as a foundation of its business case. During this phase, only a high level plan is produced.
Obtain Financial Support
Generally, Stats SA's projects are big and critical; thus they need huge financial investments. Because the government pays for them, an intensive process of budget approval has to be undertaken in order to ensure accountability.
Ministerial Approval
Stats SA projects are funded by the National Treasury under the Ministry of Finance. For large projects to go ahead, ministerial approval is required.
Design Phase
The following activities are contained in the Design phase:
Develop Detailed Project Plan
The output of the Need phase consists of high level aspects of the proposed survey. All Stats SA's surveys must go through detailed planning. For new priority projects, the responsibility for such planning lies with the organisation's Programme Office. The Programme Office has the overall responsibility for running the project to completion, after which, the future running of the project (in the case of frequent surveys) is handed over to the survey area.
Develop Survey Methodology
The goal of the survey methodology is to ensure that the statistics collected during the survey are reliable and representative of the survey's target population. For existing surveys, the survey methodology is often already in place. For new and re-engineered surveys, new survey methodologies are developed.
Design and Test Questionnaires
Questionnaire design is aimed at ensuring that the required information from a survey is realized. It consists of getting both the content and the layout of the questionnaire correct. This process is iterative between constructing survey questions and testing whether the responses to the questions asked address the problem the survey is intended to solve. Questionnaire testing is initially done "behind-the-glass", during which employees of the organisation are randomly selected for participation. Thereafter, pilot tests are conducted on the field to small population groups in the same way the actual survey will be conducted.
Design Operational Requirements
Survey operations are concerned with the tasks of getting data from respondents or other data sources. Operational requirements must detail all the technical and logistical issues that need to be sorted out in order to have a successful survey. These vary from resource issues to technologies needed to conduct the survey.
Design Computer System
The system to be used during the statistical production process consists of many related sub-systems that may be implemented through computer technology. Data collected during a statistical survey is captured in computer system for processing. A number of technologies are required to ensure that data are moved from their sources of collection to the computer.
Build Phase
Activities contained in the Build phase are as follows:
Build a Collection Vehicle
Stats SA collects statistical data through one of the following survey methods:
- Sample survey using questionnaires
- Administrative surveys, using IT communications methods to access data stored in other organisations' databases.
Building a collection vehicle consists of ensuring, through building customised or procuring all the necessary infrastructure and items for the conduction of a survey.
Build a Technology Solution
A technology solution should include all the technological components required to support the entire SVC. These may include hardware such as scanners and Optical Character Recognition (OCR) tools for capturing questionnaire-based data, database management systems, data analysis tools and information dissemination tools.
Test Technology Solution
Before a technology solution is put into production, it must be tested by the prospective users. This is to ensure that the functionality required by the users is included in the system. Also, issues concerning ease of use, integration of systems are also addressed. At a technical level, the testing of the system may lead to the identification of system bugs that may have been missed during the technical tests done by the developers.
Implement Solution
The implementation of a solution means that it is deemed ready to be used to perform productive work. Therefore, users get to be trained on how to use the system and thereafter certain people are granted access rights to the system.
Collect Phase
Contained in the Collect phase are the following activities:
Manage Respondents
Enumerators must be highly trained so that they are able to explain to the respondents the reasons for collecting data and how they were chosen to be part of the survey and the way such information is planned to be used to improve functions of the agency and improve standards of living; whether responses to the collection of information are voluntary or mandatory (citing authority: Statistics Act); the nature and extent of confidentiality to be provided (citing authority: Statistics Act); an estimate of the average respondent burden together with a request that the public direct to the agency any comments concerning the accuracy of this burden estimate and any suggestions for reducing this burden. Respondent management must be done in ways that reduce the burden of survey on the respondent. Burden reduction includes ensuring that re-visits to respondents are kept at minimum and the questionnaire need to be of reasonable length.
Post Out
Post Out refers to the process of notifying respondents by sending letters via the post detailing this information. Administrative data does not have this requirement, though legal arrangements are put in place in advance e.g. Memorandum of Agreement, Service Level Agreements etc., for the other party to be able to provide the data. When a survey is conducted by enumerators visiting respondents, the respondents must be notified by Stats SA about the pending survey. This notification must include information such as the objective of the survey, the date(s) when the enumerators will be visiting, etc.
Acquire Data
Data acquisition at Stats SA includes both the direct (e.g. Sample Surveys and Census) and administrative methods. In most direct acquisitions, data are captured on paper based questionnaires. In a few other cases, electronic media may be used. Figure 9 below shows a flowchart of how Stats SA acquires its data.
Close off Collection
The collection period is usually specified at the design stage of the survey. The end of the last day of the defined collection automatically ushers in the closure of field collection of data.
Process Phase
The Process phase consists of performing the following activities:
Capturing Data into Electronic Form
This applies only to questionnaire based collection methods. Questionnaires are either scanned or manually entered by data capturers into computer databases. Data collected from other electronic systems might only need to be transformed into Stats SA's data formats.
Perform Macro Edits
Macro edits detect individual errors by: (1) checks on aggregated data, (2) checks applied to the whole body of records. The checks are typically based on the models, either graphical or numerical formulae that determine the impact of specific fields in individual records on the aggregate estimates.
Rum Imputation/Estimation
Item non-response may result in missing values in a survey dataset. Statistical organizations use imputation methods to calculate estimate values to fill in the missing values. Imputation is implemented using mathematical algorithms through computer programs.
Estimation of missing values should not be confused with the overall statistical estimates which form the main goal of a survey. Statistical estimates are calculated by aggregating all of the collected data. These are often called macro data, and are contrasted with micro data, which are detailed data collected from the respondents.
Produce Datasets
The primary output of the processing are "clean" datasets that are ready to be analysed. Analysis tools can only process data whose formats and structure they understand. Part of producing datasets is to package them into structures and formats that conform to Stats SA's analysis packages.
Analyse Phase
Statistical data analysis consists of the following activities:
Produce Statistical results
This is the process where results are produced based on the processing that was done on the data. The ultimate goal of any survey is to produce statistical estimates of the characteristics of the statistical unit of interest.
Validate Statistical Results
This is where estimates are assessed against expectations, comparing data with the one from previous period, and assessing quality measures to ensure good quality data.
Interpret Statistical Results
Numbers are meaningless if they are presented without any explanation accompanying them. This is one quality dimension that we cater at Stats SA, that all data that get released should be accompanied by the corresponding metadata.
Prepare Content for Dissemination
This is the process where actual particular measures are taken to ensure that content from the survey does not disclose information concerning any identifiable respondent. This includes: a) for micro data: remove respondent, content reduction, content modification, b) for tabular data: sensitive cells correction methods such as cell collapsing or suppressing by data providers.
Perform Quality Control
This process entails making sure that all quality measures in SASQAF have been implemented correctly and the results thereof are known.
Disseminate Phase
- Receive and Validate Content
During this process, the dissemination team goes through a checklist of what was supposed to be accomplished and whether it was done accordingly and correctly. The content received by the team consists of macro and micro data, and other products such as published reports.
- Manage Dissemination Repositories
Data to be disseminated are kept in databases (dissemination repositories), from which they are extracted when disseminated. These repositories store datasets (including both micro and macro data), reports and other documents.
- Pre-release for Publishing
This process entails preparations before releasing regarding tables, corporate formatting standards, electronic distribution and hard copy outputs
- Manage First Release
This is where distribution media are managed and controlled in order to ensure that different categories of users of statistical information get access to relevant information. Release timelines are handled within this process.
- Handle Customers
Handling customers is part of customer relationship and stakeholder management. A system to handle customer enquiries exists. Stats SA's Support and Informatics Services unit handles customer enquiries, categorises main users and other users, consult users to determine needs and make sure data is distributed timely to users.
Metadata Description Matrix
The implemented Survey Metadata Capture Tool of the ESDMF captures the following metadata:
Descriptions are provided for section headings.
1. Active Metadata Set
The file identifier and status of the current/active metadata set is displayed immediately under this section. In other words, the metadata set that the user is currently capturing, editing or viewing.
2. Overview
The elements accessible from this section collectively provide a brief description of the survey.
The Overview section comprises the following items:
Survey/Series Status
Objective
Abstract
History
Target Population
Main Topic
Main Users
3. Generic Information
The elements accessible from this section collectively provide generic information about the survey time frames.
The Generic Information section comprises the following items:
Survey Frequency
Series Time Frames
4. Primary Data Source
The elements accessible from this section describe external inputs to the survey.
The Primary Data Source section comprises the following item:
External and Internal Data Sources
5. Methodology
The elements accessible from this section collectively describe the activities conducted and the methods and processes used which are specific to the survey.
The Methodology section comprises the following items:
Survey Population
Instrument Design
Sample Design
Collection
Error Detection/Editing
Imputation
Estimation
Quality Evaluation
Disclosure/Confidentiality Control
Seasonal and Working Day Adjustment
Revisions
Data Item/Variables
Dissemination
6. Data Quality Report
The element accessible from this section provides a hyperlink to the data quality report for the data release.
The Data Quality Report section comprises the following items:
Relevance
Accuracy
Accessibility
Interpretability
Coherence
Methodological Soundness
Timeliness
Integrity
7. Documentation
The elements accessible from this section provide hyperlinks to additional documentation related to the survey.
The Documentation section comprises the following item: Documentation
8. Contact
The elements accessible from this section provide information concerning the contact person who will manage enquiries related to the data or information produced by the survey.
The Contact section comprises the following item: Contact Person
9. Loaded Metadata Sets
This section lists the file identifiers and statuses of metadata sets created by the current user. It enables the current user to switch between his/her metadata sets.
Table 2 below shows the metadata captured with the Metadata Capture Tool against the Statistical Value Chain, with example for each stage of the SVC.
Group | Description | Statistical Value Chain | Examples | Quality Dimensions |
---|---|---|---|---|
Survey Overview | Brief overview about the survey that highlights the background, purpose, history and usage | Need | Title of survey, Series status, Objective of survey, Keywords, Main users and usage | Accessibility |
|
| Build | Metadata file identifier, Metadata version |
|
|
| Design | Target population, Main topics |
|
Survey Time Frames | Information about time frames that the life cycle of the survey will be managed | Need | Frequency of series, start date of survey, end date of survey | Timeliness |
|
| Design | Reference period, collection period, product release date |
|
Type of Survey | Classification of a survey according to its statistical activity that involves collection, compilation and publication of statistical | Design | Derived, Direct (e.g. Sample or Census) and Administrative | Methodological soundness |
Primary Data Source | Information that gives a description about or identifies the administrative data source | Design | Administrative data information (i.e. title of survey from primary data source, primary data source description, contact person from primary data source) | Pre-requisite |
Methodology | Information about processes that are put in place and methods used to collect, process, analyse and publish statistical release | Design | Survey population, instrument design, Collection, Editing/Error detection, Imputation, Estimation, Disclosure/Confidentiality control, seasonal adjustments, revisions, Data variables, Dissemination | Methodological soundness, Integrity and Accessibility |
Data Quality Report | Information about quality measures used and the errors obtained as a result of executing the statistical processes |
|
| Accuracy |
Design |
|
| Sampling errors and Non-sampling errors |
|
Documentation | Attach any documents with extra information related to specific section of the template |
|
| Interpretability |
Contact | Any additional documents that describe the concepts and |
|
| Accessibility |
Table 2: Relationships between various categories of metadata inputs and different phases of the SVC
The following table shows the stage of the SVC at which metadata is used:
Group | Statistical Value Chain | Examples | Quality Dimensions |
---|---|---|---|
Survey Overview | Build | Metadata File Identifier, Metadata version | Accessibility |
| Collection | Objective, Main topics |
|
| Dissemination | Title of survey, Series number, Series status, Abstract, History of survey, Keywords, Users and usage |
|
Survey Time Frame | Collection | Collection period, reference period | Timeliness |
| Dissemination | Frequency of series, Start date of survey, Product release date, End date of survey |
|
Type Of Survey | Collection | Derived, Direct (e.g. Sample or Census) and Administrative | Methodological soundness, Integrity, Accessibility |
Primary Data Source | Collection | Administrative data information (e.g. title of survey from primary data source, primary data source description, contact person from primary data source) |
|
Methodology | Collection | Survey population, Instrument design, Sample design, collection, Quality evaluation, Data variables, | Methodological soundness, Integrity and Accessibility |
| Process | Quality evaluation, Data Editing, Imputation, Seasonal adjustment, Revisions, Data variables |
|
| Analysis | Quality evaluation, Estimation, Data variables |
|
| Dissemination | Quality evaluation, Disclosure/Confidentiality Control, Dissemination methods |
|
Data Quality Report | Process | Sampling errors and Non-sampling errors | Accuracy |
Documentation | Dissemination | Documentation | Interpretability |
Contact | Dissemination | Contacts | Accessibility |
Table 3: Metadata produced with groups of metadata with examples for each group
Metadata relevant to other business processes
Lessons learned
Links: |
---|