Evaluated by C. Poirier, Statistics Canada, 1999
Last updated in January, 2010
GEIS - Generalized Edit and Imputation System
GEIS was developed at Statistics Canada to meet the requirements of the Canadian economic surveys. It was supported until 2007. It has been replaced by the generalized edit and imputation system Banff (see the evaluation of Banff).
GEIS v6.5.0, is not an editing system as such but targets more the imputation process. It is usually used after preliminary editing associated with the collection and capture phases and respondent follow-up have been completed. Linear programming techniques are used to conduct the localization of fields to be imputed and search algorithms are used to perform automatic imputations. The processing is entirely driven by edit/imputation linear rules defined by means of numeric variables. More details are given in Statistics Canada (1998). GEIS steps are:
Edit specification and analysis: This step serves to identify the relationships which characterize acceptable records. The relationships are expressed as a set of n linear edit rules in the form:
a11x1+ a12x2+ ... + a1mxm <= b1
. . .
an1x1+ an2x2+ ... + anmxm <= bn
where the aij's and bi's are user-defined constants, and the xj's represent the m survey variables. The rules are connected with logical 'and's, which means each rule must be satisfied for a record to pass the edits. The system checks for edit consistency, redundancy and hidden equalities. This step permits an iterative approach to the design of the best possible set of edits.
Outlier detection: This step aims at the detection of univariate outliers. It performs comparisons of selected variables across records and identifies outlying observations based on the median M, and the first and third quartiles Q1 and Q3 of the population. An observed value x will be identified as an outlier if it is outside the acceptance interval (M-kQ1, M+kQ3), where k is set by the user. This method can be used to identify variables to be imputed or to be excluded from subsequent calculations.
Error localization: The error localization uses a linear programming approach to minimize the number of fields requiring imputation. This is an application of the rule of minimum change. The step identifies the fields that need to be imputed in order for the record to pass all the edit rules. The problem is expressed as a constrained linear program and solved using Chernikova's algorithm. The system also allows the use of weights for each variable when the user wishes to exert some influence on the identification of the fields to be imputed. Although the algorithm is costly to run, it constitutes one of the main features of GEIS.
Automatic imputation: The imputation function three imputation methods: Deterministic, Donor, and Estimators. Based on the edit rules, the deterministic imputation identifies cases in which there is only one possible solution that would allow the record to satisfy the rules. The donor imputation replaces the values to be imputed using data from the closest valid record, also referred to as the nearest neighbour. For a given record, a subset of the fields which do not need imputation are automatically used as matching fields, and the maximum standardized difference among these individual fields is used as the distance function. The user can specify post-imputation edits to make sure the nearest neighbour is close enough to be used as a donor. The imputation by estimators provides a wide set of techniques using historical or current information. Built-in estimators are: Previous values, previous/current means, trends, and multiple regressions. If a non-standard estimator is required, a user-defined estimator can also be specified.GEIS allows the use of different imputation techniques across questionnaire sections and sub-populations. The use of a sequence of techniques is also possible where, at each step, the user can include/exclude previously imputed data in the process. The system works on mainframe and UNIX environments. It was developed in C language and currently interacts with Oracle databases. It includes an interface that helps the user in specifying the parameters and edit rules, but the interface is not the easiest one to work with. The functionality described above is quite adequate for economic surveys but the complex foundation software makes the system somewhat difficult to set up and maintain. Newly initiated developments target a more user-friendly system. The setup and maintenance of applications will be made easier from a user perspective.
The strengths of GEIS are its capacity to find minimum changes for any set of rules being expressed as a series of linear equations, and its automated donor imputation function driven by the edit rules. This imputation function runs with almost no intervention from the user since it derives the matching fields by itself. It simply uses the response pattern, whatever it is, to look for a donor. The minimum change rule contributes to increase the chance of preserving a relatively good data integrity given the data in error. The flexible estimator module of GEIS, the several diagnostic reports and the on-line tutorial, coupled with a continuous user support constitute the desirable aspects of the system.
The foundation software of GEIS makes the system sometimes "too heavy" to run. Also, a user that built his or her own edit system will in most cases want a direct access to the imputation function. Unfortunately, the current imputation function cannot be run independently from its edit function. GEIS only deals with numeric variables. In the editing process, it assumes each variable takes non-negative values, which is not always true in practice, especially for financial surveys. Pre-processors have to be developed to overcome the problem.
The implementation offers sub-functions or options being required by a wide range of survey applications.
The implementation have a less complete set of options.
The implementation offers a partial functionality. Options are too restrictive or not generalized enough.
No stars are assigned when the functionality is not offered at all.
TYPE OF DATA
Imputation by estimators
Graphical user interface
Statistics Canada (1998). "Functional Description of the Generalized Edit and Imputation System". Statistics Canada Technical Report.