Evaluated by J.M. Fillion, Statistics Canada, 2002
SYSTEM INFORMATION
Full name: 
LEO  Generalized error localization 
Version: 
1 
Year: 
2002 
Developer: 
Statistics Netherlands 
DESCRIPTION
LEO is a prototype system developed by Statistics Netherlands to solve the error localization problem for mixtures of quantitative and qualitative variables. It was the basis for the development of CherryPie within the SLICE system (see the evaluation of CherryPie). It was developed in a Delphi environment with an interface for the Windows environment.
Editing rules identify the conditions to be satisfied in order for a record to be a good record. If one or more rules are violated then fields to be changed must be identified in order to satisfy the rules. The edit rules are defined as follows:
where
v_{i} : the ith categorical variable ( i = 1,..,m )
x_{i} : the ith numerical variable ( i = 1,..,n )
V : the set of values defining the conditions
It is possible to define rules that are only qualitative, only quantitative, or a mixture of both. LEO tries to minimize the number of variables to be imputed. It also allows the use of weights for each variable when the user wishes to exert some influence on the identification of the fields to be imputed.
The main idea behind the algorithm is to build a binary tree (branchandbound method) where at each step (or nod), a variable is selected for analysis and then it is split in two categories: (a) to impute or (b) not to impute the variable. In the case where a variable is not imputed, it is fixed to its initial value in the set of edit rules to create a new set of rules to be analyzed. In the case where a variable must be imputed, it is removed from the rules by using a FourierMotzkin technique.
If in a given step, we get inconsistent rules, then we go back to a previous step to resume the analysis there.
STRENGTHS
 The algorithm is not complex and then it is easy to program. Furthermore, the approach allows splitting the problem in several components (or modules).
 It is possible to process categorical and numerical variables simultaneously. The edit rules can include both types together.
 It can process a high number of variables (more than 100 variables) in one run.
 The system allows the processing of negative values.
 Several parameters can be defined by the user: the weights associated to the variables, a flag or value that indicates the data is missing, etc.
 All the solutions with minimum change are identified and only one is randomly selected for the imputation.
 The algorithm can be modified easily to add new components like the processing of integer values (already included in a new version of LEO).
 The algorithm and the performance of LEO are well documented.
WEAKNESSES
(note that some weaknesses were resolved in CherryPie)
 The maximum number of fields to be imputed must not be too high in order to keep the binary tree at a reasonable size. It is recommended not to imputed more than five fields. This low number represents a limitation especially that it also includes the missing values.
 There is no approximated solution in the case where no optimal solution is found.
 The binary tree must be entirely visited to make sure the solutions are optimal, even when the optimal solutions are found early in the tree.
 The standard verification rules are not that flexible: The IFconditions can only include categorical variables while the THENconditions can only include numerical variables.
 The number of implicit rules that are kept at each step may become very large. This can bring LEO in an unstable state.
 It is not possible to specify a time limit for the processing of each record.
 In some cases, precision problems may occur, especially where rules with high coefficient are combined.
FUNCTIONAL EVALUATION
LEGEND 

*** 
The implementation offers subfunctions or options being required by a wide range of survey applications. 

** 
The implementation have a less complete set of options. 

* 
The implementation offers a partial functionality. Options are too restrictive or not generalized enough. 

 
No stars are assigned when the functionality is not offered at all. 

TYPE OF DATA 


Quantitative data 


Qualitative data 
** 

EDITING FUNCTIONS 


Data verification 
* 

Online correction 
 

Error localization 


Minimum changes 


Userdefined changes 
 

Outlier detection 


IMPUTATION FUNCTIONS 


Deterministic imputation 
 

Donor imputation 


Imputation by estimators 


Multiple imputation 
 

GENERAL FEATURES 


Graphical user interface 
*** 

Userfriendliness 
** 

Online help 
* 

Online tutorial 


Documentation 


Diagnostic reports 


Integration 


Reusable code 


Portability 


Flexibility 


User support 


Acquisition cost 
 

REFERENCES
Quere, R. and De Waal, T. (2000). "Error Localization in Mixed Data Sets". Statistics Netherlands Technical Report.
De Waal, T. (2000). "An Optimality Proof of Statistics Netherlands' New Algorithm for Automatic Editing of Mixed Data". Statistics Netherlands Technical Report.