Journal on Policy & Complex Systems Volume 3, Issue 2 | Page 136

A Novel Evolutionary Algorithm
datasets , one would not expect that one model and possibly one feature would cover all the infested houses ; thus , the majority of houses wasere used as a cutoff to allow for heterogeneity of statistical models . However , the user may select any percentage of infested houses for this threshold . In this case , 0.5 means that the feature was present in at least half of the archived conjunctive clauses that matched a given infested house . Taken together , the cutoffs employed in this manuscript indicate that the important features were in the majority of archived conjunctive clauses that matched the majority of infested houses . Univariate statistical analysis was performed on these important features using JMP v11.0 . Results
The median number of CCEA archived conjunctive clauses per repetition is 52,615 , 67,036 , and 53,996 for El Chaperno , El Carrizal , and the combined datasets , respectively . Figure 3 contains the accuracy and infested house coverage of all the conjunctive clauses identified using the CCEA for the El Chaperno ( Figure 3A ), El Carrizal ( Figure 3B ), and the combined datasets ( Figure 3C ). Each dot represents a conjunctive clause and is color-coded based on the conjunctive clause . As expected , lower order conjunctive clauses tend to have a higher coverage of infested houses and higher order conjunctive clauses tend to have higher accuracy . The CCEA identifies 2ndsecond-order conjunctive clauses ( represented as red dots ) with 100 % accuracy when it is run on the El Chaperno and El Carrizal datasets individually ( Figure 3A and B ); but this is not the case , for the combined dataset ( Figure 3C ).
The heat maps for each feature plotted against infested houses shows how often a feature was present in an archived conjunctive clause for a given infested house ( Figure 4 ). Red denotes features that are important , while blue corresponds to those of less importance ; white simply indicates missing data . Using the majority of the infested houses having a value greater than 0.5 as the threshold , the heat map indicates that 14 features that are important for El Chaperno , 23 are important for El Carrizal , and 15 for the combined datasets . Table 3 shows the name of the individual features for each dataset . The features embedded in conjunctive clauses that were identified as important across all three datasets are associated with : ( 1 ) the primary source of income is not business , ( 2 ) the primary source of income is not salary , ( 3 ) the household owns their home , ( 4 ) older homes , ( 5 ) longer periods of residency in the house , ( 6 ) accumulation of objects , ( 7 ) unhygienic beds , ( 8 ) adobe walls , ( 9 ) deteriorated bedroom walls , ( 10 ) deteriorated walls in the rest of the house , and ( 11 ) dirt floors . Features that were not previously identified as significant ( P > 0.05 ) using more traditional univariate methods at the combined town-level scale included features associated with primary source of income , home ownership , older homes , longer residency in the house , predominate wall material is adobe , and dirt floors .
Discussion

The CCEA was able to identify important sets of features across a range of conjunctive clause orders . While many of the archived conjunctive clauses may be considered noisy or overfitting , some interesting patterns emerge when the dataset is examined as a whole . Most importantly , the CCEA identifies socioeconomic features ( i . e ., source of income and home ownership ) as important risk factors across all three datasets , while these same features are not found to be significant when using traditional statistics . While low socioeco-

132