Journal on Policy & Complex Systems Volume 3, Issue 2 | Page 132

A Novel Evolutionary Algorithm
Table 1 . Summary of the dataset characteristics for El Chaperno , El Carrizal , and the two towns combined
sets have imbalanced outputs in the sense that the percent of infested houses is in the minority ( Table 1 , column 3 ).
Combinations in Datasets
While the number of households in a dataset may be small ( e . g ., 129 – 311 ) by some standards , the total number of features in the dataset makes these datasets too big to examine every combination of multivariate interactions using exhaustive search . For instance , assuming a dataset contains features with the same number of categorical responses per feature , then the number of potential feature value combinations for a given order of interaction can be calculated with v O where v represents the number of values per feature , O represents the order of interaction , and L represents the number of features in the dataset . Where the first half of the equation , and v O represents the total combination of feature values for the
selected features . The term , , represents the total number of feature combinations for the given order of interaction O . If we take a hypothetical dataset with 50 nominal features , each with five categorical values and each model is limited to one category per feature , then the number of models with two features is 3.06 × 10 4 , three features is 2.45 × 10 6 , and four features is 1.44 × 10 8 . It should be noted that models that do not allow for a range of values for ordinal features could bias models against ordinal features . Therefore , when we test models that have
Table 2 . The number of possible models for second- to fifth-order feature interactions for the El Chaperno , El Carrizal , and the combined datasets
128