Rule Extraction from Support Vector Machines: A Geometric Approach

Doctoral thesis English OPEN
Ren, L. (2008)
  • Subject: T

Despite the success of connectionist systems in prediction and classi¯cation problems, critics argue that the lack of symbol processing and explanation capability makes them less competitive than symbolic systems. Rule extraction from neural networks makes the interpretation of the behaviour of connectionist networks possible by relating sub-symbolic and symbolic process- ing. However, most rule extraction methods focus only on speci¯c neural network architectures and present limited generalization performance. Support Vector Machine is an unsupervised learning method that has been recently applied successfully in many areas, and o®ers excellent generalization ability in comparison with other neural network, statistical, or symbolic machine learning models. In this thesis, an algorithm called Geometric and Oracle-Based Support Vector Machines Rule Extraction (GOSE) has been proposed to overcome the limitations of other rule-extraction methods by extracting comprehensible models from Support Vector Machines (SVM). This algorithm views the extraction as a geometric task. Given a trained SVM network, GOSE queries the synthetic instances and draws conjunction rules by approximating the optimization problem. The extracted rule set also represents the approximation of the SVM classi¯cation boundary. Unlike previous works in SVM rule-extraction, GOSE is broadly applicable to different networks and problems because it need not rely on training examples and network architectures. Theoretical proof guarantees that GOSE is capable of approximating the behavior of SVM networks. Empirical experiments are conducted on di®erent SVM networks from binary classification networks to multi-class networks in various classi¯cation domains. The result of experiments demonstrates that GOSE can extract comprehensible rules with high levels of accuracy and ¯delity for their corresponding networks. GOSE also exhibits superior consistency. After analyzing and applying several optimizing measures, the complexity of GOSE was improved. In brief, GOSE provides a novel way to explain how an SVM network functions.
  • References (154)
    154 references, page 1 of 16

    7.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    7.2 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . 146 A Rule Sets 157

    A.0.1 Monk's Problem . . . . . . . . . . . . . . . . . . . . . . . . . 157

    A.0.2 Iris Plant Problem . . . . . . . . . . . . . . . . . . . . . . . 161

    A.0.3 Breast Cancer Problem . . . . . . . . . . . . . . . . . . . . . 162

    2.1 Boundary of two dichotomies . . . . . . . . . . . . . . . . . . . . . 12 2.2 The example of VC dimension . . . . . . . . . . . . . . . . . . . . . 16 2.3 The bound of risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Two-class linear separable problem . . . . . . . . . . . . . . . . . . 20 2.5 Two-class linear nonseparable problem . . . . . . . . . . . . . . . . 24 2.6 Mapping from original to feature space . . . . . . . . . . . . . . . . 26 2.7 The hyperplane of XOR problem in feature space. . . . . . . . . . . 30 2.8 Relations of two Lagrange multipliers ®1 and ®2 [40] . . . . . . . . 32 2.9 Using Rooted Binary DAG to decide the best class within four

    classes [39]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.10 Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.11 The mean of di®erent N samples converges to the integral. . . . . 41 3.1 Two classes classi¯cation . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2 Rules extracted from data groups . . . . . . . . . . . . . . . . . . . 46 3.3 Rule extraction system . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4 A unit of ANN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 Rules extracted from a unit in Figure 3.4 . . . . . . . . . . . . . . . 51 3.6 Simple example of VIA algorithm in forward phase . . . . . . . . . 55 3.7 Rules extracted from a unit(Fig 3.6) . . . . . . . . . . . . . . . . . 55 3.8 a) Equation-rule b) Interval rule [59] . . . . . . . . . . . . . . . . . 62

    3.10 A two-dimension example to get the cross points for the initial phase

    of RulExSVM [58] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.17 RULE PRUNING function . . . . . . . . . . . . . . . . . . . . . . . 98 5.1 The accuracy of Monk-1 under di®erent data size comparing with

    that of SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.2 The accuracy of Monk-2 under di®erent data size comparing with

  • Metrics
    No metrics available
Share - Bookmark