<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>12 Decision Trees</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisch13.xml" >next</a>] [<a 
href="#tailthesisch12.xml">tail</a>] [<a 
href="thesispa3.xml#thesisch12.xml" >up</a>] </p></div>
   <h2 class="chapterHead"><span class="titlemark">Chapter&#x00A0;12</span><br /><a 
  name="x73-9900012"></a>Decision Trees</h2>
<!--l. 4402--><p class="noindent">Are Sample Complexity bounds quantitatively tight? The real challenge lies with
continuous valued classifiers and will be addressed in the next chapter. Before worrying
about continuous valued classifiers it is worthwhile to consider performance on a discrete
valued classifier. To do this, we will apply a (discrete valued) decision tree to learning
problems in the UCI machine learning database.
</p><!--l. 4408--><p class="indent">   The results of this analysis are interesting - competitive bounds are achieved in some
cases and the best sample complexity bounds are never more than an order of
magnitude worse than a reasonable holdout based approach using the same
resources.
</p><!--l. 4412--><p class="indent">   Most practitioners of applied machine learning currently use a different technique for
free parameter optimization: holdout sets. The simplest form of this technique is to
separate the examples into a &#x201C;training set&#x201D; and &#x201C;testing set&#x201D;. The training set is used by
the learning algorithm to output a hypothesis. The hypothesis is then tested on the test
set to generate an estimate of the future error rate. The principal advantage
of the holdout technique is the (simple theoretical) guarantee that with high
probability the estimate will not be much higher or lower than the true error
rate. Here the quantity &#x201C;much&#x201D; depends upon the size of the holdout set and
the &#x201C;high probability&#x201D; can be chosen as desired by the person applying the
bound.
</p><!--l. 4423--><p class="indent">   The principal disadvantage of the holdout approach is that not all of the examples
are used for training. This is often not a significant problem but it <span 
class="ecti-1000">can </span>be important for
certain learning algorithms and problems which exhibit phase transitions (see figure
<a 
href="thesisse47.xml#x65-910011">10.4.1<!--tex4ht:ref: fig-pv-results --></a> on page  <a 
href="thesisse47.xml#x65-910011">276<!--tex4ht:ref: fig-pv-results --></a> for an example). Near a phase transition , extra examples can
exponentially decrease the expected error rate of the output hypothesis. More
sophisticated techniques such as Leave One Out Cross Validation, K-fold Cross
Validation, and Progressive Validation <span class="cite">[<a 
href="thesisli2.xml#XProgressive"><span 
class="ecbx-1000">3</span></a>]</span> attempt to remove this disadvantage. Each of
these alternative holdout techniques cannot fully remove the disadvantage and
some of them threaten the advantage (a tight bound on the true error). In
                                                                     

                                                                     
addition, a new disadvantage is introduced: significantly more computation is
required.
</p><!--l. 4436--><p class="indent">   The holdout technique is fundamentally unsatisfactory for one important reason: If a
holdout set is used multiple times, the theoretical guarantees become progressively
weaker. In particular, this implies that designing a learning algorithm which makes
multiple internal decisions based upon the result of evaluating a future error bound for
holdout sets may require many examples. In fact, enough examples are required that a
&#x201C;multiply used holdout set&#x201D; is described by the same math as a training set based sample
complexity bound.
</p><!--l. 4444--><p class="indent">   We will compare the following bounds on a decision tree:
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x73-99002x1"></a>The discrete hypothesis bound  <a 
href="thesisse16.xml#x23-32001r1">4.2.1<!--tex4ht:ref: th-DHSCP --></a>.
           </li>
        <li class="enumerate"><a 
  name="x73-99004x2"></a>The Occam&#x2019;s Razor bound  <a 
href="thesisse20.xml#x27-36001r1">4.6.1<!--tex4ht:ref: th-ORB --></a> with the Hoeffding Binomial tail bound.
        This bound might be thought of as an old &#x201C;state of the art&#x201D; bound.
           </li>
        <li class="enumerate"><a 
  name="x73-99006x3"></a>The Microchoice bound  <a 
href="thesisse22.xml#x32-40012r2">5.2.2<!--tex4ht:ref: th-smb --></a>. In addition, we will prune the decision tree
        according to the microchoice bound.
           </li>
        <li class="enumerate"><a 
  name="x73-99008x4"></a>The Shell bound  <a 
href="thesisse34.xml#x50-75001r2">8.1.2<!--tex4ht:ref: Observable --></a> and the Sampled Shell bound  <a 
href="thesisse35.xml#x51-76001r1">8.2.1<!--tex4ht:ref: th-sample_shell --></a>.
           </li>
        <li class="enumerate"><a 
  name="x73-99010x5"></a>A simple holdout bound <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a> using a holdout set of size <!--l. 4453--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mn>2</mn><mn>0</mn><mi 
>%</mi></mrow></math>.
           </li>
        <li class="enumerate"><a 
  name="x73-99012x6"></a>A combined training and testing bound using the holdout bound  <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>
        and microchoice bound  <a 
href="thesisse22.xml#x32-40012r2">5.2.2<!--tex4ht:ref: th-smb --></a>.
           </li>
        <li class="enumerate"><a 
  name="x73-99014x7"></a>A combined training and testing bound using the holdout bound  <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>
        and the (stochastic) shell bound ( <a 
href="thesisse35.xml#x51-76001r1">8.2.1<!--tex4ht:ref: th-sample_shell --></a>)  <a 
href="thesisse34.xml#x50-75001r2">8.1.2<!--tex4ht:ref: Observable --></a>.</li></ol>
<!--l. 4458--><p class="nopar"> We will first discuss the decision tree and bound calculation implementation. Then
present results
</p>
   <div class="sectionTOCS"><span class="sectionToc">&#x00A0;12.1.&#x00A0;&#x00A0;<a 
href="thesisse53.xml#x74-10000012.1" name="QQ2-74-110">The Decision Tree Learning Algorithm</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.1.1.&#x00A0;&#x00A0;<a 
href="thesisse53.xml#x74-10100012.1.1" name="QQ2-74-112">Pruning</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.1.2.&#x00A0;&#x00A0;<a 
href="thesisse53.xml#x74-10200012.1.2" name="QQ2-74-113">Uniform Sampling
from Decision Trees</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.1.3.&#x00A0;&#x00A0;<a 
href="thesisse53.xml#x74-10300012.1.3" name="QQ2-74-114">Fast Sampling</a></span><br /><span class="sectionToc">&#x00A0;12.2.&#x00A0;&#x00A0;<a 
href="thesisse54.xml#x75-10400012.2" name="QQ2-75-115">Bound Application Details</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.2.1.&#x00A0;&#x00A0;<a 
href="thesisse54.xml#x75-10500012.2.1" name="QQ2-75-116">Structural
Risk Minimization</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.2.2.&#x00A0;&#x00A0;<a 
href="thesisse54.xml#x75-10600012.2.2" name="QQ2-75-117">Computation</a></span><br /><span class="sectionToc">&#x00A0;12.3.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-10700012.3" name="QQ2-76-118">Results &#x0026; Discussion</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.1.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-10800012.3.1" name="QQ2-76-119">Holdout
bound</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.2.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-10900012.3.2" name="QQ2-76-121">Comparison with a standard confidence interval approach</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.3.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-11000012.3.3" name="QQ2-76-123">Comparison
with point estimators</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.4.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-11100012.3.4" name="QQ2-76-124">Simplistic bounds vs. the Holdout bound</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.5.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-11200012.3.5" name="QQ2-76-126">Occam vs.
the Holdout bound</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.6.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-11300012.3.6" name="QQ2-76-128">Microchoice</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.7.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-11400012.3.7" name="QQ2-76-130">Shell Bound</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.8.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-11500012.3.8" name="QQ2-76-132">Combined Microchoice
and holdout bound </a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;12.3.9.&#x00A0;&#x00A0;<a 
href="thesisse55.xml#x76-11600012.3.9" name="QQ2-76-134">Combined Shell and Holdout Bound </a></span><br /><span class="sectionToc">&#x00A0;12.4.&#x00A0;&#x00A0;<a 
href="thesisse56.xml#x77-11700012.4" name="QQ2-77-136">Discussion</a></span><br />
   </div>



                                                                     

                                                                     
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisch13.xml" >next</a>] [<a 
href="thesisch12.xml" >front</a>] [<a 
href="thesispa3.xml#thesisch12.xml" >up</a>] </p></div><a 
  name="tailthesisch12.xml"></a>  
</body> 
</html> 
