<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>12.3 Results &#x0026; Discussion</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse56.xml" >next</a>] [<a 
href="thesisse54.xml" >prev</a>] [<a 
href="thesisse54.xml#tailthesisse54.xml" >prev-tail</a>] [<a 
href="#tailthesisse55.xml">tail</a>] [<a 
href="thesisch12.xml#thesisse55.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">12.3. </span> <a 
  name="x76-10700012.3"></a>Results &#x0026; Discussion</h3>
   <h4 class="subsectionHead"><span class="titlemark">12.3.1. </span> <a 
  name="x76-10800012.3.1"></a>Holdout bound</h4>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x76-1080011"></a>
<!--l. 4643--><p class="indent">
                                                                     

                                                                     
</p><!--l. 4643--><p class="noindent"><img 
src="thesis22x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis22x.gif" src="test.ps"  
-->
<br />  </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;12.3.1:  </td><td  
class="content">This  is  a  graph  of  the  true  error  upper  bound  give  by
the   holdout   bound   (   <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>).   In   this   figure   (and   all   others)   we   use   <!--l. 4650--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>&#x03B4;</mi> <mo 
class="MathClass-rel">=</mo> <mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>1</mn></mrow></math>
probability of failure for the tail. Here &#x201C;error&#x201D; is the test error. The red lines can be
interpreted as the region where the true error rate might exist given that we are in
a high probability case. </td></tr></table></div><!--tex4ht:label?: x76-1080011 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" /> Our goal is to bound the true error of the hypothesis output by our learning algorithm.
To do this, we apply sample complexity bounds to the results of the decision tree on UCI
database problems. The problems chosen from the UCI database are those for which a
discrete decision tree is applicable. All bounds are calculated with a probability of failure of <!--l. 4656--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>&#x03B4;</mi> <mo 
class="MathClass-rel">=</mo> <mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>1</mn></mrow></math>.
As mentioned in the introduction, there are two approaches. The
commonly used approach is to first divide the example set into two sets, <!--l. 4658--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></math> and <!--l. 4658--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">train</mtext><!--/mstyle--></mrow></msub 
></mrow></math>. Then, train using the
examples in <!--l. 4659--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">train</mtext><!--/mstyle--></mrow></msub 
></mrow></math> and test
on the <!--l. 4660--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></math> examples.
We chose an <!--l. 4660--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mn>8</mn><mn>0</mn><mo 
class="MathClass-bin">/</mo><mn>2</mn><mn>0</mn></mrow></math> split
of the data into <!--l. 4661--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">train</mtext><!--/mstyle--></mrow></msub 
><mo 
class="MathClass-bin">/</mo><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></math>.
We will compare each bound with the simple holdout approach because this is the
commonly used baseline.
   <h4 class="subsectionHead"><span class="titlemark">12.3.2. </span> <a 
  name="x76-10900012.3.2"></a>Comparison with a standard confidence interval approach</h4>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x76-1090012"></a>
<!--l. 4668--><p class="indent">
                                                                     

                                                                     
</p><!--l. 4668--><p class="noindent"><img 
src="thesis23x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis23x.gif" src="test_vs_two_sigma.ps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;12.3.2: </td><td  
class="content"><a 
  name="x76-1090012"></a> This is a graph of the confidence intervals implied by the holdout
bound ( <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>) on the left, and the approximate confidence intervals implied using
the common two sigma rule motivated by asymptotic normality on the right. For
this graph only, the upper and lower bounds of the holdout bound have a maximum <!--l. 4679--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>2</mn><mo 
class="MathClass-punc">.</mo><mn>5</mn><mi 
>%</mi></mrow></math>
failure rate (each), rather than a <!--l. 4679--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mn>1</mn><mn>0</mn><mi 
>%</mi></mrow></math>
failure rate. This is done in order to make the results more comparable with the <!--l. 4679--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>2</mn></mrow></math>-sigma
approach.    The    holdout    bound    is    better    behaved    in    the    sense
that     the     confidence     interval     is     confined     to     the     interval     <!--l. 4679--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mrow><mo 
class="MathClass-open">[</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math>
and it is never over-optimistic. </td></tr></table></div><!--tex4ht:label?: x76-1090012 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" /> When attempting to calculate a confidence interval on the true error rate given the
holdout set, many people follow a standard statistical prescription:
           <ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x76-109003x1"></a>Calculate the empirical mean <!--l. 4685--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mover 
accent="true"><mrow 
><mi 
>&#x03BC;</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover> <mo 
class="MathClass-rel">=</mo><msub><mrow 
> <mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><msub><mrow 
><mi 
>S</mi></mrow><mrow 
>
<!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo>  <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>m</mi></mrow></mfrac><msubsup><mrow 
> <mo 
class="MathClass-op">&#x2211;</mo>
   </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>m</mi></mrow></msubsup 
><mi 
>I</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>x</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2260;</mo><msub><mrow 
><mi 
>y</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.
           </li>
        <li class="enumerate"><a 
  name="x76-109005x2"></a>Calculate the empirical variance <!--l. 4686--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msup><mrow 
><mover 
accent="true"><mrow 
><mi 
>&#x03C3;</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mn>2</mn></mrow></msup 
> <mo 
class="MathClass-rel">=</mo>   <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>m</mi><mo 
class="MathClass-bin">&#x2212;</mo><mn>1</mn></mrow></mfrac><msubsup><mrow 
> <mo 
class="MathClass-op">&#x2211;</mo>
     </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>m</mi></mrow></msubsup 
><msup><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>I</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>x</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2260;</mo><msub><mrow 
><mi 
>y</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">&#x2212;</mo><mover 
accent="true"><mrow 
><mi 
>&#x03BC;</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mrow 
><mn>2</mn></mrow></msup 
></mrow></math>.
           </li>
        <li class="enumerate"><a 
  name="x76-109007x3"></a>Pretend  that  the  distribution  is  a  normal  with  the  above  parameters
        and construct a confidence interval by cutting the tails of the Gaussian
        cumulative distribution.</li></ol>
<!--l. 4689--><p class="nopar"> This approach is motivated by the fact that for any <span 
class="ecti-1000">fixed </span>true error rate, the
distribution of empirical errors will behave like a gaussian <span 
class="ecti-1000">asymptotically. </span>Here,
asymptotically means &#x201C;in the limit as the number of test examples goes to infinity&#x201D;.
</p><!--l. 4695--><p class="indent">   The problem with this approach is that it leads to fundamentally misleading results.
In particular,  <a 
href="#x76-1090012">12.3.2<!--tex4ht:ref: fig-two-sigma --></a> shows that the confidence interval is not confined to the interval <!--l. 4697--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mrow><mo 
class="MathClass-open">[</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math>.
It is difficult to give an interpretation to intervals with boundaries less than <!--l. 4698--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn></mrow></math> or greater
than <!--l. 4698--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mn>1</mn></mrow></math>.
In addition, this approach is sometimes highly overoptimistic. When the test error is <!--l. 4700--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn></mrow></math>, our confidence interval
should <span 
class="ecti-1000">not </span>have size <!--l. 4700--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mn>0</mn></mrow></math>
for any finite <!--l. 4701--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>m</mi></mrow></math>.
</p><!--l. 4703--><p class="indent">   In contrast, the holdout bound approach uses the underlying Binomial distribution
directly. This implies:
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x76-109009x1"></a>The holdout bound approach is <span 
class="ecti-1000">never </span>optimistic.
           </li>
        <li class="enumerate"><a 
  name="x76-109011x2"></a>The holdout bound based confidence interval always returns an upper and
        lower bound in <!--l. 4709--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mrow><mo 
class="MathClass-open">[</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math>.
           </li>
        <li class="enumerate"><a 
  name="x76-109013x3"></a>The holdout bound approach is more accurate.</li></ol>
<!--l. 4711--><p class="nopar"> The bootstrap <span class="cite">[<a 
href="thesisli2.xml#XET"><span 
class="ecbx-1000">15</span></a>]</span> is sometimes used as a confidence interval. The assumption under
which this works is essentially equivalent to an assumption of &#x201C;enough&#x201D; data. For finite
amounts of data, the bootstrap &#x201C;confidence intervals&#x201D; will necessarily be violated on
datasets with phase transitions such as  <a 
href="thesisse47.xml#x65-910011">10.4.1<!--tex4ht:ref: fig-pv-results --></a>. This is discussed more in the next
section.
</p>
   <h4 class="subsectionHead"><span class="titlemark">12.3.3. </span> <a 
  name="x76-11000012.3.3"></a>Comparison with point estimators</h4>
<!--l. 4721--><p class="noindent">Point estimators are techniques for directly estimating the value of the true error. In
                                                                     

                                                                     
theory, there should be no need to compare point estimators with confidence interval
bounds such as those discussed here because the goals are simply different: point
estimators attempt to estimate the value of the true error while confidence intervals
confine the value of the true error to an interval with high probability. However, point
estimators are often <span 
class="ecti-1000">used </span>for more than estimating true error. It is a common practice to
use point estimators in deciding which of two learning algorithms (or learning algorithm
parameters) is better.
</p><!--l. 4730--><p class="indent">   There are several several point estimators in use, including:
</p><!--l. 4732--><p class="indent">
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x76-110002x1"></a>Holdout test set error rate.
           </li>
        <li class="enumerate"><a 
  name="x76-110004x2"></a>The bootstrap.</li></ol>
<!--l. 4735--><p class="nopar"> One commonly used point estimator is the bootstrap. In typical use, the bootstrap
which functions like this:
</p><!--l. 4739--><p class="indent">   Repeat many times:
</p><!--l. 4741--><p class="indent">
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x76-110006x1"></a>Pick <!--l. 4742--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>m</mi></mrow></math>
        examples uniformly from the set of <!--l. 4742--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>m</mi></mrow></math>
        examples.
           </li>
        <li class="enumerate"><a 
  name="x76-110008x2"></a>Train on the <!--l. 4743--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>m</mi></mrow></math>
        examples
           </li>
        <li class="enumerate"><a 
  name="x76-110010x3"></a>Test on examples not included in the training set.</li></ol>
<!--l. 4745--><p class="nopar"> After the above computation, the training and test errors are combined according some formula
(which often varies) to get an estimate of the true error rate of a hypothesis learned on all of
the <!--l. 4748--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>m</mi></mrow></math>
original examples.
</p><!--l. 4750--><p class="indent">   There is one immediate observation: the resampling process typically results in about
<!--l. 4751--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">     <mrow 
><mi 
>m</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn> <mo 
class="MathClass-bin">&#x2212;</mo><mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>e</mi></mrow></mfrac></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
unique examples being included in the resampled subset. This has very strong implications
because there exist learning problems with &#x201C;phase transitions&#x201D; where the accuracy of the
learned hypothesis (even for the best possible learning algorithm) as a function of <!--l. 4754--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>m</mi></mrow></math> decreases suddenly
when <!--l. 4755--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>m</mi></mrow></math>
reaches some critical threshold. This implies that point estimators <span 
class="ecti-1000">cannot </span>always be
accurate on dataset with a phase transition like  <a 
href="thesisse47.xml#x65-910011">10.4.1<!--tex4ht:ref: fig-pv-results --></a>. When learning a hypothesis on <!--l. 4757--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>m</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn> <mo 
class="MathClass-bin">&#x2212;</mo><mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>e</mi></mrow></mfrac></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>examples results in a true error
rate of <!--l. 4758--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>5</mn></mrow></math>, it could be the case
                                                                     

                                                                     
that learning on <!--l. 4759--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>m</mi></mrow></math> examples
results in a true error rate of <!--l. 4759--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mn>0</mn></mrow></math>
or it could be the case that the true error rate will be <!--l. 4760--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>5</mn></mrow></math>.
</p><!--l. 4762--><p class="indent">   Given that the bootstrap can sometimes fail to predict the true error rate, reasoning
about which algorithm is preferable based upon the bootstrap output is questionable.
One alternative to this is reasoning with the criteria:
</p><!--l. 4766--><p class="noindent"><span 
class="ecti-1000">Pick the learning algorithm with the lower upper bound.</span>
</p><!--l. 4768--><p class="indent">   Assuming that examples are independent, this approach can <span 
class="ecti-1000">never </span>fail arbitrarily
badly (with high probability).
</p><!--l. 4771--><p class="indent">   There are still questionable issues with this approach such as: &#x201C;What if the upper
bound is not tight?&#x201D; It <span 
class="ecti-1000">could </span>be the case that a better learning algorithm has a
worse upper bound implying that the worse algorithm will be picked according
to this criteria. One solution to this dilemma is to always involve some small
amount of holdout examples in your bound calculation. Used judiciously, these
holdout examples can guarantee that the bound-based criteria never becomes too
loose.
</p>
   <h4 class="subsectionHead"><span class="titlemark">12.3.4. </span> <a 
  name="x76-11100012.3.4"></a>Simplistic bounds vs. the Holdout bound</h4>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x76-1110013"></a>
<!--l. 4783--><p class="indent">
                                                                     

                                                                     
</p><!--l. 4783--><p class="noindent"><img 
src="thesis24x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis24x.gif" src="test_vs_simple.ps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;12.3.3: </td><td  
class="content"><a 
  name="x76-1110013"></a>This is a plot comparing confidence intervals built based upon the
holdout bound ( <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>) on the left and the simple training bound using the Hoeffding
inequality ( <a 
href="thesisse16.xml#x23-32007r3">4.2.3<!--tex4ht:ref: th-adhscb --></a>) on the right. The results are clearly unsatisfactory for the simple
training bound. </td></tr></table></div><!--tex4ht:label?: x76-1110013 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" /> Figure  <a 
href="#x76-1110013">12.3.3<!--tex4ht:ref: fig-simple-holdout --></a> compares a bound based upon theorem  <a 
href="thesisse16.xml#x23-32007r3">4.2.3<!--tex4ht:ref: th-adhscb --></a> and theorem  <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>. It is
remarkably pessimistic about the prospect of training set based bounds because the
confidence intervals are essentially vacuous. This bound can be improved always by
using exact (rather than approximate) calculations of the Binomial tail. It can
also be improved in practice by using a nonuniform &#x201C;prior&#x201D; over the hypothesis
space.
   <h4 class="subsectionHead"><span class="titlemark">12.3.5. </span> <a 
  name="x76-11200012.3.5"></a>Occam vs. the Holdout bound</h4>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x76-1120014"></a>
<!--l. 4802--><p class="indent">
                                                                     

                                                                     
</p><!--l. 4802--><p class="noindent"><img 
src="thesis25x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis25x.gif" src="test_vs_occam.ps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;12.3.4: </td><td  
class="content"><a 
  name="x76-1120014"></a>This is a plot comparing confidence intervals built based upon the
holdout bound ( <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>) on the left and an &#x201C;Occam&#x2019;s Razor&#x201D; style bound ( <a 
href="thesisse20.xml#x27-36001r1">4.6.1<!--tex4ht:ref: th-ORB --></a>) using
the Hoeffding inequality ( <a 
href="thesisse16.xml#x23-32007r3">4.2.3<!--tex4ht:ref: th-adhscb --></a>) on the right. This training set bound is sometimes
useful on the very small datasets. The particular description length was built using
a microchoice-like description language. </td></tr></table></div><!--tex4ht:label?: x76-1120014 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" /> A &#x201C;prior&#x201D; (in the sense of theorem  <a 
href="thesisse20.xml#x27-36001r1">4.6.1<!--tex4ht:ref: th-ORB --></a>) does not help much with the visible
confidence intervals, although an examination of the calculations suggests that
improvements do exist - they just aren&#x2019;t enough to make the confidence intervals
nonvacuous in figure  <a 
href="#x76-1120014">12.3.4<!--tex4ht:ref: fig-occam-vs-holdout --></a>. Note that the &#x201C;prior&#x201D; used here is the Microchoice prior.
Next, we will get rid of the approximation.
   <h4 class="subsectionHead"><span class="titlemark">12.3.6. </span> <a 
  name="x76-11300012.3.6"></a>Microchoice</h4>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x76-1130015"></a>
<!--l. 4822--><p class="indent">
                                                                     

                                                                     
</p><!--l. 4822--><p class="noindent"><img 
src="thesis26x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis26x.gif" src="test_vs_micro.ps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;12.3.5: </td><td  
class="content"><a 
  name="x76-1130015"></a>This is a plot comparing confidence intervals built based upon the
holdout bound ( <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>) on the left and the microchoice bound ( <a 
href="thesisse22.xml#x32-40012r2">5.2.2<!--tex4ht:ref: th-smb --></a>) on the right.
The most significant improvement over the last bound is using a Binomial tail bound
calculation rather than the Hoeffding approximation. The effect of this improved
bound is quite significant when the training error is low. We achieve a tighter upper
bound on 4 learning problems.</td></tr></table></div><!--tex4ht:label?: x76-1130015 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" /> For the first time, we observe confidence intervals which are nonvacuous on a
training set in figure  <a 
href="#x76-1130015">12.3.5<!--tex4ht:ref: fig-microchoice-holdout --></a>. This is encouraging, and a comparison with the holdout
approach indicates that the training set based confidence intervals are actually superior
on datasets with a small number of examples (and thus with a <span 
class="ecti-1000">very </span>small holdout
set).
   <h4 class="subsectionHead"><span class="titlemark">12.3.7. </span> <a 
  name="x76-11400012.3.7"></a>Shell Bound</h4>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x76-1140016"></a>
<!--l. 4842--><p class="indent">
                                                                     

                                                                     
</p><!--l. 4842--><p class="noindent"><img 
src="thesis27x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis27x.gif" src="test_vs_shell.ps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;12.3.6:  </td><td  
class="content"><a 
  name="x76-1140016"></a>  This  is  a  plot  comparing  confidence  intervals  built  based  upon
the  holdout  bound  (  <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>)  on  the  left  and  the  shell  upper  bound  theorem  (
<a 
href="thesisse34.xml#x50-75001r2">8.1.2<!--tex4ht:ref: Observable --></a> or  <a 
href="thesisse35.xml#x51-76001r1">8.2.1<!--tex4ht:ref: th-sample_shell --></a>) on the right. The light-dashed line indicates which of the results
were obtained using the fast sampling technique. The shell-based upper bound is
lower than the holdout upper bound on several of the 13 problems and is never
significantly looser. Note that the shell bound is computationally intensive with <!--l. 4852--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>O</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msup><mrow 
><mi 
>m</mi></mrow><mrow 
><mn>1</mn><mo 
class="MathClass-punc">.</mo><mn>5</mn></mrow></msup 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
running time plus the time required to gather the necessary information.</td></tr></table></div><!--tex4ht:label?: x76-1140016 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" /> The Shell bound performs better than the Microchoice bound in figure
<a 
href="#x76-1140016">12.3.6<!--tex4ht:ref: fig-shell-vs-holdout --></a>. The information and computation requirements needed to calculate
the shell bound are quite large, but the resulting bound is noticeably tighter,
especially on problems with more examples. This bound is strong evidence that
training set based bounds can be made competitive with test set based bounds.
However, it is unnecessary to choose between these approaches since we can
construct a bound which uses information from both training and test set based
bounds.
   <h4 class="subsectionHead"><span class="titlemark">12.3.8. </span> <a 
  name="x76-11500012.3.8"></a>Combined Microchoice and holdout bound </h4>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x76-1150017"></a>
<!--l. 4866--><p class="indent">
                                                                     

                                                                     
</p><!--l. 4866--><p class="noindent"><img 
src="thesis28x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis28x.gif" src="test_vs_test_n_micro.ps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;12.3.7: </td><td  
class="content"><a 
  name="x76-1150017"></a>This is a plot comparing confidence intervals built based upon the
holdout bound ( <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>) on the left and a combination (theorem <a 
href="thesisse50.xml#x69-95003r1">11.2.1<!--tex4ht:ref: th-tnt --></a>) of the holdout
and microchoice (theorem  <a 
href="thesisse22.xml#x32-40012r2">5.2.2<!--tex4ht:ref: th-smb --></a>) bounds. The middle column is the microchoice
bound and the right column is the combined bound. </td></tr></table></div><!--tex4ht:label?: x76-1150017 -->
   The resulting combined bound consistently performs well on all data sets and is
sometimes superior to either bound individually. The computational time of this bound is <!--l. 4877--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>O</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msup><mrow 
><mi 
>m</mi></mrow><mrow 
><mn>1</mn><mo 
class="MathClass-punc">.</mo><mn>5</mn></mrow></msup 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> in
general.
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" /> The combined Microchoice and Holdout bound performs only slightly worse than
the best of the either bound and is sometimes better than either bound individually for
the problems reported in figure  <a 
href="#x76-1150017">12.3.7<!--tex4ht:ref: fig-combined-mc --></a>. This particular combined bound is
(perhaps) the most practical result of this thesis since it is easy to calculate
the necessary information and reasonably easy to calculate the value of the
bound.
   <h4 class="subsectionHead"><span class="titlemark">12.3.9. </span> <a 
  name="x76-11600012.3.9"></a>Combined Shell and Holdout Bound </h4>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x76-1160018"></a>
<!--l. 4890--><p class="indent">
                                                                     

                                                                     
</p><!--l. 4890--><p class="noindent"><img 
src="thesis29x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis29x.gif" src="test_vs_test_n_shell.ps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;12.3.8: </td><td  
class="content"><a 
  name="x76-1160018"></a> This is a plot comparing confidence intervals built based upon the
holdout bound ( <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>) on the left and a combination (theorem   <a 
href="thesisse50.xml#x69-95003r1">11.2.1<!--tex4ht:ref: th-tnt --></a>) of the
holdout (theorem  <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>) and shell bounds (theorem  <a 
href="thesisse34.xml#x50-75001r2">8.1.2<!--tex4ht:ref: Observable --></a> and  <a 
href="thesisse35.xml#x51-76001r1">8.2.1<!--tex4ht:ref: th-sample_shell --></a>). The middle
column is the shell bound or sampling shell bound (if a dashed line is present). The
right column is the combined bound</td></tr></table></div><!--tex4ht:label?: x76-1160018 -->
   The computational cost of this bound is very nontrivial taking <!--l. 4900--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>O</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msup><mrow 
><mi 
>m</mi></mrow><mrow 
><mn>2</mn></mrow></msup 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> time
in general.
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" /> The combined shell and holdout bound gives the best results of all in figure  <a 
href="#x76-1160018">12.3.8<!--tex4ht:ref: fig-combined-shell --></a>. The
downside of using the shell bound is that significantly more computation and information
is required in order to calculate the bound. The computational cost for the bound is <!--l. 4906--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>O</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msup><mrow 
><mi 
>m</mi></mrow><mrow 
><mn>2</mn></mrow></msup 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
which makes it impractical to apply this bound beyond about <!--l. 4907--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>m</mi> <mo 
class="MathClass-rel">=</mo> <mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn><mn>0</mn></mrow></math> with
current computers.
<!--l. 4910--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse56.xml" >next</a>] [<a 
href="thesisse54.xml" >prev</a>] [<a 
href="thesisse54.xml#tailthesisse54.xml" >prev-tail</a>] [<a 
href="thesisse55.xml" >front</a>] [<a 
href="thesisch12.xml#thesisse55.xml" >up</a>] </p></div><a 
  name="tailthesisse55.xml"></a>   
</body> 
</html> 
