<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>12.4 Discussion</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse55.xml" >prev</a>] [<a 
href="thesisse55.xml#tailthesisse55.xml" >prev-tail</a>] [<a 
href="#tailthesisse56.xml">tail</a>] [<a 
href="thesisch12.xml#thesisse56.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">12.4. </span> <a 
  name="x77-11700012.4"></a>Discussion</h3>
<!--l. 4912--><p class="noindent">It is difficult to answer the question &#x201C;which bound is tighter?&#x201D; in a theoretical way,
because every bound has a worst case. For example, the Occam&#x2019;s Razor bound is worse
than the Simple bound when the hypothesis chosen happens to be one of the last <!--l. 4915--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>H</mi></mrow></math>. Our
results show there is no total ordering amongst the bounds although there is a noticeable
rough ordering:
</p><!--l. 4918--><p class="indent">   <!--l. 4918--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                    <!--mstyle 
class="text"--><mtext class="textrm">Simple</mtext><!--/mstyle--> <mo 
class="MathClass-rel">&#x003E;</mo> <!--mstyle 
class="text"--><mtext class="textrm">Occam</mtext><!--/mstyle--> <mo 
class="MathClass-rel">&#x003E;</mo> <!--mstyle 
class="text"--><mtext class="textrm">Microchoice&#x000A0;</mtext><!--/mstyle--> <mo 
class="MathClass-rel">&#x003E;</mo> <!--mstyle 
class="text"--><mtext class="textrm">Shell</mtext><!--/mstyle--> <mo 
class="MathClass-rel">&#x2243;</mo><!--mstyle 
class="text"--><mtext class="textrm">Holdout</mtext><!--/mstyle-->
</mrow></math>
This ordering is approximately as expected based on theoretical considerations. The
Simple bound can never be much better than Occam Bound and the Occam bound can
be arbitrarily tighter than the Simple bound. A similar statement holds for the
Microchoice Bound and the Shell bound. The Occam bound is only significantly looser
than the Microchoice bound because we used the Hoeffding approximation to the
Binomial tail. The Shell bound is not always the best, but it does behave well in
comparison to the more standard holdout approach.
</p><!--l. 4928--><p class="indent">   It is interesting to note that the sampling shell bound is <span 
class="ecti-1000">not </span>better than the
Microchoice bound on these learning problems, even with fast sampling techniques.
Apparently, the looseness introduced by bounding the sampling error is not countered by
the improvement in tightness.
</p><!--l. 4933--><p class="indent">   Empirically, we can observe a very noticeable behavior. For problems with less than <!--l. 4934--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn><mn>0</mn><mn>0</mn></mrow></math>
examples the sample complexity bounds are superior to the holdout bound. Between <!--l. 4935--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn><mn>0</mn><mn>0</mn></mrow></math> and <!--l. 4935--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn></mrow></math> examples, the
behavior changes with the holdout bound generally winning, although not necessarily by much.
                                                                     

                                                                     
Above <!--l. 4937--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mn>1</mn><mn>0</mn><mn>0</mn><mn>0</mn></mrow></math>
examples, the holdout bound is significantly and consistently tighter than the
sample complexity bounds. This behavior strongly suggests that the sample
complexity bounds are loose. Each of these bounds is &#x201C;tight&#x201D; in one sense or another,
but there may exist some as yet undiscovered observable property prevalent in
practical machine learning algorithms which allows us to create a tighter bound. In
particular, the problem of correlated hypotheses has yet to be solved in a convincing
manner.
</p><!--l. 4945--><p class="indent">   Also note that the holdout bound is <span 
class="ecti-1000">not </span>the tightest bound we report. In general, we have the following
ordering: <!--l. 4947--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                        <!--mstyle 
class="text"--><mtext class="textrm">Holdout</mtext><!--/mstyle--> <mo 
class="MathClass-rel">&#x003E;</mo> <!--mstyle 
class="text"--><mtext class="textrm">Holdout</mtext><!--/mstyle--> <mo 
class="MathClass-bin">+</mo> <!--mstyle 
class="text"--><mtext class="textrm">Micro</mtext><!--/mstyle--> <mo 
class="MathClass-rel">&#x003E;</mo> <!--mstyle 
class="text"--><mtext class="textrm">Holdout</mtext><!--/mstyle--> <mo 
class="MathClass-bin">+</mo> <!--mstyle 
class="text"--><mtext class="textrm">Shell</mtext><!--/mstyle-->
</mrow></math>
</p><!--l. 4951--><p class="indent">   The combined bounds seem to have the best behavior in practice.
</p><!--l. 4953--><p class="indent">   There are several directions of future investigation which could further strengthen
any of these approaches. For the sample complexity approach, it would be useful to
address the non-independence of samples in the fast sampling method used for the
Sampling Shell bound. We tested the simplest of holdout techniques so another natural
extension is to test other holdout techniques. This was not done here, because the theory
of these other techniques is lacking.
</p>
   <div class="newtheorem">
<!--l. 4960--><p class="noindent"><span class="head">
<a 
  name="x77-117001r1"></a>
  <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">B</small><small 
class="small-caps">L</small><small 
class="small-caps">E</small><small 
class="small-caps">M</small> </span>12.4.1<span 
class="eccc-1000">.</span></span>
</p><!--l. 4961--><p class="indent">   (Open)  Address  the  looseness  introduced  by  hypotheses  with  a  strong
correlation.  For  example,  two  decision  trees  which  differ  in  only  one  leaf
probably don&#x2019;t have significantly different error rates. Using the union bound over
these decision trees introduces unnecessary slack. Note that VC dimension and
covering number analysis address this, but (unfortunately) the formulas are either
unevaluatable or introduce so much slack that the quantitative results are worse
                                                                     

                                                                     
rather than better.
</p>
   </div>
<!--l. 4970--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse55.xml" >prev</a>] [<a 
href="thesisse55.xml#tailthesisse55.xml" >prev-tail</a>] [<a 
href="thesisse56.xml" >front</a>] [<a 
href="thesisch12.xml#thesisse56.xml" >up</a>] </p></div><a 
  name="tailthesisse56.xml"></a>  
</body> 
</html> 
