<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>8 Computable Shell bounds </title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisch9.xml" >next</a>] [<a 
href="thesisch7.xml" >prev</a>] [<a 
href="thesisch7.xml#tailthesisch7.xml" >prev-tail</a>] [<a 
href="#tailthesisch8.xml">tail</a>] [<a 
href="thesispa2.xml#thesisch8.xml" >up</a>] </p></div>
   <h2 class="chapterHead"><span class="titlemark">Chapter&#x00A0;8</span><br /><a 
  name="x49-720008"></a>Computable Shell bounds </h2>
<!--l. 2974--><p class="noindent">The first shell bound paper was joint work with David McAllester and was presented at
Colt <span class="cite">[<a 
href="thesisli2.xml#XShell"><span 
class="ecbx-1000">33</span></a>]</span>. The work presented here incorporates significant refinement, generalization,
and simplification of the first Colt paper.
</p><!--l. 2978--><p class="indent">   Roughly speaking, the shell bound (usually) provides much tighter true error rate
upper bounds on learned hypotheses than conventional Occam&#x2019;s Razor bound (theorem
<a 
href="thesisse20.xml#x27-36001r1">4.6.1<!--tex4ht:ref: th-ORB --></a>) or PAC-Bayes bounds (theorem s <a 
href="thesisse26.xml#x39-59001r1">6.2.1<!--tex4ht:ref: th-repbb --></a>). It does this without violating lower
upper bounds  <a 
href="thesisse18.xml#x25-340004.4">4.4<!--tex4ht:ref: sec-lower_upper --></a> by incorporating <span 
class="ecti-1000">much </span>more information into the bound
calculation.
</p><!--l. 2984--><p class="indent">   The inspiration behind the work on Shell bounds rests on two pieces of work. In <span class="cite">[<a 
href="thesisli2.xml#Xold_shell"><span 
class="ecbx-1000">22</span></a>]</span>
by Haussler, Kearns, Seung, and Tishby, learning theory curves are investigated from an
omniscient point of view where the true error rates of various hypotheses are known. The
principle improvement in this paper is that our bounds are reduced to <span 
class="ecti-1000">observable </span>quantities.
Put another way, we do not need to know the underlying learning distribution, <!--l. 2989--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>D</mi></mrow></math>. In
<span class="cite">[<a 
href="thesisli2.xml#XTobias"><span 
class="ecbx-1000">47</span></a>]</span>, an analysis was made assuming some distribution over true error rates. Our analysis
does not rely on any assumption about the distribution of true error rates&#x2014;only the
independence assumption is made. Despite using only observable information and
making no extra assumptions, the results here are quite tight and yield practical
results.
</p><!--l. 2996--><p class="indent">   We start with the distribution of empirical errors over hypotheses and subtract a
small amount from the empirical error rates to create a pessimistic distribution. With
high probability, the cumulative of the pessimistic distribution will lower bound the
cumulative distribution of hypothesis true error rates. Given this, we can directly
calculate a bound on the probability that a &#x201C;large&#x201D; hypothesis will produce
a misleadingly small error. This bound can be <span 
class="ecti-1000">much </span>tighter than standard
union bound techniques although the quantity of improvement is highly problem
dependent.
</p><!--l. 3005--><p class="indent">   After presenting the first bound, we will transform it into a bound on continuous
hypothesis spaces using a PAC-Bayes like approach <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>.
</p><!--l. 3008--><p class="indent">   Viewed as an interactive proof of learning (figure  <a 
href="#x49-720011">8.0.1<!--tex4ht:ref: fig-shell-protocol --></a>), the stochastic shell bound is
much like the PAC-Bayes bound.
</p>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x49-720011"></a>
<!--l. 3012--><p class="indent">
                                                                     

                                                                     
</p><!--l. 3012--><p class="noindent"><img 
src="thesis13x.gif" alt="PIC" class="graphics" width="705.63625pt" height="404.51125pt"  /><!--tex4ht:graphics  
name="thesis13x.gif" src="thesis-presentation/shell.eps"  
-->
<br /></p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;8.0.1: </td><td  
class="content"><a 
  name="x49-720011"></a> The stochastic shell bound, as an interactive proof of learning, has the
same general outline as the PAC-Bayes bound except that <span 
class="ecti-1000">much </span>more information
is required in order to calculate the bound. The shell bound (proved first below) is
a simplification which is somewhat tighter when the &#x201C;Posterior&#x201D; places all mass on
one hypothesis.</td></tr></table></div><!--tex4ht:label?: x49-720011 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" />
<!--l. 3023--><p class="indent">   The strongest criticism of shell bounds is, in fact, that too much information is
required. While this information is always theoretically observable, it may not be
tractable to collect. There are two answers to this criticism given here. The first is an
empirical employment on decision tree learning algorithms which shows that
in practice, there are often shortcuts which make the information gathering
feasible. The second answer is to construct a sampled version of the shell bound
which approximate versions of the information required by the shell bound. We
will:
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x49-72003x1"></a>Present the discrete shell bound
           </li>
        <li class="enumerate"><a 
  name="x49-72005x2"></a>Present the sampled version of the shell bound.
           </li>
        <li class="enumerate"><a 
  name="x49-72007x3"></a>Extend the discrete shell bound to continuous spaces</li></ol>
<!--l. 3036--><p class="nopar">
</p>
   <div class="sectionTOCS"><span class="sectionToc">&#x00A0;8.1.&#x00A0;&#x00A0;<a 
href="thesisse34.xml#x50-730008.1" name="QQ2-50-80">The Discrete Shell Bound</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;8.1.1.&#x00A0;&#x00A0;<a 
href="thesisse34.xml#x50-740008.1.1" name="QQ2-50-81">Knowledge of learning distribution</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;8.1.2.&#x00A0;&#x00A0;<a 
href="thesisse34.xml#x50-750008.1.2" name="QQ2-50-82">No
knowledge of learning distribution</a></span><br /><span class="sectionToc">&#x00A0;8.2.&#x00A0;&#x00A0;<a 
href="thesisse35.xml#x51-760008.2" name="QQ2-51-83">Sampling Shell Bound</a></span><br /><span class="sectionToc">&#x00A0;8.3.&#x00A0;&#x00A0;<a 
href="thesisse36.xml#x52-770008.3" name="QQ2-52-84">Lower Bounds</a></span><br /><span class="sectionToc">&#x00A0;8.4.&#x00A0;&#x00A0;<a 
href="thesisse37.xml#x53-780008.4" name="QQ2-53-85">Shell
Bounds for Continuous Spaces</a></span><br /><span class="sectionToc">&#x00A0;8.5.&#x00A0;&#x00A0;<a 
href="thesisse38.xml#x54-790008.5" name="QQ2-54-86">Conclusion</a></span><br />
   </div>




                                                                     

                                                                     
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisch9.xml" >next</a>] [<a 
href="thesisch7.xml" >prev</a>] [<a 
href="thesisch7.xml#tailthesisch7.xml" >prev-tail</a>] [<a 
href="thesisch8.xml" >front</a>] [<a 
href="thesispa2.xml#thesisch8.xml" >up</a>] </p></div><a 
  name="tailthesisch8.xml"></a>  
</body> 
</html> 
