<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>6 PAC-Bayes bounds</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisch7.xml" >next</a>] [<a 
href="thesisch5.xml" >prev</a>] [<a 
href="thesisch5.xml#tailthesisch5.xml" >prev-tail</a>] [<a 
href="#tailthesisch6.xml">tail</a>] [<a 
href="thesispa2.xml#thesisch6.xml" >up</a>] </p></div>
   <h2 class="chapterHead"><span class="titlemark">Chapter&#x00A0;6</span><br /><a 
  name="x37-570006"></a>PAC-Bayes bounds</h2>
<!--l. 2196--><p class="noindent">The work presented here is also published in <span class="cite">[<a 
href="thesisli2.xml#Xaveraging_tech"><span 
class="ecbx-1000">35</span></a>]</span>.
</p><!--l. 2198--><p class="indent">   PAC-Bayes bounds are a generalization of the Occam&#x2019;s razor bound for algorithms
which output a <span 
class="ecti-1000">distribution </span>over classifiers rather than just a single classifier. This
includes the possibility of a distribution over a single classifier, so it is a generalization.
Most classifiers do not output a distribution over base classifiers. Instead, they output
either a classifier, or an average over base classifiers. Nonetheless, PAC-Bayes bounds are
interesting for several reasons:
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x37-57002x1"></a>PAC-Bayes bounds are much tighter (in practice) than most common
        VC-related <span class="cite">[<a 
href="thesisli2.xml#XVapnik"><span 
class="ecbx-1000">51</span></a>]</span> approaches on continuous classifier spaces. This can be
        shown by application to stochastic neural networks (see section <a 
href="thesisch13.xml#x78-11800013">13<!--tex4ht:ref: SNN --></a>) as well
        as other classifiers. It also can be seen by observation: when specializing
        the PAC-Bayes bounds on discrete hypothesis spaces, only <!--l. 2211--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>O</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mo 
>ln</mo><!--nolimits--><mi 
>m</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
        sample complexity is lost.
           </li>
        <li class="enumerate"><a 
  name="x37-57004x2"></a>Due  to  the  achievable  tightness,  the  result  motivates  new  learning
        algorithms which strongly limit the amount of overfitting that a learning
        algorithm will incur.
           </li>
        <li class="enumerate"><a 
  name="x37-57006x3"></a>The result found here will turn out to be useful for averaging hypotheses.</li></ol>
<!--l. 2217--><p class="nopar"> PAC-Bayes bounds were first introduced by McAllester <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>.
</p><!--l. 2220--><p class="indent">   There are three relatively independent observations in this chapter:
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x37-57008x1"></a>A quantitative improvement of the PAC-Bayes by retrofit with relative
        entropy Chernoff bound  <a 
href="thesisse10.xml#x16-24001r1">3.2.1<!--tex4ht:ref: eq-recb --></a>. This retrofit is not as trivial as might be
        expected, but it can be done. The result is the tightest known PAC-Bayes
        bound.  In  addition  to  the  quantitative  improvements,  this  tightening
        simplifies  the  proof  and  adds  to  our  qualitative  understanding  of  the
        bound.
           </li>
        <li class="enumerate"><a 
  name="x37-57010x2"></a>A  method  for  (partially)  derandomizing  the  PAC-Bayes  stochastic
        hypothesis
           </li>
        <li class="enumerate"><a 
  name="x37-57012x3"></a>A method for stochastic evaluation of the empirical error.</li></ol>
<!--l. 2230--><p class="nopar"> The first observation is the most important. Observation (3) is important for many
practical applications because it is safely avoids a (sometimes) very complicated
evaluation problem. Observation (2) is of little theoretical interest, but it might interest
some people who feel reassured when every classifier randomized over has a low empirical
                                                                     

                                                                     
error rate.
</p><!--l. 2237--><p class="indent">   Figure  <a 
href="#x37-570131">6.0.1<!--tex4ht:ref: fig-pb-protocol --></a> shows what the PAC-Bayes bound looks like as an interactive proof of
learning.
</p>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x37-570131"></a>
<!--l. 2241--><p class="indent">
                                                                     

                                                                     
</p><!--l. 2241--><p class="noindent"><img 
src="thesis10x.gif" alt="PIC" class="graphics" width="709.65125pt" height="405.515pt"  /><!--tex4ht:graphics  
name="thesis10x.gif" src="thesis-presentation/pac-bayes.eps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;6.0.1:  </td><td  
class="content"><a 
  name="x37-570131"></a>  The  PAC-Bayes  bound  can  be  viewed  as  a  new  style  for  a
proof  of  learning.  The  learner  must  commit  to  a  &#x201C;Prior&#x201D;  as  in  the  Occam&#x2019;s
Razor  Bound    <a 
href="thesisse20.xml#x27-36001r1">4.6.1<!--tex4ht:ref: th-ORB --></a>  before  seeing  examples,  but  it  does  not  commit  to  a
single  hypothesis.  Instead,  it  commits  to  a  distribution  over  hypotheses,  <!--l. 2249--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
and  the  bound  applies  to  a  randomization  with  respect  to  the  distribution  <!--l. 2249--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.</td></tr></table></div><!--tex4ht:label?: x37-570131 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" />
   <div class="sectionTOCS"><span class="sectionToc">&#x00A0;6.1.&#x00A0;&#x00A0;<a 
href="thesisse25.xml#x38-580006.1" name="QQ2-38-63">PAC-Bayes Basics</a></span><br /><span class="sectionToc">&#x00A0;6.2.&#x00A0;&#x00A0;<a 
href="thesisse26.xml#x39-590006.2" name="QQ2-39-64">A Tighter PAC-Bayes Bound</a></span><br /><span class="sectionToc">&#x00A0;6.3.&#x00A0;&#x00A0;<a 
href="thesisse27.xml#x41-600006.3" name="QQ2-41-65">PAC-Bayes
Approximations</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;6.3.1.&#x00A0;&#x00A0;<a 
href="thesisse27.xml#x41-610006.3.1" name="QQ2-41-66">Approximating the empirical error</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;6.3.2.&#x00A0;&#x00A0;<a 
href="thesisse27.xml#x41-620006.3.2" name="QQ2-41-67">Derandomizing
the PAC-Bayes bound</a></span><br /><span class="sectionToc">&#x00A0;6.4.&#x00A0;&#x00A0;<a 
href="thesisse28.xml#x42-630006.4" name="QQ2-42-68">Application of the PAC-Bayes bound</a></span><br />
   </div>



                                                                     

                                                                     
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisch7.xml" >next</a>] [<a 
href="thesisch5.xml" >prev</a>] [<a 
href="thesisch5.xml#tailthesisch5.xml" >prev-tail</a>] [<a 
href="thesisch6.xml" >front</a>] [<a 
href="thesispa2.xml#thesisch6.xml" >up</a>] </p></div><a 
  name="tailthesisch6.xml"></a>  
</body> 
</html> 
