<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>7 Averaging Bounds (Improved margin)</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisch8.xml" >next</a>] [<a 
href="thesisch6.xml" >prev</a>] [<a 
href="thesisch6.xml#tailthesisch6.xml" >prev-tail</a>] [<a 
href="#tailthesisch7.xml">tail</a>] [<a 
href="thesispa2.xml#thesisch7.xml" >up</a>] </p></div>
   <h2 class="chapterHead"><span class="titlemark">Chapter&#x00A0;7</span><br /><a 
  name="x43-640007"></a>Averaging Bounds (Improved margin)</h2>
<!--l. 2533--><p class="noindent">The work in this chapter is joint with Matthias Seeger and Nimrod Megiddo. It was first
presented at ICML <span class="cite">[<a 
href="thesisli2.xml#Xaveraging_icml"><span 
class="ecbx-1000">34</span></a>]</span>.
</p><!--l. 2536--><p class="indent">   Averaging bounds are specialized for averaging classifiers. An average has the form <!--l. 2538--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
            <mi 
>f</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mo 
class="MathClass-op">&#x222B;</mo>
  <mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>h</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>d</mi><mi 
>x</mi><!--mstyle 
class="text"--><mtext class="textrm">&#x000A0;or&#x000A0;</mtext><!--/mstyle--><mi 
>f</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo><msubsup><mrow 
> <mo 
class="MathClass-op">&#x222B;</mo>
  </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>N</mi></mrow></msubsup 
><mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>h</mi></mrow><mrow 
>
<mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>d</mi><mi 
>x</mi>
</mrow></math> where <!--l. 2540--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo> <mn>0</mn></mrow></math> and <!--l. 2540--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mo 
class="MathClass-op">&#x222B;</mo>
  <mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>d</mi><mi 
>h</mi> <mo 
class="MathClass-rel">=</mo> <mn>1</mn></mrow></math>. Averaging classifiers
have the form: <!--l. 2542--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">       <mrow 
>
                                           <mi 
>c</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <!--mstyle 
class="text"--><mtext class="textrm">sign</mtext><!--/mstyle--> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>f</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></mfenced>
</mrow></math>
Averaging bounds are especially interesting because there are many learning algorithms
which use averaging. These techniques include:
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x43-64002x1"></a>Boosting <span class="cite">[<a 
href="thesisli2.xml#XBoosting"><span 
class="ecbx-1000">18</span></a>]</span>
           </li>
        <li class="enumerate"><a 
  name="x43-64004x2"></a>Bayes-Optimal learning (see section 6.7 of <span class="cite">[<a 
href="thesisli2.xml#XTom"><span 
class="ecbx-1000">37</span></a>]</span>)
           </li>
                                                                     

                                                                     
        <li class="enumerate"><a 
  name="x43-64006x3"></a>Support Vector Machines <span class="cite">[<a 
href="thesisli2.xml#XSVM"><span 
class="ecbx-1000">8</span></a>]</span>
           </li>
        <li class="enumerate"><a 
  name="x43-64008x4"></a>Bagging <span class="cite">[<a 
href="thesisli2.xml#XBagging"><span 
class="ecbx-1000">6</span></a>]</span>
           </li>
        <li class="enumerate"><a 
  name="x43-64010x5"></a>Maximum Entropy classification <span class="cite">[<a 
href="thesisli2.xml#XME"><span 
class="ecbx-1000">24</span></a>]</span></li></ol>
<!--l. 2553--><p class="nopar"> Viewed as an interactive proof of learning (see figure  <a 
href="#x43-640111">7.0.1<!--tex4ht:ref: fig-averaging-protocol --></a>) the bound presented here is
almost the same as the PAC-Bayes bound except that it applies to the <span 
class="ecti-1000">average </span>over the
posterior rather than to stochastic choices over the posterior.
</p>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x43-640111"></a>
<!--l. 2560--><p class="indent">
                                                                     

                                                                     
</p><!--l. 2560--><p class="noindent"><img 
src="thesis11x.gif" alt="PIC" class="graphics" width="709.65125pt" height="405.515pt"  /><!--tex4ht:graphics  
name="thesis11x.gif" src="thesis-presentation/averaging.eps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;7.0.1: </td><td  
class="content"><a 
  name="x43-640111"></a> For the averaging bound, the learning must commit to some measure <!--l. 2567--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>,
receive     examples,     and     then     commit     to     another     measure     <!--l. 2567--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.
The      true      error      bound      applies      to      the      <span 
class="ecti-1000">average      </span>over      <!--l. 2567--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
rather than stochastic choices as in the PAC-Bayes bound  <a 
href="thesisch6.xml#x37-570006">6<!--tex4ht:ref: sec-PB --></a>.</td></tr></table></div><!--tex4ht:label?: x43-640111 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" />
<!--l. 2571--><p class="indent">   The bound in this section is a qualitative improvement on prior results for averaging
bounds. For the average learning algorithms listed above, the form of the improvements
is most interesting for Maximum Entropy Classification and Bayes-Optimal classification.
All (currently known) specialized averaging bounds use as a parameter the
&#x201C;margin&#x201D;. For this section only, suppose the label and the hypotheses have value <!--l. 2576--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mn>1</mn></mrow></math> or <!--l. 2576--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn></mrow></math>. (<!--l. 2576--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>y</mi> <mo 
class="MathClass-rel">&#x2208;</mo><mrow><mo 
class="MathClass-open">{</mo><mrow><mo 
class="MathClass-bin">&#x2212;</mo><mn>1</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">}</mo></mrow></mrow></math> and <!--l. 2577--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>h</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2208;</mo><mrow><mo 
class="MathClass-open">{</mo><mrow><mo 
class="MathClass-bin">&#x2212;</mo><mn>1</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">}</mo></mrow></mrow></math>). Then, the margin
will be defined as <!--l. 2578--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">       <mrow 
>
                                            <mi 
>t</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mi 
>y</mi><mi 
>f</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow>
</mrow></math>
Some simple observations are immediate:
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x43-64013x1"></a>The margin is bounded. <!--l. 2583--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>t</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">[</mo><mrow><mo 
class="MathClass-bin">&#x2212;</mo><mn>1</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math>
           </li>
        <li class="enumerate"><a 
  name="x43-64015x2"></a>If the classifier is correct, the margin is positive. <!--l. 2584--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>c</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mi 
>y</mi> <mo 
class="MathClass-rel">&#x21D2;</mo> <mi 
>t</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math></li></ol>
<!--l. 2585--><p class="nopar"> The <span 
class="ecti-1000">error  </span>at some margin is the quantity actually used
in the averaging bounds. The empirical error at margin <!--l. 2587--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>&#x03B8;</mi></mrow></math> is defined by: <!--l. 2588--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
                                                                     

                                                                     
<mrow 
><msub><mrow 
>
                        <mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>&#x03B8;</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo><msub><mrow 
><mo 
> Pr</mo></mrow><mrow 
><mi 
>S</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>t</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B8;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow>
</mrow></math> and the true error at
margin <!--l. 2590--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>&#x03B8;</mi></mrow></math> is defined
by: <!--l. 2591--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                                     <msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>&#x03B8;</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo><msub><mrow 
><mo 
> Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>t</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B8;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow>
</mrow></math> Note that <!--l. 2593--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mn>0</mn></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> (the true error
rate of <!--l. 2593--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>c</mi></mrow></math>).
</p><!--l. 2595--><p class="indent">   The &#x201C;margin&#x201D; is a useful way to parameterize our learning algorithms. It will turn
out that the sample complexity will be low (and the guarantees we can make
better) when the margin of most of the training examples is large. There is,
however, a price associated with using the margin: some hypotheses have no
notion of a margin. Thus the theory in this chapter is less general than appears
elsewhere.
</p>
   <div class="sectionTOCS"><span class="sectionToc">&#x00A0;7.1.&#x00A0;&#x00A0;<a 
href="thesisse29.xml#x44-650007.1" name="QQ2-44-71">Earlier Results</a></span><br /><span class="sectionToc">&#x00A0;7.2.&#x00A0;&#x00A0;<a 
href="thesisse30.xml#x45-660007.2" name="QQ2-45-72">A generalized averaging bound</a></span><br /><span class="sectionToc">&#x00A0;7.3.&#x00A0;&#x00A0;<a 
href="thesisse31.xml#x46-670007.3" name="QQ2-46-73">Proof of main
theorem</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;7.3.1.&#x00A0;&#x00A0;<a 
href="thesisse31.xml#x46-680007.3.1" name="QQ2-46-74">Definitions and observations</a></span><br /><span class="subsectionToc">&#x00A0;&#x00A0;&#x00A0;7.3.2.&#x00A0;&#x00A0;<a 
href="thesisse31.xml#x46-690007.3.2" name="QQ2-46-75">The Proof</a></span><br /><span class="sectionToc">&#x00A0;7.4.&#x00A0;&#x00A0;<a 
href="thesisse32.xml#x47-700007.4" name="QQ2-47-76">Methods for
tightening</a></span><br /><span class="sectionToc">&#x00A0;7.5.&#x00A0;&#x00A0;<a 
href="thesisse33.xml#x48-710007.5" name="QQ2-48-77">Final thoughts for Averaging Bounds</a></span><br />
   </div>




                                                                     

                                                                     
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisch8.xml" >next</a>] [<a 
href="thesisch6.xml" >prev</a>] [<a 
href="thesisch6.xml#tailthesisch6.xml" >prev-tail</a>] [<a 
href="thesisch7.xml" >front</a>] [<a 
href="thesispa2.xml#thesisch7.xml" >up</a>] </p></div><a 
  name="tailthesisch7.xml"></a>  
</body> 
</html> 
