<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>7.2 A generalized averaging bound</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse31.xml" >next</a>] [<a 
href="thesisse29.xml" >prev</a>] [<a 
href="thesisse29.xml#tailthesisse29.xml" >prev-tail</a>] [<a 
href="#tailthesisse30.xml">tail</a>] [<a 
href="thesisch7.xml#thesisse30.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">7.2. </span> <a 
  name="x45-660007.2"></a>A generalized averaging bound</h3>
<!--l. 2636--><p class="noindent">Before discussing the main theorem, it is important to notice that the averaging classifier, <!--l. 2637--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>c</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
<span 
class="ecti-1000">implies </span>a distribution over the base hypothesis space <!--l. 2638--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>H</mi></mrow></math>. This implied
distribution is <!--l. 2638--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msub><mrow 
><mi 
>q</mi></mrow><mrow 
><mi 
>c</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
where <!--l. 2639--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                                    <mi 
>c</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <!--mstyle 
class="text"--><mtext class="textrm">sign</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mo 
class="MathClass-op">&#x222B;</mo>
  <mi 
>h</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><msub><mrow 
><mi 
>q</mi></mrow><mrow 
><mi 
>c</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>d</mi><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow>
</mrow></math> The
distribution <!--l. 2641--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msub><mrow 
><mi 
>q</mi></mrow><mrow 
><mi 
>c</mi></mrow></msub 
></mrow></math>
is used in the following theorem.
</p>
   <div class="newtheorem">
<!--l. 2643--><p class="noindent"><span class="head">
<a 
  name="x45-66001r1"></a>
  <span 
class="eccc-1000">T<small 
class="small-caps">H</small><small 
class="small-caps">E</small><small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">E</small><small 
class="small-caps">M</small> </span>7.2.1<span 
class="eccc-1000">.</span></span>
</p><!--l. 2644--><p class="indent">   <span 
class="ecti-1000">(Relative Entropy Averaging Theorem) For all distribution </span><!--l. 2645--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">for all </span><!--l. 2645--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B4;</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math><span 
class="ecti-1000">:</span>
</p><!--l. 2647--><p class="indent">   <!--l. 2647--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
                                                                     

                                                                     
<mrow 
>
    <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>&#x2203;</mi><mi 
>c</mi><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B8;</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow> <mo 
class="MathClass-punc">:</mo>  <!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--> <mfenced separators="" 
open="("  close=")" ><mrow><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover></mrow><mrow 
><mi 
>&#x03B8;</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></mfenced> <mo 
class="MathClass-rel">&#x2265;</mo> <mi 
>O</mi> <mfenced separators="" 
open="("  close=")" ><mrow><mfrac><mrow 
><mfrac><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mfenced separators="" 
open="("  close=")" ><mrow><msub><mrow 
><mi 
>q</mi></mrow><mrow 
><mi 
>c</mi></mrow></msub 
><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow></mfenced></mrow> 
    <mrow 
><msup><mrow 
><mi 
>&#x03B8;</mi></mrow><mrow 
><mn>2</mn></mrow></msup 
></mrow></mfrac>      <mo 
> ln</mo><!--nolimits--><mi 
>m</mi> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--><mi 
>m</mi> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac></mrow>
                <mrow 
><mi 
>m</mi></mrow></mfrac>              </mrow></mfenced></mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
</p>
   </div>
   <div class="proof">
<!--l. 2652--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>Given in the next section. <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 2654--><p class="indent">   The main theorem uses a KL-divergence based pseudodistance which is a bit hard to
understand intuitively. In order to gain intuition, we can relax the tightness of the proof
with an inequality.
</p><!--l. 2658--><p class="indent">   <!--l. 2658--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                                <!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>&#x03B8;</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo> <mn>2</mn><msup><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>&#x03B8;</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">&#x2212;</mo> <mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mrow 
><mn>2</mn></mrow></msup 
>
</mrow></math>
</p><!--l. 2662--><p class="indent">   This relaxation gives us an immediate corollary.
</p>
   <div class="newtheorem">
<!--l. 2664--><p class="noindent"><span class="head">
<a 
  name="x45-66002r2"></a>
  <span 
class="eccc-1000">C<small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">L</small><small 
class="small-caps">L</small><small 
class="small-caps">A</small><small 
class="small-caps">R</small><small 
class="small-caps">Y</small> </span>7.2.2<span 
class="eccc-1000">.</span></span>
                                                                     

                                                                     
</p><!--l. 2665--><p class="indent">   <span 
class="ecti-1000">(Relative Entropy Averaging Theorem) For all distribution </span><!--l. 2665--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">for all </span><!--l. 2666--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B4;</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math><span 
class="ecti-1000">:</span>
</p>
   </div>
   <div class="newtheorem">
<!--l. 2668--><p class="noindent"><span class="head">
<a 
  name="x45-66003r3"></a>
  <span 
class="eccc-1000">T<small 
class="small-caps">H</small><small 
class="small-caps">E</small><small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">E</small><small 
class="small-caps">M</small> </span>7.2.3<span 
class="eccc-1000">.</span></span>
</p><!--l. 2669--><p class="indent">   <!--l. 2669--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
    <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>&#x2203;</mi><mi 
>c</mi><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B8;</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow> <mo 
class="MathClass-punc">:</mo>  <mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover></mrow><mrow 
><mi 
>&#x03B8;</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>c</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">+</mo> <mi 
>O</mi> <mfenced separators="" 
open="("  close=")" ><mrow><msqrt><mi 
></mi>
 <mrow><mfrac><mrow 
> <mfrac> <mrow 
> <!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--> <mfenced separators="" 
open="("  close=")" ><mrow><msub><mrow 
><mi 
>q</mi></mrow><mrow 
><mi 
>c</mi> </mrow> </msub 
> <mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi>  </mrow></mfenced></mrow> 
    <mrow 
><msup><mrow 
><mi 
>&#x03B8;</mi></mrow><mrow 
><mn>2</mn></mrow></msup 
></mrow></mfrac>      <mo 
> ln</mo><!--nolimits--><mi 
>m</mi> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--><mi 
>m</mi> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac></mrow>
                <mrow 
><mi 
>m</mi></mrow></mfrac></mrow></msqrt>              </mrow></mfenced></mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
</p>
   </div>
<!--l. 2673--><p class="indent">   This theorem improves upon theorem   <a 
href="thesisse29.xml#x44-65001r1">7.1.1<!--tex4ht:ref: th-margin --></a> because <!--l. 2673--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>q</mi></mrow><mrow 
><mi 
>c</mi></mrow></msub 
><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> is used instead of <!--l. 2674--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mo 
>ln</mo><!--nolimits--><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>H</mi><mo 
class="MathClass-rel">&#x2223;</mo></mrow></math>. For the case of a uniform
distribution on <!--l. 2675--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>H</mi><mo 
class="MathClass-rel">&#x2223;</mo></mrow></math>
different base classifiers, these results will agree when the average is over just
one classifier. As the average becomes &#x201C;broader&#x201D; the results will improve.
In the limit when the average is over nearly all classifiers, the term <!--l. 2678--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>q</mi></mrow><mrow 
><mi 
>c</mi></mrow></msub 
><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> will be nearly
<!--l. 2678--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">     <mrow 
><mn>0</mn></mrow></math>.
</p><!--l. 2680--><p class="indent">   The theorems are stated in an asymptotic fashion which is not be very useful in
practical applications. Section  <a 
href="thesisse32.xml#x47-700007.4">7.4<!--tex4ht:ref: sec-tighten --></a> gives some ideas of how to tighten the result, and the
non-asymptotic form ( <a 
href="thesisse31.xml#x46-69012r15">7.3.15<!--tex4ht:ref: eq-finalbound --></a>) given at the end of the proof can be used directly in
practice.
</p><!--l. 2685--><p class="indent">   The improved averaging bound applies to averages over continuous hypothesis
spaces. In this setting, the average needs to be an integral over an uncountably-infinite
                                                                     

                                                                     
set of hypotheses or the KL-divergence will not converge to a finite value. It is exactly
because of this limitation that the improvements of this bound are most applicable to
Bayes Optimal and Maximum Entropy classifiers.
</p><!--l. 2691--><p class="indent">   In practice, the limitation may not be a significant problem because machine
learning algorithms over large hypothesis spaces typically have some parameter
stability. In other words, a small shift in the parameters of the learned model
produces a small change in the prediction of the hypothesis. With hypothesis
stability, we can convert any average over a finite set of hypotheses into an
average over an infinite set of hypotheses without significantly altering the
predictions of the average. This technique is explored in chapter  <a 
href="thesisch13.xml#x78-11800013">13<!--tex4ht:ref: SNN --></a> with positive
results.
</p><!--l. 2701--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse31.xml" >next</a>] [<a 
href="thesisse29.xml" >prev</a>] [<a 
href="thesisse29.xml#tailthesisse29.xml" >prev-tail</a>] [<a 
href="thesisse30.xml" >front</a>] [<a 
href="thesisch7.xml#thesisse30.xml" >up</a>] </p></div><a 
  name="tailthesisse30.xml"></a>  
</body> 
</html> 
