<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>6.2 A Tighter PAC-Bayes Bound</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse27.xml" >next</a>] [<a 
href="thesisse25.xml" >prev</a>] [<a 
href="thesisse25.xml#tailthesisse25.xml" >prev-tail</a>] [<a 
href="#tailthesisse26.xml">tail</a>] [<a 
href="thesisch6.xml#thesisse26.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">6.2. </span> <a 
  name="x39-590006.2"></a>A Tighter PAC-Bayes Bound</h3>
<!--l. 2309--><p class="noindent">We can tighten this bound by employing a more accurate tail bound on the Binomial
distribution. The proof of this improved lemma is not as straightforward as a simple
substitution of the Hoeffding bound with the relative entropy Chernoff bound  <a 
href="thesisse10.xml#x16-24001r1">3.2.1<!--tex4ht:ref: eq-recb --></a> but
it can be worked out nonetheless.
</p>
   <div class="newtheorem">
<!--l. 2314--><p class="noindent"><span class="head">
<a 
  name="x39-59001r1"></a>
  <span 
class="eccc-1000">T<small 
class="small-caps">H</small><small 
class="small-caps">E</small><small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">E</small><small 
class="small-caps">M</small> </span>6.2.1<span 
class="eccc-1000">.</span></span>
</p><!--l. 2315--><p class="indent">   <span 
class="ecti-1000">(Relative Entropy PAC-Bayes bound) For all binary loss functions, </span><!--l. 2316--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>l</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi><mo 
class="MathClass-punc">,</mo><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">for all &#x201C;priors&#x201D; </span><!--l. 2316--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
<span 
class="ecti-1000">and for all </span><!--l. 2316--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B4;</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math><span 
class="ecti-1000">:</span>
</p><!--l. 2318--><p class="indent">   <!--l. 2318--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
           <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>&#x2203;</mi><mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-punc">:</mo>  <!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo> <mfrac><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>q</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--> <mfrac><mrow 
><mi 
>m</mi></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac> </mrow> 
       <mrow 
><mi 
>m</mi> <mo 
class="MathClass-bin">&#x2212;</mo> <mn>1</mn></mrow></mfrac>       </mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
</p>
   </div>
<!--l. 2322--><p class="indent">   This bound is always at least as tight as the original PAC-Bayes bound <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>
and sometimes much tighter, such as when the average empirical error is near <!--l. 2324--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn></mrow></math>.
In particular, when the average empirical error is zero (<!--l. 2324--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mn>0</mn></mrow></math>) the
bound can be significantly tighter as shown in figure  <a 
href="thesisse12.xml#x18-260021">3.4.1<!--tex4ht:ref: fig-bounds --></a> on page  <a 
href="thesisse12.xml#x18-260021">56<!--tex4ht:ref: fig-bounds --></a>.
                                                                     

                                                                     
</p><!--l. 2328--><p class="indent">   One interesting new feature of this PAC-Bayes bound is &#x201C;dimensionally
consistency&#x201D;<a 
href="thesis40.xml" name="thesis40.xml" ><sup>1</sup></a>.
In particular, each side of the equation is an expectation of log probabilities&#x2014;&#x201C;nats&#x201D;.
Rewriting, we get that with high probability, approximately the following holds: <!--l. 2332--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
                      <mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>q</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow>
</mrow></math>
There is a coding theory interpretation of KL divergence: <!--l. 2334--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>q</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo></mrow></math>
the expected number of <span 
class="ecti-1000">extra </span>bits required to encode symbols drawn from <!--l. 2336--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>q</mi></mrow></math> given a code designed for
symbols drawn from <!--l. 2336--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>p</mi></mrow></math>
rather than from <!--l. 2337--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>q</mi></mrow></math>.
</p><!--l. 2339--><p class="indent">   Using the coding theory interpretation of KL divergence, this says approximately:
&#x201C;With high probability the number of <span 
class="ecti-1000">extra </span>bits required to encode the empirical errors is
less than the number of <span 
class="ecti-1000">extra </span>bits required to encode hypotheses drawn from the
posterior.&#x201D;
</p><!--l. 2344--><p class="indent">   The retrofit of the PAC-Bayes bound is accomplished by reproving a technical lemma
about distributions. The proof relies upon two lemmas. The first is Lemma 22 from <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>
which is given by:
</p>
   <div class="newtheorem">
<!--l. 2348--><p class="noindent"><span class="head">
<a 
  name="x39-59003r2"></a>
  <span 
class="eccc-1000">L<small 
class="small-caps">E</small><small 
class="small-caps">M</small><small 
class="small-caps">M</small><small 
class="small-caps">A</small> </span>6.2.2<span 
class="eccc-1000">.</span></span>
</p><!--l. 2349--><p class="indent">   <span 
class="ecti-1000">For all </span><!--l. 2349--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B2;</mi> <mo 
class="MathClass-rel">&#x003E;</mo> <mn>0</mn><mo 
class="MathClass-punc">,</mo><mi 
>K</mi> <mo 
class="MathClass-rel">&#x003E;</mo> <mn>0</mn></mrow></math>
<span 
class="ecti-1000">and </span><!--l. 2349--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>Q</mi><mo 
class="MathClass-punc">,</mo><mi 
>P</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <msup><mrow 
><mi 
>R</mi></mrow><mrow 
><mi 
>n</mi></mrow></msup 
></mrow></math>
                                                                     

                                                                     
<span 
class="ecti-1000">satisfying </span><!--l. 2350--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msub><mrow 
><mi 
>P</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
> <mo 
class="MathClass-rel">&#x003E;</mo> <mn>0</mn><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>Q</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
> <mo 
class="MathClass-rel">&#x003E;</mo> <mn>0</mn></mrow></math>
<span 
class="ecti-1000">and </span><!--l. 2350--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msub><mrow 
><mo 
class="MathClass-op">&#x2211;</mo>
  </mrow><mrow 
><mi 
>i</mi></mrow></msub 
><msub><mrow 
><mi 
>Q</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
> <mo 
class="MathClass-rel">=</mo> <mn>1</mn></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">if </span><!--l. 2351--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">      <mrow 
><msubsup><mrow 
>
                                      <mo 
class="MathClass-op">&#x2211;</mo>
                      </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>n</mi></mrow></msubsup 
><msub><mrow 
><mi 
>P</mi></mrow><mrow 
>
<mi 
>i</mi></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>&#x03B2;</mi><msub><mrow 
><mi 
>y</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
>
    </mrow></msup 
> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>K</mi>
</mrow></math>
<span 
class="ecti-1000">then</span>
</p><!--l. 2355--><p class="indent">   <!--l. 2355--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
><msubsup><mrow 
>
                    <mo 
class="MathClass-op">&#x2211;</mo>
                       </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>n</mi></mrow></msubsup 
><msub><mrow 
><mi 
>Q</mi></mrow><mrow 
>
<mi 
>i</mi></mrow></msub 
><msub><mrow 
><mi 
>y</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
> <mo 
class="MathClass-rel">&#x2264;</mo> <mfrac><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>Q</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>P</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--><mi 
>K</mi></mrow> 
         <mrow 
><mi 
>&#x03B2;</mi></mrow></mfrac>
</mrow></math>
</p>
   </div>
   <div class="proof">
<!--l. 2360--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>given in <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>. <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 2362--><p class="indent">   We will need to prove the following lemma in order to tighten the PAC-Bayes bound.
It is analogous to Lemma 17 from <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>.
                                                                     

                                                                     
</p>
   <div class="newtheorem">
<!--l. 2365--><p class="noindent"><span class="head">
<a 
  name="x39-59004r3"></a>
  <span 
class="eccc-1000">L<small 
class="small-caps">E</small><small 
class="small-caps">M</small><small 
class="small-caps">M</small><small 
class="small-caps">A</small> </span>6.2.3<span 
class="eccc-1000">.</span></span>
</p><!--l. 2366--><p class="indent">   <span 
class="ecti-1000">For all &#x201C;priors&#x201D; </span><!--l. 2366--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">for all </span><!--l. 2366--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B4;</mi><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B1;</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math><span 
class="ecti-1000">:</span>
<!--l. 2367--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">    <mrow 
>
                         <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><msub><mrow 
><mi 
>E</mi></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
> <mo 
class="MathClass-rel">&#x003E;</mo>  <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B1;</mi><mi 
>&#x03B4;</mi></mrow></mfrac></mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
</p>
   </div>
   <div class="proof">
<!--l. 2372--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>For  any  given  hypothesis  <!--l. 2372--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>h</mi></mrow></math> we
will prove the following. </p><table class="equation"><tr><td> <a 
  name="x39-59005r1"></a>
                                                                     

                                                                     
<!--l. 2373--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">     
                                <mi 
>&#x2200;</mi><mi 
>h</mi> <msub><mrow 
><mi 
>E</mi></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
> <mo 
class="MathClass-rel">&#x2264;</mo> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B1;</mi></mrow></mfrac>
</math>
<!--l. 2376--><p class="nopar"></p></td><td class="eq-no">(6.2.1)</td></tr></table>
The Lemma then follows from the sequence: <!--l. 2378--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
                  <mo 
class="MathClass-rel">&#x21D2;</mo><mi 
>&#x2200;</mi><mi 
>p</mi> <msub><mrow 
><mi 
>E</mi></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><msub><mrow 
><mi 
>E</mi></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
> <mo 
class="MathClass-rel">&#x2264;</mo> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B1;</mi></mrow></mfrac>
</mrow></math>
<!--l. 2381--><p class="indent">   <!--l. 2381--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">       <mrow 
>
                        <mo 
class="MathClass-rel">&#x21D2;</mo><mi 
>&#x2200;</mi><mi 
>p</mi> <msub><mrow 
><mi 
>E</mi></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
><msub><mrow 
><mi 
>E</mi></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
> <mo 
class="MathClass-rel">&#x2264;</mo> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B1;</mi></mrow></mfrac>
</mrow></math> <!--l. 2383--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
                                                                     

                                                                     
<mrow 
>
               <mo 
class="MathClass-rel">&#x21D2;</mo><mi 
>&#x2200;</mi><mi 
>p</mi> <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><msub><mrow 
><mi 
>E</mi></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
> <mo 
class="MathClass-rel">&#x2265;</mo> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B1;</mi><mi 
>&#x03B4;</mi></mrow></mfrac></mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
</p><!--l. 2387--><p class="indent">   Consequently, we must only prove equation  <a 
href="#x39-59005r1">6.2.1<!--tex4ht:ref: eq-expectation --></a>. Given the hypothesis, we have a fixed true error rate, <!--l. 2388--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>, and the empirical error rate <!--l. 2388--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> will be distributed like
a Binomial. Let <!--l. 2389--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>R</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
be the random variable with a cumulative distribution given by the
relative entropy Chernoff bound for a hypothesis with true error <!--l. 2391--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.
In other words, define a cumulative distribution function on <!--l. 2392--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mrow><mo 
class="MathClass-open">[</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math> according to: <!--l. 2393--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
                              <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mfenced separators="" 
open="("  close=")" ><mrow><mi 
>R</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow></mfenced></mrow></msup 
>
</mrow></math> (note that we
defined <!--l. 2395--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> so that
it is always <!--l. 2395--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mn>0</mn></mrow></math>
when <!--l. 2396--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
> <mfrac><mrow 
><mi 
>k</mi></mrow>
<mrow 
><mi 
>m</mi></mrow></mfrac> <mo 
class="MathClass-rel">&#x003E;</mo> <mi 
>p</mi></mrow></math>).
Note that the relative entropy Chernoff bound implies <!--l. 2397--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>R</mi></mrow></math> satisfies: <!--l. 2398--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
                                                                     

                                                                     
<mrow 
>
                      <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>m</mi><mi 
>R</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mfenced separators="" 
open="("  close=")" ><mrow><mi 
>R</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow></mfenced></mrow></msup 
>
</mrow></math> whenever
<!--l. 2400--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">     <mrow 
><mi 
>m</mi><mi 
>R</mi></mrow></math>
is integer.
</p><!--l. 2402--><p class="indent">   Since <!--l. 2402--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
></mrow></math>
increases monotonically with decreasing <!--l. 2403--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>, the probability distribution
function of <!--l. 2404--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>R</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
will have a larger expected value. In other words: <!--l. 2405--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
             <msub><mrow 
><mi 
>E</mi></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
> <mo 
class="MathClass-rel">&#x2264;</mo> <msub><mrow 
><mi 
>E</mi></mrow><mrow 
>
<mi 
>R</mi></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>R</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
>
</mrow></math>The probability distribution
function of <!--l. 2407--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>R</mi></mrow></math>
                                                                     

                                                                     
is given by: <!--l. 2408--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
<mi 
>f</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mfenced separators="" 
open="{"  close="" ><mrow><mtable  
equalrows="false" equalcolumns="false" class="array"><mtr><mtd 
class="array"  columnalign="left"><mn>0</mn>                               </mtd><mtd 
class="array"  columnalign="left"><!--mstyle 
class="text"--><mtext class="textrm">for</mtext><!--/mstyle--><mspace width="2.77626pt" class="tmspace"/><mi 
>x</mi> <mo 
class="MathClass-rel">&#x2265;</mo> <mi 
>p</mi></mtd>
</mtr><mtr><mtd 
class="array"  columnalign="left">                            </mtd><mtd 
class="array"  columnalign="left">         </mtd>
</mtr><mtr><mtd 
class="array"  columnalign="left"><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>m</mi><mfrac><mrow 
><mi 
>&#x2202;</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow>
     <mrow 
><mi 
>&#x2202;</mi><mi 
>x</mi></mrow></mfrac>     <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
></mtd><mtd 
class="array"  columnalign="left"><!--mstyle 
class="text"--><mtext class="textrm">for</mtext><!--/mstyle--><mspace width="2.77626pt" class="tmspace"/><mi 
>x</mi> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>p</mi></mtd></mtr> <!--ll--></mtable>                                                       </mrow></mfenced>
</mrow></math>
Taking the expectation with respect to this distribution gives us:
</p><!--l. 2415--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">     
<mtable 
class="eqnarray-star" columnalign="right center left" >
<mtr><mtd 
class="eqnarray-1"> <msub><mrow 
><mi 
>E</mi></mrow><mrow 
><mi 
>R</mi></mrow></msub 
> <mfenced separators="" 
open="["  close="]" ><mrow><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>R</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
></mrow></mfenced></mtd><mtd 
class="eqnarray-2">     <mo 
class="MathClass-rel">=</mo></mtd><mtd 
class="eqnarray-3"><msubsup><mrow 
>    <mo 
class="MathClass-op">&#x222B;</mo>
                     </mrow><mrow 
><mn>0</mn></mrow><mrow 
><mn>1</mn></mrow></msubsup 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
><mi 
>f</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>d</mi><mi 
>x</mi>                </mtd><mtd 
class="eqnarray-4"> <mtext class="eqnarray"></mtext></mtd>
</mtr><mtr><mtd 
class="eqnarray-1">                          </mtd><mtd 
class="eqnarray-2">     </mtd><mtd 
class="eqnarray-3">                                                    </mtd><mtd 
class="eqnarray-4"> <mtext class="eqnarray"></mtext></mtd>
</mtr><mtr><mtd 
class="eqnarray-1">                          </mtd><mtd 
class="eqnarray-2">   <mo 
class="MathClass-rel">=</mo></mtd><mtd 
class="eqnarray-3"><msubsup><mrow 
>    <mo 
class="MathClass-op">&#x222B;</mo>
                     </mrow><mrow 
><mn>0</mn></mrow><mrow 
><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msubsup 
> <mo 
class="MathClass-bin">&#x2212;</mo> <mi 
>m</mi><mfrac><mrow 
><mi 
>&#x2202;</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow>
       <mrow 
><mi 
>&#x2202;</mi><mi 
>x</mi></mrow></mfrac>      <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
><mi 
>d</mi><mi 
>x</mi></mtd><mtd 
class="eqnarray-4"> <mtext class="eqnarray"></mtext></mtd>
</mtr><mtr><mtd 
class="eqnarray-1">                          </mtd><mtd 
class="eqnarray-2">   <mo 
class="MathClass-rel">=</mo></mtd><mtd 
class="eqnarray-3"><msubsup><mrow 
>    <mfenced separators="" 
open=""  close="|" ><mrow> <mfrac><mrow 
><mn>1</mn></mrow>
<mrow 
><mi 
>&#x03B1;</mi></mrow></mfrac><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B1;</mi><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
></mrow></mfenced> </mrow><mrow 
>
<mn>0</mn></mrow><mrow 
><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msubsup 
>                                 </mtd><mtd 
class="eqnarray-4"> <mtext class="eqnarray"></mtext></mtd>
</mtr><mtr><mtd 
class="eqnarray-1">                                      </mtd><mtd 
class="eqnarray-2">     <mo 
class="MathClass-rel">&#x2264;</mo></mtd><mtd 
class="eqnarray-3">  <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B1;</mi></mrow></mfrac>                                               </mtd><mtd 
class="eqnarray-4"> <mtext class="eqnarray"></mtext></mtd>        </mtr></mtable>
</math>
<!--l. 2421--><p class="nopar">
This finishes the proof of the technical lemma. <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 2424--><p class="indent">   Now, we can prove the relative entropy PAC-Bayes bound  <a 
href="#x39-59001r1">6.2.1<!--tex4ht:ref: th-repbb --></a>.
</p>
   <div class="proof">
                                                                     

                                                                     
<!--l. 2427--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>First, we can specialize lemma&#x00A0; <a 
href="#x39-59004r3">6.2.3<!--tex4ht:ref: lem-tech --></a> with <!--l. 2427--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B1;</mi> <mo 
class="MathClass-rel">=</mo>  <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>m</mi></mrow></mfrac></mrow></math>
to get that with probability <!--l. 2428--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mn>1</mn> <mo 
class="MathClass-bin">&#x2212;</mo> <mi 
>&#x03B4;</mi></mrow></math>
<!--l. 2429--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">    <mrow 
>
                              <msub><mrow 
><mi 
>E</mi></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-bin">&#x2212;</mo><mn>1</mn></mrow><mo 
class="MathClass-close">)</mo></mrow><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
> <mo 
class="MathClass-rel">&#x2264;</mo> <mfrac><mrow 
><mi 
>m</mi></mrow> 
 <mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac>
</mrow></math>
Apply lemma <a 
href="#x39-59003r2">6.2.2<!--tex4ht:ref: lem-opt --></a> with <!--l. 2431--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>K</mi> <mo 
class="MathClass-rel">=</mo> <mfrac><mrow 
><mi 
>m</mi></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac> </mrow></math>,
<!--l. 2431--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">    <mrow 
><mi 
>&#x03B2;</mi> <mo 
class="MathClass-rel">=</mo> <mi 
>m</mi> <mo 
class="MathClass-bin">&#x2212;</mo> <mn>1</mn></mrow></math>,
<!--l. 2432--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">    <mrow 
><msub><mrow 
><mi 
>Q</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
> <mo 
class="MathClass-rel">=</mo> <mi 
>q</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
and <!--l. 2432--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msub><mrow 
><mi 
>y</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
> <mo 
class="MathClass-rel">=</mo> <!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
to get: <!--l. 2433--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">      <mrow 
>
          <msub><mrow 
><mi 
>E</mi></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo><msubsup><mrow 
> <mo 
class="MathClass-op">&#x2211;</mo>
  </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>n</mi></mrow></msubsup 
><msub><mrow 
><mi 
>Q</mi></mrow><mrow 
>
<mi 
>i</mi></mrow></msub 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <mfrac><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>q</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--> <mfrac><mrow 
><mi 
>m</mi></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac> </mrow> 
       <mrow 
><mi 
>m</mi> <mo 
class="MathClass-bin">&#x2212;</mo> <mn>1</mn></mrow></mfrac>
</mrow></math>
                                                                     

                                                                     
Jensen&#x2019;s inequality, gives us: <!--l. 2436--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">      <mrow 
>
                <!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <msub><mrow 
><mi 
>E</mi></mrow><mrow 
><mi 
>q</mi></mrow></msub 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <mfrac><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>q</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--> <mfrac><mrow 
><mi 
>m</mi></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac> </mrow> 
       <mrow 
><mi 
>m</mi> <mo 
class="MathClass-bin">&#x2212;</mo> <mn>1</mn></mrow></mfrac>
</mrow></math>
which proves the theorem for the finite case. For the infinite case, a sequence of
limits can be defined just as in <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>. <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 2442--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse27.xml" >next</a>] [<a 
href="thesisse25.xml" >prev</a>] [<a 
href="thesisse25.xml#tailthesisse25.xml" >prev-tail</a>] [<a 
href="thesisse26.xml" >front</a>] [<a 
href="thesisch6.xml#thesisse26.xml" >up</a>] </p></div><a 
  name="tailthesisse26.xml"></a>  
</body> 
</html> 
