<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>4.6 Incorporating a &#x201C;Prior&#x201D;</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse19.xml" >prev</a>] [<a 
href="thesisse19.xml#tailthesisse19.xml" >prev-tail</a>] [<a 
href="#tailthesisse20.xml">tail</a>] [<a 
href="thesisch4.xml#thesisse20.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">4.6. </span> <a 
  name="x27-360004.6"></a>Incorporating a &#x201C;Prior&#x201D;</h3>
<!--l. 1263--><p class="noindent">In constructing the discrete hypothesis bound ( <a 
href="thesisse16.xml#x23-32001r1">4.2.1<!--tex4ht:ref: th-DHSCP --></a>), it is notable that an arbitrary
choice was made. We decided to give the same error allowance to every hypothesis. This
is an arbitrary choice which, in practice, we will wish to make differently. The next
theorem is essentially a restatement of the discrete hypothesis bound with this arbitrary
choice made explicit. The same bound using the Hoeffding approximation has appeared
elsewhere <span class="cite">[<a 
href="thesisli2.xml#XBEHW"><span 
class="ecbx-1000">5</span></a>]</span><span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>.
</p>
   <div class="newtheorem">
<!--l. 1270--><p class="noindent"><span class="head">
<a 
  name="x27-36001r1"></a>
  <span 
class="eccc-1000">T<small 
class="small-caps">H</small><small 
class="small-caps">E</small><small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">E</small><small 
class="small-caps">M</small> </span>4.6.1<span 
class="eccc-1000">.</span></span>
</p><!--l. 1271--><p class="indent">   <span 
class="ecti-1000">(Occam&#x2019;s Razor Bound) For all hypothesis spaces, </span><!--l. 1271--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>H</mi></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">for all &#x201C;priors&#x201D; </span><!--l. 1272--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
<span 
class="ecti-1000">over the hypothesis space, </span><!--l. 1272--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>H</mi></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">for all </span><!--l. 1272--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B4;</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math><span 
class="ecti-1000">:</span>
<!--l. 1273--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">    <mrow 
>
                   <mi 
>&#x2200;</mi><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>&#x2203;</mi><mi 
>h</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mi 
>H</mi> <mo 
class="MathClass-punc">:</mo>  <mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0304;</mo></mover> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></mfenced></mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
<span 
class="ecti-1000">where </span><!--l. 1275--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0304;</mo></mover> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo> <mfrac><mrow 
><mi 
>k</mi></mrow> 
<mrow 
><mi 
>m</mi></mrow></mfrac><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi></mrow></mfenced> <mo 
class="MathClass-rel">&#x2261;</mo><msub><mrow 
><mo 
> max</mo></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">{</mo><mrow><mi 
>p</mi> <mo 
class="MathClass-punc">:</mo>  <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mi 
>&#x03B4;</mi></mrow><mo 
class="MathClass-close">}</mo></mrow></mrow></math>
</p>
   </div>
<!--l. 1277--><p class="indent">   It is very important to notice that the &#x201C;prior&#x201D;  <!--l. 1277--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> must
be selected <span 
class="ecti-1000">before </span>seeing the examples.
                                                                     

                                                                     
</p>
   <div class="proof">
<!--l. 1281--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>The proof again starts with the basic observation that: <!--l. 1282--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">      <mrow 
>
                                  <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0304;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
then, we apply the union bound in a <span 
class="ecti-1000">nonuniform </span>manner. In particular, we allocate
confidence <!--l. 1285--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B4;</mi><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
to hypothesis <!--l. 1285--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>h</mi></mrow></math>.
Since <!--l. 1285--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>p</mi></mrow></math>
is normalized, we know that <!--l. 1287--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">      <mrow 
><msub><mrow 
>
                                             <mo 
class="MathClass-op">&#x2211;</mo>
                        </mrow><mrow 
><mi 
>h</mi></mrow></msub 
><mi 
>&#x03B4;</mi><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
which implies that the union bound completes the proof. <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 1291--><p class="indent">   Once again, we can relax the Occam&#x2019;s Razor bound (theorem  <a 
href="#x27-36001r1">4.6.1<!--tex4ht:ref: th-ORB --></a>) with the
relative entropy Chernoff bound ( <a 
href="thesisse10.xml#x16-24001r1">3.2.1<!--tex4ht:ref: eq-recb --></a>) to get a somewhat more tractable
                                                                     

                                                                     
expression.
</p>
   <div class="newtheorem">
<!--l. 1295--><p class="noindent"><span class="head">
<a 
  name="x27-36002r2"></a>
  <span 
class="eccc-1000">C<small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">L</small><small 
class="small-caps">L</small><small 
class="small-caps">A</small><small 
class="small-caps">R</small><small 
class="small-caps">Y</small> </span>4.6.2<span 
class="eccc-1000">.</span></span>
</p><!--l. 1296--><p class="indent">   <span 
class="ecti-1000">(Relative Entropy Occam&#x2019;s Razor Bound) For all hypothesis spaces, </span><!--l. 1297--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>H</mi></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">for all &#x201C;priors&#x201D; </span><!--l. 1297--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
<span 
class="ecti-1000">over the hypothesis space, </span><!--l. 1298--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>H</mi></mrow></math><span 
class="ecti-1000">,</span>
<span 
class="ecti-1000">for all </span><!--l. 1298--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B4;</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">]</mo></mrow></mrow></math><span 
class="ecti-1000">:</span>
<!--l. 1299--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">    <mrow 
>
                  <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>&#x2203;</mi><mi 
>h</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <mi 
>H</mi> <mo 
class="MathClass-punc">:</mo>  <!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo> <mfrac><mrow 
><mo 
>ln</mo><!--nolimits-->   <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></mfrac> <mo 
class="MathClass-bin">+</mo><mo 
> ln</mo><!--nolimits--> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac></mrow> 
        <mrow 
><mi 
>m</mi></mrow></mfrac>       </mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
</p>
   </div>
   <div class="proof">
<!--l. 1304--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>approximate the Binomial tail with ( <a 
href="thesisse10.xml#x16-24001r1">3.2.1<!--tex4ht:ref: eq-recb --></a>) and solve for the minimum.
<span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 1306--><p class="indent">   The Occam&#x2019;s razor bound is often nonvacuous for discrete learning algorithms such as
decision lists and decision trees. The next chapter will discuss a particular motivated choice of the
measure <!--l. 1308--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
which can lead to much tighter bounds in practice.
</p><!--l. 1311--><p class="indent">   The application of the Occam&#x2019;s Razor bound is somewhat more complicated then the
                                                                     

                                                                     
application of the test set bound. Pictorially, the protocol for bound application is given
in figure  <a 
href="#x27-360031">4.6.1<!--tex4ht:ref: fig-training-set --></a>. </p><hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x27-360031"></a>
                                                                     

                                                                     
<!--l. 1315--><p class="noindent"><img 
src="thesis4x.gif" alt="PIC" class="graphics" width="678.53499pt" height="405.515pt"  /><!--tex4ht:graphics  
name="thesis4x.gif" src="thesis-presentation/training_set.eps"  
-->
<br />  </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;4.6.1:   </td><td  
class="content"><a 
  name="x27-360031"></a>   In   order   to   apply   this   bound   it   is   necessary   that
the   choice   of   &#x201C;Prior&#x201D;   be   made   before   seeing   any   training   examples.
Then,   the   bound   is   calculated   based   upon   the   chosen   hypothesis.
Note  that  it  <span 
class="ecti-1000">is  </span>&#x201C;legal&#x201D;   to  chose  the  hypothesis  based  upon  the  prior  <!--l. 1322--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
as well as the empirical error <!--l. 1322--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>S</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.</td></tr></table></div><!--tex4ht:label?: x27-360031 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" />
<!--l. 1326--><p class="indent">   Examples of calculation of these bounds is detailed in appendix section
<a 
href="thesisse61.xml#x87-13000016.2">16.2<!--tex4ht:ref: sec-train-bound-calc --></a>.
</p><!--l. 1328--><p class="indent">   The &#x201C;Occam&#x2019;s Razor bound&#x201D; is strongly related to compression.
In particular, for any self-terminating description language, <!--l. 1329--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>d</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>, we can associate a
&#x201C;prior&#x201D; <!--l. 1330--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <msup><mrow 
><mn>2</mn></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>d</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></msup 
></mrow></math> with the
property that <!--l. 1330--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msub><mrow 
><mo 
class="MathClass-op">&#x2211;</mo>
  </mrow><mrow 
><mi 
>h</mi></mrow></msub 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <mn>1</mn></mrow></math>.
Consequently, short description length hypotheses will tend to have a tighter convergence and the
penalty term, <!--l. 1332--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mo 
>ln</mo><!--nolimits-->   <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></mfrac></mrow></math>
is the number of &#x201C;nats&#x201D; (bits base e).
</p><!--l. 1336--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse19.xml" >prev</a>] [<a 
href="thesisse19.xml#tailthesisse19.xml" >prev-tail</a>] [<a 
href="thesisse20.xml" >front</a>] [<a 
href="thesisch4.xml#thesisse20.xml" >up</a>] </p></div><a 
  name="tailthesisse20.xml"></a>  
</body> 
</html> 
