<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>3.2 Approximation techniques</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse11.xml" >next</a>] [<a 
href="thesisse9.xml" >prev</a>] [<a 
href="thesisse9.xml#tailthesisse9.xml" >prev-tail</a>] [<a 
href="#tailthesisse10.xml">tail</a>] [<a 
href="thesisch3.xml#thesisse10.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">3.2. </span> <a 
  name="x16-240003.2"></a>Approximation techniques</h3>
<!--l. 631--><p class="noindent">Exact calculation of <!--l. 631--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
(covered in the next subsection) can require computation at least proportional to <!--l. 632--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>m</mi></mrow></math>, which is often too
expensive. For the bounds in this thesis, we will only need to calculate an upper bound on the quantity <!--l. 634--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>. There are
several inequalities which are often used. The first of these is the Hoeffding inequality<span class="cite">[<a 
href="thesisli2.xml#XHoeffding"><span 
class="ecbx-1000">23</span></a>]</span>. Assume
that <!--l. 636--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
> <mfrac><mrow 
><mi 
>k</mi></mrow>
<mrow 
><mi 
>m</mi></mrow></mfrac> <mo 
class="MathClass-rel">&#x003C;</mo> <mi 
>p</mi></mrow></math> then
we have: <!--l. 637--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                                     <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mn>2</mn><mi 
>m</mi><msup><mrow 
><mfenced separators="" 
open="("  close=")" ><mrow><mi 
>p</mi><mo 
class="MathClass-bin">&#x2212;</mo><mfrac><mrow 
><mi 
>k</mi></mrow>
<mrow 
><mi 
>m</mi></mrow></mfrac></mrow></mfenced></mrow><mrow 
><mn>2</mn></mrow></msup 
>
                  </mrow></msup 
>
</mrow></math>
Intuitively, this inequality can be seen as fitting a gaussian to the Binomial distribution with <!--l. 640--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>p</mi> <mo 
class="MathClass-rel">=</mo> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mn>2</mn></mrow></mfrac></mrow></math>. For any
particular <!--l. 640--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>m</mi></mrow></math>,
the variance of the Binomial distribution is maximized when <!--l. 641--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>p</mi> <mo 
class="MathClass-rel">=</mo> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mn>2</mn></mrow></mfrac></mrow></math>.
Therefore, the Hoeffding inequality is relatively tight when <!--l. 642--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>p</mi> <mo 
class="MathClass-rel">=</mo> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mn>2</mn></mrow></mfrac></mrow></math>.
Unfortunately, the Hoeffding approximation is not tight enough for our purposes. In
machine learning, our goal is to find a hypothesis with a true error rate far away from <!--l. 645--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mfrac><mrow 
><mn>1</mn></mrow>
<mrow 
><mn>2</mn></mrow></mfrac></mrow></math> where
the Hoeffding inequality becomes loose.
</p><!--l. 647--><p class="indent">   There is another bound known as the &#x201C;realizable bound&#x201D; which applies only when <!--l. 648--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>k</mi> <mo 
class="MathClass-rel">=</mo> <mn>0</mn></mrow></math>. The realizable
                                                                     

                                                                     
bound is: <!--l. 649--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                                 <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mn>0</mn><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <msup><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn> <mo 
class="MathClass-bin">&#x2212;</mo> <mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mrow 
><mi 
>m</mi></mrow></msup 
> <mo 
class="MathClass-rel">&#x2264;</mo> <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>m</mi><mi 
>p</mi></mrow></msup 
>
</mrow></math>
The realizable bound is noticeably tighter with an exponent proportional to <!--l. 652--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mfenced separators="" 
open="|"  close="|" ><mrow><mi 
>p</mi> <mo 
class="MathClass-bin">&#x2212;</mo> <mfrac><mrow 
><mi 
>k</mi></mrow> 
<mrow 
><mi 
>m</mi></mrow></mfrac></mrow></mfenced></mrow></math> rather than
<!--l. 652--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">     <mrow 
><msup><mrow 
><mfenced separators="" 
open="("  close=")" ><mrow><mi 
>p</mi> <mo 
class="MathClass-bin">&#x2212;</mo> <mfrac><mrow 
><mi 
>k</mi></mrow> 
<mrow 
><mi 
>m</mi></mrow></mfrac></mrow></mfenced> </mrow><mrow 
><mn>2</mn></mrow></msup 
></mrow></math>.
The disadvantage of the realizable bound is that it only applies to
a very limited setting - when our empirical error rate happens to be <!--l. 654--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn></mrow></math>.
</p><!--l. 656--><p class="indent">   Luckily, there exists a quickly calculable bound which achieves
the generality of the Hoeffding bound along with the tightness of the
realizable bound. We have the relative entropy Chernoff bound <span class="cite">[<a 
href="thesisli2.xml#XChernoff"><span 
class="ecbx-1000">7</span></a>]</span> for <!--l. 658--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mfrac><mrow 
><mi 
>k</mi></mrow>
<mrow 
><mi 
>m</mi></mrow></mfrac> <mo 
class="MathClass-rel">&#x003C;</mo> <mi 
>p</mi></mrow></math>:
</p>
   <table class="equation"><tr><td> <a 
  name="x16-24001r1"></a>
<!--l. 660--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">     
                                    <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>m</mi><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mfenced separators="" 
open="("  close=")" ><mrow> <mfrac><mrow 
><mi 
>k</mi></mrow>
<mrow 
><mi 
>m</mi></mrow></mfrac><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow></mfenced></mrow></msup 
>
</math>
<!--l. 663--><p class="nopar"></p></td><td class="eq-no">(3.2.1)</td></tr></table>
Here <!--l. 664--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><!--mstyle 
class="text"--><mtext class="textrm">KL</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>q</mi><mo 
class="MathClass-rel">&#x2223;</mo><mo 
class="MathClass-rel">&#x2223;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mi 
>q</mi><mo 
>ln</mo><!--nolimits--> <mfrac><mrow 
><mi 
>q</mi></mrow> 
<mrow 
><mi 
>p</mi></mrow></mfrac> <mo 
class="MathClass-bin">+</mo> <mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn> <mo 
class="MathClass-bin">&#x2212;</mo> <mi 
>q</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
>ln</mo><!--nolimits--> <mfrac><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>q</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow> 
<mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></mfrac></mrow></math>
is the KL-divergence between a coin of bias <!--l. 665--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>q</mi></mrow></math> and another coin
of bias <!--l. 666--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>p</mi></mrow></math>.
The relative Chernoff bound is as tight as the Hoeffding bound when <!--l. 667--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>p</mi></mrow></math> is near <!--l. 667--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mfrac><mrow 
><mn>1</mn></mrow>
<mrow 
><mn>2</mn></mrow></mfrac></mrow></math> and as tight as the
realizable bound when <!--l. 668--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>k</mi> <mo 
class="MathClass-rel">=</mo> <mn>0</mn></mrow></math>.
In between the extremes, the relative Chernoff bound smoothly interpolates between
these possibilities.
                                                                     

                                                                     
<!--l. 671--><p class="indent">   We are concerned with the different bounds here because much of the
learning theory literature (see <span class="cite">[<a 
href="thesisli2.xml#XValiant"><span 
class="ecbx-1000">50</span></a>]</span>, <span class="cite">[<a 
href="thesisli2.xml#XHaussler"><span 
class="ecbx-1000">20</span></a>]</span>, <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span> for examples) works with either
the realizable bound or the Hoeffding bound, or both. In contrast, we will
work with either the relative Chernoff bound or the exact tail probability, <!--l. 675--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.
There are several advantages to this approach:
</p><!--l. 677--><p class="indent">
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x16-24003x1"></a>Sometimes, a different approach to producing a bound will appear better
        than previous approaches, but the apparent benefit can simply be traced to
        the use of a tighter bound on <!--l. 680--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.
           </li>
        <li class="enumerate"><a 
  name="x16-24005x2"></a>The bounds presented here will all be immediately applicable to direct
        calculation.
           </li>
        <li class="enumerate"><a 
  name="x16-24007x3"></a>We avoid the need to state two versions of the same theorem: once for the
        realizable (<!--l. 683--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mn>0</mn></mrow></math>
        empirical error) case and once for the agnostic (arbitrary empirical error)
        case.</li></ol>
<!--l. 685--><p class="nopar"> The principle <span 
class="ecti-1000">dis</span>advantage of this approach is that both the relative entropy Chernoff bound
and <!--l. 687--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
are not analytically invertible. Lack of invertibility is a theoretical disadvantage
because it means we can not easily parameterize our &#x201C;precision&#x201D; parameter, <!--l. 689--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>&#x03B5;</mi></mrow></math> in terms
of <!--l. 690--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>&#x03B4;</mi></mrow></math>.
Nonetheless, this is not a severe computational disadvantage because the quantity <!--l. 691--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> is convex
in <!--l. 691--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>p</mi></mrow></math>
implying that a binary search is capable of solving the inequality. The process of (and
need for) inversion is discussed next.
</p><!--l. 696--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse11.xml" >next</a>] [<a 
href="thesisse9.xml" >prev</a>] [<a 
href="thesisse9.xml#tailthesisse9.xml" >prev-tail</a>] [<a 
href="thesisse10.xml" >front</a>] [<a 
href="thesisch3.xml#thesisse10.xml" >up</a>] </p></div><a 
  name="tailthesisse10.xml"></a>  
</body> 
</html> 
