<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>4.1 Simple Holdout</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse16.xml" >next</a>] [<a 
href="#tailthesisse15.xml">tail</a>] [<a 
href="thesisch4.xml#thesisse15.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">4.1. </span> <a 
  name="x22-300004.1"></a>Simple Holdout</h3>
<!--l. 875--><p class="noindent">The simplest bound arises for the classical technique of splitting the data set into two pieces: a training
set of size <!--l. 876--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">train</mtext><!--/mstyle--></mrow></msub 
></mrow></math> and a
test set of size <!--l. 877--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></math>.
In this setting, the following simple bound applies:
</p>
   <div class="newtheorem">
<!--l. 880--><p class="noindent"><span class="head">
<a 
  name="x22-30001r1"></a>
  <span 
class="eccc-1000">T<small 
class="small-caps">H</small><small 
class="small-caps">E</small><small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">E</small><small 
class="small-caps">M</small> </span>4.1.1<span 
class="eccc-1000">.</span></span>
</p><!--l. 881--><p class="indent">   <span 
class="ecti-1000">(Holdout              Sample              Complexity)              Let              </span><!--l. 881--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
<span 
class="ecti-1000">be       the       empirical       error       on       the       test       set       and       </span><!--l. 882--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
<span 
class="ecti-1000">be    the    true    error    rate    of    the    hypothesis,    then    we    have:    </span><!--l. 884--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
                <mi 
>&#x2200;</mi><mi 
>h</mi> <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow> <msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0304;</mo></mover> <mfenced separators="" 
open="("  close=")" ><mrow><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi></mrow></mfenced></mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
<span 
class="ecti-1000">where                                                                                              </span><!--l. 886--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
                                                                     

                                                                     
<mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0304;</mo></mover> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo> <mfrac><mrow 
><mi 
>k</mi></mrow> 
<mrow 
><mi 
>m</mi></mrow></mfrac><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi></mrow></mfenced> <mo 
class="MathClass-rel">&#x2261;</mo><msub><mrow 
><mo 
> max</mo></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">{</mo><mrow><mi 
>p</mi> <mo 
class="MathClass-punc">:</mo>  <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mi 
>&#x03B4;</mi></mrow><mo 
class="MathClass-close">}</mo></mrow></mrow></math>
</p>
   </div>
   <div class="proof">
<!--l. 889--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>The proof is just a simple identification with the Binomial. For any
distribution over <!--l. 890--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
pairs and any hypothesis, <!--l. 890--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>h</mi></mrow></math>,
there exists some probability, <!--l. 891--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>,
that the hypothesis predicts incorrectly. We can regard this event as a coin flip with
bias <!--l. 892--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.
Since each example is picked independently, the distribution of the empirical error
rate will then be a Binomial distribution. Given that the distribution is Binomial
we calculate an upper bound which holds with high probability. <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 897--><p class="indent">   There are two immediate corollaries of the holdout theorem ( <a 
href="#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>)
which are mathematically simpler although not as tight. The first corollary
applies to the limited &#x201C;realizable&#x201D; setting where you happen to observe <!--l. 899--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn></mrow></math> test
errors.
</p>
   <div class="newtheorem">
<!--l. 902--><p class="noindent"><span class="head">
<a 
  name="x22-30002r2"></a>
  <span 
class="eccc-1000">C<small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">L</small><small 
class="small-caps">L</small><small 
class="small-caps">A</small><small 
class="small-caps">R</small><small 
class="small-caps">Y</small> </span>4.1.2<span 
class="eccc-1000">.</span></span>
</p><!--l. 903--><p class="indent">   <span 
class="ecti-1000">(Realizable            Holdout            Sample            Complexity)            </span><!--l. 904--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
                                                                     

                                                                     
<mrow 
>
                 <mi 
>&#x2200;</mi><mi 
>h</mi> <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><mi 
>D</mi></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mn>0</mn><mo 
class="MathClass-rel">&#x2223;</mo><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo> <mfrac><mrow 
><mo 
>ln</mo><!--nolimits--> <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac></mrow> 
<mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></mfrac></mrow></mfenced> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
</p>
   </div>
   <div class="proof">
<!--l. 909--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>Specializing theorem  <a 
href="#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a> to the zero empirical error case, we get: <!--l. 910--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
                 <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mn>0</mn><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B5;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <msup><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>1</mn> <mo 
class="MathClass-bin">&#x2212;</mo> <mi 
>&#x03B5;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></msup 
> <mo 
class="MathClass-rel">&#x2264;</mo> <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mi 
>&#x03B5;</mi><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></msup 
>
</mrow></math>
Setting                     this                     equal                     to                     <!--l. 912--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>&#x03B4;</mi></mrow></math>
and                              solving                              for                              <!--l. 912--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>&#x03B5;</mi></mrow></math>
gives us the result. <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 915--><p class="indent">   A second corollary applies to all results, not just those where we observe <!--l. 915--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn></mrow></math>
errors.
</p>
   <div class="newtheorem">
<!--l. 918--><p class="noindent"><span class="head">
                                                                     

                                                                     
<a 
  name="x22-30003r3"></a>
  <span 
class="eccc-1000">C<small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">L</small><small 
class="small-caps">L</small><small 
class="small-caps">A</small><small 
class="small-caps">R</small><small 
class="small-caps">Y</small> </span>4.1.3<span 
class="eccc-1000">.</span></span>
</p><!--l. 919--><p class="indent">   <span 
class="ecti-1000">(Agnostic             Holdout             Sample             Complexity)             </span><!--l. 920--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">
<mrow 
>
                <mi 
>&#x2200;</mi><mi 
>h</mi> <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">&#x2212;</mo><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo><msqrt><mi 
></mi>
 <mrow> <mfrac> <mrow 
> <mo 
> ln</mo> <!--nolimits--> <mfrac> <mrow 
> <mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac></mrow>
<mrow 
><mn>2</mn><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></mfrac></mrow></msqrt></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
</p>
   </div>
   <div class="proof">
<!--l. 925--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>Loosening theorem <a 
href="#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a> with the Hoeffding approximation for <!--l. 925--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
>   <mfrac><mrow 
><mi 
>k</mi></mrow>
<mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></mfrac> <mo 
class="MathClass-rel">&#x003C;</mo> <mi 
>&#x03B5;</mi></mrow></math>,
we get: <!--l. 927--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">      <mrow 
>
                             <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B5;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo> <msup><mrow 
><mi 
>e</mi></mrow><mrow 
><mo 
class="MathClass-bin">&#x2212;</mo><mn>2</mn><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><msup><mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>&#x03B5;</mi><mo 
class="MathClass-bin">&#x2212;</mo>  <mfrac><mrow 
><mi 
>k</mi></mrow>
<mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
>
<!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></mfrac></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mrow 
><mn>2</mn></mrow></msup 
>
                                  </mrow></msup 
>
</mrow></math>
Using the inversion lemma <a 
href="thesisse12.xml#x18-26001r1">3.4.1<!--tex4ht:ref: lem-inversion --></a> we can set this equal to <!--l. 929--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B4;</mi></mrow></math>,
and solve for <!--l. 930--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>&#x03B5;</mi></mrow></math>
                                                                     

                                                                     
to get the result. <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
   <div class="newtheorem">
<!--l. 932--><p class="noindent"><span class="head">
<a 
  name="x22-30004r4"></a>
  <span 
class="eccc-1000">R<small 
class="small-caps">E</small><small 
class="small-caps">M</small><small 
class="small-caps">A</small><small 
class="small-caps">R</small><small 
class="small-caps">K</small> </span>4.1.4<span 
class="eccc-1000">.</span></span>
</p><!--l. 933--><p class="indent">   Similar theorems apply to bound <!--l. 933--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">&#x2212;</mo> <msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.
</p>
   </div>
<!--l. 935--><p class="indent">   How tight is the test sample complexity theorem  <a 
href="#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>? The answer is very tight. Let us
define  <!--l. 937--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                  <mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0304;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2261;</mo><msub><mrow 
><mo 
> max</mo></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><mi 
>p</mi> <mo 
class="MathClass-punc">:</mo>  <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover></mrow><mrow 
><mi 
>S</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
as our true error bound. We wish to know how much <!--l. 939--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> and <!--l. 939--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0304;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> differ.
Applying the Hoeffding approximation, we know that with high probability, <!--l. 941--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2265;</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0304;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-bin">&#x2212;</mo> <mn>2</mn><msqrt><mi 
></mi>
 <mrow> <mfrac> <mrow 
> <mo 
>ln</mo> <!--nolimits--> <mfrac> <mrow 
> <mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac></mrow>
<mrow 
><mn>2</mn><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></mfrac></mrow></msqrt></mrow></math>. Thus the region in which <!--l. 942--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> is confined with high
confidence is of size <!--l. 943--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mn>2</mn><msqrt><mi 
></mi>
 <mrow> <mfrac> <mrow 
> <mo 
>ln</mo> <!--nolimits--> <mfrac> <mrow 
> <mn>1</mn></mrow> 
<mrow 
><mi 
>&#x03B4;</mi></mrow></mfrac></mrow>
<mrow 
><mn>2</mn><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></mfrac></mrow></msqrt></mrow></math>
or smaller.
</p><!--l. 945--><p class="indent">   It is common practice in the field of machine learning to use the gaussian approximation
in reporting error bars. The practice is reasonably safe because it is usually pessimistic.
However, this can occasionally lead to embarrassing results where error rates such as <!--l. 948--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>0</mn><mn>1</mn> <mo 
class="MathClass-bin">&#x00B1;</mo> <mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>0</mn><mn>2</mn></mrow></math> are
reported. The test sample complexity theorem <span 
class="ecti-1000">never </span>produces an upper bound greater than <!--l. 949--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
                                                                     

                                                                     
<mrow 
><mn>1</mn></mrow></math> or lower bound
less than <!--l. 950--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mn>0</mn></mrow></math>
because it uses the fundamental Binomial distribution. This approach is the &#x201C;right&#x201D; way
to report test-set based errors, given the assumption of independence. Appendix Section
<a 
href="thesisse60.xml#x86-12900016.1">16.1<!--tex4ht:ref: sec-test-bound-calc --></a> documents how to apply this bound. Pictorially we can represent this as in figure
<a 
href="#x22-300051">4.1.1<!--tex4ht:ref: fig-holdout-protocol --></a>. </p><hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x22-300051"></a>
                                                                     

                                                                     
<!--l. 956--><p class="noindent"><img 
src="thesis2x.gif" alt="PIC" class="graphics" width="673.51624pt" height="404.51125pt"  /><!--tex4ht:graphics  
name="thesis2x.gif" src="thesis-presentation/test_set.eps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;4.1.1: </td><td  
class="content"><a 
  name="x22-300051"></a> For this diagram &#x201C;increasing time&#x201D; is pointing downwards. The only
requirement for applying this bound is that the learner must commit to a hypothesis
without knowledge of the test examples. Similar diagrams for other bounds will
be presented later (and they are somewhat more complicated). We can think of
the bound as a technique by which the &#x201C;Learner&#x201D; can convince the &#x201C;Verifier&#x201D; that
learning has occurred. Each of the proofs in this thesis can be thought of as a
communication protocol for an interactive proof of learning by the Learner.</td></tr></table></div><!--tex4ht:label?: x22-300051 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" />
<!--l. 970--><p class="indent">   Some results for application of the simple test set bound are presented on page  <a 
href="thesisse55.xml#x76-1110013">326<!--tex4ht:ref: fig-simple-holdout --></a>
in figure  <a 
href="thesisse55.xml#x76-1110013">12.3.3<!--tex4ht:ref: fig-simple-holdout --></a>. In summary, the test set bound tends to work quite well (in practice)
when sufficient examples are available.
</p><!--l. 975--><p class="indent">   Given that the bounds for the simple holdout technique are so tight,
why do we need to engage in further work? There is one serious drawback
to the holdout technique&#x2014;application of the holdout technique requires <!--l. 977--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></math> otherwise
unused examples. This can strongly degrade the value of the learned hypothesis because an
extra <!--l. 979--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></math>
examples for the training set could reduce the true error of the learned hypothesis from <!--l. 980--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>5</mn></mrow></math> to <!--l. 981--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>0</mn></mrow></math> on
some learning problems.
</p><!--l. 983--><p class="indent">   There is another reason why training set based bounds are important. Many learning
algorithms implicitly assume that the training error &#x201C;behaves like&#x201D; the true error in
choosing the hypothesis. With an inadequate number of training examples,
there may be very little relationship between the behavior of the training error
and the true error. Training error based bounds can be used <span 
class="ecti-1000">in </span>the training
algorithm.
</p><!--l. 990--><p class="indent">   There are two basic approaches to this difficulty:
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x22-30007x1"></a>Try to reduce <!--l. 993--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msub><mrow 
><mi 
>m</mi></mrow><mrow 
><!--mstyle 
class="text"--><mtext class="textrm">test</mtext><!--/mstyle--></mrow></msub 
></mrow></math>
        using more sophisticated holdout techniques.
           </li>
        <li class="enumerate"><a 
  name="x22-30009x2"></a>Do  not  use  a  holdout  set.  Instead,  train  and  test  on  the  same  set  of
        examples using a more sophisticated bound.</li></ol>
<!--l. 996--><p class="nopar"> Before discussing approach (2) we will make a few comments about approach (1) to
suggest the variety of theoretical difficulties which occur when using approach
(1).
</p>
   <h4 class="subsectionHead"><span class="titlemark">4.1.1. </span> <a 
  name="x22-310004.1.1"></a>Cross Validation</h4>
<!--l. 1004--><p class="noindent">One of the standard techniques for attempting to improve on the holdout
bound is cross validation. K-fold cross validation divides the data into <!--l. 1005--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>K</mi></mrow></math> folds of size <!--l. 1006--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mfrac><mrow 
><mi 
>m</mi></mrow>
<mrow 
><mi 
>K</mi></mrow></mfrac></mrow></math> (assume <!--l. 1006--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>m</mi></mrow></math> is divisible by <!--l. 1006--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>K</mi></mrow></math> for simplicity). Then,
for every fold <!--l. 1007--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>i</mi></mrow></math>,
holdout fold <!--l. 1007--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>i</mi></mrow></math>,
train on the remainder of the data and test on fold <!--l. 1008--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
                                                                     

                                                                     
<mrow 
><mi 
>i</mi></mrow></math>.
Let the hypotheses we found by training be known as <!--l. 1009--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mi 
>K</mi></mrow></msub 
></mrow></math> and their respective
holdout errors as <!--l. 1010--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mn>1</mn></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>K</mi></mrow></msub 
></mrow></math>.
Also let <!--l. 1010--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>c</mi><mi 
>v</mi></mrow></msub 
> <mo 
class="MathClass-rel">=</mo>  <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>K</mi></mrow></mfrac><msubsup><mrow 
> <mo 
class="MathClass-op">&#x2211;</mo>
    </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>K</mi></mrow></msubsup 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow></math>.
</p><!--l. 1012--><p class="indent">   There are several variations of cross validation. If <!--l. 1012--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>K</mi> <mo 
class="MathClass-rel">=</mo> <mi 
>m</mi></mrow></math>,
the procedure is often called &#x201C;leave one out cross validation&#x201D;. In one
variant, you train on all of the data to learn a new hypothesis, <!--l. 1014--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>h</mi></mrow></math>, and assume a true error rate
near <!--l. 1015--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>C</mi><mi 
>V</mi> </mrow></msub 
></mrow></math>. In another variant,
you predict according to <!--l. 1016--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mi 
>c</mi><mi 
>v</mi></mrow></msub 
> <mo 
class="MathClass-rel">=</mo> <mi 
>U</mi><mi 
>n</mi><mi 
>i</mi><mi 
>f</mi><mi 
>o</mi><mi 
>r</mi><mi 
>m</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>h</mi></mrow><mrow 
><mi 
>K</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>.
The latter variant is simpler to analyze because linearity of expectation implies that <!--l. 1017--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>c</mi><mi 
>v</mi></mrow></msub 
></mrow></math> is an unbiased
estimate of <!--l. 1018--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0304;</mo></mover></mrow><mrow 
><mi 
>c</mi><mi 
>v</mi></mrow></msub 
></mrow></math>.
</p><!--l. 1020--><p class="indent">   There are strong results known for cross validation on nearest-neighbor, kernel, and
histogram classifiers <span class="cite">[<a 
href="thesisli2.xml#XDevroye"><span 
class="ecbx-1000">11</span></a>]</span>. In general, only very weak results are known about
bounds on the variance of cross validation for general classifiers. The &#x201C;general&#x201D;
results include &#x201C;Sanity check bounds&#x201D; <span class="cite">[<a 
href="thesisli2.xml#XSanity"><span 
class="ecbx-1000">27</span></a>]</span> which state that cross validation is
not much worse than a holdout set and some slightly stronger results <span class="cite">[<span 
class="ecbx-1000">?</span>]</span> and
<span class="cite">[<a 
href="thesisli2.xml#XKalai"><span 
class="ecbx-1000">25</span></a>]</span>.
</p>
   <div class="newtheorem">
<!--l. 1027--><p class="noindent"><span class="head">
<a 
  name="x22-31001r5"></a>
  <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">B</small><small 
class="small-caps">L</small><small 
class="small-caps">E</small><small 
class="small-caps">M</small> </span>4.1.5<span 
class="eccc-1000">.</span></span>
</p><!--l. 1028--><p class="indent">   (Open) Construct a bound on the deviation of cross validation for arbitrary
classifiers which is a quantitative improvement on the results of <span class="cite">[<span 
class="ecbx-1000">?</span>]</span>.
</p>
   </div>
<!--l. 1032--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse16.xml" >next</a>] [<a 
href="thesisse15.xml" >front</a>] [<a 
href="thesisch4.xml#thesisse15.xml" >up</a>] </p></div><a 
  name="tailthesisse15.xml"></a>   
</body> 
</html> 
