<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>4.5 Structural Risk Minimization</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse20.xml" >next</a>] [<a 
href="thesisse18.xml" >prev</a>] [<a 
href="thesisse18.xml#tailthesisse18.xml" >prev-tail</a>] [<a 
href="#tailthesisse19.xml">tail</a>] [<a 
href="thesisch4.xml#thesisse19.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">4.5. </span> <a 
  name="x26-350004.5"></a>Structural Risk Minimization</h3>
<!--l. 1221--><p class="noindent">Structural Risk Minimization <span class="cite">[<a 
href="thesisli2.xml#XVapnik"><span 
class="ecbx-1000">51</span></a>]</span> (SRM) is a technique used in the learning theory
community to avoid the difficulties associated with convergence on hypothesis sets
that are too &#x201C;large&#x201D;. SRM works with a sequence of nested hypothesis sets, <!--l. 1224--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
> <mo 
class="MathClass-rel">&#x2282;</mo> <msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>2</mn></mrow></msub 
> <mo 
class="MathClass-rel">&#x2282;</mo> <mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo> <mo 
class="MathClass-rel">&#x2282;</mo> <msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mi 
>l</mi></mrow></msub 
></mrow></math>. For
each hypothesis set, a discrete hypothesis bound ( <a 
href="thesisse16.xml#x23-32001r1">4.2.1<!--tex4ht:ref: th-DHSCP --></a>) on the difference between
empirical and true error exists. For &#x201C;small&#x201D; hypothesis sets, this bound may be tight
while for large hypothesis sets it may be inherently loose. However, we also expect that
the best hypothesis in the hypothesis set improves as the hypothesis set becomes
larger. This naturally induces a trade-off: there will be some hypothesis set <!--l. 1230--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow></math> for
which the true error bound is minimized.
</p><!--l. 1232--><p class="indent">   We can&#x2019;t simply apply the discrete hypothesis bound to the meta-algorithm which
picks the algorithm (and associated hypothesis space) with the smallest true error
bound since this meta-algorithm could, potentially, output any hypothesis in <!--l. 1235--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mi 
>l</mi></mrow></msub 
></mrow></math>.
The simplest way to retrofit the bound to include all hypothesis sets is
with a simple theorem which essentially states that we can guarantee
<span 
class="ecti-1000">nearly </span>the same bound as would apply on the smallest hypothesis space <!--l. 1238--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow></math> containing the output
hypothesis, <!--l. 1238--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>h</mi></mrow></math>.
</p>
   <div class="newtheorem">
<!--l. 1240--><p class="noindent"><span class="head">
<a 
  name="x26-35001r1"></a>
  <span 
class="eccc-1000">T<small 
class="small-caps">H</small><small 
class="small-caps">E</small><small 
class="small-caps">O</small><small 
class="small-caps">R</small><small 
class="small-caps">E</small><small 
class="small-caps">M</small> </span>4.5.1<span 
class="eccc-1000">.</span></span>
</p><!--l. 1241--><p class="indent">   <span 
class="ecti-1000">(Structural Risk Minimization) Let </span><!--l. 1241--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>i</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
<span 
class="ecti-1000">be some measure across the </span><!--l. 1242--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>l</mi></mrow></math>
<span 
class="ecti-1000">hypothesis sets with </span><!--l. 1242--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><msubsup><mrow 
><mo 
class="MathClass-op">&#x2211;</mo>
  </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>l</mi></mrow></msubsup 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>i</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mn>1</mn></mrow></math><span 
class="ecti-1000">.</span>
                                                                     

                                                                     
<span 
class="ecti-1000">Then: </span><!--l. 1243--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">      <mrow 
>
        <mi 
>&#x2200;</mi><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>i</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-punc">:</mo>  <msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><msup><mrow 
><mi 
>D</mi></mrow><mrow 
><mi 
>m</mi></mrow></msup 
></mrow></msub 
> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>&#x2203;</mi><mi 
>h</mi> <mo 
class="MathClass-rel">&#x2208;</mo> <msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
> <mo 
class="MathClass-rel">&#x2208;</mo><mrow><mo 
class="MathClass-open">{</mo><mrow><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mi 
>l</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">}</mo></mrow> <mo 
class="MathClass-punc">:</mo>  <mi 
>e</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">&#x2264;</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0304;</mo></mover> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo>&#x0302;</mo></mover><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo> <mfrac><mrow 
><mi 
>p</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>i</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mi 
>&#x03B4;</mi></mrow> 
<mrow 
><mo 
class="MathClass-rel">&#x2223;</mo><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
><mo 
class="MathClass-rel">&#x2223;</mo></mrow></mfrac></mrow></mfenced></mrow></mfenced><mo 
class="MathClass-rel">&#x2264;</mo> <mi 
>&#x03B4;</mi>
</mrow></math>
<span 
class="ecti-1000">where </span><!--l. 1245--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0304;</mo></mover> <mfenced separators="" 
open="("  close=")" ><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo> <mfrac><mrow 
><mi 
>k</mi></mrow> 
<mrow 
><mi 
>m</mi></mrow></mfrac><mo 
class="MathClass-punc">,</mo><mi 
>&#x03B4;</mi></mrow></mfenced> <mo 
class="MathClass-rel">&#x2261;</mo><msub><mrow 
><mo 
> min</mo></mrow><mrow 
><mi 
>p</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">{</mo><mrow><mi 
>p</mi> <mo 
class="MathClass-punc">:</mo>  <!--mstyle 
class="text"--><mtext class="textrm">Bin</mtext><!--/mstyle--><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>m</mi><mo 
class="MathClass-punc">,</mo><mi 
>k</mi><mo 
class="MathClass-punc">,</mo><mi 
>p</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo> <mi 
>&#x03B4;</mi></mrow><mo 
class="MathClass-close">}</mo></mrow></mrow></math><span 
class="ecti-1000">.</span>
</p>
   </div>
   <div class="proof">
<!--l. 1248--><p class="indent">   <span class="head">
   <span 
class="eccc-1000">P<small 
class="small-caps">R</small><small 
class="small-caps">O</small><small 
class="small-caps">O</small><small 
class="small-caps">F</small>.</span> </span>Apply the union bound to the discrete hypothesis bound ( <a 
href="thesisse16.xml#x23-32001r1">4.2.1<!--tex4ht:ref: th-DHSCP --></a>). <span class="qed"><span 
class="msam-10">&#x25AB;</span></span>
</p>
   </div>
<!--l. 1250--><p class="indent">   The SRM bound is slightly inefficient in the sense that the bound for all hypotheses in <!--l. 1251--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>2</mn></mrow></msub 
></mrow></math> includes a bound for
every hypothesis in <!--l. 1251--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
></mrow></math>.
This effect is typically small because the size of the hypothesis sets usually
grows exponentially, implying that the extra confidence given to a hypothesis <!--l. 1253--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>h</mi></mrow></math> in <!--l. 1254--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
></mrow></math> by the bounds used on
hypothesis set <!--l. 1254--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>2</mn></mrow></msub 
><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>3</mn></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo></mrow></math>
is small relative to the confidence given by the bound for <!--l. 1255--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
></mrow></math>.
One can remove this slack in Structural Risk Minimization bound by
&#x201C;cutting out&#x201D; the nested portion of each hypothesis set in the formulation of <!--l. 1257--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
><mo 
class="MathClass-punc">,</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>H</mi></mrow><mrow 
><mi 
>l</mi></mrow></msub 
></mrow></math>. We
will call this Disjoint Structural Risk Minimization (also mentioned in <span class="cite">[<span 
class="ecbx-1000">?</span>]</span>).
</p><!--l. 1261--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse20.xml" >next</a>] [<a 
href="thesisse18.xml" >prev</a>] [<a 
href="thesisse18.xml#tailthesisse18.xml" >prev-tail</a>] [<a 
href="thesisse19.xml" >front</a>] [<a 
href="thesisch4.xml#thesisse19.xml" >up</a>] </p></div><a 
  name="tailthesisse19.xml"></a>  
</body> 
</html> 
