<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>13.2 Experimental Results</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse59.xml" >next</a>] [<a 
href="thesisse57.xml" >prev</a>] [<a 
href="thesisse57.xml#tailthesisse57.xml" >prev-tail</a>] [<a 
href="#tailthesisse58.xml">tail</a>] [<a 
href="thesisch13.xml#thesisse58.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">13.2. </span> <a 
  name="x80-12300013.2"></a>Experimental Results</h3>
   <hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x80-1230011"></a>
<!--l. 5166--><p class="indent">
                                                                     

                                                                     
</p><!--l. 5166--><p class="indent">   <img 
src="thesis30x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis30x.gif" src="bound_vs_epoch.ps"  
--><img 
src="thesis31x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis31x.gif" src="bound_vs_epoch_easy.ps"  
-->
<br /> </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;13.2.1:  </td><td  
class="content"><a 
  name="x80-1230011"></a>  Plot  of  errors  and  true  error  bounds  for  the  neural  network
(NN)  and  the  stochastic  neural  network  (SNN).  The  graph  exhibits  overfitting
after approximately 6000 pattern presentations. The slope of the neural network
true  error  bound  is  positive  because  the  size  of  the  weights  is  gradually
increasing.  Note  that  a  true  error  bound  of  &#x201C;100&#x201D;  implies  that  a  factor  of  <!--l. 5175--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn><mn>0</mn><msup><mrow 
><mn>0</mn></mrow><mrow 
><mn>2</mn></mrow></msup 
></mrow></math>
more examples are required in order to make a non-vacuous bound. The graph on
the right expands the vertical scale by excluding the poor true error bound. </td></tr></table></div><!--tex4ht:label?: x80-1230011 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" />
<!--l. 5179--><p class="indent">   How well can we bound the true error rate of a stochastic neural network?
The answer is <span 
class="ecti-1000">much </span>better than we can bound the true error rate of a neural
network.
</p><!--l. 5183--><p class="indent">   Our experimental results take place on a synthetic data set which has 25 input dimensions
and one output dimension. Most of these dimensions are useless&#x2014;simply random numbers drawn
from a <!--l. 5185--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>N</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mn>0</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
Gaussian. One of the 25 input dimensions is dependent on the label. First, the label <!--l. 5186--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>y</mi></mrow></math> is drawn uniformly from <!--l. 5187--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mrow><mo 
class="MathClass-open">{</mo><mrow><mo 
class="MathClass-bin">&#x2212;</mo><mn>1</mn><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">}</mo></mrow></mrow></math>, then the special dimension
is drawn from a <!--l. 5187--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>N</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>y</mi><mo 
class="MathClass-punc">,</mo><mn>1</mn></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
Gaussian. Note that this learning problem can not be solved perfectly because some
examples will be drawn from the tail of the gaussian.
</p><!--l. 5191--><p class="indent">   The &#x201C;ideal&#x201D; neural net to use in solving this problem is a single node perceptron. We
will instead use a 2 layer neural net with 2 hidden nodes. This overly large neural net
will result in the potential for significant overfitting which makes the bound prediction
problem interesting. It is also somewhat more &#x201C;realistic&#x201D; if the neural net structure does
not exactly fit the learning problem.
</p><!--l. 5197--><p class="indent">   All of our data sets will use just <!--l. 5197--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mn>1</mn><mn>0</mn><mn>0</mn></mrow></math>
examples. Constructing a non-vacuous bound for a continuous hypothesis space at <!--l. 5198--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn><mn>0</mn><mn>0</mn></mrow></math>
examples is quite challenging as indicated by figure  <a 
href="#x80-1230011">13.2.1<!--tex4ht:ref: bound_vs_epoch --></a>. Conventional bounds are
hopelessly loose while the stochastic neural network bound is still not as tight as might
be desired. There are several notable things about this figure.
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x80-123003x1"></a>The SNN upper bound is <span 
class="ecti-1000">2-3  </span>orders of magnitude lower than the NN
        upper bound.
           </li>
        <li class="enumerate"><a 
  name="x80-123005x2"></a>The SNN performs better than expected. In particular, the SNN true error
        rate is closer than <!--l. 5207--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mn>1</mn><mi 
>%</mi></mrow></math>
        of the NN true error rate. This is surprising considering that we fixed the
        difference in empirical error rates at <!--l. 5208--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mn>1</mn><mi 
>%</mi></mrow></math>.
           </li>
        <li class="enumerate"><a 
  name="x80-123007x3"></a>The SNN bound has a minimum at 12000 pattern presentations which
        weakly predicts the overfitting point of 6000 for both the SNN and the
        NN.</li></ol>
<!--l. 5211--><p class="nopar"> The comparison between the neural network bound and the stochastic
neural network bound is not quite &#x201C;fair&#x201D; due to the form of the bound. In
particular, the stochastic neural network bound can never return a value greater
than &#x201C;always err&#x201D;. This implies that when the bound is near the value of &#x201C;<!--l. 5215--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn></mrow></math>&#x201D;, it
                                                                     

                                                                     
is difficult to judge how rapidly extra examples will improve the stochastic
neural network bound. We can judge the sample complexity of the stochastic
bound by plotting the value of the numerator in equation  <a 
href="thesisse57.xml#x79-121002r1">13.1.1<!--tex4ht:ref: big_bound --></a>. Figure  <a 
href="#x80-1230082">13.2.2<!--tex4ht:ref: complexity_vs_epoch --></a>
plots the complexity versus the number of pattern presentations in training.
</p><hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
<a 
  name="x80-1230082"></a>
                                                                     

                                                                     
<!--l. 5222--><p class="noindent"><img 
src="thesis32x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis32x.gif" src="complexity.ps"  
-->
<br />  </p><div align="center" class="caption"><table class="caption" 
><tr valign="baseline" class="caption"><td class="id">Figure&#x00A0;13.2.2:   </td><td  
class="content"><a 
  name="x80-1230082"></a>   We   plot   the   &#x201C;complexity&#x201D;   of   the   stochastic   network
model    (numerator    of       <a 
href="thesisse57.xml#x79-121002r1">13.1.1<!--tex4ht:ref: big_bound --></a>)    vs.    training    epoch.    Note    that    the
complexity   increases   with   more   training   as   expected   and   stays   below   <!--l. 5228--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn><mn>0</mn><mn>0</mn></mrow></math>,
implying     non-vacuous     bounds     on     a     training     set     of     size     <!--l. 5228--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mn>1</mn><mn>0</mn><mn>0</mn></mrow></math>.</td></tr></table></div><!--tex4ht:label?: x80-1230082 -->
                                                                     

                                                                     
   </td></tr></table></div><hr class="endfigure" />
<!--l. 5232--><p class="indent">   The stochastic bound is a radical improvement on the neural network bound but it is
not yet a perfectly tight bound. Given that we do not have a perfectly tight bound, one
important consideration arises: does the minimum of the stochastic bound predict the
minimum of the true error rate (as predicted by a large holdout data set). In particular,
can we use the stochastic bound to determine when we should cease training? The
stochastic bound depends upon (1) the complexity which increases with training time
and (2) the training error which decreases with training time. This dependence results in
a minima which for our problem occurs at approximately 12000 pattern presentations.
The point of minimal true error (for the stochastic and deterministic neural networks)
occurs at approximately 6000 pattern presentations indicating that the stochastic bound
weakly predicts the point of minimum error. The neural network bound has no such
minimum.
</p><!--l. 5245--><p class="indent">   Is the choice of 5% increased empirical error optimal? In general, the &#x201C;optimal&#x201D;
choice of the extra error rate depends upon the learning problem. Since the
stochastic neural network bound (corollary  <a 
href="thesisse57.xml#x79-121001r3">13.1.3<!--tex4ht:ref: co-snnb --></a>) holds for all multidimensional
gaussian distributions, we are free to optimize the choice of distribution in anyway
we desire. Figure  <a 
href="#x80-12300013.2">13.2<!--tex4ht:ref: q_opt --></a>.2 shows the resulting bound for different choices of <!--l. 5250--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>q</mi></mrow></math>. The bound has a
minimum at <!--l. 5250--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mn>0</mn><mo 
class="MathClass-punc">.</mo><mn>0</mn><mn>3</mn></mrow></math>
extra error indicating that a slightly lower bound is possible if we accept a larger training
error. Also note that the complexity always decreases with increasing entropy in the
distribution of our stochastic neural net. The existence of a minimum in Figure  <a 
href="#x80-12300013.2">13.2<!--tex4ht:ref: q_opt --></a>.2 is
the &#x201C;right&#x201D; behavior: the increased empirical error rate is significant in the calculation of
the true error bound. </p><hr class="figure" /><div align="center" class="figure" 
><table class="figure"><tr class="figure"><td class="figure" 
>
                                                                     

                                                                     
                                                                     

                                                                     
<!--l. 5257--><p class="noindent"><img 
src="thesis33x.gif" alt="PIC" class="graphics" width="252.945pt" height="361.34999pt"  /><!--tex4ht:graphics  
name="thesis33x.gif" src="error_vs_bound.ps"  
-->
</p><!--l. 5259--><p class="noindent">Plot of the stochastic neural net (SNN) bound for &#x201C;posterior&#x201D; distributions chosen
according to the extra empirical error they introduce.
                                                                     

                                                                     
</p>
   </td></tr></table></div><hr class="endfigure" />
<!--l. 5266--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse59.xml" >next</a>] [<a 
href="thesisse57.xml" >prev</a>] [<a 
href="thesisse57.xml#tailthesisse57.xml" >prev-tail</a>] [<a 
href="thesisse58.xml" >front</a>] [<a 
href="thesisch13.xml#thesisse58.xml" >up</a>] </p></div><a 
  name="tailthesisse58.xml"></a>  
</body> 
</html> 
