<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>2.3 Overview of the document</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse7.xml" >prev</a>] [<a 
href="thesisse7.xml#tailthesisse7.xml" >prev-tail</a>] [<a 
href="#tailthesisse8.xml">tail</a>] [<a 
href="thesisch2.xml#thesisse8.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">2.3. </span> <a 
  name="x13-140002.3"></a>Overview of the document</h3>
<!--l. 483--><p class="noindent">This document is primarily about the theory of sample complexity for answering the
question &#x201C;Have we learned?&#x201D;. However, we do not neglect the experimental side. In
particular, following the theory we will present results for application of sample
complexity bounds to machine learning problems. These results are the &#x2019;best known
results&#x2019; in terms of bound tightness and should be considered as a guide and challenge to
others working on sample complexity bounds.
</p><!--l. 490--><p class="indent">   All of the sample complexity bounds presented here will fall within the paradigm of
classical (non-Bayesian) statistics. Despite this, Bayesians may be interested in the
results. In particular, it is worth noting that we will consider the use of a &#x2019;prior&#x2019; and a
&#x2019;posterior&#x2019; (in a <span 
class="ecti-1000">classical </span>manner) within these bounds.
</p><!--l. 496--><p class="indent">   In order to make this thesis more coherent, previous work (of which there is quite a
bit) will be integrated into the presentation rather than separated into a section of its
own. Credit will be given at the time the work is introduced.
</p><!--l. 500--><p class="indent">   The document is organized into 3 parts:
</p><!--l. 502--><p class="indent">
           </p><ol type="1" class="enumerate1" start="1" 
>
        <li class="enumerate"><a 
  name="x13-14002x1"></a>Introductory material.
           </li>
        <li class="enumerate"><a 
  name="x13-14004x2"></a>New results on Sample Complexity.
           </li>
        <li class="enumerate"><a 
  name="x13-14006x3"></a>Experimental results of applying Sample Complexity.</li></ol>
<!--l. 506--><p class="nopar"> What follows is a brief chapter-by-chapter summary of the theoretical results in this
thesis. Much of the work can be summarized as approaches which use extra information
(in the algorithm, in error rates of hypotheses which are <span 
class="ecti-1000">not </span>chosen, in test
sets, etc...) in order to construct tighter bounds on the future error rate of a
hypothesis.
</p><!--l. 513--><p class="indent">   The principal <span 
class="ecti-1000">practical </span>result of this thesis is the construction of a program &#x2019;bound&#x2019;
which automatically uses any of several theorems in calculating a true error bound on
the future error rate of a particular hypothesis. The use of this program will be
demonstrated as the bounds are presented.
</p>
   <h4 class="subsectionHead"><span class="titlemark">2.3.1. </span> <a 
  name="x13-150002.3.1"></a>Microchoice Bounds</h4>
<!--l. 521--><p class="noindent">The Microchoice technique allows for online construction of a &#x201C;prior&#x201D; which can be used
in the Occam&#x2019;s Razor bound ( <a 
href="thesisse20.xml#x27-36001r1">4.6.1<!--tex4ht:ref: th-ORB --></a>). Empirical testing (presented in chapter 11) shows
                                                                     

                                                                     
that this approach is practical and yields useful results on decision trees for real-world
problems drawn from the UCI database of problems. The Adaptive Microchoice bounds
further extend this approach and can result in functional improvements over the Occam&#x2019;s
Razor bound.
</p>
   <h4 class="subsectionHead"><span class="titlemark">2.3.2. </span> <a 
  name="x13-160002.3.2"></a>Pac-Bayes Bounds</h4>
<!--l. 531--><p class="noindent">Pac-Bayes bounds are a new approach (first presented by David McAllester <span class="cite">[<a 
href="thesisli2.xml#XPB"><span 
class="ecbx-1000">39</span></a>]</span>) for for
dealing with continuously parameterized classifiers such as (stochastic) neural networks.
This chapter refines and improves the PAC-Bayes bound, giving it an information
theoretic interpretation. Empirical results (presented in chapter 12) show that refined
PAC-Bayes bound works to produce <span 
class="ecti-1000">nonvacuous </span>bounds on realistic learning
problems. Nonvacuous bounds for continuous-valued classifiers are currently
rare.
</p>
   <h4 class="subsectionHead"><span class="titlemark">2.3.3. </span> <a 
  name="x13-170002.3.3"></a>Averaging Bounds</h4>
<!--l. 542--><p class="noindent">Averaging bounds deal with classifiers formed by picking according to the weighted
majority on some other set of classifiers. Averaging is a very common technique in
practice (see <span class="cite">[<span 
class="ecbx-1000">?</span>]</span> for examples), so results specialized for averaging classifiers are useful.
This chapters states and proves a bound on averaging classifiers which shows that
&#x201C;hypothesis space complexity&#x201D; <span 
class="ecti-1000">decreases </span>as averaging becomes more uniform. Prior
theoretical work <span class="cite">[<span 
class="ecbx-1000">?</span>]</span> was of the form &#x201C;averaging does not increase the hypothesis space
complexity much&#x201D;.
</p>
   <h4 class="subsectionHead"><span class="titlemark">2.3.4. </span> <a 
  name="x13-180002.3.4"></a>Shell Bounds</h4>
<!--l. 553--><p class="noindent">Shell bounds are a new approach which trades extra information for tighter
bounds. Empirical results in (presented in chapter 11) show this can be a useful
approach. In order to ameliorate the increased information requirements, a
sampled version of the bound is stated which allows for smooth interpolation
between simpler bounds and the full shell bound. In addition, the shell bound has
been extended to continuous spaces with an approach similar to PAC-Bayes
bounds.
</p>
   <h4 class="subsectionHead"><span class="titlemark">2.3.5. </span> <a 
  name="x13-190002.3.5"></a>Bracketing Covering Number Bounds</h4>
<!--l. 563--><p class="noindent">The results of this chapter are entirely theoretical. They point out an approach which
may lead to useful technique for bounds on continuous valued hypothesis spaces.
Using a stronger notion of cover (the bracketing cover), simplified bounds on the
                                                                     

                                                                     
future error rate of continuous classifiers can be constructed. This approach can
be significantly tighter than the standard covering number approach. More
work in calculation of bracketing covers is needed before these results can be
applied.
</p>
   <h4 class="subsectionHead"><span class="titlemark">2.3.6. </span> <a 
  name="x13-200002.3.6"></a>Progressive Validation</h4>
<!--l. 574--><p class="noindent">Progressive Validation is a variant of the holdout bound ( <a 
href="thesisse15.xml#x22-30001r1">4.1.1<!--tex4ht:ref: th-hb --></a>) which is roughly twice
as efficient about how it uses examples. Progressive Validation is not used in the
experimental results chapter(s) because more work is required in order to prove the
bound without a Hoeffding-like approximation.
</p>
   <h4 class="subsectionHead"><span class="titlemark">2.3.7. </span> <a 
  name="x13-210002.3.7"></a>Combining training and test sets</h4>
<!--l. 582--><p class="noindent">The idea behind this chapter is that it should be possible to combine <span 
class="ecti-1000">any </span>test set bound
with <span 
class="ecti-1000">any </span>training set based bound in order to derive a bound with more robust behavior.
A general technique is stated and proved to work. Empirical results (in chapter 11) show
that this approach can work well in practice.
</p><!--l. 589--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse7.xml" >prev</a>] [<a 
href="thesisse7.xml#tailthesisse7.xml" >prev-tail</a>] [<a 
href="thesisse8.xml" >front</a>] [<a 
href="thesisch2.xml#thesisse8.xml" >up</a>] </p></div><a 
  name="tailthesisse8.xml"></a>   
</body> 
</html> 
