<?xml version="1.0"?> 
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "mathml.dtd"> 
<?xml-stylesheet type="text/css" href="thesis.css"?> 
<html  
xmlns="http://www.w3.org/1999/xhtml"  
><head><title>1.4 The oblivious passive supervised learning model</title> 
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> 
<meta name="generator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<meta name="originator" content="TeX4ht (http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html)" /> 
<!-- 3,early_,early^,xhtml,mozilla --> 
<meta name="src" content="thesis.tex" /> 
<meta name="date" content="2002-08-28 13:56:00" /> 
<link rel="stylesheet" type="text/css" href="thesis.css" /> 
</head><body 
>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse5.xml" >next</a>] [<a 
href="thesisse3.xml" >prev</a>] [<a 
href="thesisse3.xml#tailthesisse3.xml" >prev-tail</a>] [<a 
href="#tailthesisse4.xml">tail</a>] [<a 
href="thesisch1.xml#thesisse4.xml" >up</a>] </p></div>
   <h3 class="sectionHead"><span class="titlemark">1.4. </span> <a 
  name="x8-70001.4"></a>The oblivious passive supervised learning model</h3>
<!--l. 236--><p class="noindent">Oblivious will be modeled by an unknown distribution <!--l. 236--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>D</mi></mrow></math>
over examples. Here, an &#x201C;example&#x201D; is just a vector of observations. Since this
is a supervised learning model, all of our examples will split into two parts, <!--l. 238--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> where <!--l. 239--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>x</mi></mrow></math> is the &#x201C;input&#x201D;
and <!--l. 239--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mi 
>y</mi></mrow></math>
is the &#x201C;output&#x201D; (the thing we wish to predict). A quick example is
predicting whether precipitation will be in the form of rain or snow (&#x201C;<!--l. 241--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>y</mi></mrow></math>&#x201D; value) given the
temperature (&#x201C;<!--l. 242--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>x</mi></mrow></math>&#x201D;
value).
</p><!--l. 244--><p class="indent">   For simplicity, we will typically work with theorems for binary valued <!--l. 244--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>y</mi></mrow></math>. We
can remove this choice by generalizing sample complexity bounds&#x2014;but we do not do so
for simplicity of presentation.
</p><!--l. 248--><p class="indent">   The fundamental assumption we will make in all of our sample complexity bounds is
that all examples are drawn independently from the unknown distribution <!--l. 249--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>D</mi></mrow></math>. This
assumption must be stated explicitly and always kept in mind when considering the
relevance of sample complexity bounds.
</p>
   <div class="newtheorem">
<!--l. 253--><p class="noindent"><span class="head">
<a 
  name="x8-7001r1"></a>
  <span 
class="eccc-1000">A<small 
class="small-caps">X</small><small 
class="small-caps">I</small><small 
class="small-caps">O</small><small 
class="small-caps">M</small> </span>1.4.1<span 
class="eccc-1000">.</span></span>
</p><!--l. 254--><p class="indent">   <span 
class="ecti-1000">All examples are drawn independently from an unknown distribution </span><!--l. 254--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">      <mrow 
><mi 
>D</mi></mrow></math><span 
class="ecti-1000">.</span>
</p>
   </div>
<!--l. 256--><p class="indent">   With the exception of this assumption, all of the other parameters in our bounds will
be verifiable at the time the bound is applied.
</p><!--l. 259--><p class="indent">   Note that we use a distribution over <span 
class="ecti-1000">labeled </span>examples and not a combination of a
distribution over the input space along with a function from the input space to the
output space as in many other formulations. This choice is made because it is both more
                                                                     

                                                                     
general and mathematically simpler.
</p><!--l. 264--><p class="indent">   The number of samples, <!--l. 264--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mi 
>m</mi></mrow></math>,
required for learning is the fundamental quantity we will be concerned with. In
particular, we will not be concerned with the time complexity or the space complexity of
learning algorithms. This choice is made for the purposes of simplicity and implies that
the relationship between sample complexity bounds and learning algorithms will be
similar to the difference between information theory and coding information for
transmission across a noisy channel.
</p><!--l. 272--><p class="indent">   Any learning algorithm must output some hypothesis, <!--l. 272--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>h</mi></mrow></math>, for
predicting the output given the input. This hypothesis is essentially a program
which, given the input, predicts the output. The hypothesis may or may not be
randomized&#x2014;it might choose an output deterministically or according to some
randomization.
</p><!--l. 277--><p class="indent">   The next item to quantify is learning. When has learning occurred?
We will say that learning has occurred when the <span 
class="ecti-1000">true error </span>is
significantly less than a uniform random prediction. The true error <!--l. 279--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> is defined in the
following way: <!--l. 281--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
>
                                        <msub><mrow 
><mi 
>e</mi></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo><msub><mrow 
><mo 
> Pr</mo></mrow><mrow 
><mi 
>D</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2260;</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow>
</mrow></math>
</p><!--l. 285--><p class="indent">   Unfortunately, the true error is not an observable quantity in our model because the distribution,
<!--l. 286--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">     <mrow 
><mi 
>D</mi></mrow></math>, is
unknown. However, there is a related quantity which is observable. Given a sample set <!--l. 287--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>S</mi></mrow></math> of <!--l. 287--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi><mo 
class="MathClass-punc">,</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> pairs <!--l. 287--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mrow><mo 
class="MathClass-open">{</mo><mrow><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>x</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>y</mi></mrow><mrow 
><mn>1</mn></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-punc">,</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">,</mo><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>x</mi></mrow><mrow 
><mi 
>m</mi></mrow></msub 
><mo 
class="MathClass-punc">,</mo><msub><mrow 
><mi 
>y</mi></mrow><mrow 
><mi 
>m</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow><mo 
class="MathClass-close">}</mo></mrow></mrow></math>, the <span 
class="ecti-1000">empirical error</span>, <!--l. 288--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><msub><mrow 
><mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>S</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> is defined similarly
                                                                     

                                                                     
as: <!--l. 289--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="display">        <mrow 
><msub><mrow 
>
                        <mover 
accent="true"><mrow 
><mi 
>e</mi></mrow><mo 
class="MathClass-op">&#x0302;</mo></mover></mrow><mrow 
><mi 
>S</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo><msub><mrow 
><mo 
> Pr</mo></mrow><mrow 
><mi 
>S</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>x</mi></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2260;</mo><mi 
>y</mi></mrow><mo 
class="MathClass-close">)</mo></mrow> <mo 
class="MathClass-rel">=</mo>  <mfrac><mrow 
><mn>1</mn></mrow> 
<mrow 
><mi 
>m</mi></mrow></mfrac><msubsup><mrow 
> <mo 
class="MathClass-op">&#x2211;</mo>
    </mrow><mrow 
><mi 
>i</mi><mo 
class="MathClass-rel">=</mo><mn>1</mn></mrow><mrow 
><mi 
>m</mi></mrow></msubsup 
><mi 
>I</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><mi 
>h</mi><mrow><mo 
class="MathClass-open">(</mo><mrow><msub><mrow 
><mi 
>x</mi></mrow><mrow 
>
<mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow><mo 
class="MathClass-rel">&#x2260;</mo><msub><mrow 
><mi 
>y</mi></mrow><mrow 
><mi 
>i</mi></mrow></msub 
></mrow><mo 
class="MathClass-close">)</mo></mrow>
</mrow></math> where <!--l. 291--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>I</mi><mrow><mo 
class="MathClass-open">(</mo><mrow></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math> is a function which
maps &#x201C;true&#x201D; to <!--l. 291--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">       <mrow 
><mn>1</mn></mrow></math>
and &#x201C;false&#x201D; to <!--l. 292--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><mn>0</mn></mrow></math>.
Here <!--l. 292--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">        <mrow 
><msub><mrow 
><mo 
>Pr</mo></mrow><mrow 
><mi 
>S</mi></mrow></msub 
><mrow><mo 
class="MathClass-open">(</mo><mrow><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo><mo 
class="MathClass-punc">.</mo></mrow><mo 
class="MathClass-close">)</mo></mrow></mrow></math>
is a probability taken with respect to the uniform distribution over the set of examples, <!--l. 293--><math 
xmlns="http://www.w3.org/1998/Math/MathML" 
mode="inline">
<mrow 
><mi 
>S</mi></mrow></math>.
</p><!--l. 296--><p class="indent">
                                                                     

                                                                     
</p>
   <div class="crosslinks"><p class="noindent">[<a 
href="thesisse5.xml" >next</a>] [<a 
href="thesisse3.xml" >prev</a>] [<a 
href="thesisse3.xml#tailthesisse3.xml" >prev-tail</a>] [<a 
href="thesisse4.xml" >front</a>] [<a 
href="thesisch1.xml#thesisse4.xml" >up</a>] </p></div><a 
  name="tailthesisse4.xml"></a>  
</body> 
</html> 
