updates notebook

YamingZhang · Dec 10, 2015 · 9f0aca0 · 9f0aca0
1 parent 0cda29e
commit 9f0aca0
Show file tree

Hide file tree

Showing 2 changed files with 95 additions and 93 deletions.
diff --git a/TempConv.ipynb b/TempConv.ipynb
@@ -13,7 +13,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 1,
    "metadata": {
     "collapsed": false
    },
@@ -163,7 +163,7 @@
     "collapsed": true
    },
    "source": [
-    "The output of the above gives us our feature map $[\\widehat{\\mathbf{c_1}}, \\widehat{\\mathbf{c_2}}, \\ldots, \\widehat{\\mathbf{c_{d'}}}]$. Finally we add a logistic regression layer (with dropout) for predicting the sentiment from this vector of features."
+    "The output of the above gives us our feature map $[\\widehat{\\mathbf{c_1}}, \\widehat{\\mathbf{c_2}}, \\ldots, \\widehat{\\mathbf{c_{d'}}}]$. Finally we add a logistic regression layer for predicting the sentiment from this vector of features."
    ]
   },
   {
@@ -176,7 +176,6 @@
    "source": [
     "logistic = nn.Sequential()\n",
     "\n",
-    "logistic:add(nn.Dropout(0.5))\n",
     "logistic:add(nn.Linear(nd, nY))\n",
     "logistic:add(nn.LogSoftMax())\n",
     "\n",
@@ -200,10 +199,10 @@
     {
      "data": {
       "text/plain": [
-       "-1.2024 -0.3573\n",
-       "-0.6194 -0.7727\n",
-       "-2.2890 -0.1069\n",
-       "-2.4340 -0.0918\n",
+       "-0.6319 -0.7584\n",
+       "-0.5740 -0.8285\n",
+       "-0.6088 -0.7853\n",
+       "-0.7404 -0.6481\n",
        "[torch.DoubleTensor of size 4x2]\n",
        "\n"
       ]
@@ -221,8 +220,31 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "As expected, we get (log) prediction probabilities for 2 classes for each input.\n",
-    "\n",
+    "As expected, we get (log) prediction probabilities for 2 classes for each input."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Include a negative-log-likelihood criterion:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "criterion = nn.ClassNLLCriterion()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "We can also implement these modules on GPUs. Specifically, we include the `cudnn` package, which has some GPU optimized versions of some of the above modules. One thing that requires modification is the convolution step - `cudnn` has no built in `TemporalConvolution` module, so we have to adapt the `SpatialConvolution` by reshaping our feature map matrix.\n",
     "\n",
     "Here's the full implementation on `cudnn` (using batch mode):"
@@ -245,133 +267,113 @@
     "\n",
     "nd = 10\n",
     "h = 3\n",
+    "S = 10\n",
     "conv = nn.Sequential()\n",
-    "conv:add(nn.Reshape(1, nV, d, false))\n",
+    "conv:add(nn.Reshape(1, S, d, false))\n",
     "conv:add(cudnn.SpatialConvolution(1, nd, d, h))\n",
-    "conv:add(nn.Reshape(nd, nV-h+1, false))\n",
+    "conv:add(nn.Reshape(nd, S-h+1, false))\n",
     "conv:add(cudnn.ReLU())\n",
     "conv:add(nn.Max(3))\n",
     "\n",
     "cudnn_model:add(conv)\n",
     "\n",
     "logistic = nn.Sequential()\n",
     "\n",
-    "dropout_p = 0.5\n",
-    "logistic:add(nn.Dropout(0.5))\n",
     "logistic:add(nn.Linear(nd, nY))\n",
     "logistic:add(cudnn.LogSoftMax())\n",
     "\n",
     "cudnn_model:add(logistic)\n",
     "\n",
+    "criterion = nn.ClassNLLCriterion()\n",
+    "\n",
     "-- Move to GPU\n",
-    "cudnn_model:cuda()"
+    "cudnn_model:cuda()\n",
+    "criterion:cuda()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Include a negative-log-likelihood criterion:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "criterion = nn.ClassNLLCriterion()"
+    "## Training"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Training"
+    "We perform training with `adadelta`. In each epoch, we create a closure that returns the gradient updates."
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": []
-  },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 8,
    "metadata": {
     "collapsed": false
    },
    "outputs": [
     {
      "data": {
       "text/plain": [
-       " 1  1  1  2  3  4  7  1  1  1\n",
-       " 1  1  1  5  8  4  7  1  1  1\n",
-       " 1  1  1  9  8  4  7  1  1  1\n",
-       " 1  1  1  6  8  4  7  1  1  1\n",
-       "[torch.DoubleTensor of size 4x10]\n",
-       "\n",
-       " 1\n",
-       " 2\n",
-       " 2\n",
-       " 2\n",
-       "[torch.DoubleTensor of size 4]\n",
-       "\n"
+       "Epoch:\t1\t1.0637046591443\t\n",
+       "Epoch:\t2\t0.92574699271742\t\n",
+       "Epoch:\t3\t0.80757029787584\t\n"
       ]
      },
-     "execution_count": 16,
+     "execution_count": 8,
      "metadata": {},
      "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "print(X, y)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Epoch:\t4\t0.70017095481893\t\n",
+       "Epoch:\t5\t0.62009300393375\t\n",
+       "Epoch:\t6\t0.53988474604789\t\n",
+       "Epoch:\t7\t0.48039989542535\t\n"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Epoch:\t8\t0.42860197930536\t\n",
+       "Epoch:\t9\t0.38423555419475\t\n",
+       "Epoch:\t10\t0.34843210201123\t\n",
+       "Epoch:\t11\t0.31601698906774\t\n",
+       "Epoch:\t12\t0.29034451988978\t\n"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
+    {
+     "data": {
+      "text/plain": [
+       "Epoch:\t13\t0.26565687644075\t\n",
+       "Epoch:\t14\t0.24537762059148\t\n",
+       "Epoch:\t15\t0.22573977925808\t\n",
+       "Epoch:\t16\t0.20809219151101\t\n",
+       "Epoch:\t17\t0.19274937869357\t\n"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    },
     {
      "data": {
       "text/plain": [
-       "Epoch:\t1\t1.2071026258965\t\n",
-       "Epoch:\t2\t0.55329548576161\t\n",
-       "Epoch:\t3\t1.0799819021057\t\n",
-       "Epoch:\t4\t0.60753844090924\t\n",
-       "Epoch:\t5\t0.5422639899197\t\n",
-       "Epoch:\t6\t0.85666510359066\t\n",
-       "Epoch:\t7\t0.46099172270831\t\n",
-       "Epoch:\t8\t0.44800228949465\t\n",
-       "Epoch:\t9\t0.74064167750147\t\n",
-       "Epoch:\t10\t0.41474709839995\t\n",
-       "Epoch:\t11\t0.68156844923208\t\n",
-       "Epoch:\t12\t0.33358810087793\t\n",
-       "Epoch:\t13\t0.40544432852648\t\n",
-       "Epoch:\t14\t0.63637760405637\t\n",
-       "Epoch:\t15\t0.37222033442753\t\n",
-       "Epoch:\t16\t0.30155980691742\t\n",
-       "Epoch:\t17\t0.88969589482503\t\n",
-       "Epoch:\t18\t0.30856292827357\t\n",
-       "Epoch:\t19\t0.44125077919892\t\n",
-       "Epoch:\t20\t0.38404520878646\t\n"
+       "Epoch:\t18\t0.17790250866323\t\n",
+       "Epoch:\t19\t0.16478886826202\t\n",
+       "Epoch:\t20\t0.15308249381824\t\n"
       ]
      },
-     "execution_count": 20,
+     "execution_count": 8,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -407,13 +409,13 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {
     "collapsed": true
    },
-   "outputs": [],
-   "source": []
+   "source": [
+    "Note that training error goes down after every epoch, as expected."
+   ]
   }
  ],
  "metadata": {

diff --git a/TempConv.pdf b/TempConv.pdf