From 359197b039f7dfb60fb9bc83f44f75b5477faf78 Mon Sep 17 00:00:00 2001
From: Evan Shelhamer <shelhamer@imaginarynumber.net>
Date: Sun, 7 Sep 2014 18:59:40 +0200
Subject: [PATCH] [docs] include cuDNN in installation and performance
 reference

---
 docs/installation.md         | 18 ++++++++++++------
 docs/performance_hardware.md | 20 ++++++++++++++++----
 2 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/docs/installation.md b/docs/installation.md
index 82a22d5..dbf73d2 100644
--- a/docs/installation.md
+++ b/docs/installation.md
@@ -15,7 +15,7 @@ We have installed Caffe on Ubuntu 14.04, Ubuntu 12.04, OS X 10.9, and OS X 10.8.
 
 Caffe depends on several software packages.
 
-* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*)
+* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.5 (recommended), 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*)
 * [BLAS](http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (provided via ATLAS, MKL, or OpenBLAS).
 * [OpenCV](http://opencv.org/).
 * [Boost](http://www.boost.org/) (>= 1.55, although only 1.55 is tested)
@@ -25,13 +25,17 @@ Caffe depends on several software packages.
 * For the MATLAB wrapper
     * MATLAB with the `mex` compiler.
 
-**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment.
+**cuDNN Caffe**: for fastest operation Caffe is accelerated by drop-in integration of [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). To speed up your Caffe models, install cuDNN then uncomment the `USE_CUDNN := 1` flag in `Makefile.config` when installing Caffe. Acceleration is automatic.
+
+**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` flag in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment.
 
 ### CUDA and BLAS
 
 Caffe requires the CUDA `nvcc` compiler to compile its GPU code and CUDA driver for GPU operation.
 To install CUDA, go to the [NVIDIA CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Install the library and the latest standalone driver separately; the driver bundled with the library is usually out-of-date. **Warning!** The 331.* CUDA driver series has a critical performance issue: do not use it.
 
+For best performance, Caffe can be accelerated by [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). Register for free at the cuDNN site, install it, then continue with these installation instructions. To compile with cuDNN set the `USE_CUDNN := 1` flag set in your `Makefile.config`.
+
 Caffe requires BLAS as the backend of its matrix and vector computations.
 There are several implementations of this library.
 The choice is yours:
@@ -92,7 +96,7 @@ Keep reading to find out how to manually build and install the Google flags libr
 On **CentOS / RHEL / Fedora**, most of the dependencies can be installed with
 
     sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
-    
+
 The Google flags library, Google logging library and LMDB already made their ways into newer versions of **CentOS / RHEL / Fedora** so it is better to first attempt to install them using `yum`
 
     sudo yum install gflags-devel glog-devel lmdb-devel
@@ -192,7 +196,7 @@ If you're not using Anaconda, include `hdf5` in the list above.
 **Note** that in order to build the caffe python wrappers you must install boost using the --with-python option:
 
     brew install --build-from-source --with-python --fresh -vd boost
-    
+
 **Note** that Homebrew maintains itself as a separate git repository and making the above `brew edit FORMULA` changes will change files in your local copy of homebrew's master branch. By default, this will prevent you from updating Homebrew using `brew update`, as you will get an error message like the following:
 
     $ brew update
@@ -201,7 +205,7 @@ If you're not using Anaconda, include `hdf5` in the list above.
     Please, commit your changes or stash them before you can merge.
     Aborting
     Error: Failure while executing: git pull -q origin refs/heads/master:refs/remotes/origin/master
-    
+
 One solution is to commit your changes to a separate Homebrew branch, run `brew update`, and rebase your changes onto the updated master, as follows:
 
     cd /usr/local
@@ -213,7 +217,7 @@ One solution is to commit your changes to a separate Homebrew branch, run `brew
     git rebase master caffe
     # Resolve any merge conflicts here
     git checkout caffe
-    
+
 At this point, you should be running the latest Homebrew packages and your Caffe-related modifications will remain in place. You may still get the following error:
 
     $ brew update
@@ -240,6 +244,8 @@ The defaults should work, but uncomment the relevant lines if using Anaconda Pyt
     make test
     make runtest
 
+To compile with cuDNN acceleration, you should uncomment the `USE_CUDNN := 1` switch in `Makefile.config`.
+
 If there is no GPU in your machine, you should switch to CPU-only Caffe by uncommenting `CPU_ONLY := 1` in `Makefile.config`.
 
 To compile the Python and MATLAB wrappers do `make pycaffe` and `make matcaffe` respectively.
diff --git a/docs/performance_hardware.md b/docs/performance_hardware.md
index 6238543..b35246f 100644
--- a/docs/performance_hardware.md
+++ b/docs/performance_hardware.md
@@ -4,7 +4,7 @@ title: Performance and Hardware Configuration
 
 # Performance and Hardware Configuration
 
-To measure performance on different NVIDIA GPUs we use the Caffe reference ImageNet model.
+To measure performance on different NVIDIA GPUs we use CaffeNet, the Caffe reference ImageNet model.
 
 For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified.
 
@@ -14,11 +14,16 @@ For training, each time point is 20 iterations/minibatches of 256 images for 5,1
 
 Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory.
 
-Best settings with ECC off and maximum clock speed:
+Best settings with ECC off and maximum clock speed in standard Caffe:
 
 * Training is 26.5 secs / 20 iterations (5,120 images)
 * Testing is 100 secs / validation set (50,000 images)
 
+Best settings with Caffe + [cuDNN acceleration](http://nvidia.com/cudnn):
+
+* Training is 19.2 secs / 20 iterations (5,120 images)
+* Testing is 60.7 secs / validation set (50,000 images)
+
 Other settings:
 
 * ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set
@@ -50,12 +55,19 @@ but note that this configuration resets across driver reloading / rebooting. Inc
 Training: 26.26 secs / 20 iterations (5,120 images).
 Testing: 100 secs / validation set (50,000 images).
 
+cuDNN Training: 20.25 secs / 20 iterations (5,120 images).
+cuDNN Testing: 66.3 secs / validation set (50,000 images).
+
+
 ## NVIDIA K20
 
 Training: 36.0 secs / 20 iterations (5,120 images).
-Testing: 133 secs / validation set (50,000 images)
+Testing: 133 secs / validation set (50,000 images).
 
 ## NVIDIA GTX 770
 
 Training: 33.0 secs / 20 iterations (5,120 images).
-Testing: 129 secs / validation set (50,000 images)
+Testing: 129 secs / validation set (50,000 images).
+
+cuDNN Training: 24.3 secs / 20 iterations (5,120 images).
+cuDNN Testing: 104 secs / validation set (50,000 images).
-- 
2.7.4