[docs] include cuDNN in installation and performance reference

author Evan Shelhamer <shelhamer@imaginarynumber.net>

Sun, 7 Sep 2014 16:59:40 +0000 (18:59 +0200)

committer Evan Shelhamer <shelhamer@imaginarynumber.net>

Sun, 7 Sep 2014 17:56:45 +0000 (19:56 +0200)
author Evan Shelhamer <shelhamer@imaginarynumber.net>
Sun, 7 Sep 2014 16:59:40 +0000 (18:59 +0200)
committer Evan Shelhamer <shelhamer@imaginarynumber.net>
Sun, 7 Sep 2014 17:56:45 +0000 (19:56 +0200)
diff --git a/docs/installation.md b/docs/installation.md

index 82a22d5..dbf73d2 100644 (file)
--- a/docs/installation.md
+++ b/docs/installation.md
@@ -15,7 +15,7 @@ We have installed Caffe on Ubuntu 14.04, Ubuntu 12.04, OS X 10.9, and OS X 10.8.
  
  Caffe depends on several software packages.
  
-* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*)
+* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.5 (recommended), 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*)
  * [BLAS](http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (provided via ATLAS, MKL, or OpenBLAS).
  * [OpenCV](http://opencv.org/).
  * [Boost](http://www.boost.org/) (>= 1.55, although only 1.55 is tested)
@@ -25,13 +25,17 @@ Caffe depends on several software packages.
  * For the MATLAB wrapper
      * MATLAB with the `mex` compiler.
  
-**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment.
+**cuDNN Caffe**: for fastest operation Caffe is accelerated by drop-in integration of [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). To speed up your Caffe models, install cuDNN then uncomment the `USE_CUDNN := 1` flag in `Makefile.config` when installing Caffe. Acceleration is automatic.
+
+**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` flag in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment.
  
  ### CUDA and BLAS
  
  Caffe requires the CUDA `nvcc` compiler to compile its GPU code and CUDA driver for GPU operation.
  To install CUDA, go to the [NVIDIA CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Install the library and the latest standalone driver separately; the driver bundled with the library is usually out-of-date. **Warning!** The 331.* CUDA driver series has a critical performance issue: do not use it.
  
+For best performance, Caffe can be accelerated by [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). Register for free at the cuDNN site, install it, then continue with these installation instructions. To compile with cuDNN set the `USE_CUDNN := 1` flag set in your `Makefile.config`.
+
  Caffe requires BLAS as the backend of its matrix and vector computations.
  There are several implementations of this library.
  The choice is yours:
@@ -92,7 +96,7 @@ Keep reading to find out how to manually build and install the Google flags libr
  On **CentOS / RHEL / Fedora**, most of the dependencies can be installed with
  
      sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
-    
+
  The Google flags library, Google logging library and LMDB already made their ways into newer versions of **CentOS / RHEL / Fedora** so it is better to first attempt to install them using `yum`
  
      sudo yum install gflags-devel glog-devel lmdb-devel
@@ -192,7 +196,7 @@ If you're not using Anaconda, include `hdf5` in the list above.
  **Note** that in order to build the caffe python wrappers you must install boost using the --with-python option:
  
      brew install --build-from-source --with-python --fresh -vd boost
-    
+
  **Note** that Homebrew maintains itself as a separate git repository and making the above `brew edit FORMULA` changes will change files in your local copy of homebrew's master branch. By default, this will prevent you from updating Homebrew using `brew update`, as you will get an error message like the following:
  
      $ brew update
@@ -201,7 +205,7 @@ If you're not using Anaconda, include `hdf5` in the list above.
      Please, commit your changes or stash them before you can merge.
      Aborting
      Error: Failure while executing: git pull -q origin refs/heads/master:refs/remotes/origin/master
-    
+
  One solution is to commit your changes to a separate Homebrew branch, run `brew update`, and rebase your changes onto the updated master, as follows:
  
      cd /usr/local
@@ -213,7 +217,7 @@ One solution is to commit your changes to a separate Homebrew branch, run `brew
      git rebase master caffe
      # Resolve any merge conflicts here
      git checkout caffe
-    
+
  At this point, you should be running the latest Homebrew packages and your Caffe-related modifications will remain in place. You may still get the following error:
  
      $ brew update
@@ -240,6 +244,8 @@ The defaults should work, but uncomment the relevant lines if using Anaconda Pyt
      make test
      make runtest
  
+To compile with cuDNN acceleration, you should uncomment the `USE_CUDNN := 1` switch in `Makefile.config`.
+
  If there is no GPU in your machine, you should switch to CPU-only Caffe by uncommenting `CPU_ONLY := 1` in `Makefile.config`.
  
  To compile the Python and MATLAB wrappers do `make pycaffe` and `make matcaffe` respectively.
diff --git a/docs/performance_hardware.md b/docs/performance_hardware.md

index 6238543..b35246f 100644 (file)
--- a/docs/performance_hardware.md
+++ b/docs/performance_hardware.md
@@ -4,7 +4,7 @@ title: Performance and Hardware Configuration
  
  # Performance and Hardware Configuration
  
-To measure performance on different NVIDIA GPUs we use the Caffe reference ImageNet model.
+To measure performance on different NVIDIA GPUs we use CaffeNet, the Caffe reference ImageNet model.
  
  For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified.
  
@@ -14,11 +14,16 @@ For training, each time point is 20 iterations/minibatches of 256 images for 5,1
  
  Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory.
  
-Best settings with ECC off and maximum clock speed:
+Best settings with ECC off and maximum clock speed in standard Caffe:
  
  * Training is 26.5 secs / 20 iterations (5,120 images)
  * Testing is 100 secs / validation set (50,000 images)
  
+Best settings with Caffe + [cuDNN acceleration](http://nvidia.com/cudnn):
+
+* Training is 19.2 secs / 20 iterations (5,120 images)
+* Testing is 60.7 secs / validation set (50,000 images)
+
  Other settings:
  
  * ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set
@@ -50,12 +55,19 @@ but note that this configuration resets across driver reloading / rebooting. Inc
  Training: 26.26 secs / 20 iterations (5,120 images).
  Testing: 100 secs / validation set (50,000 images).
  
+cuDNN Training: 20.25 secs / 20 iterations (5,120 images).
+cuDNN Testing: 66.3 secs / validation set (50,000 images).
+
+
  ## NVIDIA K20
  
  Training: 36.0 secs / 20 iterations (5,120 images).
-Testing: 133 secs / validation set (50,000 images)
+Testing: 133 secs / validation set (50,000 images).
  
  ## NVIDIA GTX 770
  
  Training: 33.0 secs / 20 iterations (5,120 images).
-Testing: 129 secs / validation set (50,000 images)
+Testing: 129 secs / validation set (50,000 images).
+
+cuDNN Training: 24.3 secs / 20 iterations (5,120 images).
+cuDNN Testing: 104 secs / validation set (50,000 images).
author	Evan Shelhamer <shelhamer@imaginarynumber.net>
	Sun, 7 Sep 2014 16:59:40 +0000 (18:59 +0200)
committer	Evan Shelhamer <shelhamer@imaginarynumber.net>
	Sun, 7 Sep 2014 17:56:45 +0000 (19:56 +0200)
docs/installation.md		patch \| blob \| history
docs/performance_hardware.md		patch \| blob \| history