From 359197b039f7dfb60fb9bc83f44f75b5477faf78 Mon Sep 17 00:00:00 2001 From: Evan Shelhamer Date: Sun, 7 Sep 2014 18:59:40 +0200 Subject: [PATCH] [docs] include cuDNN in installation and performance reference --- docs/installation.md | 18 ++++++++++++------ docs/performance_hardware.md | 20 ++++++++++++++++---- 2 files changed, 28 insertions(+), 10 deletions(-) diff --git a/docs/installation.md b/docs/installation.md index 82a22d5..dbf73d2 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -15,7 +15,7 @@ We have installed Caffe on Ubuntu 14.04, Ubuntu 12.04, OS X 10.9, and OS X 10.8. Caffe depends on several software packages. -* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*) +* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.5 (recommended), 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*) * [BLAS](http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (provided via ATLAS, MKL, or OpenBLAS). * [OpenCV](http://opencv.org/). * [Boost](http://www.boost.org/) (>= 1.55, although only 1.55 is tested) @@ -25,13 +25,17 @@ Caffe depends on several software packages. * For the MATLAB wrapper * MATLAB with the `mex` compiler. -**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment. +**cuDNN Caffe**: for fastest operation Caffe is accelerated by drop-in integration of [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). To speed up your Caffe models, install cuDNN then uncomment the `USE_CUDNN := 1` flag in `Makefile.config` when installing Caffe. Acceleration is automatic. + +**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` flag in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment. ### CUDA and BLAS Caffe requires the CUDA `nvcc` compiler to compile its GPU code and CUDA driver for GPU operation. To install CUDA, go to the [NVIDIA CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Install the library and the latest standalone driver separately; the driver bundled with the library is usually out-of-date. **Warning!** The 331.* CUDA driver series has a critical performance issue: do not use it. +For best performance, Caffe can be accelerated by [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). Register for free at the cuDNN site, install it, then continue with these installation instructions. To compile with cuDNN set the `USE_CUDNN := 1` flag set in your `Makefile.config`. + Caffe requires BLAS as the backend of its matrix and vector computations. There are several implementations of this library. The choice is yours: @@ -92,7 +96,7 @@ Keep reading to find out how to manually build and install the Google flags libr On **CentOS / RHEL / Fedora**, most of the dependencies can be installed with sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel - + The Google flags library, Google logging library and LMDB already made their ways into newer versions of **CentOS / RHEL / Fedora** so it is better to first attempt to install them using `yum` sudo yum install gflags-devel glog-devel lmdb-devel @@ -192,7 +196,7 @@ If you're not using Anaconda, include `hdf5` in the list above. **Note** that in order to build the caffe python wrappers you must install boost using the --with-python option: brew install --build-from-source --with-python --fresh -vd boost - + **Note** that Homebrew maintains itself as a separate git repository and making the above `brew edit FORMULA` changes will change files in your local copy of homebrew's master branch. By default, this will prevent you from updating Homebrew using `brew update`, as you will get an error message like the following: $ brew update @@ -201,7 +205,7 @@ If you're not using Anaconda, include `hdf5` in the list above. Please, commit your changes or stash them before you can merge. Aborting Error: Failure while executing: git pull -q origin refs/heads/master:refs/remotes/origin/master - + One solution is to commit your changes to a separate Homebrew branch, run `brew update`, and rebase your changes onto the updated master, as follows: cd /usr/local @@ -213,7 +217,7 @@ One solution is to commit your changes to a separate Homebrew branch, run `brew git rebase master caffe # Resolve any merge conflicts here git checkout caffe - + At this point, you should be running the latest Homebrew packages and your Caffe-related modifications will remain in place. You may still get the following error: $ brew update @@ -240,6 +244,8 @@ The defaults should work, but uncomment the relevant lines if using Anaconda Pyt make test make runtest +To compile with cuDNN acceleration, you should uncomment the `USE_CUDNN := 1` switch in `Makefile.config`. + If there is no GPU in your machine, you should switch to CPU-only Caffe by uncommenting `CPU_ONLY := 1` in `Makefile.config`. To compile the Python and MATLAB wrappers do `make pycaffe` and `make matcaffe` respectively. diff --git a/docs/performance_hardware.md b/docs/performance_hardware.md index 6238543..b35246f 100644 --- a/docs/performance_hardware.md +++ b/docs/performance_hardware.md @@ -4,7 +4,7 @@ title: Performance and Hardware Configuration # Performance and Hardware Configuration -To measure performance on different NVIDIA GPUs we use the Caffe reference ImageNet model. +To measure performance on different NVIDIA GPUs we use CaffeNet, the Caffe reference ImageNet model. For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified. @@ -14,11 +14,16 @@ For training, each time point is 20 iterations/minibatches of 256 images for 5,1 Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory. -Best settings with ECC off and maximum clock speed: +Best settings with ECC off and maximum clock speed in standard Caffe: * Training is 26.5 secs / 20 iterations (5,120 images) * Testing is 100 secs / validation set (50,000 images) +Best settings with Caffe + [cuDNN acceleration](http://nvidia.com/cudnn): + +* Training is 19.2 secs / 20 iterations (5,120 images) +* Testing is 60.7 secs / validation set (50,000 images) + Other settings: * ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set @@ -50,12 +55,19 @@ but note that this configuration resets across driver reloading / rebooting. Inc Training: 26.26 secs / 20 iterations (5,120 images). Testing: 100 secs / validation set (50,000 images). +cuDNN Training: 20.25 secs / 20 iterations (5,120 images). +cuDNN Testing: 66.3 secs / validation set (50,000 images). + + ## NVIDIA K20 Training: 36.0 secs / 20 iterations (5,120 images). -Testing: 133 secs / validation set (50,000 images) +Testing: 133 secs / validation set (50,000 images). ## NVIDIA GTX 770 Training: 33.0 secs / 20 iterations (5,120 images). -Testing: 129 secs / validation set (50,000 images) +Testing: 129 secs / validation set (50,000 images). + +cuDNN Training: 24.3 secs / 20 iterations (5,120 images). +cuDNN Testing: 104 secs / validation set (50,000 images). -- 2.7.4