Caffe depends on several software packages.
-* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*)
+* [CUDA](https://developer.nvidia.com/cuda-zone) library version 6.5 (recommended), 6.0, 5.5, or 5.0 and the latest driver version for CUDA 6 or 319.* for CUDA 5 (and NOT 331.*)
* [BLAS](http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) (provided via ATLAS, MKL, or OpenBLAS).
* [OpenCV](http://opencv.org/).
* [Boost](http://www.boost.org/) (>= 1.55, although only 1.55 is tested)
* For the MATLAB wrapper
* MATLAB with the `mex` compiler.
-**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment.
+**cuDNN Caffe**: for fastest operation Caffe is accelerated by drop-in integration of [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). To speed up your Caffe models, install cuDNN then uncomment the `USE_CUDNN := 1` flag in `Makefile.config` when installing Caffe. Acceleration is automatic.
+
+**CPU-only Caffe**: for cold-brewed CPU-only Caffe uncomment the `CPU_ONLY := 1` flag in `Makefile.config` to configure and build Caffe without CUDA. This is helpful for cloud or cluster deployment.
### CUDA and BLAS
Caffe requires the CUDA `nvcc` compiler to compile its GPU code and CUDA driver for GPU operation.
To install CUDA, go to the [NVIDIA CUDA website](https://developer.nvidia.com/cuda-downloads) and follow installation instructions there. Install the library and the latest standalone driver separately; the driver bundled with the library is usually out-of-date. **Warning!** The 331.* CUDA driver series has a critical performance issue: do not use it.
+For best performance, Caffe can be accelerated by [NVIDIA cuDNN](https://developer.nvidia.com/cudnn). Register for free at the cuDNN site, install it, then continue with these installation instructions. To compile with cuDNN set the `USE_CUDNN := 1` flag set in your `Makefile.config`.
+
Caffe requires BLAS as the backend of its matrix and vector computations.
There are several implementations of this library.
The choice is yours:
On **CentOS / RHEL / Fedora**, most of the dependencies can be installed with
sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
-
+
The Google flags library, Google logging library and LMDB already made their ways into newer versions of **CentOS / RHEL / Fedora** so it is better to first attempt to install them using `yum`
sudo yum install gflags-devel glog-devel lmdb-devel
**Note** that in order to build the caffe python wrappers you must install boost using the --with-python option:
brew install --build-from-source --with-python --fresh -vd boost
-
+
**Note** that Homebrew maintains itself as a separate git repository and making the above `brew edit FORMULA` changes will change files in your local copy of homebrew's master branch. By default, this will prevent you from updating Homebrew using `brew update`, as you will get an error message like the following:
$ brew update
Please, commit your changes or stash them before you can merge.
Aborting
Error: Failure while executing: git pull -q origin refs/heads/master:refs/remotes/origin/master
-
+
One solution is to commit your changes to a separate Homebrew branch, run `brew update`, and rebase your changes onto the updated master, as follows:
cd /usr/local
git rebase master caffe
# Resolve any merge conflicts here
git checkout caffe
-
+
At this point, you should be running the latest Homebrew packages and your Caffe-related modifications will remain in place. You may still get the following error:
$ brew update
make test
make runtest
+To compile with cuDNN acceleration, you should uncomment the `USE_CUDNN := 1` switch in `Makefile.config`.
+
If there is no GPU in your machine, you should switch to CPU-only Caffe by uncommenting `CPU_ONLY := 1` in `Makefile.config`.
To compile the Python and MATLAB wrappers do `make pycaffe` and `make matcaffe` respectively.
# Performance and Hardware Configuration
-To measure performance on different NVIDIA GPUs we use the Caffe reference ImageNet model.
+To measure performance on different NVIDIA GPUs we use CaffeNet, the Caffe reference ImageNet model.
For training, each time point is 20 iterations/minibatches of 256 images for 5,120 images total. For testing, a 50,000 image validation set is classified.
Performance is best with ECC off and boost clock enabled. While ECC makes a negligible difference in speed, disabling it frees ~1 GB of GPU memory.
-Best settings with ECC off and maximum clock speed:
+Best settings with ECC off and maximum clock speed in standard Caffe:
* Training is 26.5 secs / 20 iterations (5,120 images)
* Testing is 100 secs / validation set (50,000 images)
+Best settings with Caffe + [cuDNN acceleration](http://nvidia.com/cudnn):
+
+* Training is 19.2 secs / 20 iterations (5,120 images)
+* Testing is 60.7 secs / validation set (50,000 images)
+
Other settings:
* ECC on, max speed: training 26.7 secs / 20 iterations, test 101 secs / validation set
Training: 26.26 secs / 20 iterations (5,120 images).
Testing: 100 secs / validation set (50,000 images).
+cuDNN Training: 20.25 secs / 20 iterations (5,120 images).
+cuDNN Testing: 66.3 secs / validation set (50,000 images).
+
+
## NVIDIA K20
Training: 36.0 secs / 20 iterations (5,120 images).
-Testing: 133 secs / validation set (50,000 images)
+Testing: 133 secs / validation set (50,000 images).
## NVIDIA GTX 770
Training: 33.0 secs / 20 iterations (5,120 images).
-Testing: 129 secs / validation set (50,000 images)
+Testing: 129 secs / validation set (50,000 images).
+
+cuDNN Training: 24.3 secs / 20 iterations (5,120 images).
+cuDNN Testing: 104 secs / validation set (50,000 images).