[gradclip] hot fix + unittests
gradient clipping works by adding extending the execution order of the
gradients to the last node, where the global norm is calculated and the
gradients are clipped and applied.
However, weight sharing use of gradients also relies on the last access
of the gradient and gradient clipping disturbs the balance of gradient
last access.
As a quick fix, if gradient clip is enabled, the last access is replaced
with second to last access.
A better way would be for clipping to be layer, and then last access by
clipping layer would be a valid access and balance to the system can be
maintained.
Unittests for gradient clipping is added with and without weight
sharing.
Signed-off-by: Parichay Kapoor <pk.kapoor@samsung.com>