Zachary DeVito [Fri, 5 Apr 2019 20:33:14 +0000 (13:33 -0700)]
slots with explicit value/setValue make more sense in future patches (#18468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18468
ghimport-source-id:
d4b41c521f2269a695e03c8e7d05d5542731ee48
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18469 Create Object that represents a Module
* **#18468 slots with explicit value/setValue make more sense in future patches**
* #18467 Make Object hold its ClassType
* #18379 Enforce single parent for script submodules
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.
Reviewed By: suo
Differential Revision:
D14613509
fbshipit-source-id:
9f2208d0efd01465c78cebdc3e8365a9e0adf9ff
Zachary DeVito [Fri, 5 Apr 2019 20:33:14 +0000 (13:33 -0700)]
Make Object hold its ClassType (#18467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18467
ghimport-source-id:
d51bdd64d2529d08c634c58df1a0870b54ad49fb
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18469 Create Object that represents a Module
* #18468 slots with explicit value/setValue make more sense in future patches
* **#18467 Make Object hold its ClassType**
* #18379 Enforce single parent for script submodules
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.
Currently it holds a symbol whose unqualified name is the name of the
class. This will get confusing when there are multiple possible registries,
and it makes getting the class type from the object difficult.
The pointer to the class is only 4 more bytes so this patch just puts
it in the object.
Reviewed By: suo
Differential Revision:
D14613510
fbshipit-source-id:
b35175ba4be83d2522deaa6dad5070d6ec691fed
Zachary DeVito [Fri, 5 Apr 2019 20:33:14 +0000 (13:33 -0700)]
Enforce single parent for script submodules (#18379) (#18860)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18860
ghimport-source-id:
96305349bf3db564f43df2263b1e5bddcc9e9dae
Reviewed By: suo
Differential Revision:
D14780421
Pulled By: zdevito
fbshipit-source-id:
2bdd89b35866ba035ebea0adab037e441c1006e2
Stas Bekman [Fri, 5 Apr 2019 19:46:44 +0000 (12:46 -0700)]
CUDA_NVCC_EXECUTABLE is not needed, as nvcc is in PATH (#18958)
Summary:
As indicated by f0k: https://github.com/pytorch/pytorch/pull/18495#issuecomment-
480178763
nvcc via ccache is already first in the PATH in the instructions I provided, so CUDA_NVCC_EXECUTABLE is not needed.
I re-built to test that it's so.
Thank you!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18958
Differential Revision:
D14810732
Pulled By: ezyang
fbshipit-source-id:
3758ae2253c745c5d7cfccedd49fa00cc4629965
Ahmad Salim Al-Sibahi [Fri, 5 Apr 2019 19:45:37 +0000 (12:45 -0700)]
Fix precision issue with expansion that prefers 'probs' over 'logits' (#18614)
Summary:
I have experienced that sometimes both were in `__dict__`, but it chose to copy `probs` which loses precision over `logits`. This is especially important when training (bayesian) neural networks or doing other type of optimization, since the loss is heavily affected.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18614
Differential Revision:
D14793486
Pulled By: ezyang
fbshipit-source-id:
d4ff5e34fbb4021ea9de9f58af09a7de00d80a63
Joakim Rishaug [Fri, 5 Apr 2019 19:44:49 +0000 (12:44 -0700)]
Method is supposed to be in-place (#18684)
Summary:
Tracing models which attempts to return this in-place value doesn't turn out well.
I haven't run any tests to confirm the results to be honest, but regardless of the outcome, the operation happens in-place, so it should work as before.
Sample output from traced model attempting to set `max_norm` on `Embedding`:
```
a leaf Variable that requires grad has been used in an in-place operation. (check_inplace at /pytorch/torch/csrc/autograd/VariableTypeUtils.h:49)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f0ecc5cc021 in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f0ecc5cb8ea in /usr/local/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x38ab2f (0x7f0ecb55ab2f in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #3: torch::autograd::VariableType::embedding_renorm_(at::Tensor&, at::Tensor const&, double, double) const + 0x76 (0x7f0ecb5b5966 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #4: <unknown function> + 0x56c958 (0x7f0ecb73c958 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #5: <unknown function> + 0x672286 (0x7f0ecb842286 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #6: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x22 (0x7f0ecb83d842 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #7: <unknown function> + 0x65c6ac (0x7f0ecb82c6ac in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #8: <unknown function> + 0x3c8ab4 (0x7f0f06bc0ab4 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #9: <unknown function> + 0x3ad2c3 (0x7f0f06ba52c3 in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #10: <unknown function> + 0x11663e (0x7f0f0690e63e in /usr/local/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #39: python_call + 0x11 (0x5563c3c521c1 in uwsgi)
frame #40: uwsgi_request_wsgi + 0x100 (0x5563c3c54410 in uwsgi)
frame #41: wsgi_req_recv + 0xac (0x5563c3becabc in uwsgi)
frame #42: simple_loop_run + 0xc4 (0x5563c3c35be4 in uwsgi)
frame #43: simple_loop + 0x10 (0x5563c3c35a00 in uwsgi)
frame #44: uwsgi_ignition + 0x241 (0x5563c3c3a3a1 in uwsgi)
frame #45: uwsgi_worker_run + 0x275 (0x5563c3c3ec35 in uwsgi)
frame #46: <unknown function> + 0x8f22c (0x5563c3c3f22c in uwsgi)
frame #47: <unknown function> + 0x3c13e (0x5563c3bec13e in uwsgi)
frame #48: __libc_start_main + 0xf1 (0x7f0f138922e1 in /lib/x86_64-linux-gnu/libc.so.6)
frame #49: _start + 0x2a (0x5563c3bec16a in uwsgi)
:
operation failed in interpreter:
op_version_set = 0
def forward(self,
input_1: Tensor) -> Tensor:
_0 = torch.norm(self.item_embedding.weight, 2, 1, True)
_1 = torch.div(self.item_embedding.weight, _0)
m_weight = torch.t(_1)
input_2 = torch.contiguous(input_1)
weight_1 = torch.embedding_renorm_(self.item_embedding.weight, input_2, 1., 2.)
~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
x = torch.embedding(weight_1, input_2, -1, False, False)
input_3 = torch.div(x, torch.norm(x, 2, 2, True))
max_batch_size = ops.prim.NumToTensor(torch.size(input_3, 0))
hx = torch.zeros([2, int(max_batch_size), 70], dtype=6, layout=0, device=torch.device("cpu"))
_2 = [self.lstm_layer.weight_ih_l0, self.lstm_layer.weight_hh_l0, self.lstm_layer.weight_ih_l1, self.lstm_layer.weight_hh_l1]
input_4, _3, _4 = torch.lstm(input_3, [hx, hx], _2, False, 2, 0.
10000000000000001, False, False, True)
input = torch.matmul(input_4, torch.t(self.rnn2item.weight))
tastevec = torch.div(input, torch.norm(input, 2, 2, True))
outputs = torch.matmul(tastevec, m_weight)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18684
Differential Revision:
D14782041
Pulled By: ezyang
fbshipit-source-id:
7b2fc19b7d5b6600263644498bb728319a19f39d
Summer Deng [Fri, 5 Apr 2019 19:44:09 +0000 (12:44 -0700)]
fix bug when falling back to acc32 when weight is prepacked (#18881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18881
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18878
When the weight is prepacked and it doesn't contain a prepacked weight for acc32, we shouldn't fallback to acc32.
TODO: add unit tests with better coverage
Reviewed By: feiyu1990
Differential Revision:
D14778810
fbshipit-source-id:
d49a8c4b7c815ab29b77feb53ee730ad63780488
Marek Kolodziej [Fri, 5 Apr 2019 19:43:02 +0000 (12:43 -0700)]
More numerically stable lerp (#18871)
Summary:
The C++ and CUDA implementations of the lerp are not numerically stable. This is discussed on Wikipedia [here](https://en.wikipedia.org/wiki/Linear_interpolation#Programming_language_support). I checked the GPU SASS output and there's no overhead from using the more precise implementation, from Kepler all the way to Turing. I haven't looked at CPU ASM though.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18871
Differential Revision:
D14793438
Pulled By: ezyang
fbshipit-source-id:
2ddc2e026c5285466cae7d1b4101174253100445
Pieter Noordhuis [Fri, 5 Apr 2019 19:13:31 +0000 (12:13 -0700)]
Increase default c10d/ProcessGroupGloo test timeout (#18916)
Summary:
See #18659.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18916
Differential Revision:
D14808749
Pulled By: pietern
fbshipit-source-id:
9a9c8beddb2dbbb1bf4c5e575743d9e1fa3f07fa
Ailing Zhang [Fri, 5 Apr 2019 18:57:17 +0000 (11:57 -0700)]
remove symbolic variable part 1 (#17986)
Summary:
As discussed with gchanan we should deduplicate symbolic_variable and symbolic_script to prepare for the future merge with derivatives.yaml.
This PR moves most easy formulas to symbolic_script.
TODO: run benchmarks to make sure no perf regression
cc: apaszke zdevito wanchaol
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17986
Differential Revision:
D14766412
Pulled By: ailzhang
fbshipit-source-id:
d95a3f876e256c0f505779a71587c985571d3b8f
Edward Yang [Fri, 5 Apr 2019 18:55:38 +0000 (11:55 -0700)]
Revert
D14742020: Wrap workaround for cpp custom types a bit prettier and add an example
Differential Revision:
D14742020
Original commit changeset:
0f2fd83ae56a
fbshipit-source-id:
5640255aef0319b7d8996e07132e87213130d31c
Karl Ostmo [Fri, 5 Apr 2019 18:26:31 +0000 (11:26 -0700)]
Decompose more Windows scripts (#18917)
Summary:
This PR:
* pulls four distinct installation steps out of `build_pytorch.bat` and into their own scripts.
* eliminates the copy step for helper scripts called by `win-build.sh` and `win-test.sh`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18917
Differential Revision:
D14807236
Pulled By: kostmo
fbshipit-source-id:
03e91a5834dfd6d68903ad9725eacc099bbf6d53
Dmytro Dzhulgakov [Fri, 5 Apr 2019 18:14:11 +0000 (11:14 -0700)]
Wrap workaround for cpp custom types a bit prettier and add an example (#18791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18791
As a temporary demonstration on how to extend this hack further until custom C types are ready.
Reviewed By: jamesr66a
Differential Revision:
D14742020
fbshipit-source-id:
0f2fd83ae56ab2abe16977a1829ed421e6abe74b
bddppq [Fri, 5 Apr 2019 18:09:15 +0000 (11:09 -0700)]
Remove cuda::compat functions in aten (#18905)
Summary:
Looks like the issue of using `std::` functions is fixed in new rocm version
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18905
Differential Revision:
D14792943
Pulled By: bddppq
fbshipit-source-id:
af11acbb85872943f23b6e55415db1f0699e7b8f
Michael Suo [Fri, 5 Apr 2019 17:40:19 +0000 (10:40 -0700)]
fix side-effects and aliasing for custom ops (#18711)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18711
ghimport-source-id:
c9caedc0660b2b7ba3730cd0e1a2e0e9c3cf422b
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18711 [jit] fix side-effects and aliasing for custom ops**
Previously we didn't track aliasing, mutation, or side effects for
custom ops. This PR adds in guards with the most conservative
assumptions possible: the op will
1) have side effects,
2) write to everything
3) produce a wildcard.
In order to tell whether a given operator is a custom op, this PR introduces
the concept of a "reserved" namespace (basically all our builtin namespaces).
Custom ops live in non-reserved namespaces, so a check on the namespace
is sufficient to tell whether a schema/node is "custom" or not.
This is just to get things correct for now. Follow-ups to this:
- Users should be able to specify aliasing/mutability without having to learn
the whole alias annotation schema.
- Relax assumptions a bit. In particular outputs can only alias input tensors,
they don't have to be wildcards.
Fixes #18490
Differential Revision:
D14730978
fbshipit-source-id:
540b47a24ccf24145051609bdcc99c97e46e0fe0
Elias Ellison [Fri, 5 Apr 2019 17:37:58 +0000 (10:37 -0700)]
Expand the list of ops that mutate an inputs shape (#18812)
Summary:
Expand the list of ops that resize an input in-place to include broadcasting ops and other ops that affect shape. Whoever is reviewing the PR could you please look through pytorch in place ops and see if I missed any.
Expanding the PR from: https://github.com/pytorch/pytorch/pull/17518
This is already being tested in test_resize_input_ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18812
Differential Revision:
D14793410
Pulled By: eellison
fbshipit-source-id:
125f4f5375ac1036fb96fabc9da2aaccc9adc778
J M Dieterich [Fri, 5 Apr 2019 17:11:43 +0000 (10:11 -0700)]
add launch bounds, enable more tests (#18909)
Summary:
Add launch bounds annotations for ROCm arising from maxThreadsPerBlock and apply threads use.
Enable tests that now work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18909
Differential Revision:
D14801490
Pulled By: ezyang
fbshipit-source-id:
b81c97fc783a2627bc7e31b32036a364cfe40cc7
Yinghai Lu [Fri, 5 Apr 2019 17:09:14 +0000 (10:09 -0700)]
Add backward pass to infer single missing input shape for Concat opportunitiscally (#18911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18911
Att.
Reviewed By: bddppq
Differential Revision:
D14791295
fbshipit-source-id:
4b7a775924f0eadb0cb73aa6c434a6a5be8b92be
Jiakai Liu [Fri, 5 Apr 2019 16:54:27 +0000 (09:54 -0700)]
change to use clang if NDK >= 18 (#18914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18914
ghimport-source-id:
4d9d9322ee5559d96e13533ec37ff3be86a0227c
Reviewed By: ezyang
Differential Revision:
D14794162
Pulled By: ljk53
fbshipit-source-id:
caac55e12b1e62bf6ebcc6e2062d5ed122ad4e64
Zachary DeVito [Fri, 5 Apr 2019 16:46:10 +0000 (09:46 -0700)]
Revert
D14673459: [pytorch][PR] [jit] Replace Slot on script::Method with NamedIValue
Differential Revision:
D14673459
Original commit changeset:
21200180c47f
fbshipit-source-id:
9c01de4cf5bb7c87ac0c55705b901db990cd917b
Edward Yang [Fri, 5 Apr 2019 16:37:11 +0000 (09:37 -0700)]
Disable flaky test_proper_exit test. (#18950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18950
ghimport-source-id:
27bd575fd3c73a51ace1360aa020fa63a792a5d2
Differential Revision:
D14802009
Pulled By: ezyang
fbshipit-source-id:
051e1d038892c2c6e8337357fa80771b8dc42680
Edward Yang [Fri, 5 Apr 2019 16:33:08 +0000 (09:33 -0700)]
Checkout pytorch_sphinx_theme with https. (#18859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18859
ghimport-source-id:
fbbcb8a2dd9c9f0a317de489b6bbb83e9071a7d8
Differential Revision:
D14801989
Pulled By: ezyang
fbshipit-source-id:
a9bc02e1383adafcac01994e6346b28551d95c71
Pieter Noordhuis [Fri, 5 Apr 2019 16:04:43 +0000 (09:04 -0700)]
Add tests for reducer class (#18845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18845
This adds a few CPU only test cases for the reducer class.
Reviewed By: mrshenli
Differential Revision:
D14768432
fbshipit-source-id:
c008a52206826304e634a95bc14167ed94c97662
Owen Anderson [Fri, 5 Apr 2019 15:34:41 +0000 (08:34 -0700)]
Fix a few instances of notifying on a CV while holding the lock (#18857)
Summary:
Fix a few instances of notifying on a CV while holding the lock to release the lock before notifying. This avoids an extra thread suspension when the notified thread tries to grab the lock.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18857
Differential Revision:
D14779132
Pulled By: resistor
fbshipit-source-id:
b18a05c4c15be1426ebfdffac1c8f002b771cfd7
peter [Fri, 5 Apr 2019 14:44:43 +0000 (07:44 -0700)]
Unify caffe2 and libtorch build scripts on Windows (#18683)
Summary:
`scripts/build_windows.bat` is the original way to build caffe2 on Windows, but since it is merged into libtorch, the build scripts should be unified because they actually do the same thing except there are some different flags.
The follow-up is to add the tests. Looks like the CI job for caffe2 windows is defined [here](https://github.com/pytorch/ossci-job-dsl/blob/master/src/jobs/caffe2.groovy#L906). Could we make them a separate file, just like what we've done in `.jenkins/pytorch/win-build.sh`? There's a bunch of things we can do there, like using ninja and sccache to accelerate build.
cc orionr yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18683
Differential Revision:
D14730188
Pulled By: ezyang
fbshipit-source-id:
ea287d7f213d66c49faac307250c31f9abeb0ebe
Gregory Chanan [Fri, 5 Apr 2019 14:18:39 +0000 (07:18 -0700)]
Simplify storage wrapping in TH. (#18855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18855
ghimport-source-id:
01faa229fa4db901ab8539d3778b716d909ba4cf
Reviewed By: dzhulgakov
Differential Revision:
D14790669
Pulled By: gchanan
fbshipit-source-id:
167b9bc9c9872743fa8f6040a26ddf7ff5789c27
Gregory Chanan [Fri, 5 Apr 2019 14:18:38 +0000 (07:18 -0700)]
Cache device on TensorImpl; clean up TensorImpl constructors. (#18833)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18833
ghimport-source-id:
6f2be25fcc5e6be3ffe20582e604bd2c1fbab66b
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.**
* #18832 [STACK] Disallow changing the device of a tensor via set_.
* #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.
1) We cache device on TensorImpl. This means we can access the device without a virtual function and allows us to more easily extend TensorImpls (because they don't need to figure out how to store the Device for themselves).
2) Clean up TensorImpl APIs. We had a constructor that took a TensorTypeId and an allocator and would allocate a Storage based on the recognized types of TensorTypeIds. Instead, we just have two different constructors: one for types with a storage, one without.
Reviewed By: dzhulgakov
Differential Revision:
D14766230
fbshipit-source-id:
745b8db84dcd6cb58f1a8675ad3ff8d033bc50df
Vitaly Fedyunin [Fri, 5 Apr 2019 13:19:58 +0000 (06:19 -0700)]
Revert "Adding pin_memory kwarg to zeros, ones, empty,... (#18854)
Summary:
This reverts commit
c484cf43a02863efd2f4a76aad43246fb0191ab5.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18854
Differential Revision:
D14778393
Pulled By: VitalyFedyunin
fbshipit-source-id:
4b5a1f5b1c091bbc4a8e75614734cc011d26b452
Sebastian Messmer [Fri, 5 Apr 2019 08:46:58 +0000 (01:46 -0700)]
Silence compiler warnings (#18912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18912
We intentionally test a deprecated API, no need to show the warnings here.
Reviewed By: dzhulgakov
Differential Revision:
D14792617
fbshipit-source-id:
9ea2a4106d566064283726eed2c274b98f49a2e5
Dmytro Dzhulgakov [Fri, 5 Apr 2019 08:04:58 +0000 (01:04 -0700)]
ScriptModuleOp in caffe2 (#18716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18716
Might be useful as an intermediate stage for some systems that currently use Caffe2 nets as an execution mechanism.
Not sure it's a good idea all together, please comment.
Limitations:
- only Tensor types as inputs/outputs
- the entire module is serialized as a zip archive inside a proto in Caffe2 db, it'd be subject to 4Gb limit and is likely very slow. For small models it'd work though.
- no autograd, though it can be attached in principle
- no way to retrieve parameters inside the script module from C2 runtime perspective (though they potentially can be alias-fetched and stored as individual blobs)
- after deserialization, python wrappers returned don't have correct type (as we don't do module_lookup trick)
Build-wise, I had to add dependency from pybind_state to libtorch.so. I don't think we build Caffe2 python frontend independently anymore, so it should be fine.
Reviewed By: amirshim, houseroad
Differential Revision:
D14339599
fbshipit-source-id:
88a37a8abd1f1c4703e5ef937031f222535d4080
Karl Ostmo [Fri, 5 Apr 2019 07:49:06 +0000 (00:49 -0700)]
flake8 fix on extracted python script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18931
Differential Revision:
D14796114
Pulled By: kostmo
fbshipit-source-id:
25971be5a36fffc61e29db981af7298a0fe0ed8c
David Riazati [Fri, 5 Apr 2019 06:27:05 +0000 (23:27 -0700)]
Replace Slot on script::Method with NamedIValue (#18252)
Summary:
This refactor lets us track the types of initial values added onto a `Method`. The main motivation for this is the change in `module.cpp`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18252
Differential Revision:
D14673459
Pulled By: driazati
fbshipit-source-id:
21200180c47f25bb70898771adfb569856e6c34a
Karl Ostmo [Fri, 5 Apr 2019 04:05:13 +0000 (21:05 -0700)]
U/kostmo/windows offload scripts 3
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18754
Differential Revision:
D14794893
Pulled By: kostmo
fbshipit-source-id:
05187d9b53615ffbcc7253accdc692c4ecaf25d9
Tongzhou Wang [Fri, 5 Apr 2019 02:03:08 +0000 (19:03 -0700)]
fix lint in optim doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18883
Differential Revision:
D14793365
Pulled By: ezyang
fbshipit-source-id:
c1b46c98e3319badec3e0e772d0ddea24cbf9c89
Iurii Zdebskyi [Fri, 5 Apr 2019 01:23:38 +0000 (18:23 -0700)]
Fixed the comment to reference gist example instead of private repo (#18852)
Summary:
Replace link to a file in a private repo with a gist
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18852
Reviewed By: ezyang
Differential Revision:
D14778481
Pulled By: izdeby
fbshipit-source-id:
8389aa4bf115ddcfd85079cc2c861404efa678e7
Sepehr Sameni [Fri, 5 Apr 2019 01:07:54 +0000 (18:07 -0700)]
return missing keys from load_state_dict (#18668)
Summary:
return missing_keys and unexpected_keys from load_state_dict so the user can handle them when strict mode is off; also removed an unused variable
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18668
Differential Revision:
D14782073
Pulled By: ezyang
fbshipit-source-id:
ab3b855eb77bb7422594d971988067e86eef20f2
Junjie Bai [Fri, 5 Apr 2019 00:21:41 +0000 (17:21 -0700)]
Fix caffe2 miopen conv transpose gradient op for case of no dX gradient
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18809
Reviewed By: ezyang
Differential Revision:
D14759762
Pulled By: bddppq
fbshipit-source-id:
ff795b7e58c82f67a1d7284b5ab06b0e0e5fd3ae
Brennan Vincent [Fri, 5 Apr 2019 00:18:11 +0000 (17:18 -0700)]
don't attempt to multiply by a sparse matrix (#18737)
Summary:
Tested by running the script in #16562 , and there was no error.
Then:
```
>>> print(mat.grad)
tensor([[1., 2., 3.],
[1., 2., 3.],
[1., 2., 3.]])
```
which is correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18737
Differential Revision:
D14773078
Pulled By: umanwizard
fbshipit-source-id:
8aa36eb6f6aa104263a467d9ac91d61b3bfd05f5
Wanchao Liang [Fri, 5 Apr 2019 00:00:46 +0000 (17:00 -0700)]
add Fast-RNN to AI-PEP
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18885
Reviewed By: hl475
Differential Revision:
D14728854
fbshipit-source-id:
7e7a2946929551963f7c938e3d82a260a9efdfbd
Pieter Noordhuis [Thu, 4 Apr 2019 21:14:50 +0000 (14:14 -0700)]
Allow override of backend in dist.new_group() (#18595)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18595
There is no need to force the backend to be the same as the global
process group, as long as the backend is "nccl" or "gloo".
Reviewed By: mrshenli
Differential Revision:
D14657204
fbshipit-source-id:
868817b9f219e3be8db0761a487f0027ed46663b
Lara [Thu, 4 Apr 2019 20:15:18 +0000 (13:15 -0700)]
ONNX Export All Cases of Softmax
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18482
Reviewed By: zrphercule
Differential Revision:
D14630697
Pulled By: houseroad
fbshipit-source-id:
c06f1e3bead10a265c5f4ac3723d49f4caf46801
Iurii Zdebskyi [Thu, 4 Apr 2019 20:01:10 +0000 (13:01 -0700)]
Added bool and half support for resize_as_ and view methods (#18821)
Summary:
Enabled **resize_as_** and **view** methods for bool and half tensors.
tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18821
Reviewed By: ezyang
Differential Revision:
D14762852
Pulled By: izdeby
fbshipit-source-id:
4312079fb4e893fea6f71ff4f163094b2674f1e8
Lu Fang [Thu, 4 Apr 2019 19:57:31 +0000 (12:57 -0700)]
update of fbcode/onnx to
079c2639f9bb79b1774d1e3bfa05b0c093816ca7 (#18841)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18841
Previous import was
f0d7df2c643c4e37f1fd7735ef02c972c4d19fb5
Included changes:
- **[
079c2639](https://github.com/onnx/onnx/commit/
079c2639)**: update the squeeze and unsqueeze doc (#1905) <Lu Fang>
- **[
a8b45d62](https://github.com/onnx/onnx/commit/
a8b45d62)**: fix the ir_version onnx-operators.proto (#1903) <Lu Fang>
Reviewed By: zrphercule
Differential Revision:
D14767158
fbshipit-source-id:
2d772fece45e25d72bf1d10fad156189397f3f86
James Reed [Thu, 4 Apr 2019 19:53:44 +0000 (12:53 -0700)]
Actually model scalar type promotion in shape analysis (#18811)
Summary:
This was causing some numerical issues in the fuser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18811
Differential Revision:
D14767390
Pulled By: jamesr66a
fbshipit-source-id:
f1123d1aab5501abad850d2edc996f8aa8dafe04
Max Wang [Thu, 4 Apr 2019 19:42:12 +0000 (12:42 -0700)]
Add a .ctags.d/ toplevel directory (#18827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18827
ghimport-source-id:
38f857bc29b2c2c6a71069d00c4c69ed0bef1574
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18827 Add a .ctags.d/ toplevel directory**
Exclude build artifacts by default.
Reviewed By: ezyang
Differential Revision:
D14765721
fbshipit-source-id:
a785dbb2ef1df96af8e23cc65c8db2a6b67b4fce
Wanwannodao [Thu, 4 Apr 2019 19:40:46 +0000 (12:40 -0700)]
Fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18802
Differential Revision:
D14781874
Pulled By: ezyang
fbshipit-source-id:
0f94c40bd84c84558ea3329117580f6c749c019f
Xiaomeng Yang [Thu, 4 Apr 2019 18:46:37 +0000 (11:46 -0700)]
Add support for group ConvTranspose (#18794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18794
Add support for group ConvTranspose
Reviewed By: houseroad
Differential Revision:
D14741327
fbshipit-source-id:
5d947ca044bf8495dd7f8f56122441ebbcc6c7e4
Gregory Chanan [Thu, 4 Apr 2019 18:12:13 +0000 (11:12 -0700)]
Disallow changing the device of a tensor via set_. (#18832)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18832
ghimport-source-id:
fde4ad90541ba52dfa02bdd83466f17e6541e535
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.
* **#18832 [STACK] Disallow changing the device of a tensor via set_.**
* #18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.
This is necessary to cache the device on a TensorImpl.
Differential Revision:
D14766231
fbshipit-source-id:
bba61634b2d6252ac0697b96033c9eea680956e8
Karl Ostmo [Thu, 4 Apr 2019 17:38:09 +0000 (10:38 -0700)]
U/kostmo/win test offload scripts
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18694
Differential Revision:
D14766339
Pulled By: kostmo
fbshipit-source-id:
a2300e72129979f866430ca5c09dd7fff6df0a89
Zachary DeVito [Thu, 4 Apr 2019 17:22:27 +0000 (10:22 -0700)]
Revert
D14603722: Enforce single parent for script submodules
Differential Revision:
D14603722
Original commit changeset:
63ab5d0cccf7
fbshipit-source-id:
2c4174def102eda4589e08c4dbd67ce8af975199
Edward Yang [Thu, 4 Apr 2019 16:20:20 +0000 (09:20 -0700)]
Fix deviceCount on FakeGuardImpl. (#18745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18745
ghimport-source-id:
3ed111efe83b3061652869e33d9b5910b7daa732
Differential Revision:
D14759198
Pulled By: ezyang
fbshipit-source-id:
70a8db767f310fe0e0079c7b0693e9330d7cd472
Gregory Chanan [Thu, 4 Apr 2019 13:19:54 +0000 (06:19 -0700)]
Stop swapping in Storages of the wrong device for Tensors. (#18831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18831
ghimport-source-id:
2741e0d70ebe2c2217572c3af54ddd9d2047e342
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18833 [STACK] Cache device on TensorImpl; clean up TensorImpl constructors.
* #18832 [STACK] Disallow changing the device of a tensor via set_.
* **#18831 [STACK] Stop swapping in Storages of the wrong device for Tensors.**
This is necessary to support device caching, see https://github.com/pytorch/pytorch/pull/18751 and https://github.com/pytorch/pytorch/pull/18578.
In library code, we potentially swap in Storages with the wrong device when device_guard is False. This happens as follows with "view-like" operations.
1) We allocate a tensor on the 'wrong' device (because device_guard is false).
2) We swap out the 'wrong' storage with the 'right' storage using e.g. THCTensor_setStorage.
Instead, we can just construct the Tensor with the correct Storage from the beginning. This is what we do with 'view'.
Note there are two other "view-like" cases where this happens:
1) unfold
2) set_()
Because these aren't performance critical, I just added the device_guard instead of applying the above correction.
For completeness, this also includes a test that all `device_guard: false` functions behave properly under these conditions.
Reviewed By: dzhulgakov
Differential Revision:
D14766232
fbshipit-source-id:
0865c3ddae3f415df5da7a9869b1ea9f210e81bc
Roy Li [Thu, 4 Apr 2019 09:21:09 +0000 (02:21 -0700)]
Pass ScalarType separately from Type in python constructors
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17786
Reviewed By: ezyang
Differential Revision:
D14379075
fbshipit-source-id:
3abf066563b789a30cafe5b0c868a41326f5b833
Roy Li [Thu, 4 Apr 2019 09:21:09 +0000 (02:21 -0700)]
Store ScalarType and Backend instead of Type in TensorIterator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/17601
Reviewed By: ezyang
Differential Revision:
D14274754
fbshipit-source-id:
b08880ae586b6ae57d4c0bbeb203796d087926c4
Roy Li [Thu, 4 Apr 2019 09:21:09 +0000 (02:21 -0700)]
Introduce DeprecatedTypeProperties class (#17991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17991
changes:
-Breaks bc: Tensor::type() now returns DeprecatedTypeProperties& rather than Type&.
-Added DeprecatedTypeProperties, it serves as a temporary replacement for Type as the return value of Tensor::type(). This contributes to making Type just for dispatch purposes so that we can make it dtype agnostic.
-Tensor::dispatch_type() now returns Type& like Tensor::type() used to do.
-Changed callsites of Tensor::type() appropriately.
Reviewed By: ezyang
Differential Revision:
D14443117
fbshipit-source-id:
239ccb7a09626279a71d1a37f8f82e7f57bf7d9e
Bram Wasti [Thu, 4 Apr 2019 07:24:16 +0000 (00:24 -0700)]
Fix to handle null strides in DLPack tensor (#18510)
Summary:
DLPack can have non-strided tensors, which is represented by a nullptr in the place of dl_tensor.strides.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18510
Differential Revision:
D14647328
Pulled By: bwasti
fbshipit-source-id:
5364282810a5772cfc2319fc8133fe86fdd84dd1
Yinghai Lu [Thu, 4 Apr 2019 07:19:21 +0000 (00:19 -0700)]
Add shape inference function for Split (#18838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18838
It turns out that we don't have shape inference function of `Split` op at all. This diff adds that.
Reviewed By: bertmaher
Differential Revision:
D14766871
fbshipit-source-id:
535cb4f24bdada603c76579e00e7a39aee93e19f
Lu Fang [Thu, 4 Apr 2019 06:14:07 +0000 (23:14 -0700)]
Fix the duplication problem in _unique_state_dict (#18139)
Summary:
Since parameter.data will create a new torch.Tensor each time, we get duplicate tensors when call _unique_state_dict now. Try to deduplicate it before creating new tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18139
Reviewed By: dzhulgakov
Differential Revision:
D14511262
Pulled By: houseroad
fbshipit-source-id:
cb69795d0b6509721220650bbb19edeb3459a503
Jongsoo Park [Thu, 4 Apr 2019 05:50:05 +0000 (22:50 -0700)]
fold col offset into bias; optimize A symmetric quant (#17026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17026
D14013931 was for FC. This diff is similar optimizations for Conv.
A subtle difference is that in FC, once we fold col_offset into bias during pre-processing step, we can treat everything as if A_zero_offset == 0 (symmetric quantization of A).
In Conv, we can't do this because padding still needs to use the original A_zero_offset.
From requantization point of view, once col_offset folded into bias, we can treat as if we're doing symmetric A quantization.
But, for steps involving padding like im2col, im2col fused with packing, and direct conv for depth-wise/group convolution we still need to pass the original A_zero_offset.
Reviewed By: jianyuh
Differential Revision:
D14020276
fbshipit-source-id:
c29caefd1127bbc6aff0e9d535939bb0c1ecb66c
Michael Suo [Thu, 4 Apr 2019 05:18:09 +0000 (22:18 -0700)]
fix flake8 lint (#18835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18835
ghimport-source-id:
7b1f433ae51232822704d62699233688072cbc23
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18835 fix flake8 lint**
* #18826 [jit] run cpp tests for non-cuda builds in test_jit.py
...again
Reviewed By: ZolotukhinM
Differential Revision:
D14766790
fbshipit-source-id:
29361a407589092831dfbc3c5d63d2834934cd02
Michael Suo [Thu, 4 Apr 2019 05:18:09 +0000 (22:18 -0700)]
run cpp tests for non-cuda builds in test_jit.py (#18826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18826
ghimport-source-id:
7ffa3bc7ef7402a6d6eb6ba5849e197019d77bf8
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18826 [jit] run cpp tests for non-cuda builds in test_jit.py**
We did all the work of nicely separating our cpp tests that don't require
CUDA, but they aren't run from test_jit.py if CUDA is missing.
Reviewed By: ZolotukhinM
Differential Revision:
D14766287
fbshipit-source-id:
9326b3a5c90f6c20fc8cfaf1a1885a363b91f30a
Lu Fang [Thu, 4 Apr 2019 04:29:36 +0000 (21:29 -0700)]
Fix the linter (#18842)
Summary:
Remove extra empty line
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18842
Differential Revision:
D14767334
Pulled By: houseroad
fbshipit-source-id:
63224bc407949949e1eb5123d3f151e4ac8f6988
Zachary DeVito [Thu, 4 Apr 2019 03:21:27 +0000 (20:21 -0700)]
Enforce single parent for script submodules (#18379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18379
ghimport-source-id:
9895ecc1ff7897e98853dc00675341f36726e7c7
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18379 Enforce single parent for script submodules**
* #18378 Unify namespace of script::Module
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.
The assumption that a ScriptModule has a single parent is present in
our serialization format, and likely a few other places. It is not
enforced on creation of script module hierarchies though, meaning that
problems associated with (e.g. replicating a module twice in the output
format) will not be caught until much later in the development cycle.
This patch enforces the property when a submodule is registered.
It also removes NamedModule since it is no longer necessary in this regime.
This will also allow the easy discover of a modules fully-qualified name
without needing to traverse the Module hierarchy.
Differential Revision:
D14603722
fbshipit-source-id:
63ab5d0cccf7d66c7833e0adf9023024ca9607cb
Elias Ellison [Thu, 4 Apr 2019 00:09:37 +0000 (17:09 -0700)]
Allow ints, floats, and tensors in conditional (#18755)
Summary:
Per our offline discussion, allow Tensors, ints, and floats to be casted to be bool when used in a conditional
Fix for https://github.com/pytorch/pytorch/issues/18381
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18755
Reviewed By: driazati
Differential Revision:
D14752476
Pulled By: eellison
fbshipit-source-id:
149960c92afcf7e4cc4997bccc57f4e911118ff1
Wanchao Liang [Wed, 3 Apr 2019 23:50:46 +0000 (16:50 -0700)]
Fix layernorm ad formula on weight and bias (#18233)
Summary:
Fix the layernorm formula when weight and bias passed in.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18233
Differential Revision:
D14760375
Pulled By: wanchaol
fbshipit-source-id:
d6bd3b137bc04c391aa5c24d021d1f811ba2a877
Zachary DeVito [Wed, 3 Apr 2019 22:58:08 +0000 (15:58 -0700)]
Unify namespace of script::Module (#18378)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18378
ghimport-source-id:
55c29bb436a2153d29ff2f4488d99d8863c187b1
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18379 Enforce single parent for script submodules
* **#18378 Unify namespace of script::Module**
* #18314 Add ability to specialize class types to ArgumentSpec
* #18226 Add Slot type to abstract the raw pointers being used for slots.
This removes individual OrderedDicts in favor of a single unified
namespace for all things in a script::Module. This removes a whole
class of bugs where both a method and an parameter could get the
same name, for instance.
Since we no longer have to expose OrderedDict::Item objects, a lot of
downstream code can be simplified.
We no longer now double-store names (both in the key of the dictionary,
and in the object itself).
Differential Revision:
D14603723
fbshipit-source-id:
b5f7551b3074679623edd6ea70269830353b4d4c
Vitaly Fedyunin [Wed, 3 Apr 2019 22:26:34 +0000 (15:26 -0700)]
Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance (#18648)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18648
ghimport-source-id:
1cf4a8fe91492621e02217f38cae5d7e0699fb05
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18661 Step 7: remove _unique
* #18655 Step 6: Rename _unique2 to unique and add int? dim
* #18654 Step 5: remove _unque_dim in favor of unique_dim
* #18651 Step 4: add support for unique with dim=None
* #18650 Step 3: Add support for return_counts to torch.unique for dim not None
* #18649 Step 2: Rename _unique_dim2_temporary_will_remove_soon to unique_dim
* **#18648 Step 1: Secretly add return_counts to unique, and refactor unique_dim for performance**
`unique` is fragile, previously I tried to change it in #18391 and #17097, they all pass OSS tests but finally get reverted due to internal failure. My previous work of refactoring unique #18459 is based on #18391, and after #18391 get reverted, I could not work on #18459. To continue working on #18459, #18391, and #17097 without worrying about internal failures, I am suggesting the following steps for the improvements of `unique` and `unique_dim`. soumith Please take this and there is no need to put #18391 back.
The motivation is basically to move forward as much as possible without causing any internal failures. So I will try to divide it into steps and sort from low probability of internal failure to high probability. (I don't know what the internal failure is, so I have to guess). Let's merge these PR stack one by one until we enounter internal failure.
Step 1: Create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and keep `_unique` and `_unique_dim` unchanged. The backend of these two functions and `_unique` and `_unique_dim` are all the same, the only difference is the temporary ones support `return_counts` but not the `_unique` and `_unique_dim`. Step one is mostly #18391 + #18459. The cuda8 errors has been fixed. At this point, there is no user visible API change, so no docs are updated. `torch.unique` does not support `return_counts` yet, and `return_counts` is tested through the newly added temporary operators. This step just added two new ATen operators, so there shouldn't be any internal failure.
Step 2: Rename `_unique_dim2_temporary_will_remove_soon` to `unique_dim`. This should cause no internal failure either, because no change to existing operators. The only thing to worry about is to delete `unique_dim` from python side because we don't want users to use it. At this point, C++ users now have `return_counts` support for `unique_dim`.
Step 3: Update the docs of `torch.unique` and use `unique_dim` inside `torch.unique` to support `return_counts` In the docs, we should say `torch.unique` with None dim support does not support `return_counts` yet. This might cause internal failure.
Step 4: Rename `_unique2_temporary_will_remove_soon` to `_unique2` and use `_unique2` inside `torch.unique` to support `return_counts`. Update the docs saying that `torch.unique` with None dim now support `return_counts`. This might cause internal failure.
Step 5: Remove `_unique_dim`. This might cause internal failure.
Step 6: Rename `_unique2` to `unique`, add optional `dim` argument to make it looks like the signature of Python's `torch.unique`. Inside `torch.unique`, use `unique` and get rid of `unique_dim`. Unbind `unique_dim` totally from Python at codegen. This is likely to cause internal fail.
Step 7: Remove `_unique`. This is very likely to cause internal failure.
This PR
======
This PR is for step 1. This create two new ATen operators, `_unique2_temporary_will_remove_soon` and `_unique_dim2_temporary_will_remove_soon` and implement `return_counts` inside them and do refactor for performance improvements.
Please review ngimel VitalyFedyunin. They are mostly copied from #18391 and #18459, so the review should be easy.
Below is a benchmark on a tensor of shape `torch.Size([15320, 2])`:
Before
---------
```python
print(torch.__version__)
%timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize()
```
```
1.0.1
192 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
548 ms ± 3.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```
```python
print(torch.__version__)
%timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize()
```
```
1.0.1
226 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
302 µs ± 7.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
After
-------
```python
print(torch.__version__)
%timeit a.unique(dim=0, sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(dim=0, sorted=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique_dim2_temporary_will_remove_soon(a, dim=0, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```
```
1.1.0a0+83ab8ac
190 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
237 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
219 µs ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
263 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
```python
print(torch.__version__)
%timeit a.unique(sorted=True, return_inverse=False); torch.cuda.synchronize()
%timeit a.unique(sorted=True, return_inverse=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=False, return_counts=True); torch.cuda.synchronize()
%timeit torch._unique2_temporary_will_remove_soon(a, sorted=True, return_inverse=True, return_counts=True); torch.cuda.synchronize()
```
```
1.1.0a0+83ab8ac
232 µs ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
301 µs ± 1.65 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
264 µs ± 7.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
339 µs ± 9.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```
Differential Revision:
D14730905
fbshipit-source-id:
10026b4b98628a8565cc28a13317d29adf1225cc
Shen Li [Wed, 3 Apr 2019 21:37:54 +0000 (14:37 -0700)]
Support replicating multi-GPU modules (#18687)
Summary:
If the input `network` resides on multiple GPUs, `devices` must be a 2D list with `devices[0]` matching `network`'s devices. See #18591
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18687
Differential Revision:
D14706162
Pulled By: mrshenli
fbshipit-source-id:
dca630d3308f2dbcf8b75629c452d7a64092ba42
Wanchao Liang [Wed, 3 Apr 2019 21:07:31 +0000 (14:07 -0700)]
flake8 fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18810
Differential Revision:
D14758293
Pulled By: wanchaol
fbshipit-source-id:
975abe4fc5dc0dc4d43af61ec0f987e2c5670874
Gregory Chanan [Wed, 3 Apr 2019 20:55:03 +0000 (13:55 -0700)]
Remove `device_guard: False` from native_functions that don't have a … (#18803)
Summary:
…tensor.
There's nothing to device_guard on.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18803
Reviewed By: ezyang
Differential Revision:
D14748091
Pulled By: gchanan
fbshipit-source-id:
ed6f16d6f4d3f07b6d5ad9696f71a14333c228b8
Edward Yang [Wed, 3 Apr 2019 20:38:56 +0000 (13:38 -0700)]
Switch our Linux machine AMI to a newer image. (#18433)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18433
ghimport-source-id:
1c92f98b091232c0045a2e1db75d19c1f258ac1f
Differential Revision:
D14748827
Pulled By: ezyang
fbshipit-source-id:
a459451058cf5560811403bafb96c6ff083d7e3a
Jerry Zhang [Wed, 3 Apr 2019 20:13:26 +0000 (13:13 -0700)]
QTensor (#18230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18230
Implementing minimum qtensor API to unblock other workstreams in quantization
Changes:
- Added Quantizer which represents different quantization schemes
- Added qint8 as a data type for QTensor
- Added a new ScalarType QInt8
- Added QTensorImpl for QTensor
- Added following user facing APIs
- quantize_linear(scale, zero_point)
- dequantize()
- q_scale()
- q_zero_point()
Reviewed By: dzhulgakov
Differential Revision:
D14524641
fbshipit-source-id:
c1c0ae0978fb500d47cdb23fb15b747773429e6c
Dmytro Dzhulgakov [Wed, 3 Apr 2019 20:12:28 +0000 (13:12 -0700)]
Enforce import order to make protobuf cpp implementation in python work (#18560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18560
We have to import python protobuf here **before** we load cpp extension.
Otherwise it breaks under certain build conditions if cpp implementation of
protobuf is used. Presumably there's some registry in protobuf library and
python side has to initialize the dictionary first, before static
initialization in python extension does so. Otherwise, duplicated protobuf
descriptors will be created and it can lead to obscure errors like
Parameter to MergeFrom() must be instance of same class: expected caffe2.NetDef got caffe2.NetDef.
I think it also fixes https://github.com/facebookarchive/caffe2/issues/1573
Reviewed By: ezyang, iroot900
Differential Revision:
D14622054
fbshipit-source-id:
2499eb88ecdee85ff8d845859048f7ae5da2a480
Lu Fang [Wed, 3 Apr 2019 20:08:21 +0000 (13:08 -0700)]
Pin onnx ir_version to 4 (#18768)
Summary:
to make test_operators.py more stable. in future, we will bump this up manually, and I think it's acceptable, since ir_version should be bumped too often.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18768
Reviewed By: zrphercule
Differential Revision:
D14741514
Pulled By: houseroad
fbshipit-source-id:
0369dbc55424e345a113e49fc104a441ea290d58
Soumith Chintala [Wed, 3 Apr 2019 19:44:35 +0000 (12:44 -0700)]
fix nccl compilation to make sure it compiles for architectures that pytorch compiles for (#18739)
Summary:
resubmit of https://github.com/pytorch/pytorch/pull/18704 with additional fixes
Fixes https://github.com/pytorch/pytorch/issues/18359
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18739
Differential Revision:
D14737274
Pulled By: soumith
fbshipit-source-id:
cfbbbf68b098594bd045861d1b2c085da693ea51
Soumith Chintala [Wed, 3 Apr 2019 19:27:19 +0000 (12:27 -0700)]
push magma init into lazyInitCUDA (#18527)
Summary:
Tries to fix C++ API's usage of MAGMA-based functions.
Attempts to Fix https://github.com/pytorch/pytorch/issues/18074
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18527
Differential Revision:
D14691694
Pulled By: soumith
fbshipit-source-id:
dd04e74418e486d73ea4a92193ddf79352ed71ba
Jerry Zhang [Wed, 3 Apr 2019 19:01:57 +0000 (12:01 -0700)]
For some files that are touched by the QTensor diff (#18765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18765
att
Reviewed By: ZolotukhinM
Differential Revision:
D14733442
fbshipit-source-id:
525002034e6dccc2045da645e1193671fd0474b3
Wanchao Liang [Wed, 3 Apr 2019 18:18:05 +0000 (11:18 -0700)]
Fix contiguous AD and Autogradzero inconsistency (#18633)
Summary:
Fixes #17962
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18633
Differential Revision:
D14700449
Pulled By: wanchaol
fbshipit-source-id:
3d15d67c01b69b28394a0f2f001db90ed9fd31dc
Iurii Zdebskyi [Wed, 3 Apr 2019 17:53:11 +0000 (10:53 -0700)]
Added indexing for bool tensors and bool Indices (#18583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18583
ghimport-source-id:
2b1941449827f4ab632fa0f5c8cf0791a6be0845
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18583 Added indexing for bool tensors and bool Indices**
* #18505 Added numpy conversion
* #18166 Bool Tensor for CUDA
-----------
This PR enables bool tensor indexing and indexing with bool indices. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
1. Storage Implementation [Done]
2. Tensor Creation.
a) CPU [Done]
b) CUDA [In review]
3. Tensor Conversions. [In review]
4. Tensor Indexing. [This PR]
5. Tensor Operations.
6. Back compatibility related changes.
TODO:
as a follow up, we should move nonzero method from TH to Aten to make code cleaner.
Change:
```
v = torch.tensor([True, False, True], dtype=torch.bool)
boolIndices = torch.tensor([True, False, False], dtype=torch.bool)
v[boolIndices]
-> tensor([True], dtype=torch.bool)
v = torch.randn(5, 7, 3)
boolIndices = torch.tensor([True, False, True, True, False], dtype=torch.bool)
v[boolIndices]
->
tensor([[[ 0.5885, -0.3322, 0.7388],
[ 1.1182, 0.7808, -1.1492],
[-0.7952, 0.5255, -0.0251],
[ 0.7128, 0.8099, 1.2689],
[-0.7018, -1.4733, -0.3732],
[ 0.4503, 0.4986, -1.1605],
[ 0.3348, -1.3767, -0.2976]],
[[-2.0303, -0.4720, -0.1448],
[-0.1914, -0.6821, 2.0061],
[-1.0420, -0.1872, -0.3438],
[ 1.7587, -0.4183, -0.7577],
[ 1.0094, -0.1950, -0.2430],
[ 0.1174, 0.3308, -0.5700],
[ 0.1110, -0.2714, 1.3006]],
[[-0.1946, -1.4747, -0.4650],
[-1.0567, 1.0110, -0.2809],
[ 0.3729, -0.5699, 0.0815],
[-0.7733, -0.8316, 0.1674],
[ 1.2000, -0.3745, -1.1679],
[ 1.7105, 0.9851, -0.1907],
[-1.1077, 0.2086, -0.0548]]])
```
Differential Revision:
D14673403
fbshipit-source-id:
2b88ec2c7eb26a4f5ef64f8707fb68068d476fc9
Lu Fang [Wed, 3 Apr 2019 17:51:41 +0000 (10:51 -0700)]
add an assertion to check the param num (#18145)
Summary:
Introduce this check to see whether it will break any existing workflow
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18145
Reviewed By: dzhulgakov
Differential Revision:
D14511711
Pulled By: houseroad
fbshipit-source-id:
a7bb6ac84c9133fe94d3fe2f1a8566faed14a136
Jiakai Liu [Wed, 3 Apr 2019 17:42:30 +0000 (10:42 -0700)]
add Android NDK param to CI docker build script (#18782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18782
ghimport-source-id:
6c4bde7dc835b59209c1d5f7b243f00c9fe99de2
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18782 [pytorch] add Android NDK param to CI docker build script**
Inspired by discussion: https://github.com/pytorch/pytorch/pull/16242
Reviewed By: dreiss
Differential Revision:
D14739471
fbshipit-source-id:
0a081045186cbf359eb3cdadee722741cd8cd62f
Gu, Jinghui [Wed, 3 Apr 2019 17:29:19 +0000 (10:29 -0700)]
Upgrade mkldnn-bridge for dnnlowp support (#16308)
Summary:
The mkldnn-bridge is upgraded in this PR to support DNNLOWP operators.
Meanwhile, APIs have been updated in caffe2 to use latest version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16308
Differential Revision:
D14697018
Pulled By: yinghai
fbshipit-source-id:
ca952589098accb08295fd5aa92924c61e74d69c
Michael Kösel [Wed, 3 Apr 2019 17:11:33 +0000 (10:11 -0700)]
add 'abs' builtin
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18502
Differential Revision:
D14750173
Pulled By: eellison
fbshipit-source-id:
359cf08938ada442ca1a3b3ea14022ce10229499
kshitij12345 [Wed, 3 Apr 2019 16:16:29 +0000 (09:16 -0700)]
Fix dense Embedding to work with double backward (#9078)
Summary:
Fixes : #6469
1. `ATen/native/native_functions.yml` had [dispatch](https://github.com/pytorch/pytorch/blob/
03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/native_functions.yaml#L451-L455) variants for for `embedding_dense_backward` , however `embedding_backward` explicitly made [call](https://github.com/pytorch/pytorch/blob/
03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/Embedding.cpp#L35-L45) to it, thus leading to error.
2. In case of CUDA type tensor, the function crashed used to crash on dereferencing of indices's data [pointer](https://github.com/pytorch/pytorch/blob/
03e7953a98875c0164cb8e2c19b45800e85f4347/aten/src/ATen/native/Embedding.cpp#L93).
Both have been solved and checked against (on CUDA and CPU)
1. As mentioned in the issue
```
import torch
class Test(torch.nn.Module):
def __init__(self):
super(Test,self).__init__()
self.embd = torch.nn.Embedding(1000, 100)
self.dense = torch.nn.Linear(100, 1)
def forward(self, inp):
inp = self.embd(inp)
return self.dense(inp)
test = Test()
inp = torch.tensor([0,1,2,1,1])
out = test(inp)
raw_loss = out.mean(dim=0)
loss_grad = torch.autograd.grad(outputs=raw_loss,
inputs=list(test.parameters()),
retain_graph=True, create_graph=True, only_inputs=True)
norm = sum([param.norm()**2 for param in loss_grad])
loss = raw_loss + norm
loss.backward(retain_graph=True)
print(test.embd.weight.grad)
```
2. Test Script
```
import torch
import time
start = time.time()
l = [1,1]*100
input = torch.tensor([[1,0],[1,0]],device='cpu')
embedding_matrix = torch.tensor([[1.0,3.0],[2.0,4]],requires_grad=True,device='cpu')
sq = embedding_matrix * embedding_matrix
emb = torch.nn.functional.embedding(input, sq,scale_grad_by_freq=False)
print('Embedding Matrix')
print(embedding_matrix)
print('-----------------')
sum_ = emb.sum()#prod.sum()
loss_grad, = torch.autograd.grad(outputs=sum_,inputs=embedding_matrix,create_graph=True)
print('Gradient')
print(loss_grad)
print('-----------------')
sum2_ = sum_ + loss_grad.sum()
print(sum2_)
sum2_.backward()
print(embedding_matrix.grad)
print(time.time() - start)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9078
Reviewed By: ezyang
Differential Revision:
D14691901
Pulled By: soumith
fbshipit-source-id:
78e2612ba39080be564c876311671eb5a0119a0f
Shen Li [Wed, 3 Apr 2019 16:06:09 +0000 (09:06 -0700)]
Highlight NCCL all_reduce and all_gather requirements (#18741)
Summary:
See #18689
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18741
Differential Revision:
D14726874
Pulled By: mrshenli
fbshipit-source-id:
a92404c653e3c62fc23fa3ccacfb3b2959b2e307
svcscm [Wed, 3 Apr 2019 15:25:14 +0000 (08:25 -0700)]
Updating submodules
Reviewed By: zpao
fbshipit-source-id:
ea0b06ce68d3fd6092eaea7c835a8b51c1120ea0
peter [Wed, 3 Apr 2019 15:19:45 +0000 (08:19 -0700)]
Make it possible for users for select /Zi or /ZI over /Z7 when using MSVC (#18790)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/18701.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18790
Differential Revision:
D14748195
Pulled By: ezyang
fbshipit-source-id:
e50df1b5ca199a88d7b5ea3ea45d25d23cd31a27
Jongsoo Park [Wed, 3 Apr 2019 14:55:02 +0000 (07:55 -0700)]
use optimization in
D14020675 (#16945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16945
As title
Reviewed By: jianyuh
Differential Revision:
D14020769
fbshipit-source-id:
fc0f05fcc57bfe9b4aa0c5750060d7b2ba57dd7a
Gregory Chanan [Wed, 3 Apr 2019 14:52:54 +0000 (07:52 -0700)]
Add device and dtype to storage. (#18749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18749
ghimport-source-id:
9026a037f5e11cdb9ccd386f4b6b5768b9c3259b
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18751 Disallow changing the device of a tensor via set_.
* #18750 Use non-legacy constructors for tensor deserialization.
* **#18749 Add device and dtype to storage.**
The goal here is to fix our serialization, which currently depends on the legacy constructors. Having dtype and device on Storage allows us to use the non-legacy constructors.
This fits somewhat along our goal of removing Storage, my having Storage act like a Tensor.
Differential Revision:
D14729516
fbshipit-source-id:
bf4a3e8669ad4859931f4a3fa56df605cbc08dcb
Gregory Chanan [Wed, 3 Apr 2019 14:51:15 +0000 (07:51 -0700)]
Use non-legacy constructors for tensor deserialization. (#18750)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18750
ghimport-source-id:
f1475cfb67841c41d9867d4429ba9125d5c7dd07
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18751 Disallow changing the device of a tensor via set_.
* **#18750 Use non-legacy constructors for tensor deserialization.**
* #18749 Add device and dtype to storage.
Deserialization currently uses legacy constructors. This is bad because we need to maintain them, but there is a more immediate problem:
1) We are trying to implement device caching on TensorImpl to get rid of a virtual dispatch
2) This doesn't work if one is able to change the device of a Tensor underlying a Variable.
3) Deserialization does 2)
So the plan is to change deserialization, then enforce that we don't change the device out from underneath a Variable.
Differential Revision:
D14729513
fbshipit-source-id:
090d6cdb375b94dc1bf4f554b2df243952b8cdc6
Iurii Zdebskyi [Wed, 3 Apr 2019 14:22:38 +0000 (07:22 -0700)]
Added numpy conversion (#18505)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18505
ghimport-source-id:
f3c9b9251e5793f9e192f587194ddfebb45facc1
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18505 [WIP]Added numpy conversion**
* #18166 Bool Tensor for CUDA
Differential Revision:
D14646403
fbshipit-source-id:
79d39d692c778ce1981c1d35b1c33e3d93111041
Gregory Chanan [Wed, 3 Apr 2019 14:05:16 +0000 (07:05 -0700)]
Remove THTensor_(newUnfold). (#18773)
Summary:
It's not used and unfold's use of `device_guard: False` is scary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18773
Differential Revision:
D14736526
Pulled By: gchanan
fbshipit-source-id:
6281a284bee45fa5038783e4c1ed4d1ed7ca81ab
mingzhe0908 [Wed, 3 Apr 2019 05:49:49 +0000 (22:49 -0700)]
temp fix for flake8 error (#18788)
Summary:
Fix lint error
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18788
Reviewed By: houseroad
Differential Revision:
D14741840
Pulled By: mingzhe09088
fbshipit-source-id:
1fa630e3c6e606e3d78fe8293e5b0e7ea1b78da3
Igor Fedan [Wed, 3 Apr 2019 04:10:22 +0000 (21:10 -0700)]
Fix flake8 issues
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18762
Reviewed By: houseroad
Differential Revision:
D14734152
Pulled By: ifedan
fbshipit-source-id:
5adf123f88273895ad34ee9041896358d686de08
Jerry Zhang [Wed, 3 Apr 2019 03:54:28 +0000 (20:54 -0700)]
Change ReinitializeTensor to use C10_LOG_FIRST_N (#18531)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18531
Currently we use C10_LOG_EVERY_MS to log the data type change, but it pollutes the log of some service,
we would like to change it to C10_LOG_FIRST_N to prevent that.
Reviewed By: dzhulgakov
Differential Revision:
D14647704
fbshipit-source-id:
b84e4002bd4aa94d616133cd1049c3d4ab05386e
Yinghai Lu [Wed, 3 Apr 2019 03:52:58 +0000 (20:52 -0700)]
Add support for getting TensorProto argument (#18364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18364
att
Reviewed By: bddppq
Differential Revision:
D14584784
fbshipit-source-id:
03f9207d5cf4f7f4b812428a931edbcdcb21ca8d
Michael Suo [Wed, 3 Apr 2019 01:06:07 +0000 (18:06 -0700)]
make test module hook use save/load (#18284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18284
ghimport-source-id:
5a92c03fda19072ffb6afd40e0f56806716c7be6
Stack from [ghstack](https://github.com/ezyang/ghstack):
* #18296 [jit] Add namespacing for ScriptClasses
* **#18284 [jit] make test module hook use save/load**
* #18211 [jit] Turn script_type_parser into a class
* #18148 [jit] python interop for script classes
Instead of python-printing and comparing strings (which does not capture
depdency information, etc.), use save/load on in-memory buffers and
compare the main module contents inside the buffer
Reviewed By: ailzhang
Differential Revision:
D14581129
fbshipit-source-id:
52264ae9ce076775ab3fd1a0c32c8d6f6677a903
Zachary DeVito [Wed, 3 Apr 2019 00:33:06 +0000 (17:33 -0700)]
Add ability to specialize class types to ArgumentSpec (#18314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18314
ghimport-source-id:
8cecb768d476ab19c9460f39c8f94a764e4cb052
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18314 Add ability to specialize class types to ArgumentSpec**
* #18226 Add Slot type to abstract the raw pointers being used for slots.
Differential Revision:
D14574395
fbshipit-source-id:
cc3af6e56e9ae52990f4a1ad56ecceaa2d493577
Mingzhe Li [Wed, 3 Apr 2019 00:03:23 +0000 (17:03 -0700)]
Operator-level performance microbenchmarks (#18740)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18740
Test utilities for writing Caffe2/PyTorch performance microbenchmarks. Brief description of the file structure
* benchmark_core.py : core utiltiites for running microbenchmark tests
* benchmark_caffe2.py : Caffe2 specific benchmark utilitites
* benchmark_pytorch.py: PyTorch specific benchmark utilities
* benchmark_runner.py : Main function. Currently it can run the microbenchmark tests in a stand-alone mode. The next step is to have this integrate with AI-PEP.
The utilities are located at https://github.com/pytorch/pytorch/tree/master/test to have access to both Caffe2/PyTorch Python's frontend.
Include two operator microbenchmarks; support both Caffe2/PyTorch:
* MatMul
* Add
Reference: PyTorch benchmarks : https://github.com/pytorch/benchmark/tree/master/timing/python. In this work, we start with two example binary operators MatMul and Add, but eventually we should to cover unary operators like in the PyTorch benchmark repo.
Reviewed By: zheng-xq
Differential Revision:
D13887111
fbshipit-source-id:
b7a56b95448c9ec3e674b0de0ffb96af4439bfce
Iurii Zdebskyi [Tue, 2 Apr 2019 23:10:43 +0000 (16:10 -0700)]
Bool Tensor for CUDA (#18166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18166
ghimport-source-id:
a8e2ba2d966e49747a55701c4f6863c5e24d6f14
Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#18166 Bool Tensor for CUDA**
* #18165 Resolved comments from Bool Tensor for CPU PR
------
This PR enables bool tensor creation and some basic operations for the CPU backend. This is a part of Bool Tensor feature implementation work. The whole plan looks like this:
1. Storage Implementation [Done]
2. Tensor Creation.
a) CPU [Done]
b) CUDA [This PR]
3. Tensor Conversions.
4. Tensor Indexing.
5. Tensor Operations.
6. Back compatibility related changes.
Change:
Enable bool tensor in CUDA with the following operations:
torch.zeros
torch.tensor
torch.ones
torch.rand/rand_like/randint/randint_like
torch.full
torch.full_like
torch.empty
torch.empty_like
Tested via unit tests and local scripts.
Differential Revision:
D14605104
fbshipit-source-id:
b7d7340a7d70edd03a109222d271e68becba762c