62aba2ab6b- Enable sle15_python_module_pythons. - GCC 9.3 or newer is required, regardless if CUDA is enabled. See https://github.com/pytorch/pytorch/blob/v2.3.1/CMakeLists.txt#L48 Therefore, for SLE15 we went with GCC 11 as it seems to be the most common one. - Use %gcc_version macro for Tumbleweed.
devel
Guillaume GARDET
2024-08-31 09:11:26 +0000
c138972860Accepting request 1189413 from science:machinelearning
Dominique Leuenberger
2024-07-25 13:38:58 +0000
9c8ce17a59- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on trainingChristian Goll2024-07-19 12:15:19 +0000
aed2b2a069- updated the requirement for examples and convertersChristian Goll2020-02-25 12:33:19 +0000
0a90b0a626- updated to stable release 1.4.0, which has as Highlights: * Distributed Model Parallel Training * Pruning functionalities have been added to PyTorch - New Features: * torch.optim.lr_scheduler now support “chaining.” * torch.distributed.rpc is a newly introduced package - full Changelog listed in relases file or under https://github.com/pytorch/pytorch/releases - added files: * skip-third-party-check.patch which is a patch to skip the check of disabled dependencies * QNNPACK-7d2a4e9931a82adc3814275b6219a03e24e36b4c.tar.gz which is part of pytorch but developed in different repo - removed patch files: * fix-build-options.patch * honor-PSIMD-env.patch * removed-some-tests.patch - Requires python-PeachPy on x86_64 only, as it is optional and available on x86_64 onlyChristian Goll2020-02-21 15:50:33 +0000
3756bcc792- updated the requirement for examples and convertersChristian Goll2020-01-14 14:16:39 +0000
10e2e125b6Requires python-PeachPy on x86_64 only, as it is optional and available on x86_64 onlyChristian Goll2020-01-14 14:16:11 +0000
ba0a6629a5- moved libraries to seperate package, unfortunatley they have no version, so these are plain .so files - THNN.h and THCUNN.h are interpred by the python and can so not be part of the devel packageChristian Goll2019-08-28 11:58:32 +0000