python-torch/python-torch.spec

425 lines
14 KiB
RPMSpec
Raw Normal View History

#
# spec file for package python-torch
#
# Copyright (c) 2024 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
# Please submit bugfixes or comments via https://bugs.opensuse.org/
#
%define srcname pytorch
%define pname torch
%global flavor @BUILD_FLAVOR@%{nil}
%if "%{flavor}" == "standard"
%bcond_with cuda
%endif
%if "%{flavor}" == "cuda-10-2"
%bcond_without cuda
%define cudaver 10-2
%endif
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%bcond_with mpi
%if "%{flavor}" == "openmpi4"
%bcond_without mpi
%bcond_without openmpi4
%global mpi_flavor openmpi
%define mpi_ext 4
%endif
%if "%{flavor}" == "vulkan"
%bcond_without vulkan
%global pkg_suffix -vulkan
%endif
%if %{with mpi}
%global pkg_suffix %{?mpi_flavor:-%{mpi_flavor}%{?mpi_ext}}
%define pkg_prefix %{_libdir}/mpi/gcc/%{mpi_flavor}%{?mpi_ext}
%define pkg_bindir %{pkg_prefix}/bin/
%define pkg_libdir %{pkg_prefix}/%{_lib}/
%define pkg_incdir %{pkg_prefix}/include/
%define pkg_datadir %{pkg_prefix}/share/
%define pkg_sysconfdir %{pkg_prefix}/etc/
%define pkg_skeldir %{pkg_prefix}/etc/skel/
%define package_name %{pname}%{?pkg_suffix}
%endif
%define FP16_version 4dfe081
%define FXdiv_version b408327
%define QNNPACK_version 7d2a4e9
%define XNNPACK_version fcbf55a
%define cpuinfo_version d6860c4
%define flatbuffers 01834de
%define foxi_version c278588
%define fmt_version e69e5f9
%define gemmlowp_version 3fb5c
%define gloo_version 5354032
%define kineto 3f30237
%define libnop 910b558
%define onnx_version 990217f
%define pocketfft 9d3ab05
%define psimd_version 072586a
%define pthreadpool_version 4fe0e1e
%define pybind11_version 3e9dfa2
%define sleef_version e0a003e
%define tensorpipe 52791a2
Name: python-torch%{?pkg_suffix}
Version: 2.3.1
Release: 0
Summary: Deep learning framework aka pytorch/Caffe2
License: Apache-2.0 AND BSD-2-Clause AND BSD-3-Clause AND MIT AND Zlib AND BSL-1.0
Group: Development/Languages/Python
URL: https://pytorch.org
Source0: https://github.com/pytorch/pytorch/archive/v%{version}.tar.gz#/%{srcname}-%{version}.tar.gz
Source1: releases.html
# License10: BSD-3-Clause
Source10: https://github.com/facebookincubator/gloo/archive/%{gloo_version}.tar.gz#/gloo-%{gloo_version}.tar.gz
# License12: BSD-2-Clause
Source12: https://github.com/pytorch/cpuinfo/archive/%{cpuinfo_version}.tar.gz#/cpuinfo-%{cpuinfo_version}.tar.gz
# License13: BSL-1.0
Source13: https://github.com/zdevito/sleef/archive/%{sleef_version}.tar.gz#/sleef-%{sleef_version}.tar.gz
# License14: BSD-3-Clause
Source14: https://github.com/pybind/pybind11/archive/%{pybind11_version}.tar.gz#/pybind11-%{pybind11_version}.tar.gz
# License15: MIT
Source15: https://github.com/onnx/onnx/archive/%{onnx_version}.tar.gz#/onnx-%{onnx_version}.tar.gz
#License16: BSD-2-Clause
Source16: https://github.com/Maratyszcza/pthreadpool/archive/%{pthreadpool_version}.tar.gz#/pthreadpool-%{pthreadpool_version}.tar.gz
# License17: MIT
Source17: https://github.com/Maratyszcza/FXdiv/archive/%{FXdiv_version}.tar.gz#/FXdiv-%{FXdiv_version}.tar.gz
# License18: MIT
Source18: https://github.com/Maratyszcza/psimd/archive/%{psimd_version}.tar.gz#/psimd-%{psimd_version}.tar.gz
# License19: MIT
Source19: https://github.com/Maratyszcza/FP16/archive/%{FP16_version}.tar.gz#/FP16-%{FP16_version}.tar.gz
# License20: Apache-2.0
Source20: https://github.com/google/gemmlowp/archive/%{gemmlowp_version}.tar.gz#/gemmlowp-%{gemmlowp_version}.tar.gz
# License21: MIT
Source21: https://github.com/houseroad/foxi/archive/%{foxi_version}.tar.gz#/foxi-%{foxi_version}.tar.gz
# License22: MIT
Source22: https://github.com/pytorch/QNNPACK/archive/%{QNNPACK_version}.tar.gz#/QNNPACK-%{QNNPACK_version}.tar.gz
# License23: BSD-3-Clause
Source23: https://github.com/google/XNNPACK/archive/%{XNNPACK_version}.tar.gz#/XNNPACK-%{XNNPACK_version}.tar.gz
# License 25: MIT
Source25: https://github.com/fmtlib/fmt/archive/%{fmt_version}.tar.gz#/fmt-%{fmt_version}.tar.gz
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
# License 26: BSD-3-Clause
Source26: https://github.com/mreineck/pocketfft/archive/%{pocketfft}.tar.gz#/pocketfft-%{pocketfft}.tar.gz
# License 27: BSD-3-Clause
Source27: https://github.com/pytorch/kineto/archive/%{kineto}.tar.gz#/kineto-%{kineto}.tar.gz
# License 28: Apache-2.0
Source28: https://github.com/google/flatbuffers/archive/%{flatbuffers}.tar.gz#/flatbuffers-%{flatbuffers}.tar.gz
# License 29: BSD-3-Clause
Source29: https://github.com/pytorch/tensorpipe/archive/%{tensorpipe}.tar.gz#/tensorpipe-%{tensorpipe}.tar.gz
# License 30: Apache-2.0
Source30: https://github.com/google/libnop/archive/%{libnop}.tar.gz#/libnop-%{libnop}.tar.gz
Patch1: skip-third-party-check.patch
Patch2: fix-setup.patch
# A python call to cmake fails with a return code of 1 on this arch, disable it for now.
# and 32-bit arm is not supported
ExcludeArch: %ix86 %{arm}
BuildRequires: %{python_module Gloo}
%ifarch x86_64
BuildRequires: %{python_module PeachPy}
%endif
BuildRequires: %{python_module PyYAML}
BuildRequires: %{python_module devel}
BuildRequires: %{python_module hypothesis}
BuildRequires: %{python_module numpy-devel}
BuildRequires: %{python_module opcodes}
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
BuildRequires: %{python_module pip}
BuildRequires: %{python_module protobuf}
BuildRequires: %{python_module psutil}
BuildRequires: %{python_module py-cpuinfo}
BuildRequires: %{python_module setuptools}
BuildRequires: %{python_module typing_extensions}
BuildRequires: %{python_module typing}
%if 0%{?suse_version} <= 1500
# Python 3.6 still need dataclasses
BuildRequires: %{python_module dataclasses}
%ifarch aarch64
# XNNPACK uses +dotprod modifier which requires GCC8+
BuildRequires: gcc8
BuildRequires: gcc8-c++
%endif
%endif
BuildRequires: cmake >= 3.5
BuildRequires: eigen3-devel
BuildRequires: fdupes
BuildRequires: gcc-c++
BuildRequires: glog-devel
BuildRequires: gtest
BuildRequires: leveldb-devel
BuildRequires: libnuma-devel
BuildRequires: libopenblas_pthreads-devel
BuildRequires: libuv-devel
BuildRequires: lmdb-devel
BuildRequires: ninja
BuildRequires: openblas-devel
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
BuildRequires: opencv-devel
BuildRequires: openssl-devel
BuildRequires: protobuf-c
BuildRequires: protobuf-devel
BuildRequires: python-rpm-macros
BuildRequires: snappy-devel
%if %{with cuda}
BuildRequires: cuda-compiler-%cudaver
BuildRequires: cuda-cudart-dev-%cudaver
BuildRequires: cuda-libraries-dev-%cudaver
BuildRequires: cuda-misc-headers-%cudaver
BuildRequires: cuda-nsight-%cudaver
BuildRequires: cuda-toolkit-%cudaver
%if 0%{?suse_version} > 1500
BuildRequires: gcc7
BuildRequires: gcc7-c++
%endif
BuildRequires: libcudnn7-devel
BuildRequires: libnccl-devel
%endif
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%if %{with openmpi4}
BuildRequires: openmpi4-devel
Conflicts: %{python_module torch}
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%endif
%if %{with vulkan}
BuildRequires: VulkanMemoryAllocator-devel
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
BuildRequires: shaderc
BuildRequires: vulkan-devel
Conflicts: %{python_module torch}
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%endif
Requires: python-numpy
Requires: python-protobuf
Requires: python-six
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
Requires: python-typing_extensions
Provides: python-caffe2%{?pkg_suffix} = %version
Provides: python-pytorch%{?pkg_suffix} = %version
%if "%flavor" == ""
ExclusiveArch: do_not_build
%endif
%python_subpackages
%description
PyTorch enables fast, flexible experimentation and efficient production through
a hybrid front-end, distributed training, and ecosystem of tools and libraries.
The library is developed by Facebook and other groups.
PyTorch provides two high-level features:
* Tensor computing (like NumPy) with strong acceleration via graphics
* processing units (GPU) Deep neural networks built on a tape-based autodiff
system
%package devel
Summary: Headers for C/C++, cmake build description and libraries needed for development
Group: Development/Languages/Python
Requires: python-torch = %{version}
%description devel
Although the Python interface is more polished and the primary focus of
development, PyTorch also has a C++ frontend. This package contains the header
to access the C/C++ interface.
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%package converters
Summary: Converters for onnx and caffe2
Group: Development/Languages/Python
BuildArch: noarch
Requires: python3-click
Requires: python3-onnx
Requires: python3-pip
Requires: python3-pname
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%description converters
Converter from caffe2 to onnx and from caffe2 to onnx formated files.
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%package examples
Summary: Examples which can be used for testing
Group: Development/Languages/Python
BuildArch: noarch
Recommends: python3-lmdb
Recommends: python3-networkx
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%description examples
This example files can be used to start an own pytorch/caffe2 project.
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%package -n libtorch%{?pkg_suffix}
Summary: Library which used by %{name}
Group: Development/Libraries/Python
%if %{with openmpi4}
Conflicts: libtorch
%endif
%if %{with vulkan}
Conflicts: libtorch
%endif
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%description -n libtorch%{?pkg_suffix}
Library which is used by %{name}
%prep
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%define make_depend_src() test -e $(basename %1| sed 's/-.*//') && rmdir %{?2}%{!?2:$(basename %1| sed 's/-.*//')}; tar xzf %1; mv $(basename %{1} | sed 's/\.tar\.gz//' )* %{?2}%{!?2:$(basename %1| sed 's/-.*//')}
%define make_depend_src_uppercase() rmdir -p $(basename %1| sed 's/-.*//'| tr '[:upper:]' '[:lower:]'); tar xzf %1; mv $(basename %1 | cut -f 1 -d '.' ) $(basename %1| sed 's/-.*//'| tr '[:upper:]' '[:lower:]')
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%autosetup -p1 -n %{srcname}-%{version}
cp %{S:1} releases.html
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%if %{with vulkan}
sed -i '/-Werror=return-type/d' CMakeLists.txt
%endif
cd third_party
rmdir python-peachpy/
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
rmdir eigen/
%make_depend_src %{SOURCE10}
%make_depend_src %{SOURCE12}
%make_depend_src %{SOURCE13}
%make_depend_src %{SOURCE14}
%make_depend_src %{SOURCE15}
%make_depend_src %{SOURCE16}
%make_depend_src %{SOURCE17}
%make_depend_src %{SOURCE18}
%make_depend_src %{SOURCE19}
%make_depend_src %{SOURCE20} gemmlowp/gemmlowp
%make_depend_src %{SOURCE21}
%make_depend_src %{SOURCE22}
%make_depend_src %{SOURCE23}
%make_depend_src %{SOURCE25}
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%make_depend_src %{SOURCE26}
%make_depend_src %{SOURCE27}
%make_depend_src %{SOURCE28}
%make_depend_src %{SOURCE29}
# getting the vendoring of the vendored source working, this is
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
# insanity at the next level. My onlu exuse is that libnop is header only
rmdir tensorpipe/third_party/libnop
%make_depend_src %{SOURCE30} tensorpipe/third_party/libnop
%build
%define buildvars \
export USE_NNPACK=OFF \
%if %{with cuda} \
export USE_CUDA=ON \
export USE_CUDNN=ON \
export USE_SYSTEM_NCCL=ON \
export PATH="/usr/local/cuda-10.1/bin:$PATH" \
export CPLUS_INCLUDE_PATH="/usr/local/cuda-10.1/include" \
export C_INCLUDE_PATH="/usr/local/cuda-10.1/include" \
export LD_LIBRARY_PATH="/usr/local/cuda-10.1/lib" \
export NCCL_INCLUDE_DIR="/usr/include/" \
%if 0%{?suse_version} > 1500 \
export CC=gcc-7 \
export CXX=g++-7 \
%endif \
%else \
%if 0%{?suse_version} <= 1500 \
%ifarch aarch64 \
export CC=gcc-8 \
export CXX=g++-8 \
%endif \
%endif \
export USE_CUDA=OFF \
export USE_CUDNN=OFF \
%endif \
export USE_KINETO=OFF \
export USE_LEVELDB=ON \
export USE_LMDB=ON \
export USE_FBGEMM=OFF \
export USE_SYSTEM_BENCHMARK=ON \
export USE_SYSTEM_EIGEN_INSTALL=ON \
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
export USE_OPENCV=ON \
export USE_TBB=OFF \
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
export USE_MKLDNN=OFF \
export USE_KINETO=OFF \
export USE_DISTRIBUTED=ON \
export TP_BUILD_LIBUV=OFF \
export USE_NCCL=OFF \
%if %{with mpi} \
export USE_MPI=ON \
export MPIEXEC_EXECUTABLE="%{pkg_bindir}/mpiexec" \
%else \
export USE_GLOO=ON \
%endif \
export BLAS=OpenBLAS \
export BUILD_CUSTOM_PROTOBUF=OFF \
export BUILD_TEST=OFF \
export MAX_JOBS=%{?jobs} \
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%if %{with vulkan} \
export USE_VULKAN=ON \
export CXXFLAGS="-I /usr/ -Wno-error=return-type" \
%endif \
%buildvars
%python_build
%install
%buildvars
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%python_install -q
%python_expand %fdupes %{buildroot}%{$python_sitearch}
install -m 755 -D caffe2/python/examples/* -t %{buildroot}%{_docdir}/%{name}/
install -m 644 -D %{buildroot}%{python_sitearch}/torch/lib/* %{buildroot}/%{_libdir}
#rm -r %{buildroot}%{python_sitearch}/torch/lib
#cd %{buildroot}/%{_libdir}
#rm libtorch.so
#ln -s libtorch.so.1 libtorch.so
#cd -
#for file in $(find %{buildroot}%{python_sitearch} -type f -name \*.py -perm 644 -size +1b); do
#%{__grep} '/usr/bin/env ' $file && sed -i 's@/usr/bin/env python@/usr/bin/python@' $file && chmod 755 $file
#done
#
#%check
#export LD_LIBRARY_PATH=%{buildroot}/%{_libdir}
#%%python_expand PYTHONPATH=%{buildroot}%{$python_sitearch} $python test/run_test.py
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%python_clone -a %{buildroot}%{_bindir}/torchrun
%python_clone -a %{buildroot}%{_bindir}/convert-caffe2-to-onnx
%python_clone -a %{buildroot}%{_bindir}/convert-onnx-to-caffe2
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%post -n libtorch%{?pkg_suffix} -p /sbin/ldconfig
%postun -n libtorch%{?pkg_suffix} -p /sbin/ldconfig
%files %{python_files}
%defattr(-,root,root)
%doc README.md NOTICE releases.html
%license LICENSE
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%{python_sitearch}/torch*
%{python_sitearch}/torchgen/
%{python_sitearch}/functorch/
%exclude %{python_sitearch}/torch/share
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%exclude %{python_sitearch}/torch/include
%exclude %{python_sitearch}/torch/_inductor/codegen
%exclude %{python_sitearch}/torch/utils/benchmark/utils/
%exclude %{python_sitearch}/torchgen/packaged/ATen/templates
%exclude %{python_sitearch}/torchgen/packaged/autograd/templates
%python_alternative %{_bindir}/torchrun
%files %{python_files devel}
%{python_sitearch}/torch/share
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%{python_sitearch}/torch/include
%{python_sitearch}/torch/_inductor/codegen
%{python_sitearch}/torch/utils/benchmark/utils/
%{python_sitearch}/torchgen/packaged/ATen/templates
%{python_sitearch}/torchgen/packaged/autograd/templates
%files %{python_files converters}
%python_alternative %{_bindir}/convert-caffe2-to-onnx
%python_alternative %{_bindir}/convert-onnx-to-caffe2
%files %{python_files examples}
%{_docdir}/%{name}
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%exclude %{_docdir}/%{name}/NOTICE
%exclude %{_docdir}/%{name}/README.md
%exclude %{_docdir}/%{name}/releases.html
- update to 2.3.1 with following summarized highlights: * from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
2024-07-19 14:15:19 +02:00
%files -n libtorch%{?pkg_suffix}
%{_libdir}/*.so*
%changelog