Christian Goll
9c8ce17a59
* from 2.0.x: - torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition - Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operato * from 2.1.x: - automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API. - In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization. * from 2.2.x: - 2x performance improvements to scaled_dot_product_attention via FlashAttention-v2 integration, as well as AOTInductor, a new ahead-of-time compilation and deployment tool built for non-python server-side deployments. * from 2.3.x: - support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance complications or graph breaks. As well, Tensor Parallelism improves the experience for training Large Language Models using native PyTorch functions, which has been validated on training OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/python-torch?expand=0&rev=32
40 lines
1.6 KiB
Diff
40 lines
1.6 KiB
Diff
From 65b21aea6c04d8e8760aa79e06de31a35e05c954 Mon Sep 17 00:00:00 2001
|
|
From: Christian Goll <cgoll@suse.com>
|
|
Date: Wed, 10 Jul 2024 16:08:39 +0200
|
|
Subject: [PATCH] fix setup
|
|
|
|
Signed-off-by: Christian Goll <cgoll@suse.com>
|
|
---
|
|
cmake/Dependencies.cmake | 1 -
|
|
setup.py | 2 +-
|
|
2 files changed, 1 insertion(+), 2 deletions(-)
|
|
|
|
diff --git a/cmake/Dependencies.cmake b/cmake/Dependencies.cmake
|
|
index a96075245ae..f0c29dacfb8 100644
|
|
--- a/cmake/Dependencies.cmake
|
|
+++ b/cmake/Dependencies.cmake
|
|
@@ -1403,7 +1403,6 @@ if(USE_DISTRIBUTED AND USE_TENSORPIPE)
|
|
set(TP_USE_CUDA ON CACHE BOOL "" FORCE)
|
|
set(TP_ENABLE_CUDA_IPC ON CACHE BOOL "" FORCE)
|
|
endif()
|
|
- set(TP_BUILD_LIBUV ON CACHE BOOL "" FORCE)
|
|
add_compile_options(-DTORCH_USE_LIBUV)
|
|
include_directories(BEFORE SYSTEM ${CMAKE_CURRENT_LIST_DIR}/../third_party/tensorpipe/third_party/libuv/include)
|
|
set(TP_STATIC_OR_SHARED STATIC CACHE STRING "" FORCE)
|
|
diff --git a/setup.py b/setup.py
|
|
index 8d1aaff5668..c8b9a4fb507 100644
|
|
--- a/setup.py
|
|
+++ b/setup.py
|
|
@@ -591,7 +591,7 @@ class build_ext(setuptools.command.build_ext.build_ext):
|
|
def run(self):
|
|
# Report build options. This is run after the build completes so # `CMakeCache.txt` exists and we can get an
|
|
# accurate report on what is used and what is not.
|
|
- cmake_cache_vars = defaultdict(lambda: False, cmake.get_cmake_cache_variables())
|
|
+ cmake_cache_vars = defaultdict(lambda: "False", cmake.get_cmake_cache_variables())
|
|
if cmake_cache_vars["USE_NUMPY"]:
|
|
report("-- Building with NumPy bindings")
|
|
else:
|
|
--
|
|
2.43.0
|
|
|