* Major CUDA improvements including Blackwell native build fixes,
experimental MXFP4 support, optimized CUMSUM paths, new ops
(FILL, DIAG, TRI, CUMSUM), FA/MMA overflow fixes, better GPU
utilization defaults, and multiple correctness and stability
fixes.
* Significant Vulkan backend work with new operators, faster
FA/MMV/MMVQ paths, async tensor and event support, rope and MoE
improvements, reduced data races, better logging, and numerous
performance optimizations.
* CPU and GGML backend enhancements covering ARM64, RVV, RISC-V,
ZenDNN, and Hexagon, with new and optimized kernels, improved
repack logic, allocator fixes, graph reuse, and better error
handling.
* Expanded support and fixes across Metal, HIP, SYCL, OpenCL,
CANN, WebGPU, and Hexagon backends.
* Added and improved support for many models and architectures
including Qwen3-Next, Nemotron v2/v3, Llama 4 scaling, GLM4V,
MiMo-V2-Flash, Granite Embeddings, KORMo, Rnj-1, LFM2 text/
audio/MoE, Mistral and Mistral-Large variants, DeepSeek
variants, ASR conformer models, and multimodal pipelines.
* Fixed multiple model issues such as missing tensors,
division-by-zero errors, rope scaling regressions, MoE edge
cases, bidirectional architectures, and multimodal loading
errors.
* Server and router improvements including safer multithreading,
race-condition fixes, multi-model routing, preset cascading,
startup model loading, auto-sleep on idle, improved speculative
decoding, better RPC validation, and friendlier error handling.
* CLI and argument-parsing improvements with new flags, negated
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=120
254 lines
7.2 KiB
RPMSpec
254 lines
7.2 KiB
RPMSpec
#
|
|
# spec file for package llamacpp
|
|
#
|
|
# Copyright (c) 2025 SUSE LLC and contributors
|
|
# Copyright (c) 2025 Eyad Issa <eyadlorenzo@gmail.com>
|
|
#
|
|
# All modifications and additions to the file contributed by third parties
|
|
# remain the property of their copyright owners, unless otherwise agreed
|
|
# upon. The license for this file, and modifications and additions to the
|
|
# file, is the same license as for the pristine package itself (unless the
|
|
# license for the pristine package is not an Open Source License, in which
|
|
# case the license is the MIT License). An "Open Source License" is a
|
|
# license that conforms to the Open Source Definition (Version 1.9)
|
|
# published by the Open Source Initiative.
|
|
|
|
# Please submit bugfixes or comments via https://bugs.opensuse.org/
|
|
#
|
|
|
|
|
|
%global backend_dir %{_libdir}/ggml
|
|
|
|
%global llama_sover 0.0.%{version}
|
|
%global llama_sover_suffix 0
|
|
|
|
%global mtmd_sover 0.0.%{version}
|
|
%global mtmd_sover_suffix 0
|
|
|
|
%global ggml_sover 0.9.4
|
|
%global ggml_sover_suffix 0
|
|
|
|
Name: llamacpp
|
|
Version: 7540
|
|
Release: 0
|
|
Summary: Inference of Meta's LLaMA model (and others) in pure C/C++
|
|
License: MIT
|
|
URL: https://github.com/ggml-org/llama.cpp
|
|
Source: %{URL}/archive/b%{version}/%{name}-%{version}.tar.gz
|
|
BuildRequires: cmake >= 3.14
|
|
BuildRequires: gcc-c++
|
|
BuildRequires: git
|
|
BuildRequires: ninja
|
|
BuildRequires: pkgconfig
|
|
BuildRequires: shaderc
|
|
BuildRequires: pkgconfig(OpenCL)
|
|
BuildRequires: pkgconfig(libcurl)
|
|
BuildRequires: pkgconfig(vulkan)
|
|
# 32bit seems not to be supported anymore
|
|
ExcludeArch: %{ix86} %{arm}
|
|
|
|
%description
|
|
The llama.cpp library provides a C++ interface for running inference
|
|
with large language models (LLMs). Initially designed to support Meta's
|
|
LLaMA model, it has since been extended to work with a variety of other models.
|
|
|
|
This package includes the llama-cli tool to run inference using the library.
|
|
|
|
%package devel
|
|
Summary: Development files for llama.cpp
|
|
Obsoletes: libllama < 7266
|
|
Obsoletes: libmtmd < 7266
|
|
|
|
%description devel
|
|
Development files for llama.cpp
|
|
|
|
%package -n libllama%{llama_sover_suffix}
|
|
Summary: A C++ interface for running inference with large language models
|
|
|
|
%description -n libllama%{llama_sover_suffix}
|
|
The llama.cpp library provides a C++ interface for running inference
|
|
with large language models (LLMs). Initially designed to support Meta's
|
|
LLaMA model, it has since been extended to work with a variety of other models.
|
|
|
|
This package includes the shared libraries necessary for running applications
|
|
that depend on libllama.so.
|
|
|
|
%package -n libggml%{ggml_sover_suffix}
|
|
Summary: A tensor library for C++
|
|
Requires: libggml-cpu
|
|
Recommends: libggml-opencl
|
|
Recommends: libggml-vulkan
|
|
|
|
%description -n libggml%{ggml_sover_suffix}
|
|
A tensor library for C++. It was created originally to support llama.cpp
|
|
and WhisperCpp projects.
|
|
|
|
%package -n libggml-base%{ggml_sover_suffix}
|
|
Summary: A tensor library for C++ (base)
|
|
|
|
%description -n libggml-base%{ggml_sover_suffix}
|
|
A tensor library for C++. It was created originally to support llama.cpp
|
|
and WhisperCpp projects.
|
|
|
|
This package includes the base shared library for ggml.
|
|
|
|
%package -n libggml-cpu
|
|
Summary: A tensor library for C++ (CPU backend)
|
|
|
|
%description -n libggml-cpu
|
|
A tensor library for C++. It was created originally to support llama.cpp
|
|
and WhisperCpp projects.
|
|
|
|
This package includes the CPU backend for ggml.
|
|
|
|
%package -n libggml-vulkan
|
|
Summary: A tensor library for C++ (Vulkan backend)
|
|
|
|
%description -n libggml-vulkan
|
|
A tensor library for C++. It was created originally to support llama.cpp
|
|
and WhisperCpp projects.
|
|
|
|
This package includes the Vulkan backend for ggml.
|
|
|
|
%package -n libggml-opencl
|
|
Summary: A tensor library for C++ (OpenCL backend)
|
|
|
|
%description -n libggml-opencl
|
|
A tensor library for C++. It was created originally to support llama.cpp
|
|
and WhisperCpp projects.
|
|
|
|
This package includes the OpenCL backend for ggml.
|
|
|
|
%package -n ggml-devel
|
|
Summary: Development files for ggml
|
|
Obsoletes: libggml < 7266
|
|
Obsoletes: libggml-base < 7266
|
|
|
|
%description -n ggml-devel
|
|
A tensor library for C++. It was created originally to support llama.cpp
|
|
and WhisperCpp projects.
|
|
|
|
This package includes the development files necessary for building applications
|
|
that depend on ggml.
|
|
|
|
%package -n libmtmd%{mtmd_sover_suffix}
|
|
Summary: Library to run multimodals inference models
|
|
|
|
%description -n libmtmd%{mtmd_sover_suffix}
|
|
As outlined in the history, libmtmd is the modern library designed to
|
|
replace the original llava.cpp implementation for handling multimodal inputs.
|
|
|
|
Built upon clip.cpp (similar to llava.cpp), libmtmd offers several advantages:
|
|
- Unified Interface: Aims to consolidate interaction for various multimodal models.
|
|
- Improved UX/DX: Features a more intuitive API, inspired by the Processor class
|
|
in the Hugging Face transformers library.
|
|
- Flexibility: Designed to support multiple input types (text, audio, images) while
|
|
respecting the wide variety of chat templates used by different models.
|
|
|
|
%package -n libllava
|
|
Summary: Library to run multimodals inference models
|
|
|
|
%description -n libllava
|
|
Library to handle multimodal inputs for llama.cpp.
|
|
|
|
%ldconfig_scriptlets -n libllama%{llama_sover_suffix}
|
|
%ldconfig_scriptlets -n libggml%{ggml_sover_suffix}
|
|
%ldconfig_scriptlets -n libggml-base%{ggml_sover_suffix}
|
|
%ldconfig_scriptlets -n libmtmd%{mtmd_sover_suffix}
|
|
|
|
%prep
|
|
%autosetup -p1 -n llama.cpp-b%{version}
|
|
|
|
%build
|
|
|
|
%define _lto_cflags %{nil}
|
|
%define __builder ninja
|
|
|
|
mkdir -p %{_libdir}
|
|
|
|
%cmake \
|
|
-DCMAKE_SKIP_RPATH=ON \
|
|
-DLLAMA_BUILD_TESTS=OFF \
|
|
-DLLAMA_BUILD_EXAMPLES=OFF \
|
|
-DLLAMA_BUILD_TOOLS=ON \
|
|
-DLLAMA_CURL=ON \
|
|
-DGGML_NATIVE=OFF \
|
|
-DGGML_CPU=ON \
|
|
-DGGML_VULKAN=ON \
|
|
-DGGML_OPENCL=ON \
|
|
-DGGML_BACKEND_DL=ON \
|
|
-DGGML_BACKEND_DIR="%{backend_dir}" \
|
|
-DGGML_OPENCL_USE_ADRENO_KERNELS=OFF \
|
|
-DLLAMA_BUILD_NUMBER=%{version} \
|
|
-DLLAMA_VERSION="0.0.%{version}" \
|
|
%{nil}
|
|
|
|
%cmake_build
|
|
|
|
%install
|
|
%cmake_install
|
|
|
|
# dev scripts
|
|
rm %{buildroot}%{_bindir}/convert_hf_to_gguf.py
|
|
|
|
%files
|
|
%doc README.md
|
|
%license LICENSE
|
|
%{_bindir}/llama-*
|
|
|
|
%files devel
|
|
%license LICENSE
|
|
%{_includedir}/llama*
|
|
%{_includedir}/mtmd*
|
|
%{_libdir}/cmake/llama
|
|
%{_libdir}/pkgconfig/llama.pc
|
|
# libllama symlinks
|
|
%{_libdir}/libllama.so
|
|
%{_libdir}/libllama.so.0
|
|
# libmtmd symlinks
|
|
%{_libdir}/libmtmd.so
|
|
%{_libdir}/libmtmd.so.0
|
|
|
|
%files -n libllama%{llama_sover_suffix}
|
|
%license LICENSE
|
|
%{_libdir}/libllama.so.%{llama_sover}
|
|
|
|
%files -n libggml%{ggml_sover_suffix}
|
|
%license LICENSE
|
|
%{_libdir}/libggml.so.%{ggml_sover}
|
|
|
|
%files -n libggml-base%{ggml_sover_suffix}
|
|
%license LICENSE
|
|
%{_libdir}/libggml-base.so.%{ggml_sover}
|
|
|
|
%files -n libggml-cpu
|
|
%license LICENSE
|
|
%dir %{backend_dir}
|
|
%{backend_dir}/libggml-cpu.so
|
|
|
|
%files -n libggml-vulkan
|
|
%license LICENSE
|
|
%dir %{backend_dir}
|
|
%{backend_dir}/libggml-vulkan.so
|
|
|
|
%files -n libggml-opencl
|
|
%license LICENSE
|
|
%dir %{backend_dir}
|
|
%{backend_dir}/libggml-opencl.so
|
|
|
|
%files -n ggml-devel
|
|
%license LICENSE
|
|
%{_includedir}/ggml*.h
|
|
%{_includedir}/gguf.h
|
|
%{_libdir}/cmake/ggml
|
|
%{_libdir}/libggml.so
|
|
%{_libdir}/libggml.so.0
|
|
%{_libdir}/libggml-base.so
|
|
%{_libdir}/libggml-base.so.0
|
|
|
|
%files -n libmtmd%{mtmd_sover_suffix}
|
|
%license LICENSE
|
|
%{_libdir}/libmtmd.so.%{mtmd_sover}
|
|
|
|
%changelog
|