- Update to version 7540:
* Major CUDA improvements including Blackwell native build fixes,
experimental MXFP4 support, optimized CUMSUM paths, new ops
(FILL, DIAG, TRI, CUMSUM), FA/MMA overflow fixes, better GPU
utilization defaults, and multiple correctness and stability
fixes.
* Significant Vulkan backend work with new operators, faster
FA/MMV/MMVQ paths, async tensor and event support, rope and MoE
improvements, reduced data races, better logging, and numerous
performance optimizations.
* CPU and GGML backend enhancements covering ARM64, RVV, RISC-V,
ZenDNN, and Hexagon, with new and optimized kernels, improved
repack logic, allocator fixes, graph reuse, and better error
handling.
* Expanded support and fixes across Metal, HIP, SYCL, OpenCL,
CANN, WebGPU, and Hexagon backends.
* Added and improved support for many models and architectures
including Qwen3-Next, Nemotron v2/v3, Llama 4 scaling, GLM4V,
MiMo-V2-Flash, Granite Embeddings, KORMo, Rnj-1, LFM2 text/
audio/MoE, Mistral and Mistral-Large variants, DeepSeek
variants, ASR conformer models, and multimodal pipelines.
* Fixed multiple model issues such as missing tensors,
division-by-zero errors, rope scaling regressions, MoE edge
cases, bidirectional architectures, and multimodal loading
errors.
* Server and router improvements including safer multithreading,
race-condition fixes, multi-model routing, preset cascading,
startup model loading, auto-sleep on idle, improved speculative
decoding, better RPC validation, and friendlier error handling.
* CLI and argument-parsing improvements with new flags, negated
OBS-URL: https://build.opensuse.org/request/show/1324424
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/llamacpp?expand=0&rev=24
* Major CUDA improvements including Blackwell native build fixes,
experimental MXFP4 support, optimized CUMSUM paths, new ops
(FILL, DIAG, TRI, CUMSUM), FA/MMA overflow fixes, better GPU
utilization defaults, and multiple correctness and stability
fixes.
* Significant Vulkan backend work with new operators, faster
FA/MMV/MMVQ paths, async tensor and event support, rope and MoE
improvements, reduced data races, better logging, and numerous
performance optimizations.
* CPU and GGML backend enhancements covering ARM64, RVV, RISC-V,
ZenDNN, and Hexagon, with new and optimized kernels, improved
repack logic, allocator fixes, graph reuse, and better error
handling.
* Expanded support and fixes across Metal, HIP, SYCL, OpenCL,
CANN, WebGPU, and Hexagon backends.
* Added and improved support for many models and architectures
including Qwen3-Next, Nemotron v2/v3, Llama 4 scaling, GLM4V,
MiMo-V2-Flash, Granite Embeddings, KORMo, Rnj-1, LFM2 text/
audio/MoE, Mistral and Mistral-Large variants, DeepSeek
variants, ASR conformer models, and multimodal pipelines.
* Fixed multiple model issues such as missing tensors,
division-by-zero errors, rope scaling regressions, MoE edge
cases, bidirectional architectures, and multimodal loading
errors.
* Server and router improvements including safer multithreading,
race-condition fixes, multi-model routing, preset cascading,
startup model loading, auto-sleep on idle, improved speculative
decoding, better RPC validation, and friendlier error handling.
* CLI and argument-parsing improvements with new flags, negated
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=120
- Update to version 6937:
* New model: Janus Pro
* New model: Minimax M2
* New model: Granite Hybrid nano types
* New model: support for qwen3vl series
* New model: support for CogVLM model
* New model: LightOnOCR-1B model
* New model: BailingMoeV2 support
* New model: Granite Hybrid types
* New model: Support home-cooked Mistral Small Omni
* New model: Support LiquidAI LFM2-MoE hybrid model
* New model: Granite docling + Idefics3 preprocessing (SmolVLM)
* New model: EmbeddingGemma Adding Support for
SentenceTransformers Dense Modules
* Server improvements, OpenAI API compatibility, optimizations,
and bug fixes
* Vulkan backend improvements, optimizations, and bug fixes
* OpenCL backend fixes
* CPU backend optimizations
* Multimodal (mtmd) improvements
* WebUI enhancements
* Architecture-specific improvements
* llama core improvements
* Memory management improvements
* Conversion and quantization tools enhancements
* Grammar and sampling improvements
* Chat and prompts enhancements
* General fixes and improvements
* RPC improvements and bug fixes
* Full commit log:
OBS-URL: https://build.opensuse.org/request/show/1315691
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/llamacpp?expand=0&rev=22
* New model: Janus Pro
* New model: Minimax M2
* New model: Granite Hybrid nano types
* New model: support for qwen3vl series
* New model: support for CogVLM model
* New model: LightOnOCR-1B model
* New model: BailingMoeV2 support
* New model: Granite Hybrid types
* New model: Support home-cooked Mistral Small Omni
* New model: Support LiquidAI LFM2-MoE hybrid model
* New model: Granite docling + Idefics3 preprocessing (SmolVLM)
* New model: EmbeddingGemma Adding Support for
SentenceTransformers Dense Modules
* Server improvements, OpenAI API compatibility, optimizations,
and bug fixes
* Vulkan backend improvements, optimizations, and bug fixes
* OpenCL backend fixes
* CPU backend optimizations
* Multimodal (mtmd) improvements
* WebUI enhancements
* Architecture-specific improvements
* llama core improvements
* Memory management improvements
* Conversion and quantization tools enhancements
* Grammar and sampling improvements
* Chat and prompts enhancements
* General fixes and improvements
* RPC improvements and bug fixes
* Full commit log:
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=115
* Model and conversion: support for Seed-OSS, GPT-OSS
response_format, interns1-mini, Ernie 4.5, gpt-oss type
strings, improved Mistral templates, new model conversion
tool/example with torch-cpu.
* Vulkan backend: multiple optimizations (rms_norm, mul_mat_id,
synchronization, conv2d, subgroup ops), new ops (exp,
conv_2d_dw f16, ggml_mean).
* GGML/CPU: added conv3d op, WebGPU quantization support,
Q5_0/Q5_1 on s390x, mxfp4 intrinsics on ppc64le.
* Server and chat: multimodal completion and embeddings
JSON support, improved OpenAI API compatibility and usage
statistics, disabled context shift by default, fixed ordering
of tasks, webui issues, debug assertions, clarified
reasoning_format.
* KV cache: unified handling improvements, support for reuse,
removal of deprecated APIs, simplifications.
* Miscellaneous: fixed logging of non-ASCII characters, removed
deprecated or unused code and build artifacts.
* Full commit log:
https://github.com/ggml-org/llama.cpp/compare/b6188...b6269
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=106
- Add GGML_NATIVE=OFF build flag
- Update to version 5889:
* Remove Kompute support
* Prevent integer overflow in gguf tensor size calculation
(bsc#1246377) (CVE-2025-53630) (GHSA-vgg9-87g3-85w8)
* Improved build-time messaging for ggml_set_rows.
* Enhanced test coverage for LFM2 and added LFM2 to
documentation.
* Synchronized ggml updates and improved Vulkan backend
(bilinear interpolation, ggml_roll, SET_ROWS, optimizations).
* Fixed pooled embedding output in server and improved prompt
processing.
* Added support for LiquidAI LFM2 hybrid family and Falcon-H1
models.
* Improved HIP, OpenCL, and SYCL backend compatibility
and features.
* Added new vocabularies and model support
(midm-2.0, skt/A.X-4.0, SmolLM3, hunyuan moe, Granite Four).
* Various bug fixes, optimizations, and documentation improvements
across backends and models.
* Full changelog:
https://github.com/ggml-org/llama.cpp/compare/b5812...b5889
OBS-URL: https://build.opensuse.org/request/show/1292534
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/llamacpp?expand=0&rev=13
- Update to version 5889:
* Prevent integer overflow in gguf tensor size calculation
(bsc#1246377) (CVE-2025-53630) (GHSA-vgg9-87g3-85w8)
* Improved build-time messaging for ggml_set_rows.
* Enhanced test coverage for LFM2 and added LFM2 to
documentation.
* Synchronized ggml updates and improved Vulkan backend
(bilinear interpolation, ggml_roll, SET_ROWS, optimizations).
* Fixed pooled embedding output in server and improved prompt
processing.
* Added support for LiquidAI LFM2 hybrid family and Falcon-H1
models.
* Improved HIP, OpenCL, and SYCL backend compatibility
and features.
* Added new vocabularies and model support
(midm-2.0, skt/A.X-4.0, SmolLM3, hunyuan moe, Granite Four).
* Various bug fixes, optimizations, and documentation improvements
across backends and models.
* Full changelog:
https://github.com/ggml-org/llama.cpp/compare/b5812...b5889
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=91
- Update to version 5812:
* Mamba-2 Support: Initial integration of Mamba-2 architecture.
* Added support for ERNIE 4.5 0.3B, NeoBERT, Arcee AI's AFM,
Gemma3n text-only, and dots.llm1 architectures
* Vulkan Improvements: Support for softmax/FlashAttention
batch/broadcast, fused RMS_NORM+MUL, and better memory handling
* GGML Backend: Added REGLU/GEGLU/SWIGLU ops, ggml_set_rows, and
improved SYCL/OpenCL/Metal support
* Server Improvements: Jinja template kwargs, draft model cache
params, and Unix socket support
* Quantization: User-defined layer pruning and KV override fixes
* Optimizations: Batched Vulkan mul_mat_id splitting
and ARM hsum reduction
* Added GGML version function
* Full changelog:
https://github.com/ggml-org/llama.cpp/compare/b5699...b5812
OBS-URL: https://build.opensuse.org/request/show/1290235
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/llamacpp?expand=0&rev=12
* Mamba-2 Support: Initial integration of Mamba-2 architecture.
* Added support for ERNIE 4.5 0.3B, NeoBERT, Arcee AI's AFM,
Gemma3n text-only, and dots.llm1 architectures
* Vulkan Improvements: Support for softmax/FlashAttention
batch/broadcast, fused RMS_NORM+MUL, and better memory handling
* GGML Backend: Added REGLU/GEGLU/SWIGLU ops, ggml_set_rows, and
improved SYCL/OpenCL/Metal support
* Server Improvements: Jinja template kwargs, draft model cache
params, and Unix socket support
* Quantization: User-defined layer pruning and KV override fixes
* Optimizations: Batched Vulkan mul_mat_id splitting
and ARM hsum reduction
* Added GGML version function
* Full changelog:
https://github.com/ggml-org/llama.cpp/compare/b5699...b5812
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=89
* vocab : prevent integer overflow during load
(bsc#1244714) (CVE-2025-49847)
* batch : add LLAMA_BATCH_DEBUG environment variable
* batch : auto-gen positions + verify multi-sequence input
* common : suggest --jinja when autodetection fails
* ggml-cpu: fix uncaught underscore terminators
* kv-cache : fix use-after-move of defrag info
* llama : rework embeddings logic
* llama-chat : do not throw when tool parsing fails
* llama-chat : fix multiple system message for gemma, orion
* model : Add support for Arcee AI's upcoming AFM model
* model : add dots.llm1 architecture support
* model : add NeoBERT
* server : When listening on a unix domain socket don't print
http:// and port
* quantize : change int to unsigned int for KV overrides
* Full changelog:
https://github.com/ggml-org/llama.cpp/compare/b5657...b5699
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=87
- Update to 5516:
* llama : remove llama_kv_cache_view API
* model : disable SWA for Phi models
* kv-cache : simplify the interface
* server : Add the endpoints /api/tags and /api/chat
* ggml : add ggml_gelu_erf()
* hparams : support models for which all layers use SWA
* opencl: fix couple crashes
* opencl: Add support for multiple devices
* mtmd : add ultravox audio input
* server : support audio input
* server: streaming of tool calls and thoughts when jinja is on
* mtmd : support Qwen 2.5 Omni
* ggml : riscv: add xtheadvector support
* opencl : various optimizations
* Full changelog:
https://github.com/ggml-org/llama.cpp/compare/b5426...b5516
OBS-URL: https://build.opensuse.org/request/show/1280718
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/llamacpp?expand=0&rev=9
* llama : remove llama_kv_cache_view API
* model : disable SWA for Phi models
* kv-cache : simplify the interface
* server : Add the endpoints /api/tags and /api/chat
* ggml : add ggml_gelu_erf()
* hparams : support models for which all layers use SWA
* opencl: fix couple crashes
* opencl: Add support for multiple devices
* mtmd : add ultravox audio input
* server : support audio input
* server: streaming of tool calls and thoughts when jinja is on
* mtmd : support Qwen 2.5 Omni
* ggml : riscv: add xtheadvector support
* opencl : various optimizations
* Full changelog:
https://github.com/ggml-org/llama.cpp/compare/b5426...b5516
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=82