llamacpp

pool/llamacpp

SHA256

Fork 0

6eacdbbad9 Accepting request 1324424 from science:machinelearning factory Dominique Leuenberger 2025-12-26 13:37:57 +00:00
846ae27b53 - Update to version 7540: * Major CUDA improvements including Blackwell native build fixes, experimental MXFP4 support, optimized CUMSUM paths, new ops (FILL, DIAG, TRI, CUMSUM), FA/MMA overflow fixes, better GPU utilization defaults, and multiple correctness and stability fixes. * Significant Vulkan backend work with new operators, faster FA/MMV/MMVQ paths, async tensor and event support, rope and MoE improvements, reduced data races, better logging, and numerous performance optimizations. * CPU and GGML backend enhancements covering ARM64, RVV, RISC-V, ZenDNN, and Hexagon, with new and optimized kernels, improved repack logic, allocator fixes, graph reuse, and better error handling. * Expanded support and fixes across Metal, HIP, SYCL, OpenCL, CANN, WebGPU, and Hexagon backends. * Added and improved support for many models and architectures including Qwen3-Next, Nemotron v2/v3, Llama 4 scaling, GLM4V, MiMo-V2-Flash, Granite Embeddings, KORMo, Rnj-1, LFM2 text/ audio/MoE, Mistral and Mistral-Large variants, DeepSeek variants, ASR conformer models, and multimodal pipelines. * Fixed multiple model issues such as missing tensors, division-by-zero errors, rope scaling regressions, MoE edge cases, bidirectional architectures, and multimodal loading errors. * Server and router improvements including safer multithreading, race-condition fixes, multi-model routing, preset cascading, startup model loading, auto-sleep on idle, improved speculative decoding, better RPC validation, and friendlier error handling. * CLI and argument-parsing improvements with new flags, negated Eyad Issa 2025-12-26 02:15:13 +00:00
763622b525 Accepting request 1321203 from science:machinelearning Ana Guerrero 2025-12-05 15:56:38 +00:00
0f17a147fa OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=118 Eyad Issa 2025-12-04 23:38:13 +00:00
f050e5debb - Switch to .so versioning, following upstream - Update to version 7266: * Added support for several new and updated models including Ministral3, Qwen3 Next, RND1 Diffusion LM, AfmoeForCausalLM, openPangu-Embedded, and improved detection for GigaChat3-10-A1.8B. * Server improvements: multi-model API, Anthropic Messages API, task generator API, HTTP interface split, jinja enabled by default. * Chat and parsing improvements: generalized XML-style tool-call parsing, composable PEG parser combinators. * WebUI enhancements: restored HTML in Markdown tables, rehype plugin improvements, attachment-handling UX improvements, Harmony tool-call visualization, new keyboard shortcuts, clickability fixes, autoscroll toggle, and new “Continue” action. * CUDA backend improvements: FP16 restrictions, memory bandwidth improvements, stream-based concurrency, MMQ and fusion fixes, rope fusion corrections, improved handling of nb00/nb02, and various stability fixes. * Vulkan backend improvements: new operators, improved FA and MMVQ support, async graph_compute, conv2d spec constants, i32 copy support. * GGML and CPU backend updates: expanded RVV, ARM64, RISC-V feature detection; new CPU intrinsic implementations; improved GEMM/GEMV repack kernels; ops additions. * OpenCL, SYCL, HIP, MUSA, and Hexagon improvements: expanded operator support, new kernels, fallback logic for older SoCs, buffer handling fixes. * MTMD (multimodal) improvements: warmup toggles, CLI log-noise reduction, image embedding size fixes and audio model patch fixes. * General performance, stability, and correctness improvements across CPU, GPU, schedulers, memory management, kv-cache, async behavior, thread safety, and operator fusion. * Full commit log: https://github.com/ggml-org/llama.cpp/compare/b6937...b7266 Eyad Issa 2025-12-04 14:34:53 +00:00
a541da59b8 Accepting request 1315691 from science:machinelearning Ana Guerrero 2025-11-06 17:12:47 +00:00
b89927e8a7 - Update to version 6937: * New model: Janus Pro * New model: Minimax M2 * New model: Granite Hybrid nano types * New model: support for qwen3vl series * New model: support for CogVLM model * New model: LightOnOCR-1B model * New model: BailingMoeV2 support * New model: Granite Hybrid types * New model: Support home-cooked Mistral Small Omni * New model: Support LiquidAI LFM2-MoE hybrid model * New model: Granite docling + Idefics3 preprocessing (SmolVLM) * New model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules * Server improvements, OpenAI API compatibility, optimizations, and bug fixes * Vulkan backend improvements, optimizations, and bug fixes * OpenCL backend fixes * CPU backend optimizations * Multimodal (mtmd) improvements * WebUI enhancements * Architecture-specific improvements * llama core improvements * Memory management improvements * Conversion and quantization tools enhancements * Grammar and sampling improvements * Chat and prompts enhancements * General fixes and improvements * RPC improvements and bug fixes * Full commit log: Eyad Issa 2025-11-03 18:57:48 +00:00
12f7fce1c0 Accepting request 1309029 from science:machinelearning Dominique Leuenberger 2025-10-05 15:51:16 +00:00
a8cb1efc19 - Update to version 6690: * ggml: bump to v0.9.4; fix graph reallocation and multi-chunk dependencies * ggml webgpu: add soft_max op; optimize rms_norm; extend operator support * ggml-riscv: add Spacemit backend * vulkan: improve shader threading and incremental builds * vulkan: fix FA coopmat1 array indexing and quantized flash attention * vulkan: replace maxMemoryAllocationSize, improve header compatibility * vulkan: add bounds checks in flash attention; 64-bit im2col support * rpc: add multi-device support; validate src buffer copies * server: add context checkpointing for hybrid and recurrent models * chat: add Magistral thinking support; fix missing sibling messages * webui: fix payloads and routing; improve mobile and dialog behavior * model: implement Apertus; support GLM 4.6 * llama: fix shapes for BERT/MPT q/k norm; improve PLaMo2 loading * common: introduce http.h client; disable progress bar without tty * common: remove common_has_curl(); simplify etag tracking * opencl: support pad_ext and ne3 in get_rows * various minor fixes for scrolling, sampling, and chat block handling * Full commit log: Eyad Issa 2025-10-04 22:06:53 +00:00
c1a7fde844 Accepting request 1307499 from science:machinelearning Ana Guerrero 2025-09-29 14:32:13 +00:00
7eecb82de8 - Update to b6605: * Added docker protocol support and resumable downloads for llama-server * New models: LLaDA-7b-MoE, Grok-2, GroveMoE, OLMo3, LiquidAI LFM2-2.6B * Added conversion support for GraniteHybrid (non-hybrid attn) and Llama4ForCausalLM * llama: support for qwen3 reranker, T5 unequal encoder-decoder layers, seq limit bumped 64 → 256 * Bench improvements: list devices, multiple devices, n-cpu-moe * Vulkan: conv_transpose_2d, GET_ROWS, iGPU device selection, buffer optimizations, shader fixes, OOM handling * ggml: semantic versioning, backend/device extensions, optimizations, fixes for embedding, quantization, padding * ggml-cpu: SIMD support (MXFP4 for s390x), cpumask respect, ARM INT8 checks * Common: fixes for memory corruption, offline mode without curl, switch to cpp-httplib * Server: SSE/OpenAI error handling, usage stats opt-in, external test server, removed LLAMA_SERVER_SSL * WebUI: migrated to SvelteKit, hash-based routing, chunk handling fixes * Fixes across model-conversion, rpc, media, devops, embedding docs, typos * Full commit log: https://github.com/ggml-org/llama.cpp/compare/b6269...b6428 Eyad Issa 2025-09-27 17:34:31 +00:00
636fe3b65d Accepting request 1305191 from science:machinelearning Ana Guerrero 2025-09-16 16:20:03 +00:00
0ec4f68c01 - Update to version 6428 Eyad Issa 2025-09-09 12:29:13 +00:00
29895c65c7 Accepting request 1302234 from science:machinelearning Ana Guerrero 2025-09-02 15:58:24 +00:00
f6cca5429e Accepting request 1301212 from science:machinelearning Ana Guerrero 2025-08-25 18:38:58 +00:00
c5d8653d73 - Update to version 6269: * Model and conversion: support for Seed-OSS, GPT-OSS response_format, interns1-mini, Ernie 4.5, gpt-oss type strings, improved Mistral templates, new model conversion tool/example with torch-cpu. * Vulkan backend: multiple optimizations (rms_norm, mul_mat_id, synchronization, conv2d, subgroup ops), new ops (exp, conv_2d_dw f16, ggml_mean). * GGML/CPU: added conv3d op, WebGPU quantization support, Q5_0/Q5_1 on s390x, mxfp4 intrinsics on ppc64le. * Server and chat: multimodal completion and embeddings JSON support, improved OpenAI API compatibility and usage statistics, disabled context shift by default, fixed ordering of tasks, webui issues, debug assertions, clarified reasoning_format. * KV cache: unified handling improvements, support for reuse, removal of deprecated APIs, simplifications. * Miscellaneous: fixed logging of non-ASCII characters, removed deprecated or unused code and build artifacts. * Full commit log: https://github.com/ggml-org/llama.cpp/compare/b6188...b6269 Eyad Issa 2025-08-25 14:14:05 +00:00
3764c5b78a - Update to version 6188: * Vulkan backend improvements: larger workgroups, optimized argsort, fused adds, bounds checking, out-of-bounds and compile warning fixes, performance logging. * OpenCL backend: initial FA and mxfp4 support. * Model support: vision LiquidAI LFM2-VL family, 18-layer Gemma 3-270m model type. * Common: fixed double BOS, improved chat templates, added override-tensor and CPU MoE draft parameters. * GGML: initial IBM zDNN backend, rope_multi update, conv_1d_dw bug fix, block_iq4_nlx8 repack, improved Mistral integration. * Server: SWA checkpoints, -td/-tbd parameters, harmony thought message filtering. * Perplexity: improved error hints and constraint reporting. * GPT-OSS: harmony parsing implemented. - Add LLAMA_BUILD_NUMBER and LLAMA_VERSION to the build Eyad Issa 2025-08-17 22:18:58 +00:00
8ee8f8134c Accepting request 1299150 from science:machinelearning Dominique Leuenberger 2025-08-13 14:30:52 +00:00
755973372c - Update to version 6139: * opencl: allow mixed f16/f32 add (#15140) * mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750) * chat : hotfix gpt-oss jinja raising an exception (#15243) * server : allow specifying reasoning_format in HTTP request (#15238) * kv-cache : fix seq_rm with seq_id == -1 (#15226) * kv-cache : log (debug) all streams in find_slot (#15176) * convert : improve Mistral models integration (#14737) * kleidiai: fix unsigned overflow bug (#15150) Eyad Issa 2025-08-12 18:01:43 +00:00
0e11fa8fd1 Add LLAMA_BUILD_NUMBER and LLAMA_VERSION to the build Eyad Issa 2025-08-12 17:38:59 +00:00
02bb0a433c - Update to version 6121: * Support intern-s1 * opencl: add swiglu_oai and add_id * vulkan: support fattn sinks * vulkan: Add env var to disable host visible vidmem * ggml: Skip backend library linking code when GGML_BACKEND_DL=ON * ggml : fix fallback to CPU for ununsupported ops * Various bug fixes * Full changelog: https://github.com/ggml-org/llama.cpp/compare/b6100...b6121 Eyad Issa 2025-08-08 23:44:39 +00:00
6e32983ab7 Accepting request 1298007 from science:machinelearning Dominique Leuenberger 2025-08-07 14:48:47 +00:00
031ccc2321 - Enable loading backends dynamically Eyad Issa 2025-08-06 17:36:49 +00:00
65d2ea5770 - Drop 0001-dl-load-path.patch: use GGML_BACKEND_DIR instead - Update to version 6100: * llama : add gpt-oss (#15091) * llama : add --n-cpu-moe option (#15077) * llama : enable LLAMA_SET_ROWS=1 by default (#14959) * server : add openai-style logit_bias support (#14946) * server : implement universal assisted decoding (#12635) * mtmd : support MiniCPM-V 4.0 (#14983) * opencl: add f16 for add, sub, mul, div (#14984) * model : add hunyuan dense (#14878) * model : add text-only support for Kimi-VL * model: support GLM 4.5 family of models (#14939) * model : support Qwen3-Embedding (#15023) * graph : Optimize graph operations * vulkan: various bug fixes and optimizations * Various bug fixes Eyad Issa 2025-08-06 17:28:56 +00:00
2fd82620cc Accepting request 1296595 from science:machinelearning Dominique Leuenberger 2025-07-31 15:46:28 +00:00
5fb3a7e0e0 - Update to version 6038: Eyad Issa 2025-07-30 20:42:07 +00:00
17f3fd085f - Update to version 5970: * Full changelog: https://github.com/ggml-org/llama.cpp/compare/b5889...b5970 Eyad Issa 2025-07-23 14:31:52 +00:00
5fd045ca7d Accepting request 1292534 from science:machinelearning Ana Guerrero 2025-07-15 14:43:18 +00:00
54299e5777 * Remove Kompute support Eyad Issa 2025-07-13 15:14:43 +00:00
21e3ba7e90 - Add GGML_NATIVE=OFF build flag Eyad Issa 2025-07-13 15:12:56 +00:00
287ac0c443 - Add -GGML_NATIVE=OFF build flag - Update to version 5889: * Prevent integer overflow in gguf tensor size calculation (bsc#1246377) (CVE-2025-53630) (GHSA-vgg9-87g3-85w8) * Improved build-time messaging for ggml_set_rows. * Enhanced test coverage for LFM2 and added LFM2 to documentation. * Synchronized ggml updates and improved Vulkan backend (bilinear interpolation, ggml_roll, SET_ROWS, optimizations). * Fixed pooled embedding output in server and improved prompt processing. * Added support for LiquidAI LFM2 hybrid family and Falcon-H1 models. * Improved HIP, OpenCL, and SYCL backend compatibility and features. * Added new vocabularies and model support (midm-2.0, skt/A.X-4.0, SmolLM3, hunyuan moe, Granite Four). * Various bug fixes, optimizations, and documentation improvements across backends and models. * Full changelog: https://github.com/ggml-org/llama.cpp/compare/b5812...b5889 Eyad Issa 2025-07-13 15:11:35 +00:00
aee11711a1 Accepting request 1290235 from science:machinelearning Ana Guerrero 2025-07-06 15:07:53 +00:00
7027db2e08 - Update to version 5812: * Mamba-2 Support: Initial integration of Mamba-2 architecture. * Added support for ERNIE 4.5 0.3B, NeoBERT, Arcee AI's AFM, Gemma3n text-only, and dots.llm1 architectures * Vulkan Improvements: Support for softmax/FlashAttention batch/broadcast, fused RMS_NORM+MUL, and better memory handling * GGML Backend: Added REGLU/GEGLU/SWIGLU ops, ggml_set_rows, and improved SYCL/OpenCL/Metal support * Server Improvements: Jinja template kwargs, draft model cache params, and Unix socket support * Quantization: User-defined layer pruning and KV override fixes * Optimizations: Batched Vulkan mul_mat_id splitting and ARM hsum reduction * Added GGML version function * Full changelog: https://github.com/ggml-org/llama.cpp/compare/b5699...b5812 Eyad Issa 2025-07-03 00:33:30 +00:00
e84b2edce8 Accepting request 1286807 from science:machinelearning Ana Guerrero 2025-06-20 14:48:56 +00:00
05fa0fbdf4 - Update to 5699: * vocab : prevent integer overflow during load (bsc#1244714) (CVE-2025-49847) * batch : add LLAMA_BATCH_DEBUG environment variable * batch : auto-gen positions + verify multi-sequence input * common : suggest --jinja when autodetection fails * ggml-cpu: fix uncaught underscore terminators * kv-cache : fix use-after-move of defrag info * llama : rework embeddings logic * llama-chat : do not throw when tool parsing fails * llama-chat : fix multiple system message for gemma, orion * model : Add support for Arcee AI's upcoming AFM model * model : add dots.llm1 architecture support * model : add NeoBERT * server : When listening on a unix domain socket don't print http:// and port * quantize : change int to unsigned int for KV overrides * Full changelog: https://github.com/ggml-org/llama.cpp/compare/b5657...b5699 Eyad Issa 2025-06-19 00:59:30 +00:00
ecc1b17ccf - Update to 5657: Eyad Issa 2025-06-14 13:16:56 +00:00
c55a8cac44 Accepting request 1283892 from science:machinelearning Ana Guerrero 2025-06-10 07:05:25 +00:00
098bcc02b6 - Update to 5556: * mtmd : move helpers to dedicated library * server: fix remove 'image_url'/'input_audio' json-object * llama : add RobertaForSequenceClassification reranker support * ggml: aarch64: Implement SVE F32 kernels for Mamba Sequential Scan Algorithm * llama : add support for jina-reranker-v2 * arm64: optimize q4_k_q8_k kernel with i8mm * llama : use llm_build_granite for minicpm * mtmd : drop _shared from libmtmd name, merge helpers into libmtmd * server: allow unclosed thinking tags * llama : use n_swa + n_ubatch cells for SWA cache * convert : fix rwkv bos/eos token * llama : add support for DistilBert Eyad Issa 2025-05-31 23:45:53 +00:00
b9212f41bb Accepting request 1280718 from science:machinelearning Dominique Leuenberger 2025-05-30 12:32:23 +00:00
d0e896b3f4 - Update to 5516: * llama : remove llama_kv_cache_view API * model : disable SWA for Phi models * kv-cache : simplify the interface * server : Add the endpoints /api/tags and /api/chat * ggml : add ggml_gelu_erf() * hparams : support models for which all layers use SWA * opencl: fix couple crashes * opencl: Add support for multiple devices * mtmd : add ultravox audio input * server : support audio input * server: streaming of tool calls and thoughts when jinja is on * mtmd : support Qwen 2.5 Omni * ggml : riscv: add xtheadvector support * opencl : various optimizations * Full changelog: https://github.com/ggml-org/llama.cpp/compare/b5426...b5516 Eyad Issa 2025-05-27 22:55:37 +00:00
8dfa0f3a34 Accepting request 1278459 from science:machinelearning Ana Guerrero 2025-05-20 10:19:52 +00:00
50ec7b4608 - Update to 5426: * print hint when loading a model when no backends are loaded * vulkan: use scalar FA rather than coopmat2 when N==1 * mtmd : add vision support for llama 4 * Full changelog: https://github.com/ggml-org/llama.cpp/compare/b5402...b5426 Eyad Issa 2025-05-19 22:19:33 +00:00
3125cf9e4e Update to 5402 Eyad Issa 2025-05-17 13:15:02 +00:00
f19318fe3f - Update to version 5332: * server : vision support via libmtmd Eyad Issa 2025-05-09 21:18:48 +00:00
1dd356788c Accepting request 1276203 from science:machinelearning Ana Guerrero 2025-05-09 16:52:00 +00:00
6553a34765 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=76 Eyad Issa 2025-05-09 11:08:31 +00:00
aee1933fe1 - Delete patch 0002-build-main-cli.patch: build system changed upstream Eyad Issa 2025-05-09 11:08:05 +00:00
6c3349496a OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=74 Eyad Issa 2025-05-09 11:04:22 +00:00
68410760e1 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=73 Eyad Issa 2025-05-09 11:02:25 +00:00
fce1fbe866 - Use source urls instead of obs_scm - Update to version 5327: * A new binary llama-mtmd-cli is introduced to replace llava-cli, minicpmv-cli, gemma3-cli (#13012) and qwen2vl-cli (#13141), libllava will be deprecated * Full changes here: https://github.com/ggml-org/llama.cpp/compare/b5158...b5321 - Disable patch 0001-dl-load-path.patch Eyad Issa 2025-05-09 11:00:51 +00:00
27e08bcac8 Accepting request 1271007 from science:machinelearning Dominique Leuenberger 2025-04-20 07:36:21 +00:00
09b32fc8ff - Remove convert_hf_to_gguf.py Eyad Issa 2025-04-19 21:51:41 +00:00
b25c391d4d - Update to version 5158: Eyad Issa 2025-04-19 21:46:54 +00:00
7dc7ca652b Accepting request 1253529 from science:machinelearning Ana Guerrero 2025-03-17 21:17:23 +00:00
3567886aa8 * common : refactor '-o' option * common : add llama.vim preset for Qwen2.5 Coder * common : add --system-prompt parameter, replace behavior of -p in conversation mode * cmake : install ggml-cpp.h as a public header file * hparams : add SWA rope parameters * ggml : upgrade init_tensor API to return a ggml_status * ggml : aarch64: implement SVE kernels for q2_k_q8_k vector dot * ggml : aarch64: implement SVE kernels for q3_K_q8_K vector dot * ggml-cpu : faster AVX2 variant for IQ1_M (#12216) * ggml-cpu : faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions * ggml-cpu : Support s390x SIMD Instruction Set * ggml-cpu : Add CPU backend support for KleidiAI library * ggml-backend : keep paths in native string type when possible * llama : Add Gemma 3 support (+ experimental vision capability) * llama : add Phi-4-mini support * llama : expose llama_model_n_head_kv in the API * llama : skip loading unused tensors * llama : fix indentation in llama-grammar * main : add -sysf / --system-prompt-file * main : allow preloading conversation with -p and add -st / --single-turn * main : use jinja chat template system prompt by default * main : update outdated system prompt message * opencl : use OpenCL C standard supported by the device * opencl : Noncontiguous norm, rms_norm, disable fp16 for some ops * run : allow to customize prompt by env var LLAMA_PROMPT_PREFIX * run : add --chat-template-file Eyad Issa 2025-03-16 16:17:09 +00:00
e58053caf7 Accepting request 1253454 from home:zzndb001:test Eyad Issa 2025-03-16 15:59:22 +00:00
166966e108 Accepting request 1246038 from science:machinelearning Dominique Leuenberger 2025-02-16 21:40:49 +00:00
e82e5eee02 - Update to version 4719: * Too many changes to list here. Please refer to the upstream changelog for more information. https://github.com/ggerganov/llama.cpp/compare/b4589...b4719 Eyad Issa 2025-02-15 01:30:36 +00:00
4380c90a13 Accepting request 1241683 from science:machinelearning Ana Guerrero 2025-02-03 20:44:48 +00:00
e2368e264f Build with curl support. Eyad Issa 2025-01-31 17:04:07 +00:00
40fcb226d2 Accepting request 1241557 from science:machinelearning Ana Guerrero 2025-01-31 15:04:51 +00:00
22ac7ffaa9 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=60 Eyad Issa 2025-01-31 02:12:27 +00:00
99a956f5a0 Update to version 4589 Eyad Issa 2025-01-31 02:02:23 +00:00
08c11c1569 Accepting request 1238533 from science:machinelearning Dominique Leuenberger 2025-01-18 12:18:11 +00:00
eede873f48 - Update to version 4501: * Optimizations to Vulkan kernels * Add internlm3 support * Add llama_model_load_from_splits * ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot * cli : auto activate conversation mode if chat template is available (#11214) * common : support tag-based --hf-repo like on ollama * cli: reset color before exiting Eyad Issa 2025-01-17 15:47:06 +00:00
5511b5bcca - Update to version 4458 - Add 0002-build-main-cli.patch to only build necessary binaries Eyad Issa 2025-01-12 23:07:31 +00:00
0c43237a9b OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=55 Eyad Issa 2024-12-20 02:08:51 +00:00
00e88b0361 - Disable LTO, as it was causing some issues with dynamic loading of backends - Disable dynamic loading of backends for now Eyad Issa 2024-12-20 01:58:08 +00:00
bb100476d9 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=53 Eyad Issa 2024-12-20 01:33:55 +00:00
2d9ad9121d OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=52 Eyad Issa 2024-12-20 00:24:11 +00:00
16135d8ca7 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=51 Eyad Issa 2024-12-19 12:43:31 +00:00
d583cf0e7d - Update to version 4361: Eyad Issa 2024-12-19 12:28:07 +00:00
2119496793 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=49 Eyad Issa 2024-12-14 16:46:29 +00:00
141ffe3ca0 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=48 Eyad Issa 2024-12-14 16:19:28 +00:00
7b259544b8 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=47 Eyad Issa 2024-12-14 15:57:24 +00:00
50c0b3c6a7 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=46 Eyad Issa 2024-12-14 15:57:06 +00:00
b06a4aa885 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=45 Eyad Issa 2024-12-14 15:56:09 +00:00
3cec2e5f3f OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=44 Eyad Issa 2024-12-14 04:00:22 +00:00
92156d94e2 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=43 Eyad Issa 2024-12-14 03:49:04 +00:00
cfdc7174d9 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=42 Eyad Issa 2024-12-14 03:45:37 +00:00
cd107329d5 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=41 Eyad Issa 2024-12-14 03:43:44 +00:00
85d16cf50f - Update to version 4326: * Introducing experimental OpenCL backend * Vulkan backend improvements and optimizations * Update documentation for server streaming mode * Improve -ctv -ctk CLI arguments * Load all backends from a user-provided search path at runtime * Vulkan backend improvements and optimizations * Server improvements and optimizations * Various ops optimizations * Various server fixes * Vulkan backend improvements and optimizations * Automatic selection of best CPU backend Eyad Issa 2024-12-14 03:39:40 +00:00
04014e5bb2 - Update to version 4304: * bug-fix: snprintf prints NULL in place of the last character (#10419) * docs: fix server documentation formatting (#10776) * ggml: load all backends from a user-provided search path (#10699) * vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) * vulkan: dynamic subgroup size for the remaining k quants (#10745) * imatrix : Add imatrix to --no-context-shift (#10766) * CUDA: rename macros to avoid conflicts with WinAPI (#10736) * server : add flag to disable the web-ui (#10762) (#10751) * vulkan: disable spirv-opt for coopmat shaders (#10763) * CUDA: fix shared memory access condition for mmv (#10740) * Changes to CMakePresets.json to add ninja clang target on windows (#10668) * vulkan: fix compile warnings (#10731) * cmake : simplify msvc charsets (#10672) * server : fix format_infill (#10724) * server : bring back info of final chunk in stream mode (#10722) * Vulkan: fix NaN in tanh.comp with AMD proprietary driver on Windows (#10723) * llama : use cmake for swift build (#10525) * vulkan: compile a test shader in cmake to check for coopmat2 support (#10713) * llama : add 128k yarn context for Qwen (#10698) * server : (refactor) no more json in server_task input (#10691) Eyad Issa 2024-12-11 20:42:43 +00:00
da906242c2 - Split backends into different packages - Added llama-server llama-perplexity and llama-bench binaries Eyad Issa 2024-12-07 19:39:12 +00:00
b4c5c445dc - Update to version 4284: Eyad Issa 2024-12-07 18:44:21 +00:00
7e098f1e2e - Removed ggml-amx.so, as it is now included in the CPU backend - Update to version 4230: Eyad Issa 2024-11-30 19:46:19 +00:00
201a708682 - Update to version 4219: * sycl : Reroute permuted mul_mats through oneMKL (#10408) * CANN: RoPE operator optimization (#10563) * vulkan: get the first command buffer submitted sooner (#10499) * llava: return false instead of exit (#10546) * ggml : remove redundant copyright notice + update authors * llama : add missing model types * server : (tests) don't use thread for capturing stdout/stderr, bump openai client library (#10568) * common: fix warning message when no GPU found (#10564) * docs: fix outdated usage of llama-simple (#10565) * ci : fix tag name in cuda and hip releases (#10566) * ggml : fix row condition for i8mm kernels (#10561) * cmake : fix ARM feature detection (#10543) * ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541) * kompute : improve backend to pass test_backend_ops (#10542) * CANN: Update cann.md to display correctly in CLion (#10538) * CANN: Fix SOC_TYPE compile bug (#10519) * CANN: ROPE operator optimization (#10540) * common : fix duplicated file name with hf_repo and hf_file (#10550) * Add some minimal optimizations for CDNA (#10498) * ci : faster CUDA toolkit installation method and use ccache (#10537) * metal : fix group_norm support condition (#0) * sync : ggml * Do not include arm_neon.h when compiling CUDA code (ggml/1028) * vulkan: define all quant data structures in types.comp (#10440) Eyad Issa 2024-11-29 11:37:13 +00:00
05858c7fd7 - Update to version 4195: Eyad Issa 2024-11-27 10:58:14 +00:00
e85c193b31 - Update to version 4153: * ci: Update oneAPI runtime dll packaging (#10428) * GitHub: ask for more info in issue templates (#10426) * CANN: Support Ascend310P to accelerate F32 and F16 Model (#10216) * cuda : optimize argmax (#10441) * llama : handle KV shift for recurrent models (#10402) * sync : ggml * ggml/sched : do not skip views in pre-assignments * ggml-opt: fix data corruption (ggml/1022) * vulkan: predicate max operation in soft_max shaders/soft_max (#10437) * cmake: add link dependencies to cmake find pkg (#10433) * llama : add .clang-format file (#10415) * vulkan: copy iq4_nl LUT into shared memory (#10409) * vulkan: further optimize mul_mat_vec using larger loads (#10387) * update rel to 4040 (#10395) * Fix missing file renames in Makefile due to changes in commit ae8de6d50a (#10413) * add cmake rvv support (#10411) * sync : ggml * metal : fox offset integer overflows in im2col (ggml/1015) * metal : add GGML_UNARY_OP_ELU kernel (ggml/1018) * cmake: force MSVC compiler charset to utf-8 (#9989) * Add required ggml-base and backend libs to cmake pkg (#10407) * cuda : fix CUDA_FLAGS not being applied (#10403) * llama : add check for KV cache shifts (#10401) Eyad Issa 2024-11-23 14:31:07 +00:00
b385a4cc93 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=32 Eyad Issa 2024-11-19 13:15:18 +00:00
d3180eea0d - Update to version 4130: * llama : add OLMo November 2024 support (#10394) * sycl : Add option to set the SYCL architecture for all targets (#10266) * vulkan: Optimize soft_max (#10301) * sycl: Revert MUL_MAT_OP support changes (#10385) Eyad Issa 2024-11-19 13:11:16 +00:00
9896add9b7 - Lower requires CMake version to 3.14 Eyad Issa 2024-11-18 20:57:49 +00:00
38939da02c OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=29 Eyad Issa 2024-11-18 20:56:55 +00:00
e118732486 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=28 Eyad Issa 2024-11-18 20:56:25 +00:00
b53a8210ac OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=27 Eyad Issa 2024-11-18 19:48:21 +00:00
e3999f6d6e - Re-enable Vulkan backend - Update to version 4126: * cuda : only use native when supported by cmake (#10389) * Skip searching root path for cross-compile builds (#10383) * vulkan: remove use of null initializer (#10372) * flake.lock: Update (#10346) * Vulkan: Fix device info output format specifiers (#10366) * docker: use GGML_NATIVE=OFF (#10368) Eyad Issa 2024-11-18 19:44:04 +00:00
7a7caa5b37 - Disable Vulkan backend because of a bug on vnsprintf and Vulkan Backend: https://github.com/ggerganov/llama.cpp/issues/10375 - Remove libllava packaging (for now) - Update to version 4120: * CUDA: fix MMV kernel being used for FP16 src1 (#10357) * CMake: fix typo in comment [no ci] (#10360) * llama : only use default buffer types for the KV cache (#10358) * gitignore : ignore local run scripts [no ci] * metal : refactor kernel args into structs (#10238) * ggml : fix undefined reference to 'getcpu' (#10354) * CUDA: remove DMMV, consolidate F16 mult mat vec (#10318) * CMake: default to -arch=native for CUDA build (#10320) * ggml : fix possible buffer use after free in sched reserve (#9930) * ggml : inttypes.h -> cinttypes (#0) * ggml : adapt AMX to tensor->grad removal (#0) * make : add ggml-opt (#0) * tests : remove test-grad0 * ggml : fix compile warnings (#0) * ggml: new optimization interface (ggml/988) * scripts : update sync * docs : vulkan build instructions to use git bash mingw64 (#10303) * llama/ex: remove --logdir argument (#10339) * llamafile : fix include path (#0) * make : auto-determine dependencies (#0) - Split libllama into libllama and libllava - Add Vulkan support Eyad Issa 2024-11-18 10:01:01 +00:00
ccc0a00ff2 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=24 Eyad Issa 2024-11-16 19:55:26 +00:00
a825f81c57 OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=23 Eyad Issa 2024-11-16 19:41:03 +00:00
fe04c5185f OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/llamacpp?expand=0&rev=22 Eyad Issa 2024-11-16 19:35:45 +00:00

1 2

Commit Graph Select branches Hide Pull Requests factory Mono Color

Commit Graph

Select branches

Hide Pull Requests

factory