forked from pool/openvino
- More GenAI coverage and framework integrations to minimize code
changes
* New models supported:
+ On CPUs & GPUs: Qwen3-Embedding-0.6B, Qwen3-Reranker-0.6B,
Mistral-Small-24B-Instruct-2501.
+ On NPUs: Gemma-3-4b-it and Qwen2.5-VL-3B-Instruct.
* Preview: Mixture of Experts (MoE) models optimized for CPUs
and GPUs, validated for Qwen3-30B-A3B.
* GenAI pipeline integrations: Qwen3-Embedding-0.6B and
Qwen3-Reranker-0.6B for enhanced retrieval/ranking, and
Qwen2.5VL-7B for video pipeline.
- Broader LLM model support and more model compression
techniques
* Gold support for Windows ML* enables developers to deploy AI
models and applications effortlessly across CPUs, GPUs, and
NPUs on Intel® Core™ Ultra processor-powered AI PCs.
* The Neural Network Compression Framework (NNCF) ONNX backend
now supports INT8 static post-training quantization (PTQ)
and INT8/INT4 weight-only compression to ensure accuracy
parity with OpenVINO IR format models. SmoothQuant algorithm
support added for INT8 quantization.
* Accelerated multi-token generation for GenAI, leveraging
optimized GPU kernels to deliver faster inference, smarter
KV-cache reuse, and scalable LLM performance.
* GPU plugin updates include improved performance with prefix
caching for chat history scenarios and enhanced LLM accuracy
with dynamic quantization support for INT8.
- More portability and performance to run AI at the edge, in the
cloud, or locally.
* Announcing support for Intel® Core Ultra Processor Series 3.
* Encrypted blob format support added for secure model
deployment with OpenVINO GenAI. Model weights and artifacts
are stored and transmitted in an encrypted format, reducing
risks of IP theft during deployment. Developers can deploy
with minimal code changes using OpenVINO GenAI pipelines.
* OpenVINO™ Model Server and OpenVINO™ GenAI now extend
support for Agentic AI scenarios with new features such as
output parsing and improved chat templates for reliable
multi-turn interactions, and preview functionality for the
Qwen3-30B-A3B model. OVMS also introduces a preview for
audio endpoints.
* NPU deployment is simplified with batch support, enabling
seamless model execution across Intel® Core Ultra
processors while eliminating driver dependencies. Models
are reshaped to batch_size=1 before compilation.
* The improved NVIDIA Triton Server* integration with
OpenVINO backend now enables developers to utilize Intel
GPUs or NPUs for deployment.
OBS-URL: https://build.opensuse.org/package/show/science:machinelearning/openvino?expand=0&rev=44
17 lines
585 B
Plaintext
17 lines
585 B
Plaintext
<services>
|
|
<service name="obs_scm" mode="manual">
|
|
<param name="url">https://github.com/openvinotoolkit/openvino.git</param>
|
|
<param name="scm">git</param>
|
|
<param name="revision">2025.4.0</param>
|
|
<param name="version">2025.4.0</param>
|
|
<param name="submodules">enable</param>
|
|
<param name="filename">openvino</param>
|
|
<param name="exclude">.git</param>
|
|
</service>
|
|
<service name="tar" mode="buildtime" />
|
|
<service name="recompress" mode="buildtime">
|
|
<param name="file">*.tar</param>
|
|
<param name="compression">zstd</param>
|
|
</service>
|
|
</services>
|