diff --git a/_service b/_service index ce5f506..8ec83c9 100644 --- a/_service +++ b/_service @@ -2,8 +2,8 @@ https://github.com/openvinotoolkit/openvino.git git - 2025.2.0 - 2025.2.0 + 2025.4.0 + 2025.4.0 enable openvino .git diff --git a/openvino-2025.2.0.obscpio b/openvino-2025.2.0.obscpio deleted file mode 100644 index ff93bc6..0000000 --- a/openvino-2025.2.0.obscpio +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:6c75c27293662056f9098ecf9b0dfbeacf948983df5807a63610313678024adf -size 743258127 diff --git a/openvino-2025.4.0.obscpio b/openvino-2025.4.0.obscpio new file mode 100644 index 0000000..451c150 --- /dev/null +++ b/openvino-2025.4.0.obscpio @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:deda1db3ae8e8acb506d8937ff4709332bfa0380de14393c6f030b88dd2fc5c4 +size 753350671 diff --git a/openvino.changes b/openvino.changes index 4b0bbf8..8c8b5bc 100644 --- a/openvino.changes +++ b/openvino.changes @@ -1,3 +1,102 @@ +------------------------------------------------------------------- +Tue Dec 2 22:43:52 UTC 2025 - Alessandro de Oliveira Faria + +- Update to 2025.4.0 +- More GenAI coverage and framework integrations to minimize code + changes + * New models supported: + + On CPUs & GPUs: Qwen3-Embedding-0.6B, Qwen3-Reranker-0.6B, + Mistral-Small-24B-Instruct-2501. + + On NPUs: Gemma-3-4b-it and Qwen2.5-VL-3B-Instruct. + * Preview: Mixture of Experts (MoE) models optimized for CPUs + and GPUs, validated for Qwen3-30B-A3B. + * GenAI pipeline integrations: Qwen3-Embedding-0.6B and + Qwen3-Reranker-0.6B for enhanced retrieval/ranking, and + Qwen2.5VL-7B for video pipeline. +- Broader LLM model support and more model compression + techniques + * The Neural Network Compression Framework (NNCF) ONNX backend + now supports INT8 static post-training quantization (PTQ) + and INT8/INT4 weight-only compression to ensure accuracy + parity with OpenVINO IR format models. SmoothQuant algorithm + support added for INT8 quantization. + * Accelerated multi-token generation for GenAI, leveraging + optimized GPU kernels to deliver faster inference, smarter + KV-cache reuse, and scalable LLM performance. + * GPU plugin updates include improved performance with prefix + caching for chat history scenarios and enhanced LLM accuracy + with dynamic quantization support for INT8. +- More portability and performance to run AI at the edge, in the + cloud, or locally. + * Announcing support for Intel® Core Ultra Processor Series 3. + * Encrypted blob format support added for secure model + deployment with OpenVINO GenAI. Model weights and artifacts + are stored and transmitted in an encrypted format, reducing + risks of IP theft during deployment. Developers can deploy + with minimal code changes using OpenVINO GenAI pipelines. + * OpenVINO™ Model Server and OpenVINO™ GenAI now extend + support for Agentic AI scenarios with new features such as + output parsing and improved chat templates for reliable + multi-turn interactions, and preview functionality for the + Qwen3-30B-A3B model. OVMS also introduces a preview for + audio endpoints. + * NPU deployment is simplified with batch support, enabling + seamless model execution across Intel® Core Ultra + processors while eliminating driver dependencies. Models + are reshaped to batch_size=1 before compilation. + * The improved NVIDIA Triton Server* integration with + OpenVINO backend now enables developers to utilize Intel + GPUs or NPUs for deployment. + +------------------------------------------------------------------- +Sun Sep 7 01:21:19 UTC 2025 - Alessandro de Oliveira Faria + +- Update to 2025.3.0 +- More GenAI coverage and framework integrations to minimize code + changes + * New models supported: Phi-4-mini-reasoning, AFM-4.5B, + Gemma-3-1B-it, Gemma-3-4B-it, and Gemma-3-12B, + * NPU support added for: Qwen3-1.7B, Qwen3-4B, and Qwen3-8B. + * LLMs optimized for NPU now available on OpenVINO Hugging + Face collection. +- Broader LLM model support and more model compression techniques + * The NPU plug-in adds support for longer contexts of up to + 8K tokens, dynamic prompts, and dynamic LoRA for improved + LLM performance. + * The NPU plug-in now supports dynamic batch sizes by reshaping + the model to a batch size of 1 and concurrently managing + multiple inference requests, enhancing performance and + optimizing memory utilization. + * Accuracy improvements for GenAI models on both built-in + and discrete graphics achieved through the implementation + of the key cache compression per channel technique, in + addition to the existing KV cache per-token compression + method. + * OpenVINO™ GenAI introduces TextRerankPipeline for improved + retrieval relevance and RAG pipeline accuracy, plus + Structured Output for enhanced response reliability and + function calling while ensuring adherence to predefined + formats. +- More portability and performance to run AI at the edge, + in the cloud, or locally. + * Announcing support for Intel® Arc™ Pro B-Series + (B50 and B60). + * Preview: Hugging Face models that are GGUF-enabled for + OpenVINO GenAI are now supported by the OpenVINO™ Model + Server for popular LLM model architectures such as + DeepSeek Distill, Qwen2, Qwen2.5, and Llama 3. + This functionality reduces memory footprint and + simplifies integration for GenAI workloads. + * With improved reliability and tool call accuracy, + the OpenVINO™ Model Server boosts support for + agentic AI use cases on AI PCs, while enhancing + performance on Intel CPUs, built-in GPUs, and NPUs. + * int4 data-aware weights compression, now supported in the + Neural Network Compression Framework (NNCF) for ONNX + models, reduces memory footprint while maintaining + accuracy and enables efficient deployment in + resource-constrained environments. + ------------------------------------------------------------------- Wed Jun 25 01:09:14 UTC 2025 - Alessandro de Oliveira Faria @@ -186,7 +285,7 @@ Mon Apr 14 06:52:03 UTC 2025 - Alessandro de Oliveira Faria