ollama/ollama.changes

378 lines
16 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

-------------------------------------------------------------------
Sat Jun 01 21:12:20 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.1.40:
* New model: Codestral: Codestral is Mistral AIs first-ever code
model designed for code generation tasks.
* New model: IBM Granite Code: now in 3B and 8B parameter sizes.
* New model: Deepseek V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Model
* Fixed out of memory and incorrect token issues when running
Codestral on 16GB Macs
* Fixed issue where full-width characters (e.g. Japanese,
Chinese, Russian) were deleted at end of the line when using
ollama run
-------------------------------------------------------------------
Wed May 29 11:38:26 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.1.39:
* New model: Cohere Aya 23: A new state-of-the-art, multilingual
LLM covering 23 different languages.
* New model: Mistral 7B 0.3: A new version of Mistral 7B with
initial support for function calling.
* New model: Phi-3 Medium: a 14B parameters, lightweight,
state-of-the-art open model by Microsoft.
* New model: Phi-3 Mini 128K and Phi-3 Medium 128K: versions of
the Phi-3 models that support a context window size of 128K
* New model: Granite code: A family of open foundation models by
IBM for Code Intelligence
* It is now possible to import and quantize Llama 3 and its
finetunes from Safetensors format to Ollama.
* Full changelog at
https://github.com/ollama/ollama/releases/tag/v0.1.39
-------------------------------------------------------------------
Wed May 22 18:05:30 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Added 15.6 build
-------------------------------------------------------------------
Thu May 16 19:55:51 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.1.38:
* New model: Falcon 2: A new 11B parameters causal decoder-only
model built by TII and trained over 5T tokens.
* New model: Yi 1.5: A new high-performing version of Yi, now
licensed as Apache 2.0. Available in 6B, 9B and 34B sizes.
* Added ollama ps command
* Added /clear command
* Fixed issue where switching loaded models on Windows would take
several seconds
* Running /save will no longer abort the chat session if an
incorrect name is provided
* The /api/tags API endpoint will now correctly return an empty
list [] instead of null if no models are provided
-------------------------------------------------------------------
Sun May 12 19:05:53 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.1.37:
* Fixed issue where models with uppercase characters in the name
would not show with ollama list
* Fixed usage string for ollama create
* Fix finish_reason being "" instead of null in the Open-AI
compatible chat API.
-------------------------------------------------------------------
Sun May 12 15:20:28 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Use obs_scm service instead of the deprecated tar_scm
- Use zstd for vendor tarball compression
-------------------------------------------------------------------
Sun May 12 01:39:26 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.1.36:
* Fixed exit status 0xc0000005 error with AMD graphics cards on Windows
* Fixed rare out of memory errors when loading a model to run with CPU
- Update to version 0.1.35:
* New models: Llama 3 ChatQA: A model from NVIDIA based on Llama
3 that excels at conversational question answering (QA) and
retrieval-augmented generation (RAG).
* Quantization: ollama create can now quantize models when
importing them using the --quantize or -q flag
* Fixed issue where inference subprocesses wouldn't be cleaned up
on shutdown.
* Fixed a series out of memory errors when loading models on
multi-GPU systems
* Ctrl+J characters will now properly add newlines in ollama run
* Fixed issues when running ollama show for vision models
* OPTIONS requests to the Ollama API will no longer result in
errors
* Fixed issue where partially downloaded files wouldn't be
cleaned up
* Added a new done_reason field in responses describing why
generation stopped responding
* Ollama will now more accurately estimate how much memory
is available on multi-GPU systems especially when running
different models one after another
- Update to version 0.1.34:
* New model: Llava Llama 3
* New model: Llava Phi 3
* New model: StarCoder2 15B Instruct
* New model: CodeGemma 1.1
* New model: StableLM2 12B
* New model: Moondream 2
* Fixed issues with LLaVa models where they would respond
incorrectly after the first request
* Fixed out of memory errors when running large models such as
Llama 3 70B
* Fixed various issues with Nvidia GPU discovery on Linux and
Windows
* Fixed a series of Modelfile errors when running ollama create
* Fixed no slots available error that occurred when cancelling a
request and then sending follow up requests
* Improved AMD GPU detection on Fedora
* Improved reliability when using the experimental
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED flags
* ollama serve will now shut down quickly, even if a model is
loading
- Update to version 0.1.33:
* New model: Llama 3
* New model: Phi 3 Mini
* New model: Moondream
* New model: Llama 3 Gradient 1048K
* New model: Dolphin Llama 3
* New model: Qwen 110B
* Fixed issues where the model would not terminate, causing the
API to hang.
* Fixed a series of out of memory errors on Apple Silicon Macs
* Fixed out of memory errors when running Mixtral architecture
models
* Aded experimental concurrency features:
~ OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously
for a single model
~ OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously
-------------------------------------------------------------------
Tue Apr 23 02:26:34 UTC 2024 - rrahl0@disroot.org
- Update to version 0.1.32:
* scale graph based on gpu count
* Support unicode characters in model path (#3681)
* darwin: no partial offloading if required memory greater than system
* update llama.cpp submodule to `7593639` (#3665)
* fix padding in decode
* Revert "cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)" (#3662)
* Added Solar example at README.md (#3610)
* Update langchainjs.md (#2030)
* Added MindsDB information (#3595)
* examples: add more Go examples using the API (#3599)
* Update modelfile.md
* Add llama2 / torch models for `ollama create` (#3607)
* Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653)
* app: gracefully shut down `ollama serve` on windows (#3641)
* types/model: add path helpers (#3619)
* update llama.cpp submodule to `4bd0f93` (#3627)
* types/model: make ParseName variants less confusing (#3617)
* types/model: remove (*Digest).Scan and Digest.Value (#3605)
* Fix rocm deps with new subprocess paths
* mixtral mem
* Revert "types/model: remove (*Digest).Scan and Digest.Value (#3589)"
* types/model: remove (*Digest).Scan and Digest.Value (#3589)
* types/model: remove DisplayLong (#3587)
* types/model: remove MarshalText/UnmarshalText from Digest (#3586)
* types/model: init with Name and Digest types (#3541)
* server: provide helpful workaround hint when stalling on pull (#3584)
* partial offloading
* refactor tensor query
* api: start adding documentation to package api (#2878)
* examples: start adding Go examples using api/ (#2879)
* Handle very slow model loads
* fix: rope
* Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564)
* build.go: introduce a friendlier way to build Ollama (#3548)
* update llama.cpp submodule to `1b67731` (#3561)
* ci: use go-version-file
* Correct directory reference in macapp/README (#3555)
* cgo quantize
* no blob create if already exists
* update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528)
* Docs: Remove wrong parameter for Chat Completion (#3515)
* no rope parameters
* add command-r graph estimate
* Fail fast if mingw missing on windows
* use an older version of the mac os sdk in release (#3484)
* Add test case for context exhaustion
* CI missing archive
* fix dll compress in windows building
* CI subprocess path fix
* Fix CI release glitches
* update graph size estimate
* Fix macOS builds on older SDKs (#3467)
* cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)
* feat: add OLLAMA_DEBUG in ollama server help message (#3461)
* Revert options as a ref in the server
* default head_kv to 1
* fix metal gpu
* Bump to b2581
* Refined min memory from testing
* Release gpu discovery library after use
* Safeguard for noexec
* Detect too-old cuda driver
* Integration test improvements
* Apply 01-cache.diff
* Switch back to subprocessing for llama.cpp
* Simplify model conversion (#3422)
* fix generate output
* update memory calcualtions
* refactor model parsing
* Add chromem-go to community integrations (#3437)
* Update README.md (#3436)
* Community Integration: CRAG Ollama Chat (#3423)
* Update README.md (#3378)
* Community Integration: ChatOllama (#3400)
* Update 90_bug_report.yml
* Add gemma safetensors conversion (#3250)
* CI automation for tagging latest images
* Bump ROCm to 6.0.2 patch release
* CI windows gpu builds
* Update troubleshooting link
* fix: trim quotes on OLLAMA_ORIGINS
- add set_version to automatically switch over to the newer version
-------------------------------------------------------------------
Tue Apr 16 10:52:25 UTC 2024 - bwiedemann@suse.com
- Update to version 0.1.31:
* Backport MacOS SDK fix from main
* Apply 01-cache.diff
* fix: workflows
* stub stub
* mangle arch
* only generate on changes to llm subdirectory
* only generate cuda/rocm when changes to llm detected
* Detect arrow keys on windows (#3363)
* add license in file header for vendored llama.cpp code (#3351)
* remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350)
* change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347)
* malformed markdown link (#3358)
* Switch runner for final release job
* Use Rocky Linux Vault to get GCC 10.2 installed
* Revert "Switch arm cuda base image to centos 7"
* Switch arm cuda base image to centos 7
* Bump llama.cpp to b2527
* Fix ROCm link in `development.md`
* adds ooo to community integrations (#1623)
* Add cliobot to ollama supported list (#1873)
* Add Dify.AI to community integrations (#1944)
* enh: add ollero.nvim to community applications (#1905)
* Add typechat-cli to Terminal apps (#2428)
* add new Web & Desktop link in readme for alpaca webui (#2881)
* Add LibreChat to Web & Desktop Apps (#2918)
* Add Community Integration: OllamaGUI (#2927)
* Add Community Integration: OpenAOE (#2946)
* Add Saddle (#3178)
* tlm added to README.md terminal section. (#3274)
* Update README.md (#3288)
* Update README.md (#3338)
* Integration tests conditionally pull
* add support for libcudart.so for CUDA devices (adds Jetson support)
* llm: prevent race appending to slice (#3320)
* Bump llama.cpp to b2510
* Add Testcontainers into Libraries section (#3291)
* Revamp go based integration tests
* rename `.gitattributes`
* Bump llama.cpp to b2474
* Add docs for GPU selection and nvidia uvm workaround
* doc: faq gpu compatibility (#3142)
* Update faq.md
* Better tmpdir cleanup
* Update faq.md
* update `faq.md`
* dyn global
* llama: remove server static assets (#3174)
* add `llm/ext_server` directory to `linguist-vendored` (#3173)
* Add Radeon gfx940-942 GPU support
* Wire up more complete CI for releases
* llm,readline: use errors.Is instead of simple == check (#3161)
* server: replace blob prefix separator from ':' to '-' (#3146)
* Add ROCm support to linux install script (#2966)
* .github: fix model and feature request yml (#3155)
* .github: add issue templates (#3143)
* fix: clip memory leak
* Update README.md
* add `OLLAMA_KEEP_ALIVE` to environment variable docs for `ollama serve` (#3127)
* Default Keep Alive environment variable (#3094)
* Use stdin for term discovery on windows
* Update ollama.iss
* restore locale patch (#3091)
* token repeat limit for prediction requests (#3080)
* Fix iGPU detection for linux
* add more docs on for the modelfile message command (#3087)
* warn when json format is expected but not mentioned in prompt (#3081)
* Adapt our build for imported server.cpp
* Import server.cpp as of b2356
* refactor readseeker
* Add docs explaining GPU selection env vars
* chore: fix typo (#3073)
* fix gpu_info_cuda.c compile warning (#3077)
* use `-trimpath` when building releases (#3069)
* relay load model errors to the client (#3065)
* Update troubleshooting.md
* update llama.cpp submodule to `ceca1ae` (#3064)
* convert: fix shape
* Avoid rocm runner and dependency clash
* fix `03-locale.diff`
* Harden for deps file being empty (or short)
* Add ollama executable peer dir for rocm
* patch: use default locale in wpm tokenizer (#3034)
* only copy deps for `amd64` in `build_linux.sh`
* Rename ROCm deps file to avoid confusion (#3025)
* add `macapp` to `.dockerignore`
* add `bundle_metal` and `cleanup_metal` funtions to `gen_darwin.sh`
* tidy cleanup logs
* update llama.cpp submodule to `77d1ac7` (#3030)
* disable gpu for certain model architectures and fix divide-by-zero on memory estimation
* Doc how to set up ROCm builds on windows
* Finish unwinding idempotent payload logic
* update llama.cpp submodule to `c2101a2` (#3020)
* separate out `isLocalIP`
* simplify host checks
* add additional allowed hosts
* Update docs `README.md` and table of contents
* add allowed host middleware and remove `workDir` middleware (#3018)
* decode ggla
* convert: fix default shape
* fix: allow importing a model from name reference (#3005)
* update llama.cpp submodule to `6cdabe6` (#2999)
* Update api.md
* Revert "adjust download and upload concurrency based on available bandwidth" (#2995)
* cmd: tighten up env var usage sections (#2962)
* default terminal width, height
* Refined ROCm troubleshooting docs
* Revamp ROCm support
* update go to 1.22 in other places (#2975)
* docs: Add LLM-X to Web Integration section (#2759)
* fix some typos (#2973)
* Convert Safetensors to an Ollama model (#2824)
* Allow setting max vram for workarounds
* cmd: document environment variables for serve command
* Add Odin Runes, a Feature-Rich Java UI for Ollama, to README (#2440)
* Update api.md
* Add NotesOllama to Community Integrations (#2909)
* Added community link for Ollama Copilot (#2582)
* use LimitGroup for uploads
* adjust group limit based on download speed
* add new LimitGroup for dynamic concurrency
* refactor download run
-------------------------------------------------------------------
Wed Mar 06 23:51:28 UTC 2024 - computersemiexpert@outlook.com
- Update to version 0.1.28:
* Fix embeddings load model behavior (#2848)
* Add Community Integration: NextChat (#2780)
* prepend image tags (#2789)
* fix: print usedMemory size right (#2827)
* bump submodule to `87c91c07663b707e831c59ec373b5e665ff9d64a` (#2828)
* Add ollama user to video group
* Add env var so podman will map cuda GPUs
-------------------------------------------------------------------
Tue Feb 27 08:33:15 UTC 2024 - Jan Engelhardt <jengelh@inai.de>
- Edit description, answer _what_ the package is and use nominal
phrase. (https://en.opensuse.org/openSUSE:Package_description_guidelines)
-------------------------------------------------------------------
Fri Feb 23 21:13:53 UTC 2024 - Loren Burkholder <computersemiexpert@outlook.com>
- Added the Ollama package
- Included a systemd service