ollama/ollama.changes

344 lines
14 KiB
Plaintext

-------------------------------------------------------------------
Wed May 22 18:05:30 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Added 15.6 build
-------------------------------------------------------------------
Thu May 16 19:55:51 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.1.38:
* New model: Falcon 2: A new 11B parameters causal decoder-only
model built by TII and trained over 5T tokens.
* New model: Yi 1.5: A new high-performing version of Yi, now
licensed as Apache 2.0. Available in 6B, 9B and 34B sizes.
* Added ollama ps command
* Added /clear command
* Fixed issue where switching loaded models on Windows would take
several seconds
* Running /save will no longer abort the chat session if an
incorrect name is provided
* The /api/tags API endpoint will now correctly return an empty
list [] instead of null if no models are provided
-------------------------------------------------------------------
Sun May 12 19:05:53 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.1.37:
* Fixed issue where models with uppercase characters in the name
would not show with ollama list
* Fixed usage string for ollama create
* Fix finish_reason being "" instead of null in the Open-AI
compatible chat API.
-------------------------------------------------------------------
Sun May 12 15:20:28 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Use obs_scm service instead of the deprecated tar_scm
- Use zstd for vendor tarball compression
-------------------------------------------------------------------
Sun May 12 01:39:26 UTC 2024 - Eyad Issa <eyadlorenzo@gmail.com>
- Update to version 0.1.36:
* Fixed exit status 0xc0000005 error with AMD graphics cards on Windows
* Fixed rare out of memory errors when loading a model to run with CPU
- Update to version 0.1.35:
* New models: Llama 3 ChatQA: A model from NVIDIA based on Llama
3 that excels at conversational question answering (QA) and
retrieval-augmented generation (RAG).
* Quantization: ollama create can now quantize models when
importing them using the --quantize or -q flag
* Fixed issue where inference subprocesses wouldn't be cleaned up
on shutdown.
* Fixed a series out of memory errors when loading models on
multi-GPU systems
* Ctrl+J characters will now properly add newlines in ollama run
* Fixed issues when running ollama show for vision models
* OPTIONS requests to the Ollama API will no longer result in
errors
* Fixed issue where partially downloaded files wouldn't be
cleaned up
* Added a new done_reason field in responses describing why
generation stopped responding
* Ollama will now more accurately estimate how much memory
is available on multi-GPU systems especially when running
different models one after another
- Update to version 0.1.34:
* New model: Llava Llama 3
* New model: Llava Phi 3
* New model: StarCoder2 15B Instruct
* New model: CodeGemma 1.1
* New model: StableLM2 12B
* New model: Moondream 2
* Fixed issues with LLaVa models where they would respond
incorrectly after the first request
* Fixed out of memory errors when running large models such as
Llama 3 70B
* Fixed various issues with Nvidia GPU discovery on Linux and
Windows
* Fixed a series of Modelfile errors when running ollama create
* Fixed no slots available error that occurred when cancelling a
request and then sending follow up requests
* Improved AMD GPU detection on Fedora
* Improved reliability when using the experimental
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED flags
* ollama serve will now shut down quickly, even if a model is
loading
- Update to version 0.1.33:
* New model: Llama 3
* New model: Phi 3 Mini
* New model: Moondream
* New model: Llama 3 Gradient 1048K
* New model: Dolphin Llama 3
* New model: Qwen 110B
* Fixed issues where the model would not terminate, causing the
API to hang.
* Fixed a series of out of memory errors on Apple Silicon Macs
* Fixed out of memory errors when running Mixtral architecture
models
* Aded experimental concurrency features:
~ OLLAMA_NUM_PARALLEL: Handle multiple requests simultaneously
for a single model
~ OLLAMA_MAX_LOADED_MODELS: Load multiple models simultaneously
-------------------------------------------------------------------
Tue Apr 23 02:26:34 UTC 2024 - rrahl0@disroot.org
- Update to version 0.1.32:
* scale graph based on gpu count
* Support unicode characters in model path (#3681)
* darwin: no partial offloading if required memory greater than system
* update llama.cpp submodule to `7593639` (#3665)
* fix padding in decode
* Revert "cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)" (#3662)
* Added Solar example at README.md (#3610)
* Update langchainjs.md (#2030)
* Added MindsDB information (#3595)
* examples: add more Go examples using the API (#3599)
* Update modelfile.md
* Add llama2 / torch models for `ollama create` (#3607)
* Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653)
* app: gracefully shut down `ollama serve` on windows (#3641)
* types/model: add path helpers (#3619)
* update llama.cpp submodule to `4bd0f93` (#3627)
* types/model: make ParseName variants less confusing (#3617)
* types/model: remove (*Digest).Scan and Digest.Value (#3605)
* Fix rocm deps with new subprocess paths
* mixtral mem
* Revert "types/model: remove (*Digest).Scan and Digest.Value (#3589)"
* types/model: remove (*Digest).Scan and Digest.Value (#3589)
* types/model: remove DisplayLong (#3587)
* types/model: remove MarshalText/UnmarshalText from Digest (#3586)
* types/model: init with Name and Digest types (#3541)
* server: provide helpful workaround hint when stalling on pull (#3584)
* partial offloading
* refactor tensor query
* api: start adding documentation to package api (#2878)
* examples: start adding Go examples using api/ (#2879)
* Handle very slow model loads
* fix: rope
* Revert "build.go: introduce a friendlier way to build Ollama (#3548)" (#3564)
* build.go: introduce a friendlier way to build Ollama (#3548)
* update llama.cpp submodule to `1b67731` (#3561)
* ci: use go-version-file
* Correct directory reference in macapp/README (#3555)
* cgo quantize
* no blob create if already exists
* update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528)
* Docs: Remove wrong parameter for Chat Completion (#3515)
* no rope parameters
* add command-r graph estimate
* Fail fast if mingw missing on windows
* use an older version of the mac os sdk in release (#3484)
* Add test case for context exhaustion
* CI missing archive
* fix dll compress in windows building
* CI subprocess path fix
* Fix CI release glitches
* update graph size estimate
* Fix macOS builds on older SDKs (#3467)
* cmd: provide feedback if OLLAMA_MODELS is set on non-serve command (#3470)
* feat: add OLLAMA_DEBUG in ollama server help message (#3461)
* Revert options as a ref in the server
* default head_kv to 1
* fix metal gpu
* Bump to b2581
* Refined min memory from testing
* Release gpu discovery library after use
* Safeguard for noexec
* Detect too-old cuda driver
* Integration test improvements
* Apply 01-cache.diff
* Switch back to subprocessing for llama.cpp
* Simplify model conversion (#3422)
* fix generate output
* update memory calcualtions
* refactor model parsing
* Add chromem-go to community integrations (#3437)
* Update README.md (#3436)
* Community Integration: CRAG Ollama Chat (#3423)
* Update README.md (#3378)
* Community Integration: ChatOllama (#3400)
* Update 90_bug_report.yml
* Add gemma safetensors conversion (#3250)
* CI automation for tagging latest images
* Bump ROCm to 6.0.2 patch release
* CI windows gpu builds
* Update troubleshooting link
* fix: trim quotes on OLLAMA_ORIGINS
- add set_version to automatically switch over to the newer version
-------------------------------------------------------------------
Tue Apr 16 10:52:25 UTC 2024 - bwiedemann@suse.com
- Update to version 0.1.31:
* Backport MacOS SDK fix from main
* Apply 01-cache.diff
* fix: workflows
* stub stub
* mangle arch
* only generate on changes to llm subdirectory
* only generate cuda/rocm when changes to llm detected
* Detect arrow keys on windows (#3363)
* add license in file header for vendored llama.cpp code (#3351)
* remove need for `$VSINSTALLDIR` since build will fail if `ninja` cannot be found (#3350)
* change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347)
* malformed markdown link (#3358)
* Switch runner for final release job
* Use Rocky Linux Vault to get GCC 10.2 installed
* Revert "Switch arm cuda base image to centos 7"
* Switch arm cuda base image to centos 7
* Bump llama.cpp to b2527
* Fix ROCm link in `development.md`
* adds ooo to community integrations (#1623)
* Add cliobot to ollama supported list (#1873)
* Add Dify.AI to community integrations (#1944)
* enh: add ollero.nvim to community applications (#1905)
* Add typechat-cli to Terminal apps (#2428)
* add new Web & Desktop link in readme for alpaca webui (#2881)
* Add LibreChat to Web & Desktop Apps (#2918)
* Add Community Integration: OllamaGUI (#2927)
* Add Community Integration: OpenAOE (#2946)
* Add Saddle (#3178)
* tlm added to README.md terminal section. (#3274)
* Update README.md (#3288)
* Update README.md (#3338)
* Integration tests conditionally pull
* add support for libcudart.so for CUDA devices (adds Jetson support)
* llm: prevent race appending to slice (#3320)
* Bump llama.cpp to b2510
* Add Testcontainers into Libraries section (#3291)
* Revamp go based integration tests
* rename `.gitattributes`
* Bump llama.cpp to b2474
* Add docs for GPU selection and nvidia uvm workaround
* doc: faq gpu compatibility (#3142)
* Update faq.md
* Better tmpdir cleanup
* Update faq.md
* update `faq.md`
* dyn global
* llama: remove server static assets (#3174)
* add `llm/ext_server` directory to `linguist-vendored` (#3173)
* Add Radeon gfx940-942 GPU support
* Wire up more complete CI for releases
* llm,readline: use errors.Is instead of simple == check (#3161)
* server: replace blob prefix separator from ':' to '-' (#3146)
* Add ROCm support to linux install script (#2966)
* .github: fix model and feature request yml (#3155)
* .github: add issue templates (#3143)
* fix: clip memory leak
* Update README.md
* add `OLLAMA_KEEP_ALIVE` to environment variable docs for `ollama serve` (#3127)
* Default Keep Alive environment variable (#3094)
* Use stdin for term discovery on windows
* Update ollama.iss
* restore locale patch (#3091)
* token repeat limit for prediction requests (#3080)
* Fix iGPU detection for linux
* add more docs on for the modelfile message command (#3087)
* warn when json format is expected but not mentioned in prompt (#3081)
* Adapt our build for imported server.cpp
* Import server.cpp as of b2356
* refactor readseeker
* Add docs explaining GPU selection env vars
* chore: fix typo (#3073)
* fix gpu_info_cuda.c compile warning (#3077)
* use `-trimpath` when building releases (#3069)
* relay load model errors to the client (#3065)
* Update troubleshooting.md
* update llama.cpp submodule to `ceca1ae` (#3064)
* convert: fix shape
* Avoid rocm runner and dependency clash
* fix `03-locale.diff`
* Harden for deps file being empty (or short)
* Add ollama executable peer dir for rocm
* patch: use default locale in wpm tokenizer (#3034)
* only copy deps for `amd64` in `build_linux.sh`
* Rename ROCm deps file to avoid confusion (#3025)
* add `macapp` to `.dockerignore`
* add `bundle_metal` and `cleanup_metal` funtions to `gen_darwin.sh`
* tidy cleanup logs
* update llama.cpp submodule to `77d1ac7` (#3030)
* disable gpu for certain model architectures and fix divide-by-zero on memory estimation
* Doc how to set up ROCm builds on windows
* Finish unwinding idempotent payload logic
* update llama.cpp submodule to `c2101a2` (#3020)
* separate out `isLocalIP`
* simplify host checks
* add additional allowed hosts
* Update docs `README.md` and table of contents
* add allowed host middleware and remove `workDir` middleware (#3018)
* decode ggla
* convert: fix default shape
* fix: allow importing a model from name reference (#3005)
* update llama.cpp submodule to `6cdabe6` (#2999)
* Update api.md
* Revert "adjust download and upload concurrency based on available bandwidth" (#2995)
* cmd: tighten up env var usage sections (#2962)
* default terminal width, height
* Refined ROCm troubleshooting docs
* Revamp ROCm support
* update go to 1.22 in other places (#2975)
* docs: Add LLM-X to Web Integration section (#2759)
* fix some typos (#2973)
* Convert Safetensors to an Ollama model (#2824)
* Allow setting max vram for workarounds
* cmd: document environment variables for serve command
* Add Odin Runes, a Feature-Rich Java UI for Ollama, to README (#2440)
* Update api.md
* Add NotesOllama to Community Integrations (#2909)
* Added community link for Ollama Copilot (#2582)
* use LimitGroup for uploads
* adjust group limit based on download speed
* add new LimitGroup for dynamic concurrency
* refactor download run
-------------------------------------------------------------------
Wed Mar 06 23:51:28 UTC 2024 - computersemiexpert@outlook.com
- Update to version 0.1.28:
* Fix embeddings load model behavior (#2848)
* Add Community Integration: NextChat (#2780)
* prepend image tags (#2789)
* fix: print usedMemory size right (#2827)
* bump submodule to `87c91c07663b707e831c59ec373b5e665ff9d64a` (#2828)
* Add ollama user to video group
* Add env var so podman will map cuda GPUs
-------------------------------------------------------------------
Tue Feb 27 08:33:15 UTC 2024 - Jan Engelhardt <jengelh@inai.de>
- Edit description, answer _what_ the package is and use nominal
phrase. (https://en.opensuse.org/openSUSE:Package_description_guidelines)
-------------------------------------------------------------------
Fri Feb 23 21:13:53 UTC 2024 - Loren Burkholder <computersemiexpert@outlook.com>
- Added the Ollama package
- Included a systemd service