#Installation bash script to solve the local model problem
52 messages · Page 1 of 1 (latest)
Taking cues from rvm, nvm, and ollama.ai—all of which utilize installation shell scripts to set up dependencies—I propose adopting a similar strategy for OI. This approach is likely to address the majority of dependency-related issues that our users are presently encountering.
For Ollama
curl https://ollama.ai/install.sh | sh
For RVM
\curl -sSL https://get.rvm.io | bash
For NVM
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.5/install.sh | bash
We should be able to use a petals swarm server for local models and side step this entirely
what if the user doesn't want to depend on petals swarm server ?
I
think they mainly want it to actually work
These local installs are killers.
I don't sdee why they care how the code runs so long as it works, and then we call everything through an endpoint
I have no idea of the comparative memory footprints
This would also allow them to have a local cluster of all their machines
The sefrver is one liner
server
would that solve the problem that cuda library version as well ?
I can look into it.
yes
I think so
if the swarm is running it's running
this avoids sooo many issues
and we don't maintina it tee hee
it crashed on my Mac Apple M2 😂
2023-09-28 17:24:23.551 python3[77884:1005009] Error = Error Domain=com.apple.appleneuralengine Code=6 "createProgramInstanceForModel:modelToken:qos:isPreCompiled:enablePowerSaving:skipPreparePhase:statsMask:memoryPoolID:enableLateLatch:modelIdentityStr:owningPid:cacheUrlIdentifier:aotCacheUrlIdentifier:error:: Program load failure (0xF0004)" UserInfo={NSLocalizedDescription=createProgramInstanceForModel:modelToken:qos:isPreCompiled:enablePowerSaving:skipPreparePhase:statsMask:memoryPoolID:enableLateLatch:modelIdentityStr:owningPid:cacheUrlIdentifier:aotCacheUrlIdentifier:error:: Program load failure (0xF0004)}
/AppleInternal/Library/BuildRoots/fe2afe83-06e7-11ee-80c3-f6357a1003e8/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Runtimes/MPSRuntime/Operations/GPURegionOps.mm:548: failed assertion `ANE load failed!'
[1] 77884 abort python3 -m petals.cli.run_server petals-team/StableBeluga2
not as smooth as what we have right now.
I can run local model in OI currently, but this petals swarm server is not working.
smooth until they can't get the libs or cuda installed. Don't know how much of a problem this still is
that's the bash script would solve that.
For Ollama
curl https://ollama.ai/install.sh | sh
it does solve that , if you look into the script
just need to adopt for OI
kind of check the condition and apply the way we already suggest in the issue in github.
Seems to work for intel silicon
python -m petals.cli.run_server codellama/CodeLlama-7b-Instruct-hf --new_swarm --num_blocks 3
Sep 28 12:33:27.809 [INFO] Running Petals 2.2.0
most of the people ask in the channel can solve their problem after follow how we tell them in the channel. the bash script would be a step closer to find why and fix it.
loc("cast"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/fe2afe83-06e7-11ee-80c3-f6357a1003e8/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":745:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x98xi1>'
sure, fewer code layes should be more performant. I suppose it comes down to how well the install scripts work
CPU only
python -m petals.cli.run_server codellama/CodeLlama-7b-Instruct-hf --new_swarm --num_blocks 3
I can see the incremental improvement along the way.
I couldn't use my GPU for this on my old model
No reason not to. If we support petals they could do either way according to their whimsical nature
I can use with LM Studio and OI, so it shoudn't be the computer problem.
Looks nice, too bad I can't run it.