#Installation bash script to solve the local model problem

52 messages · Page 1 of 1 (latest)

tender knoll
#

Like other project that depend on the GPU library to be installed correctly before install the OI, a shell script to handle the pre-checking can save many problem that our user is running into with this checking.

summer bough
#

We should be able to use a petals swarm server for local models and side step this entirely

tender knoll
#

what if the user doesn't want to depend on petals swarm server ?

summer bough
#

I

#

think they mainly want it to actually work

#

These local installs are killers.

#

I don't sdee why they care how the code runs so long as it works, and then we call everything through an endpoint

#

I have no idea of the comparative memory footprints

#

This would also allow them to have a local cluster of all their machines

#

The sefrver is one liner

#

server

tender knoll
#

would that solve the problem that cuda library version as well ?

#

I can look into it.

summer bough
#

yes

#

I think so

#

if the swarm is running it's running

#

this avoids sooo many issues

#

and we don't maintina it tee hee

tender knoll
#

it crashed on my Mac Apple M2 😂

summer bough
#

let me try on my apple intel

#

That would be an issue

tender knoll
#

2023-09-28 17:24:23.551 python3[77884:1005009] Error = Error Domain=com.apple.appleneuralengine Code=6 "createProgramInstanceForModel:modelToken:qos:isPreCompiled:enablePowerSaving:skipPreparePhase:statsMask:memoryPoolID:enableLateLatch:modelIdentityStr:owningPid:cacheUrlIdentifier:aotCacheUrlIdentifier:error:: Program load failure (0xF0004)" UserInfo={NSLocalizedDescription=createProgramInstanceForModel:modelToken:qos:isPreCompiled:enablePowerSaving:skipPreparePhase:statsMask:memoryPoolID:enableLateLatch:modelIdentityStr:owningPid:cacheUrlIdentifier:aotCacheUrlIdentifier:error:: Program load failure (0xF0004)}
/AppleInternal/Library/BuildRoots/fe2afe83-06e7-11ee-80c3-f6357a1003e8/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Runtimes/MPSRuntime/Operations/GPURegionOps.mm:548: failed assertion `ANE load failed!'
[1] 77884 abort python3 -m petals.cli.run_server petals-team/StableBeluga2

summer bough
#

--new_swarm

#

That doesn't look like swarm mode

tender knoll
#

not as smooth as what we have right now.

#

I can run local model in OI currently, but this petals swarm server is not working.

summer bough
#

smooth until they can't get the libs or cuda installed. Don't know how much of a problem this still is

tender knoll
#

that's the bash script would solve that.

summer bough
#

yes, I see. That assumes a solution exists.

#

Perhaps that is a good assumption

tender knoll
#

it does solve that , if you look into the script

#

just need to adopt for OI

#

kind of check the condition and apply the way we already suggest in the issue in github.

summer bough
#

Seems to work for intel silicon

python -m petals.cli.run_server codellama/CodeLlama-7b-Instruct-hf --new_swarm --num_blocks 3
Sep 28 12:33:27.809 [INFO] Running Petals 2.2.0
tender knoll
#

most of the people ask in the channel can solve their problem after follow how we tell them in the channel. the bash script would be a step closer to find why and fix it.

#

loc("cast"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/fe2afe83-06e7-11ee-80c3-f6357a1003e8/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":745:0)): error: 'anec.gain_offset_control' op result #0 must be 4D/5D memref of 16-bit float or 8-bit signed integer or 8-bit unsigned integer values, but got 'memref<1x1x1x98xi1>'

summer bough
#

sure, fewer code layes should be more performant. I suppose it comes down to how well the install scripts work

#

CPU only

python -m petals.cli.run_server codellama/CodeLlama-7b-Instruct-hf --new_swarm --num_blocks 3
tender knoll
#

I can see the incremental improvement along the way.

summer bough
#

I couldn't use my GPU for this on my old model

#

No reason not to. If we support petals they could do either way according to their whimsical nature

tender knoll
#

I can use with LM Studio and OI, so it shoudn't be the computer problem.

summer bough
#

options

#

what's LM studio

tender knoll
summer bough
#

Looks nice, too bad I can't run it.

tender knoll
#

@summer bough after using a clean environment, I can install petals now.

#

using petals-team/StableBeluga2 can provide around 5 token/sec, which is not bad.