Comment

Is that really serendipity, though? There’s a huge gap between asking a predictive model to be spontaneous and actual spontaneity.

Still, I’m curious what you run locally. I have a Pixel 6 Pro, so while it has a Tensor CPU, it wasn’t designed for this use case.

Sort:hotnew top

jarfil@beehaw.org ⁨3⁩ ⁨months⁩ ago
I’m running ollama in termux on a Samsung Galaxy A35 with 8GB of RAM (+8GB of swap, which is useless for AI), and the Ollama app. Models up to 3GB work reasonably fine on just the CPU.

Serendipity is a side effect of the temperature setting. LLMs randomly jump between related concepts, which exposes stuff you might, or might not, have thought about by yourself. It isn’t 100% spontaneous, but on average it ends up working “more than nothing”. Between that and bouncing ideas off it, they have a use.

With 12GB RAM, you might be able to load models up to 7GB or so… but without tensor acceleration, they’ll likely be pretty sluggish. 3GB CoT models already take a while to go through their paces on just the CPU.

source
TehPers@beehaw.org ⁨4⁩ ⁨months⁩ ago
You can see if a friend can run an inferencing server for you. Maybe someone you know can run Open WebUI or something?

source