You can see if a friend can run an inferencing server for you. Maybe someone you know can run Open WebUI or something?
Comment on My Couples Retreat With 3 AI Chatbots and the Humans Who Love Them
Powderhorn@beehaw.org 4 weeks agoIs that really serendipity, though? There’s a huge gap between asking a predictive model to be spontaneous and actual spontaneity.
Still, I’m curious what you run locally. I have a Pixel 6 Pro, so while it has a Tensor CPU, it wasn’t designed for this use case.
TehPers@beehaw.org 4 weeks ago
jarfil@beehaw.org 1 week ago
I’m running ollama in termux on a Samsung Galaxy A35 with 8GB of RAM (+8GB of swap, which is useless for AI), and the Ollama app. Models up to 3GB work reasonably fine on just the CPU.
Serendipity is a side effect of the temperature setting. LLMs randomly jump between related concepts, which exposes stuff you might, or might not, have thought about by yourself. It isn’t 100% spontaneous, but on average it ends up working “more than nothing”. Between that and bouncing ideas off it, they have a use.
With 12GB RAM, you might be able to load models up to 7GB or so… but without tensor acceleration, they’ll likely be pretty sluggish. 3GB CoT models already take a while to go through their paces on just the CPU.