Comment on My Couples Retreat With 3 AI Chatbots and the Humans Who Love Them
jarfil@beehaw.org 4 weeks agoYou can use local AI as a sort of “private companion”. I have a few smaller versions on my smartphone, they aren’t as great as the online versions, and run slower… but you decide the system prompt (not the company behind it), and they work just fine to bounce ideas.
NotebookLM is a great tool to interact with large amounts of data. You can bet Google is using every interaction to train their LLMs, everything you say is going to be analyzed, classified, and fed as some form of training, hopefully anonymized (…but have you read their privacy policy? I haven’t, “accept”…).
All chatbots are prompted by the company to be somewhat sycophantic so you come back, the cases where they were “too sycophantic”, were just a mistake in dialing it too far. Again, can avoid that with your own system prompt… or at least add an initial prompt in config, if you have the option, to somewhat counteract the company’s prompt.
If you want serendipity, you can ask a chatbot to be more spontaneous and suggest more random things. They’re generally happy to oblige… but the company ones are cut short on anything that could even remotely be considered as “harmful”. That includes NSFW, medical, some chemistry and physics, random hypotheticals, and so on.
Powderhorn@beehaw.org 4 weeks ago
Is that really serendipity, though? There’s a huge gap between asking a predictive model to be spontaneous and actual spontaneity.
Still, I’m curious what you run locally. I have a Pixel 6 Pro, so while it has a Tensor CPU, it wasn’t designed for this use case.
jarfil@beehaw.org 1 week ago
I’m running ollama in termux on a Samsung Galaxy A35 with 8GB of RAM (+8GB of swap, which is useless for AI), and the Ollama app. Models up to 3GB work reasonably fine on just the CPU.
Serendipity is a side effect of the temperature setting. LLMs randomly jump between related concepts, which exposes stuff you might, or might not, have thought about by yourself. It isn’t 100% spontaneous, but on average it ends up working “more than nothing”. Between that and bouncing ideas off it, they have a use.
With 12GB RAM, you might be able to load models up to 7GB or so… but without tensor acceleration, they’ll likely be pretty sluggish. 3GB CoT models already take a while to go through their paces on just the CPU.
TehPers@beehaw.org 4 weeks ago
You can see if a friend can run an inferencing server for you. Maybe someone you know can run Open WebUI or something?