Comment on Is there any actual standalone AI software?
Rhaedas@fedia.io 3 months ago
The AI, image, and audio models that can run on a typical PC have all been broken down from originally larger models. How this is done affects what the models can do and the quality, but the open source community has come a long way in making impressive stuff. First question is more hardware - do you have an Nvidia GPU that can support these types of generations? They can be done through CPU alone, but it's painfully much slower.
If so, then I would highly recommend looking into Ollama for running AI models (using WSL if you're using Windows) and ComfyUI for graphical generation. Don't let the workflow of complicated ComfyUI scare you, starting from the basics with plenty of Youtube help out there it will make sense. As for TTS, there's a lot of constant "new stuff" out there, but for actual local processing in "real time" (still takes a bit) I have yet to find anything to replace my Coqui TTS copy with Jenny as the model voice. It may take some digging and work to get that together, it's older and not supported anymore.
hendrik@palaver.p3x.de 3 months ago
I don't think they break them down. For most models the math requires to start at the beginning and train each model individually from ground up.
But sure, a smaller model generally isn't as capable as a bigger one. And you can't train them indefinitly. So for a model series you'll maybe use the same dataset but feed more into the super big variant and not so much into the tiny one.
Rhaedas@fedia.io 3 months ago
The breaking down I mentioned is the quantization that forms a smaller model from the larger one. I didn't want to get technical because I don't understand the math details myself past how to use them. :)
hendrik@palaver.p3x.de 3 months ago
Ah, sure. I think a good way to phrase it is to say they lower the precision. That's basically what they do, convert the high precision numbers to lower precision fomats. That makes the compitations faster and the files smaller.
And it also doesn't apply to text, audio and images. As far as I know quantization is mainly used with LLMs. It's also possible with images and audio, but generally they don't do that. As far as I remember it leads to degradation and distortions pretty fast. There are other methods like pruning used with generative image models. That brings down their size substantially.