Because it’s difficult to fit to a world. You need a pretty good GPU of which a lot of the memory will be take up by the LLM running locally. That means you basically can not use it while also having other fancy graphics at the same time. So you would basically have a not so demanding looking game with high GPU requirements.
Also it’s quite difficult to steer the NPCs to be consistent. In my free time I’m working on a small project right now to have a game centered around llm NPCs, but it’s a lot of work to steer them to be consistent with the world you place them in. Because they always go with a “yes and” approach, so it’s easy to end up in a situation where they make up things that contradict the reality of the game.
Coelacanth@feddit.nu 21 hours ago
I’m actually also working on a project using LLMs to talk to NPCs. Though this one doesn’t use local models but online models called through a proxy using API keys, which lets you use much larger and better models.
But yeah it’s been interesting digging deep into the exact and precise construction of the prompts to get the NPCs talking and behaving exactly like you want them, and be as real and lifelike as possible.
ByteSorcerer@beehaw.org 4 hours ago
I’ve also experimented with this. In my experience, getting the NPCs to behave the way you want with just a prompt is hard and inconsistent, and quickly falls apart when the conversation gets longer.
I’ve gotten much better results by starting from a small model and fine-tuning it on lore-accurate conversations (you can use your conversations with larger models as training materials for that). In theory you can improve it further with RLHF, but I haven’t tried that myself yet.
The downside of this is of course that you’re limited to open-weight models for which you have enough compute resources available to fine-tune them. If you don’t have a good GPU then the free Google Collab sessions can give you access to a GPU with 15GB of VRAM. The free version has a daily limit on GPU time though so set up your training code to regularly save checkpoints so that you can continue the training on another day if you run out. Using LoRa instead of doing a full fine-tune can also reduce the memory and computational resources required for the fine-tune (or in other words, allows you to use a larger and better model with your available resources).
Coelacanth@feddit.nu 3 hours ago
Well, what I’m working on is a mod for STALKER Anomaly, and most large models already seem to have good enough awareness of the STALKER games setting. I can imagine it’s a much bigger challenge if you’re making your own game set in your own unique world. I still need to have some minor game information inserted into the prompt, but only like a paragraph detailing some important game mechanics.
Getting longer term interactions to work right is actually what I’ve been working on the last few weeks, implementing a long-term memory for game characters using LLM calls to condense raw events into summaries that can be fed back into future prompts to retain context. The basics of this system was actually already in place created by the original mod author, I just expanded it into a true full on hierarchical memory system with long- and mid-term memories.
But it turns out creating and refining the LLM prompts for memory management is harder than implementing the memory function itself!