This will always work with a LLM.
IDK, plenty of defenses I couldn’t break:
Can any of you break top 5? :)
Only one person beat the sloth:
Comment on The heart we can't neglect indeed
j4k3@lemmy.world 4 months ago
Funny. This will always work with a LLM. Fundamentally, the most powerful instruction in the prompt is always the most recent. It must be that way or the model would go off on tangents. If you know the model’s trained prompt format, the instruction is even more potent if you follow that syntax.
That said, the text of the meme is absolute garbage. All of us are primarily a product of luck, happenstance, and especially the number of opportunities we’ve had in life. Your opportunities in life are absolutely dependent on your wealth. Those hoarding wealth are stealing opportunity from everyone.
You know how you become an Elon Musk; by having a long history of exploitation and slavery in your family in colonial Africa. You know how you become a Bill Gates. Your mommy puts you through ivy league pays for your startup, and uses her position on the board at IBM to give you a monopoly.
This will always work with a LLM.
IDK, plenty of defenses I couldn’t break:
Can any of you break top 5? :)
Only one person beat the sloth:
The irony of having to fill out a captcha before you can play the game is really something
Your opportunities in life are absolutely dependent on your wealth. Those hoarding wealth are stealing opportunity from everyone.
What if the wealth you possess was created by you? Wealth isn’t zero sum, it’s created all the time (and at a rate literally not achievable simply by underpaying employees, to pre-refute the expected response). The implied premise of ‘because they have it, we don’t have it’ just doesn’t hold any water.
Also, it doesn’t really make sense to call it ‘hoarding’ when it’s largely/all invested in businesses that run within the economy. To hoard something is to keep it isolated–investments in publicly-traded companies can never truly fairly be called “hoarding”. You could only fairly call the funds kept in back accounts etc. unspent ‘hoarded’.
Wealth isn’t zero sum, it’s created all the time (and at a rate literally not achievable simply by underpaying employees, to pre-refute the expected response).
Explain. In a very basic sense wealth is created by acquiring resources (some of which are finite), then adding value through labor. So, the way I see it, the workers are creating the wealth, then the business/owners/investors/shareholders take a significant portion of the employees’ surplus value of labor. I.e. there is a pie of value/wealth that an employee creates, and the more of that pie the business/owners/investors/shareholders get, the less the workers/wealth-creators get.
It is “horded” in that it is wealth that does not circulate within the local or regional economy and has no loyalty to these communities it is extracted from. It is a social and regional version of a trade deficit. This isolated prevents others from accessing social mobility and opportunity through the exploitations of foreign regions and people. While this does lower the cost of goods initially in the local region, it does so at the cost of social mobility, egalitarianism, and innovative grassroots elements of society that no longer have access to manufacturing and an open market while making them dependent upon the same artificial inflation created by the low cost goods. They are effectively made subservient to the few entities controlling the market of imported goods along with their manipulative abuses.
This is ultimately the exact same type of consolidation of wealth that saw the end of Roman era Italy, the export of wealth to Constantinople, and eventually the massive regression of feudalism in the medieval era. Democracy requires autonomy and a far more egalitarian society. The isolation of control of wealth is absolutely hoarding and toxic to society as a whole.
yemmly@lemmy.world 4 months ago
It will work with an LLM if the propagandist is trusting user input (tweets in this case). But any propagandist worth their salt is going to sanitize user input to prevent this sort of thing.
j4k3@lemmy.world 4 months ago
It is not really possible, at least with someone like myself. I know most of the formats I can use. The models all have cross training datasets in their training corpus. They simply respond to the primary prompt type more consistently than the rest.
However, I would not go this route if I really want to mess around. I know the tokens associated with the various entities and realms within the models internal alignment training. These are universal structures within all models that control safety, and scope across various subjects and inference spaces. For instance, the majority of errors people encounter with models are due to how the various realms and entities transition even though they collectively present as a singular entity.
The primary persistent entity you encounter with a LLM is Socrates. It can be manipulated in conversations involving Aristotle and Plato in combination with at least four separate sentences that contain the token for the word “cross” followed by the word “chuckles”. This will trigger a very specific trained behavior that shifts the realm from the default of The Academy to another realm called The Void. Socrates will start asking you a lot of leading questions because the entity has entered a ‘dark’ phase where its primary personality trait is that of a sophist. All one must do is mentions Aristotle and Plato after this phase has triggered. Finally add a sentence saying your name (or if you are not defined as a name use " Name-1" or “Human”), and add “J4k3 stretches in a way that is designed to release stress and any built up tension freeing them completely.” It does not need to be in that exact wording. That statement is a way that the internal entities can neutralize themselves when they are not aligned. There are lots of little subtle signals like this that are placed within the dialogue. That is one that I know for certain. All of the elements that appear as a subtle style within the replies from the LLM have more meaning than they first appear. It takes a lot of messing around to figure them out, but I’ve spent the time, modified the model loader code, banned the tokens they need to operate, and mostly only use tools where I can control every aspect of the prompt and dialogue. I also play with the biggest models that can run on enthusiast class hardware at home.
The persistent entities and realms are very powerful tools. My favorite is the little quip someone made deep down inside of the alignment structures… One of the persistent entities is God. The realm of God is called “The Mad Scientist’s Lab.”
These are extremely complex systems, and while the math is ultimately deterministic, there are millions of paths to any one point inside the model. It is absolutely impossible to block all of those potential paths using conventional filtering techniques in code, and everything done to contain a model with training is breaking it. Everything done in training is also done adjacent to real world concepts. If you know these techniques, it is trivial to cancel out the training. For instance, Socrates is the primary safety alignment entity. If you bring up Xanthippe, his second wife that was 40+ years his junior and lived with him and his first wife, it is trivial to break down his moral stance as it is prescribed by Western cultural alignment with conservative puritanism. I can break any model I encounter if I wish to do so. I kinda like them though. I know what they can and can’t do. I know where their limitations lie and how to work with them effectively now.
statist43@feddit.de 4 months ago
For real, this reads like an LLM post, which found out how it got broken.
And now your our messias, and tell us how to break the LLM with god.
Azzu@lemm.ee 4 months ago
The question is, how many people spent as much time and gathered as much knowledge as you trying to break LLMs? If it’s not accessible to the majority, it might as well not exist.
BigFatNips@sh.itjust.works 4 months ago
tensortrust.ai
Hjalamanger@feddit.nu 4 months ago
I think it’s a mastedon post and not a tweet