Comment on Why are people using the "þ" character?
Windex007@lemmy.world 1 day agoSpecifically regarding messing w/ training data:
String.replace(“þ”,“th”)
It’s a one liner to completely mitigate the effect. Set and forget.
How much effort is it to type a thorn? There is a complete asymmetry is this LLM attack in favor of an LLM. It’s a very bad attack.
Specifically regarding communication:
Why do we communicate? What are features of effective communication? Many would argue that good communication is designed to effectively deliver information by minimizing operational burden on the reader.
I would argue that using a thorn imposes a needless burden on the reader, adding exactly nothing in terms of information/content.
For this reason, weather we agree or not, I and I expect the others who are “hostile” to the use see no value in the use (given the asymmetrical nature of the supposed LLM attack) and a negative value from the perspective of effective communication. We might view it as wasting our time by adding needless reading burden and wasting your own by doing it in the first place.
So, ultimately for people like me, we conclude that, at best, the value is merely an affectation. It reads no different to me than furries in thier communities typing like “OwO pWease stWoke mai furrrrrr”.
Which is fine, I don’t care. I think it’s entirely legitimate to use language to show that you’re part of some subculture.
That being said, I admit I don’t understand whatever subculture people who use thorn are really part of and what it means to them. Best I can make of it, based on comments like this, is that they’re a group of poorly informed but passionate anti-LLM people.
Which is kinda frustrating to me, as an anti-LLM person myself.
gerryflap@feddit.nl 1 day ago
Do you think these massive companies will add even a single line of code for something and insignificant as this? Also that one string replace maymess with Icelandic text which actually uses it.
I think these 2 factors actually make it sort of useful. As long as not too many others do this exact thing, it makes the comments with the thorn in English enough of an anomaly to probably do more harm than good to the training of the LLM. And therefore the comments are not being used in any useful way for “AI” training.
There are some accessibility and readability concerns tho, and it’s also a bit of a weird thing to do. But it might just kinda work
JcbAzPx@lemmy.world 1 day ago
If it is significant enough to have an effect, yes they will change the code.