Comment

Comment on OpenAI says it’s “impossible” to create useful AI models without copyrighted material

it doesn’t know what apples mean on a semiotic level

Interestingly, LLMs seem to show emerging semiotic organization. By analyzing the activation space of the neural network, related concepts seem to get trained into similar activation patterns, which is what allows LLMs to zero shot relationships when executed at a “temperature” (randomness level) in the right range.

Pairing an LLM with a stable diffusion model, allows the resulting AI to… well, judge by yourself: llm-grounded-diffusion.github.io

source

Sort:hotnew top

frog@beehaw.org ⁨1⁩ ⁨year⁩ ago
I’m unconvinced that the fact they’re getting better at following instructions, like putting objects where the prompter specifies, or changing the colour, or putting the right number of them, etc means the model actually understands what the objects mean beyond their appearance. It doesn’t understand the cultural meanings attached to each object, and thus is unable to truly make a decision about why it should place an apple rather than an orange, or how the message within the picture changes when it’s a red sports car rather than a beige people-carrier.

source
- jarfil@beehaw.org ⁨1⁩ ⁨year⁩ ago
  
  how the message within the picture changes when it’s a red sports car rather than a beige people-carrier.
  
  Well, that’s part of the LLM step, so let’s put it to the test:
  
  Image prompt:
  
  Create an ad for a car. The target audience are young adults, with high income, and thrill seeking. Come up with a brand name fitting such a car, and include it in the image. For the car, come up with a shape, and color, best fitting of the target audience. Come up with an image layout, art style, and camera angle, best fitting of the target audience. Include between zero and two additional items that will make the overall picture more attractive to the target audience.
  
  ChatGPT:
  
  [(‘an advertisement for a car’, [45, 58, 422, 396]), (‘a high-performance car’, [123, 193, 266, 128]), (‘an edgy and dynamic brand name logo’, [188, 10, 136, 30]), (‘a sleek, red sports car’, [168, 254, 176, 102])] Background prompt: An adrenaline-pumping car advertisement targeting young adults with high income and a thrill-seeking spirit. The layout includes a bold brand name logo, a sleek red sports car, and a dynamic composition to captivate the audience. Negative prompt: additional items
  
  How did it know to pick a “sleek red sports car”? Or the rest of the elements.
  
  source
  - frog@beehaw.org ⁨1⁩ ⁨year⁩ ago
    Because some of its training data included some of the many, many websites out there that describe marketing techniques. However, your example has actually proved my point - the red sports car is a car for insecure middle-aged men needing a mid-life crisis penis extension. The LLM has entirely missed that cultural association, and has basically suggested a red sports car for a young audience, when an alternate colour would actually be more appropriate - because it doesn’t actually understand what a red sports car means.
    
    source