The result of asking on The Orange Site™️ why Stable Diffusion has no train-from-scratch example code (related to #PDDiffusion ):
- One guy excited that I was talking about actually using licensed images
- A handful of people telling me to just fine-tune because I'll never be able to afford to scale up
- The usual smoldering tire fire of arguments between people who hate Copilot, people who hate copyright, and people who don't understand how diffusion models work
That last one is the thing that really bugs me.
Art generators do not "take references" in the same way a human does. You have a model that does noise detection (U-Net), and a pair of models that does image classification (CLIP).
What the generator does is, it takes a starting image, asks CLIP how close it is to the guidance, and then the U-Net uses that information to condition its noise prediction.
This is entirely a hack that just so happens to work for """drawing""" an image.
@kmeisthax Indeed it’s not like the conscious inspiration of having a reference. But IMO there is a valid analogy to how the brain unconsciously builds up a concept of ‘what an <insert object> looks like’, based on every time it’s seen that object, in art or (unlike models) in real life.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!