If I jump into the actual set of images it read from, a LOT of them are #maps. Which explains why the from-scratch CLIP trained #PDDiffusion likes to draw maps, but not why the OpenAI CLIP trained one generates pixel nonsense.
Actually, no, it doesn't explain it, because these are all clearly labeled as maps and #CLIP should be able to distinguish between the maps, portraits, and landscapes in the set.
Hey, remember how #PDDiffusion was spitting out nothing but maps?
Well, I retrained on OpenAI CLIP, and now it's spitting out nothing but nonsense. The attached image is supposed to be a "landscape painting of a forest".
#PDDiffusion 90k finished #training today, and the results are...
Uh... it literally forgot how to #draw anything that isn't a map. The prompt for this was "forest landscape painting". All the training data from the 29k version is still there.
I'm retraining with #OpenAI #CLIP instead of my own from-scratch model to try and narrow down the cause of this model forgetting literally everything. #ai #aiart
I decided to screw the VAE training for now and just start scraping images again #PDDiffusion #aiart
I have to babysit the scraper because the wikitext parsing still hits corner cases and crashes because, say, this CHEEKY FUCKER decided he was going to be painted on 176X
https://commons.wikimedia.org/wiki/File:Aleksy_Bobry%C5%84ski.jpeg
I thought the X years were only invented in 200X
Neural networks are exceptionally good at pattern recognition. The process of training involves connecting input to output neurons, backpropagating error, and then calculating the loss function.
You know big tech has pissed people off when Y Combinator and the Writers Guild are sponsoring the same petition
And also... Type-Moon? Like, the "Visual Novels You Heard About On 4chan In 2006" company?
"painting circa 1800 portrait of a paintress oil on canvas"
So, CLIP isn't broken after all. #PDDiffusion 's label set is so narrow and with so many specific phrases that prompt engineering is hilariously critical to getting anything useful out of it - even with the improved wikitext parser. Descriptions aren't good enough.
Definitely going to have to build a manual labeling tool at some point, because there's entire styles of things in the dataset that you just can't recall right now.
"a dense forest covered in fog and mountains"
Yes, this is exactly what I wanted, #PDDiffusion . #aiart #derplearning
So, #PDDiffusion 's U-Net finished training. I handed it four prompts and it actually spat out vaguely-related images.
- "1800s portrait" - It doesn't quite understand figures yet, but it at least knows chiaroscuro.
- "a historical portrait of king george" - This looks like an illustration of a savannah or grasslands
- "landscape painting of a forest" - A blue cave. Is this the 9th circle of hell?
- "a guinea pig" - A map. I guess it assumes I wanted the country of Guinea?
Administrator of fantranslation.org and maintainer of various Free Software projects and at least one ROM hack. All opinions are solely my own.