If I jump into the actual set of images it read from, a LOT of them are #maps. Which explains why the from-scratch CLIP trained #PDDiffusion likes to draw maps, but not why the OpenAI CLIP trained one generates pixel nonsense.
Actually, no, it doesn't explain it, because these are all clearly labeled as maps and #CLIP should be able to distinguish between the maps, portraits, and landscapes in the set.
Hey, remember how #PDDiffusion was spitting out nothing but maps?
Well, I retrained on OpenAI CLIP, and now it's spitting out nothing but nonsense. The attached image is supposed to be a "landscape painting of a forest".
Uh... it literally forgot how to #draw anything that isn't a map. The prompt for this was "forest landscape painting". All the training data from the 29k version is still there.
I have to babysit the scraper because the wikitext parsing still hits corner cases and crashes because, say, this CHEEKY FUCKER decided he was going to be painted on 176X
I thought the X years were only invented in 200X
You know big tech has pissed people off when Y Combinator and the Writers Guild are sponsoring the same petition
And also... Type-Moon? Like, the "Visual Novels You Heard About On 4chan In 2006" company?
"painting circa 1800 portrait of a paintress oil on canvas"
So, CLIP isn't broken after all. #PDDiffusion 's label set is so narrow and with so many specific phrases that prompt engineering is hilariously critical to getting anything useful out of it - even with the improved wikitext parser. Descriptions aren't good enough.
Definitely going to have to build a manual labeling tool at some point, because there's entire styles of things in the dataset that you just can't recall right now.
"a guinea pig"
So, #PDDiffusion 's U-Net finished training. I handed it four prompts and it actually spat out vaguely-related images.
- "1800s portrait" - It doesn't quite understand figures yet, but it at least knows chiaroscuro.
- "a historical portrait of king george" - This looks like an illustration of a savannah or grasslands
- "landscape painting of a forest" - A blue cave. Is this the 9th circle of hell?
- "a guinea pig" - A map. I guess it assumes I wanted the country of Guinea?
Actually these aren't that long. Why the heck can't I clone this?
Of course, it runs slow as heck and it has all sorts of fun graphical glitches
Administrator of fantranslation.org and maintainer of various Free Software projects and at least one ROM hack. All opinions are solely my own.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!