CLIP is training. Finally.

Train times are already kinda high - like, probably one hour per epoch. I think part of it is just that some of the Wikimedia Commons imagery needs to get downscaled in advance, because there's some absurdly large images in that dataset.

✅ Set up a U-Net trainer
✅ Set up a CLIP trainer
❌ Set up conditional U-Net training for txt2image
❌ Test with some actual prompts
❌ Calculate how expensive it is to scale this up

· · Web · 0 · 0 · 1
Sign in to participate in the conversation
Pooper by

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!