Oh no, I forgot to save the image preprocessor config when training vocabulary
No matter, we're just using the defaults from CLIPFeatureExtractor, I can just copy the preprocessor_config.json from OpenAI CLIP (they're the same, and uncopyrightable)
...Oh no, it's not actually trying to read the file, is it? #PDDiffusion
And now the U-Net training loop is choking because CLIP wants everything on the CPU for some reason...
Might as well just move the CLIP step into dataset loading at this point