Oh no, I forgot to save the image preprocessor config when training vocabulary

No matter, we're just using the defaults from CLIPFeatureExtractor, I can just copy the preprocessor_config.json from OpenAI CLIP (they're the same, and uncopyrightable)

...Oh no, it's not actually trying to read the file, is it?

Nevermind, it turns out the unet part of PDDiffusion has an os.path.chdir("output") right at the start that throws everything off

And now the U-Net training loop is choking because CLIP wants everything on the CPU for some reason...

Might as well just move the CLIP step into dataset loading at this point

