Ok, you know how I've been banging my head against an inscrutable bug setting up for the past week or so?

The fix? There's an on-by-default option in TrainingArguments called remove_unused_columns. It deletes data the model doesn't know about.

Except I'm using Dataset transforms to transform all the columns into the names that the model wants, and those run AFTER the Trainer decides to delete ALL THE DATA in the dataset!

Follow

I'll give it that it tried to warn me about unnamed columns, but it was in the middle of about 40 different other lines of logspew and I had already went back and forth checking, rechecking, and renaming columns to try and get it to work

· · Web · 1 · 0 · 0

Next problem: the CLIP tokenizer is refusing to pad things even though it knows the maximum length and the padding token, so torch.tensor shits its pants

Sign in to participate in the conversation
Pooper by Fantranslation.org

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!