Thinking about scaling up #PDDiffusion ...
The current size of my scraped subset of Wikimedia Commons images is around 33GB. It will get way larger. We already know from my escapades with #Mastodon #devops that local storage is hella expensive in AWS but S3 is super cheap.
My escapades with Paparouna CI have also told me that spot pricing for EC2 is hella cheap.
But these won't jive well - having all the data on S3 means idle time as the instances grab pieces of the dataset on startup.