Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been modifying an LSTM GAN model to use a transformer in the encoder and it seems to do much worse. Or at least the training dynamics are very different. Transformers perform great when they work but it seems to be a lot harder to get them to work in my experience. Can anyone corroborate that or is it likely that I'm doing something fundamentally wrong? To be clear I'm not implementing it myself but using PyTorch's Transformer classes as drop-in replacements for my LSTM-based encoder and decoder. Been trying lots of variations in the hyperparameters and position encoding methods etc but it always either doesn't train at all (generator/discriminator divergence) or it produces blurry images. (The "prenet" and "postnet" remain the same as my reference model so I find this surprising.). Really frustrating when all the latest results say that this should work amazingly well.

Tons of articles like this on "how transformers work", very few on "tips for getting transformers to work in practice."



I'm mostly working on fairly simple image segmentation tasks but in my experience just replacing convolutional layers with attention layers + position embeddings works well. Using convolutional embeddings before the transformer encoder also helps.

It still take a lot more epochs to train though, so you might have to decrease the learning rate of your discriminator by a lot.


I admit I do run out of patience when it's been running for quite a while and seems to be really far behind the equivalent number of iterations for my LSTM solution. I often stop and adjust things and try again, when maybe it just needs to run longer. I will try that, thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: