A lot of code has multiple FOSS licenses that are not contaminating like GPL. GPL violations do occur on code, but have nothing to do with the training Data.
For example, many academic data sets are not public domain, and can't be used in a commercial context. A GPL claim on that data is often an argument of which thief showed up first.
Rule #24: A lawyers Strategic Truth is to never lie, but also avoid voluntarily disclosing information that may help opponents.
Thus, a business will never disclose they paid a fool to break laws for them... =3