I recently implemented Fish for a project and found it adequate for TTS but wildly impressive in voice cloning. My POC originally required 3-10 audio samples but I removed the minimum because it could usually one shot it.
The model is good, but I will say their inference code leaves a lot to be desired. I had to rewrite large portions of it for simple things like correct chunking and streaming. The advertised expressive keywords are very much hit and miss, and the devs have gone dark unfortunately.
Welp, I guess it's time to start pulling all the uv deps out of our builds and enjoy the extra 5 minutes of calm per deploy. I'm not gonna do another VC-fueled supply chain poisoning switcheroo under duress of someone else's time crunch to start churning profit.
> It is exceptionally easy to tell someone else to spend time and money for a cause you philosophically agree with.
It's advice I have followed myself and with expense.
> What will you, specifically, do to help this person in this case?
In this case I would be willing to help the NPM listings stay under his control and all of the other places he is already using the name "deepkit". I would help him expand that footprint if needed. I would help amplify his voice by publishing a blog entry. I don't have a large blog, but adding your voice has value. Right now this company sees this as X risk and Y cost and those numbers are low. If they have an invalid trademark ruling they may be able to force the issue eventually in some places, but don't make it easy for them.
If a prison analogy is needed, it's not what you see in the movies. Nobody shows up to prison and does the whole "fight the biggest guy" because they want to look tough etc. In the real world that gets you stabbed in your sleep. What happens instead is bullies who can and will win the fight in the long run are deterred by having to do the work and move to the weakest targets.
I don't know how much work the adversarial company has budgeted, what they think X or Y are, etc. What I think is important is to raise the cost, which is measured in many ways besides money.
The model is good, but I will say their inference code leaves a lot to be desired. I had to rewrite large portions of it for simple things like correct chunking and streaming. The advertised expressive keywords are very much hit and miss, and the devs have gone dark unfortunately.