No, basically, the requests are processed in batches, together, and the order they're listed in matters for the results, as the grid (tiles) that the GPU is ultimately processing, are different depending on what order they entered at.
So if you want batching + determinism, you need the same batch with the same order which obviously don't work when there are N+1 clients instead of just one.
Small subtle errors that are only exposed at certain execution parts could be one. You might place things differently onto the GPU depending on how large the batch is, if you've found one way to be faster batch_size<1024, but another when batch_size>1024. As number of concurrent incoming requests goes up, you increase batch_size. Just one possibility, guess there could be a multitude of reasons, as it's really hard to reason about until you sit with the data in front of you. vLLM has had bugs with these sort of thing too, so wouldn't surprise me.
No, I'm not sure how that'd make sense. Either you're making the correct (expected) calculations, or you're getting it wrong. Depending the type of wrong or how wrong, could go from "used #2 in attention instead of #1" so "blue" instead of "Blue" or whatever, to completely incoherent text and garbled output.
I accept errors are more likely to decrease "intelligence". But I don't see how increased load, through batching, is any more likely to increase than decrease errors.
apart from the skepticism and anti AI hype. One down side that is pointed out is that AI has the potential to dissuade people from learning. Why go through the pains of learning something when an AI can do it faster and better.
I do get the argument that it is a tool and everyone will have to adapt around it. But at some point it can be extremely demoralizing that a phd project that took months or years can be done by AI in a fraction of the time.
3 minutes is too long for exploratory searches, where I'm not sure what I'm even looking for. And 3 minutes feels too short for deep research which I'm expected to trust some complex result which I either don't know enough about myself (that's why I'm searching for it) or know enough about to the point that AI probably can't do something that I already couldn't within a couple minutes.
I think the sweet spot for AI results is around 10-30 seconds. It's fast enough that I'm willing to wait for the results even if I'm not sure I'm exploring the right topic. And it's also fast enough that even if I knew what to search for, it can give me summarized results faster than I could read on my own.
Can someone explain to me why you would want to do something like in the example of calculating age based on birthdate? Why wouldn't you do that within an app or within code rather than having a database function?
Do we really know why LLMs seem to score the highest with python related coding tasks? I would think there are equally good examples of javascript/c++/java code to train from but I always see python with the highest scores.
I'm genuinely asking this out of curiosity and a bit of naivety — are there as many international students pursuing advanced degrees in China as there are in the U.S.? I don't know the answer and would love to hear from folks who do.
I believe this is actually the second time google has tried to buy this company too. They had to give them a too good to refuse offer.
While it seems like we aren't getting a ton of people who have used the product in the comments. I can tell you it checks a lot of boxes to make people sleep better at night with customer data in the cloud.