More

bradhilton · 2025-12-03T14:36:05 1764772565

The problem is that they have a lot of time to report their purchases. If they were required to report before they purchased the problem would probably resolve itself.

bradhilton · 2025-12-03T14:33:08 1764772388

The SP500 is probably the most popular investment in America, perhaps aside from housing. Wouldn't hurt to have lawmaker's fortunes broadly aligned versus narrowly aligned with specific corporations.

Mistletoe · 2025-12-03T14:42:12 1764772932

That can go very poorly like in 1929, 1965, 2000, etc. and it is going to go poorly again and show everyone why it is a bad idea.

https://www.currentmarketvaluation.com/models/s&p500-mean-re...

throwaway2037 · 2025-12-07T12:19:12 1765109952

Whenever the S&P 500 craters, it is the best time to buy. Warren Buffet talks about it all the time: "Don't bet against America." There is no other large rich nation on Earth with such a dynamic economy. That (partly) explains why the US can bounce back so quickly (compared to other countries) from devastating economic downturns.

bradhilton · 2025-05-01T11:01:38 1746097298

Awesome! If you run into any problems or have questions feel free to open an issue or drop by the discord [1] server.

[1] https://discord.gg/zbBHRUpwf4

bradhilton · 2025-04-30T19:03:56 1746039836

Hi, we don't have reliable documentation for the HTTP API endpoints yet, mostly as they are still subject to change.

However, to briefly provide some context, `/_train_model` returns a stream of line delimited JSON objects for each gradient step as the model trains on the provided trajectories so the client can monitor progress. The final version of this endpoint may provide the option for both streaming & non-streaming responses, and/or potentially return a "training job" that can be polled instead.

bradhilton · 2025-04-30T18:08:49 1746036529

Contributor here, we developed the Agent Reinforcement Trainer (ART) library to make it easy to train LLMs for anything.

No callbacks or straitjacket flows. Instead we serve an OpenAI API-compatible endpoint that you can use as a drop-in replacement for any proprietary APIs you may be hitting.

After collecting responses from the inference API, you can tune the model with your own custom rewards and repeat the process as long as you like, until performance converges. We believe this level of flexibility will make it easier for you to train state-of-the-art models for your own use cases, much like Kyle's new email agent[1].

Also happy to answer any questions you have about the framework.

[1] https://openpipe.ai/blog/art-e-mail-agent

bradhilton · 2025-04-29T18:13:14 1745950394

I could see training your own email agent being beneficial for products like this:

https://x.com/advaitpaliwal/status/1913290027897131084

bradhilton · 2025-04-05T19:09:18 1743880158

I know Google DeepMind ran experiments with 10M a while ago, but I think this will be the first legit, released 10M context window model.

bradhilton · 2025-03-07T14:04:07 1741356247

We used about 58 hours on 4xH100s and about 19 hours on 8xH100s to get the very best result with the 32B model. We trained for about another 16 hours before finishing the run, but we could have stopped earlier after it was apparent the model was regressing. Actual dollar costs are provider dependent.

bradhilton · 2025-03-07T04:42:28 1741322548

Well, in this case there is a much more straightforward method with the same CP-SAT solver used to create the puzzles. This is more of a fun experiment to see if we can train LLMs to solve these kinds of logical deduction problems.

bradhilton · 2025-03-07T04:13:40 1741320820

Technically yes, only if you do a gradient step with data sampled from the exact same weights is it an online step.

With our training recipe this can be easily done by accumulating the gradients across the entire batch and only doing one step with optimizer before sampling more responses.

In our experiments, however, we found the advantages of doing multiple gradient steps outweighed any potential drift in policy.

Ultimately the online-ness of data is on a spectrum and while more online data is better, other factors may be more important.

fc417fc802 · 2025-03-07T09:57:25 1741341445

> only if you do a gradient step with data sampled from the exact same weights is it an online step.

Bit pedantic, but amusing thought; wouldn't that imply that asynchronous actor critic is an offline training methodology?

bradhilton · 2025-03-07T14:06:02 1741356362

Yes, pedantically, it is! But as I said, everything's on a spectrum. Online-ish data can still work just fine.