Hacker Newsnew | past | comments | ask | show | jobs | submit | bradhilton's commentslogin

The problem is that they have a lot of time to report their purchases. If they were required to report before they purchased the problem would probably resolve itself.


The SP500 is probably the most popular investment in America, perhaps aside from housing. Wouldn't hurt to have lawmaker's fortunes broadly aligned versus narrowly aligned with specific corporations.


That can go very poorly like in 1929, 1965, 2000, etc. and it is going to go poorly again and show everyone why it is a bad idea.

https://www.currentmarketvaluation.com/models/s&p500-mean-re...


Whenever the S&P 500 craters, it is the best time to buy. Warren Buffet talks about it all the time: "Don't bet against America." There is no other large rich nation on Earth with such a dynamic economy. That (partly) explains why the US can bounce back so quickly (compared to other countries) from devastating economic downturns.


Awesome! If you run into any problems or have questions feel free to open an issue or drop by the discord [1] server.

[1] https://discord.gg/zbBHRUpwf4


Hi, we don't have reliable documentation for the HTTP API endpoints yet, mostly as they are still subject to change.

However, to briefly provide some context, `/_train_model` returns a stream of line delimited JSON objects for each gradient step as the model trains on the provided trajectories so the client can monitor progress. The final version of this endpoint may provide the option for both streaming & non-streaming responses, and/or potentially return a "training job" that can be polled instead.


Contributor here, we developed the Agent Reinforcement Trainer (ART) library to make it easy to train LLMs for anything.

No callbacks or straitjacket flows. Instead we serve an OpenAI API-compatible endpoint that you can use as a drop-in replacement for any proprietary APIs you may be hitting.

After collecting responses from the inference API, you can tune the model with your own custom rewards and repeat the process as long as you like, until performance converges. We believe this level of flexibility will make it easier for you to train state-of-the-art models for your own use cases, much like Kyle's new email agent[1].

Also happy to answer any questions you have about the framework.

[1] https://openpipe.ai/blog/art-e-mail-agent


I could see training your own email agent being beneficial for products like this:

https://x.com/advaitpaliwal/status/1913290027897131084


I know Google DeepMind ran experiments with 10M a while ago, but I think this will be the first legit, released 10M context window model.


We used about 58 hours on 4xH100s and about 19 hours on 8xH100s to get the very best result with the 32B model. We trained for about another 16 hours before finishing the run, but we could have stopped earlier after it was apparent the model was regressing. Actual dollar costs are provider dependent.


Well, in this case there is a much more straightforward method with the same CP-SAT solver used to create the puzzles. This is more of a fun experiment to see if we can train LLMs to solve these kinds of logical deduction problems.


Technically yes, only if you do a gradient step with data sampled from the exact same weights is it an online step.

With our training recipe this can be easily done by accumulating the gradients across the entire batch and only doing one step with optimizer before sampling more responses.

In our experiments, however, we found the advantages of doing multiple gradient steps outweighed any potential drift in policy.

Ultimately the online-ness of data is on a spectrum and while more online data is better, other factors may be more important.


> only if you do a gradient step with data sampled from the exact same weights is it an online step.

Bit pedantic, but amusing thought; wouldn't that imply that asynchronous actor critic is an offline training methodology?


Yes, pedantically, it is! But as I said, everything's on a spectrum. Online-ish data can still work just fine.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: