More

neehao · 2025-11-20T00:02:58 1763596978

sample output

``` { "question": "What is the distribution of Sepal Length?", "answer": "*Sepal Length*: mean=5.84, median=5.80, std=0.83, range=[4.30, 7.90]. N=150 [non-normal distribution].", "type": "distributional", "provenance": { "generated_at": "2025-11-19T19:21:28+00:00", "tool": "statqa", "tool_version": "0.2.0", "generation_method": "template", "analysis_type": "unknown", "variables": ["sepal_length"], "python_commands": ["valid_data.mean() # Result: 5.84", "valid_data.std() # Result: 0.83"] }, "visual": { "plot_type": "histogram", "caption": "Histogram showing sepal length distribution with mean=5.84 and std=0.83 (N=150). The data shows a approximately normal distribution.", "alt_text": "Histogram chart with sepal length values on x-axis and frequency density on y-axis, showing distribution shape with 150 observations.", "visual_elements": { "chart_type": "histogram", "x_axis": "Sepal Length", "y_axis": "Density", "key_features": ["distribution shape", "mean line"], "colors": ["blue bars", "red mean line"], "annotations": ["Mean: 5.84"] }, "primary_plot": "/path/to/univariate_sepal_length.png", "generation_code": "plot_factory.plot_univariate(data['sepal_length'], sepal_length_var, 'plot.png')" }, "vars": ["sepal_length"] } ```

neehao · 2025-11-19T19:49:56 1763581796

Output includes the precise commands run to generate the number

neehao · 2025-10-14T23:27:04 1760484424

By the way, new enough to HN that I don't know if this a policy or not. I revised the title for clarity

fragmede · 2025-10-14T23:35:03 1760484903

the guidance on titles boils down to: > please use the original title, unless it is misleading or linkbait; don't editorialize.

But the guidelines (linked at the bottom of the main page) has the full official policy

https://news.ycombinator.com/newsguidelines.html

neehao · 2025-09-25T01:01:00 1758762060

see also: https://http.cat/ (posted in comments of the blog)

neehao · 2025-09-24T23:24:26 1758756266

from a new customer perspective who just heard about midjourney, search may be a good spot to find alternate products. what google needs to know is if it is a navigational search and unless it keeps a long history, it may not know that. the simpler answer may be that companies who know you use a product like the one they make may just be willing to spend a bunch and google may be willing to add friction for the $.

neehao · 2025-09-13T22:57:05 1757804225

see https://pubmed.ncbi.nlm.nih.gov/17420199/

neehao · 2025-08-31T21:36:53 1756676213

As Tyler Cowen says, solve for the equilibrium.

"Many widely used machine-learning models rely on copyrighted data. For instance, Google finds the most relevant web pages for a search term by relying on a machine learning model trained on copyrighted web data. But the use of copyrighted data by machine learning models that generate content (or give answers to search queries than link to sites with the answers) poses new (reasonable) questions about fair use. By not sharing the proceeds, such systems also kill the incentives to produce original content on which they rely. For instance, if we don’t incentivize content producers, e.g., people who respond to Stack Overflow questions, the ability of these models to answer questions in new areas is likely to be lower. The concern about fair use can be addressed by training on data from content producers who have opted to share their data. The second problem is more challenging. How do you build a system that shares proceeds with content producers?"

https://www.gojiberries.io/generative-ai-and-the-market-for-...

skybrian · 2025-08-31T22:11:25 1756678285

AI companies are already paying for content to train on.

whimsicalism · 2025-09-01T07:41:30 1756712490

not in any serious manner. they are paying for content that has been successfully walled a la reddit, not all that is copyrighted. its law of the jungle

observationist · 2025-08-31T21:41:01 1756676461

Content producers that publish their "content" to the public web aren't entitled to dictate what's done with that material.

There's a simple solution. People that publish things can put up a paywall and people can pay what the content is worth.

The thing that AI endangers is not valuable content, it's the SEO clickbait cashcow, and as far as I'm concerned, the faster AI kills that off, the better.

That monetization model is corrupt as hell, produces all sorts of perverse incentives, and is the epitome of the enshittification of the web.

Burn, baby, burn.

_Algernon_ · 2025-08-31T21:52:15 1756677135

Publishing publicly doesn't surrender copyright…

bgwalter · 2025-08-31T21:52:11 1756677131

Of course they are entitled. They have the copyright, so you cannot reproduce it anywhere by default and the "fair" use issue is not settled.

Valuable content is endangered because writers feel demotivated it their material is just stolen by overfunded big corporations.

Paywalls only work for known publications and not for someone who writes the perfect tutorial on how to solve boot issues in Debian. Why would anyone write that if it's just stolen and monetized without attribution?

neehao · 2025-08-31T04:38:23 1756615103

This article covers a lot of the points: https://www.gojiberries.io/building-together-separately-chal...

"A single page on Doordash can make upward of 1000 gRPC calls (see the interview). For many engineers, upward of a thousand network calls nicely illustrate the chaos and inefficiency unleashed by microservices. Engineers implicitly diff 1000+ gRPC calls with the orders of magnitude fewer calls made by a system designed by an architect looking at the problem afresh today. A 1000+ gRPC calls also seem like a perfect recipe for blowing up latency. There are more items in the debit column. Microservices can also increase the costs of monitoring, debugging, and deployment (and hence cause greater downtime and worse performance)."

neehao · 2025-08-19T04:20:11 1755577211

For the customer, breakfast can be expensive if they bought it from outside. I can imagine $50 for a filling breakfast for a family of 4. This is just good bundling.

neehao · 2025-08-13T01:43:21 1755049401

three points: 1. i have often wondered about whether rapid tech. progress makes underinvestment more likely.

2. ben evans frequently makes fun of the business value. pretty clear a lot of the models are commodotized.

3. strategically, the winners are platforms where the data are. if you have data in azure, that's where you will use your models. exclusive licensing could pull people to your cloud from on prem. so some gains may go to those companies ...