Hacker Newsnew | past | comments | ask | show | jobs | submit | petercooper's favoriteslogin

I can't keep up with half the new features all the model companies keep rolling out. I wish they would solve that

If you're planning to cancel, here's how to avoid the 50% cancellation fee:

1. Go to cancel your subscription. You will see a screen with the cancellation fee. Continue.

2. They will offer you a new plan to avoid the cancellation fee. Choose the cheapest one to switch to a new plan.

3. Here's the loophole: You can cancel any plan for free within two weeks. Cancel the new plan within this period and get your money back for the first month of the new plan.

You've avoided the fee.

Source: https://www.reddit.com/r/VideoEditing/comments/k9kh6v/how_to...


The Xeon Phi uses a different computing architecture than either a CPU or a GPU so some intuitions about using it will be off. The Phi is essentially a modern barrel processor that uses the AMD64 ISA (it does not understand legacy x86 modes), a really nice vector ALU implementation, and memory bandwidth that is more like a GPU than a CPU. While it will run normal 64-bit software reasonably well with no special considerations, it will not be efficient without tweaking code design. CPU-targeted code attempts to optimize IPC in a single thread; barrel processors are designed to hide latency and can't drive IPC through single thread optimization.

The reason barrel processors are interesting is that they can be incredibly efficient with their clock cycles. Unlike either CPUs or GPUs, it is relatively easy to get sustained throughput that approaches the theoretical IPC of the silicon for diverse software. The Xeon Phi mentioned in the article has 114 ALUs; it is possible to ensure all of those ALUs are doing useful work every single clock cycle, unlike the much smaller number of ALUs in your CPU. CPUs and GPUs have higher theoretical throughput in some cases but various parts of their ALUs typically spend a significant part of their time idle.

Contrary to marketing, you do not want to program these like an ordinary CPU even though the cores are truly general purpose (unlike a GPU). Thread behavior is unlike CPUs or GPUs. Barrel processors cannot saturate a single core with a single thread! The Xeon Phi has 228 independent threads and you need to use them all the time.

The way barrel processors work is if the hardware supports N threads then each clock cycle you can saturate all the ALUs if some subset M of those threads are not stalled. The M-of-N ratio varies by barrel processor design but is typically 20-50% in my experience. Each clock cycle, a core selects an immediately runnable operation from the basket of threads it can see and executes it; as long as something is runnable in that basket, the core will do real work that clock cycle. Xeon Phi has a 50% M-of-N requirement, so you need a minimum of 114 threads that are not stalled every clock cycle to saturate the processor. The way you ensure that you hit the 114 threshold is to schedule all 228 hardware thread slots with useful work.

For programming, this changes the way you reason about locality and concurrency. If we assume that some percentage of threads can be safely stalled or blocked with no impact on throughput then it changes the way you design your algorithms and data structures. A little additional latency on a subset of threads won't hurt performance, especially if it increases task concurrency. On a CPU stalled threads are expensive, as it leads to idle cores or context switches. For architectures like Phi, you design your data structures and algorithms around relatively small, semi-independently work units so that there is always a large number of tasks that can be assigned to a thread even though this reduces locality. Below a certain threshold, thread concurrency is approximately free because the cores will schedule around stalls due to contention.

I like barrel processors quite a lot. Once you get used to the model, it is an easier architecture with which to achieve efficient massively threaded parallelism and throughput than either CPUs or GPUs. More importantly, they are hard to beat for efficiency for general purpose computing when software is designed for the architecture since so few clock cycles are wasted.


You need different indexing algorithms for different use cases - brute-force indexing, for example, is "SOTA" when it comes to recall (100%). If you have multiple use cases or if you might have domain shift, you'll want a vector database that supports multiple indexes.

Here's my 2¢:

- If you're just playing around with vector search locally and have a very small dataset, use brute-force search. Don't worry about indexes until later.

- If you have plenty of RAM and CPU cores and would like to squeeze out the most performance, use ScaNN or HNSW plus some form of quantization (product quantization or scalar quantization).

- If you have limited RAM, use IVF plus PQ or SQ.

- If you want to maintain reasonable latency but aren't very concerned about throughput, use a disk-based index such as DiskANN or Starling. https://arxiv.org/pdf/2401.02116.pdf

- If you have a GPU, use GPU-specific indexes. CAGRA (supported in Milvus!) seems to be one of the best. https://arxiv.org/abs/2308.15136

All of these indexes are supported in Milvus (https://milvus.io/docs/index.md), so you can pick and choose the right one for your application. Tree-based indexes such as Annoy don't seem to have a sweet spot just yet, but I think there's room for improvement in this subvertical.


Let me narrow my guess: They hit 4 years, 206 days and 16 hours . . . or 40,000 hours.

And that they were sold by HP or Dell, and manufactured by SanDisk.

Do I win a prize?

(None of us win prizes on this one).


That's the alarm system telling you to wake up and start looking around for a change. For whatever reason, we seem to have a meta-cognitive system that watches for stagnation and creates unease when it is detected. Am convinced that you can't disable it entirely, but you can self-mediate or ignore it. Either route is not great, as you're aware on some level that you're refusing your destiny (of sorts).

I think about this quote a lot:

"i keep re-encountering with a shock the way that most people do not know, at all, that the problem the entire universe is devoted to, that it crashes us into walls, throws us off cliffs, tortures and murders us to try to solve, is that of escaping local maxima"

- https://twitter.com/chaosprime/status/1248861223501942784?re...

Thus, I'd implore you to stick with the feeling and use it as impetus to change for something new.


For sure. As someone who has stayed away from this part of the biz, infrastructure always looked like makework to me. Necessary, mind you, but still makework that was indicative of poor/insufficient computing primitives for services.

> it seems like JVM is experience a bit of a comeback with a few companies adopting Kotlin for backend

JVM 21 adding virtual threads, and Spring Boot 3.2 using them with one line of config is huge. We can now write simple code that looks blocking and let the runtime handle it instead of writing async function/await everywhere. Personally I'm loving Spring Boot 3.2 with Kotlin, especially for the fact that I can bundle scheduled jobs, API, and frontend all in one place for my indie projects. Plus the JVM world of devs seems to have a somewhat decent appreciation of how to make web services that aren't rife with unnecessary coupling.

The bad? There is a legendary amount of cargo-culting blog posts and questionable advice around Spring Boot, often for older versions.


Usually when you’re building container images at any scale you’d use something like buildx, kaniko or buildah which allow you to easily set multiple target architectures (AMD64, ARM64) for your images.

There’s hardly any / basically no overhead to running an application in a container if that’s what you mean by overhead. If you mean the image size - well you only need to add the things you need - the problem is people tend to abuse images and install a lot of packages in the final image which absolutely aren’t required.

Rust or golang at the core would be nice as Python can be slow at times. I do wish more folks would give Tauri a go for GUI apps.


Their org chart includes "Aella", who is apparently acting as their "media advisor" [0]. That tells you just about everything you need to know about this organization.

[0] https://nitter.net/Aella_Girl/status/1705772547781140541


I track almost all of my expenditures. At a restaurant I used to visit fairly regularly I noticed that my costs were often off by approximately ±25 cents. This was inconsistent and not always in one party's favour. Upon reporting and investigation it turned out that (a) I routinely wrote 'math' along with a total instead of a tip and (b) one of the waiters only approximated the math required.

Nothing nefarious and I'm not even sure if I won or lost out of this - but it was interesting to catch.


Hi, in case you face these challenges again, put the target system into DFU mode and connect the configuring system using the appropriate USB-C port (only one will work).

The DFU key combination is finicky for portable machines: connect a charger (preferably Magsafe so you can watch the power LED) and your configuring system, press and hold power to be sure the system is off (if doing this turned the system on, repeat this step), press left ctrl+option and right shift at the same time as the power button, count ten seconds, let go of everything but power until the device shows up as "DFU" in Configurator (you may be prompted to allow more accessories to connect to your configuring system before it does).

If asked to perform a software update before/during reviving, choose "Quit and update" and start the process again. If you upgraded to 13.6 or 12.5 before facing these issues, you may have to enter recoveryOS instead of booting normally and perform a system upgrade to Sonoma.

If done correctly (without choosing Restore), you will not lose data. If you can't do these steps yourself or think you will have trouble walking a family member through them, the Apple Store can do a revive for you (be explicit that they are only to revive the machine, not restore or replace).

Full details at https://support.apple.com/guide/apple-configurator-mac/reviv...


I’d love to decode some radio signals - just to go through the “motions” and try my hand at it.

Is something like an AM radio signal decoding relatively easy to understand? What about FM? And NTSC?

Is the python code clear-ish? Complicated? Full of math I’m not appreciating?

Thanks!

Ok… it’s not exactly trivial: https://pysdr.org/content/rds.html#fm-demodulation


A friend of mine convinced me this was the FB strategy to get people to resign instead of needing to fire them (a costly, legal.minefield).

As a founder, the act of paralizying the company to effectively "bleed the fat off the bone" while cost effective, is insane. Because nothing gets done during those low morale months. I refused to believe this could be a valid strategy.

Then I was countered with: this org is already not shipping features or anything customers want anyway, so its not like it has any productivity to lose. Then proceeded to name Twitter as exhibit A of the argument.

I could not retort to that.


In January, 2022, I experienced a 100% blockage of my Left Anterior Descending artery (a "widowmaker" heart attack), and I experienced cardiac arrest for a minute or two while on the exam table in a local emergency room. CPR was administered, and I was shocked back into rhythm, whereupon I regained consciousness immediately.

During the time I was dead, I have a memory. It is a singular experience of non-existence. Everything was black, and warm, and comfortable. It was silent, and there was no pain or concern for anything that had been going on. I didn't even think about it. It was as though I were in the most effective sensory deprivation tank ever.

Then they shocked me, and it hurt like hell. I woke up, looked at the doctor, and he said, "you went away for a few there", and I said, "Oh, did I?".

A couple minutes later, they "lost" me, and I went back to the blackness, but for a much shorter period of time, and I was shocked almost immediately. I had the wherewithal, upon being shocked, to say, in a very annoyed voice, 'Ouch.' which apparently caused some people in the ER room to laugh.

I ended up crashing 6 times that day, and each time I underwent cardiac arrest, it was like slipping back asleep into the deepest dream.

I used to be afraid to die. Now, I'm not, but I'm afraid to leave behind the people that I love, because I want to spend time with them, and I don't want them to have to go through me leaving them again.

Update: Here's a link to my tweet when I first talked publicly about it - https://twitter.com/standaloneSA/status/1478436334347816960


Did you consider pre-processing each chunk separately to generate useful information - summary, title, topics - that would enrich embeddings and aid retrieval? Embeddings only capture surface form. "Third letter of second word" won't match embedding for letter "t". Info has surface and depth. We get depth through chain-of-thought, but that requires first digesting raw text with an LLM.

Even LLMs are dumb during training but smart during inference. So to make more useful training examples, we need to first "study" them with a model, making the implicit explicit, before training. This allows training to benefit from inference-stage smarts.

Hopefully we avoid cases where "A is B" fails to recall "B is A" (the reversal curse). The reversal should be predicted during "study" and get added to the training set, reducing fragmentation. Fragmented data in the dataset remains fragmented in the trained model. I believe many of the problems of RAG are related to data fragmentation and superficial presentation.

A RAG system should have an ingestion LLM step for retrieval augmentation and probably hierarchical summarisation up to a decent level. It will be adding insight into the system by processing the raw documents into a more useful form.


Yes, you can do this.

This is how you would implement a "medium term" memory. Folks in the "sentence-transformer" world have known this forever, yet the wider NLP world ignores it in the context of "chatbots" despite how powerful that and related concepts like "soft-prompts/textual inversion" are.

It's a wonderful technique and the fact that it's not used in ChatGPT and other tools like it is a shocking shame.


Lots of these services:

- Fluidstack: https://fluidstack.io

- Vast: https://vast.ai

- QBlocks: https://qblocks.cloud

- RunPod: https://runpod.io

- Sonm: https://sonm.com (blockchain)

- Golem: https://golem.network (blockchain)

- Rentaflop: https://rentaflop.com (rendering specific, blockchain)

- RNDR: https://rendertoken.com (rendering specific, blockchain)

If you want HPC specific cloud providers:

- Crusoe Cloud: https://crusoecloud.com

- Coreweave: https://coreweave.com

- Lambda Labs: https://lambdalabs.com

- Paperspace: https://paperspace.com

As others have pointed out, the decentralized clouds can't offer high performance interconnects (e.g. InfiniBand) that a lot of folks are using for LLM training. There are definitely initiatives underway to reduce dependence on these interconnects and build performant distributed training (again, some threads below mention this), but I think it's mostly academic at this point.

Disclosure: I run product at Crusoe Cloud, which aims to provide ML training at half the cost of a hyperscaler, while also being carbon reducing (https://crusoecloud.com/climate-impact/).


Oracle Database 12.2.

It is close to 25 million lines of C code.

What an unimaginable horror! You can't change a single line of code in the product without breaking 1000s of existing tests. Generations of programmers have worked on that code under difficult deadlines and filled the code with all kinds of crap.

Very complex pieces of logic, memory management, context switching, etc. are all held together with thousands of flags. The whole code is ridden with mysterious macros that one cannot decipher without picking a notebook and expanding relevant pats of the macros by hand. It can take a day to two days to really understand what a macro does.

Sometimes one needs to understand the values and the effects of 20 different flag to predict how the code would behave in different situations. Sometimes 100s too! I am not exaggerating.

The only reason why this product is still surviving and still works is due to literally millions of tests!

Here is how the life of an Oracle Database developer is:

- Start working on a new bug.

- Spend two weeks trying to understand the 20 different flags that interact in mysterious ways to cause this bag.

- Add one more flag to handle the new special scenario. Add a few more lines of code that checks this flag and works around the problematic situation and avoids the bug.

- Submit the changes to a test farm consisting of about 100 to 200 servers that would compile the code, build a new Oracle DB, and run the millions of tests in a distributed fashion.

- Go home. Come the next day and work on something else. The tests can take 20 hours to 30 hours to complete.

- Go home. Come the next day and check your farm test results. On a good day, there would be about 100 failing tests. On a bad day, there would be about 1000 failing tests. Pick some of these tests randomly and try to understand what went wrong with your assumptions. Maybe there are some 10 more flags to consider to truly understand the nature of the bug.

- Add a few more flags in an attempt to fix the issue. Submit the changes again for testing. Wait another 20 to 30 hours.

- Rinse and repeat for another two weeks until you get the mysterious incantation of the combination of flags right.

- Finally one fine day you would succeed with 0 tests failing.

- Add a hundred more tests for your new change to ensure that the next developer who has the misfortune of touching this new piece of code never ends up breaking your fix.

- Submit the work for one final round of testing. Then submit it for review. The review itself may take another 2 weeks to 2 months. So now move on to the next bug to work on.

- After 2 weeks to 2 months, when everything is complete, the code would be finally merged into the main branch.

The above is a non-exaggerated description of the life of a programmer in Oracle fixing a bug. Now imagine what horror it is going to be to develop a new feature. It takes 6 months to a year (sometimes two years!) to develop a single small feature (say something like adding a new mode of authentication like support for AD authentication).

The fact that this product even works is nothing short of a miracle!

I don't work for Oracle anymore. Will never work for Oracle again!


The cool thing for working with json is to store each json document as is in one column, then make virtual columns that store some specific information you want to query, using some combination of json_extract, then index those columns.

This makes for super-fast search, and the best part is you don't have to choose what to index at insert time; you can always make more virtual columns when you need them.

(You can still also search non-indexed, raw json, although it may take a long time for large collections).

I love SQLite so much.


Don't lowball your consulting rate!

* Your price sends a signal that attracts particular clients, and the ones you attract with lowball prices are the worst, most demanding kind.

* Your ordinary good clients aren't shopping on price, beyond a vague notion of what the normal range of prices for your field is. Remember: they're generally not spending their own money.

* Any client big enough to have a purchasing department is never going to let you get your rate back; their whole job is to prevent vendors from ever raising rates.

* There are a zillion things you're selling as a consultant that you don't realize you're selling, from schedule flexibility and freedom to fire at will to answering phone calls about the project deliverable 3 weeks after the project is done to not having to pay benefits and payroll taxes to documentation, and you're much more likely to forget to capture this stuff in your prices than you are to overcapture it.

I'm not sure I've ever met a new consultant that had unrealistically high rates. But most new consultants I've met have had unrealistically low rates.

If a client balks at your rate, you can still get the project cost where they need it to be: negotiate on scope instead of rate.

Never bill hourly. Hourly is cursed.


This tutorial is very complex. Here's how to get free semantic search with much less complexity:

  1. Install sentence-transformers [1]
  2. Initialize the MiniLM model - `model = SentenceTransformer('all-MiniLM-L6-v2')`
  3. Embed your corpus [2]
  4. Embed your queries, then search the corpus
This runs on CPU (~750 sentences per second), and GPU (18k sentences per second). You can use paragraphs instead of sentences if you need more text. The embeddings are accurate [3] and only 384 dimensions, so they're space-efficient [4].

Here's how to handle persistence. I recommend starting with the simplest strategy, and only getting more complex if you need higher performance:

  - Just save the embedding tensors to disk, and load them if you need them later.
  - Use Faiss to store the embeddings (it will use an index to retrieve them faster) [5]
  - Use pgvector, an extension for postgres that stores embeddings
  - If you really need it, use something like qdrant/weaviate/pinecone, etc.
This setup is much simpler and cheaper than using a ton of cloud services to do embeddings. I don't know why people make semantic search so complex.

I've used it for https://www.endless.academy, and https://www.dataquest.io and it's worked well in production.

[1] https://www.sbert.net/

[2] https://www.sbert.net/examples/applications/semantic-search/...

[3] https://huggingface.co/blog/mteb

[4] https://medium.com/@nils_reimers/openai-gpt-3-text-embedding...

[5] https://github.com/facebookresearch/faiss


Hi, I'm the creator of Anime.js. Firstly, thank you for the submission. I honestly didn't anticipate being featured on the homepage of HN without any major update to the library! To those wondering why the project hasn't been updated recently, it's simply because I've been working on a new version (V4) for the last two years. The core library has been completely rewritten, I'm currently in the testing and documentation phase and should be ready for release by this summer.

Some of the new features of V4 include:

* Improved performance: The library has been entirely rewritten with performance optimization and low memory usage in mind.

* New ESM Module first API: Import only what you need and benefit from improved tree shaking.

* Better timelines: New options for animation positioning, allowing looping animations in timelines and improved parameters inheritance.

* Additive animations: A new tween composition mode that lets you smoothly blend concurrently running animations.

* Variable framerate: Specify different framerates for each animation or set it globally.

* New callbacks: onTick, onLoop, onRender, etc.

* Value modifiers: A function to modify or alter the behavior of the numerical values of the animated property value before rendering.

* Animate from: Animation from a given value to the current target value.

* Improved documentation: A new design with enhanced animations demos and more in-depth explanations.

* Unit tests

And much more! I can't wait to share it with you!


Please sell something business-like that people can purchase and expense.

A book, software, something. I can't quite expense patreon and others may have a similar issue. (Useful "free SaaS" where all there is is the cup of coffee button makes me sad).


> My script read through each of the products we had responses for, called OpenAI's embedding api and loaded it into Pinecone - with a reference to the Supabase response entry.

OpenAI and the Pinecone database are not really needed for this task. A simple SBERT encoding of the product texts, followed by storing the vectors in a dense numpy array or faiss index would be more than sufficient. Especially if one is operating in batch mode, the locality and simplicity can’t be beat and you can easily scale to 100k-1M texts in your corpus on commodity hardware/VPS (though NVME disk will see a nice performance gain over regular SSD)


I run a service that has an API. which can help get spot price https://ec2.shop/

Simplify do:

curl 'https://ec2.shop?region=us-west-2&filter=m5&json' | jq

You can pipe it to whatever your system store to get the real time price without dealing with AWS Price API


I have trouble answering this question myself, and I created it!

The problem I have is that it can be applied to too many different problems.

I personally have used it for the following (a truncated summary):

- Publishing data online to allow other people to explore it, for example https://scotrail.datasette.io and https://russian-ira-facebook-ads.datasettes.com/

- Building websites, by combining it with custom templates. https://datasette.io and https://www.niche-museums.com and https://til.simonwillison.net are three examples

- Building my own combined search engine over a bunch of different data. https://github-to-sqlite.dogsheep.net is this for my GitHub issues and commits and issue comments across 100+ projects

- Similarly, building a code search engine across multiple repos (partly to demonstrate how far you can go with custom plugins): https://ripgrep.datasette.io

- Any time I have a CSV file I open it in the Datasette Desktop macOS app first to start exploring it: https://datasette.io/desktop

- As a prototyping tool. It's the fastest way I know of to get from some data files (CSV or JSON) to a working JSON API - and a GraphQL API too using this plugin: https://datasette.io/plugins/datasette-graphql

- Messing around with geospatial data - here's a write-up of my favourite experiment with that so far: https://simonwillison.net/2021/Jan/24/drawing-shapes-spatial...

This is a bewilderingly wide array of things! And I keep on finding new problems I can apply it to.

Of course, if all you have is a hammer, everything looks like a nail. But thanks to the plugin system (and the amazing flexibility of SQLite under the good) I can reshape my hammer to do all sorts of other things!

Maybe it's more of a sonic screwdriver.

I've been trying to capture some of this at https://datasette.io/for

This is one of my biggest marketing challenges for the project though. If someone asks you for an elevator pitch you need to do better than spending 15 minutes talking through a wide ranging bulleted list!


> the Monte Carlo method lets you punch above your weight class in terms of measuring and predicting phenomena that are too complex, or too expensive to deterministically model.

Yes, this is exactly why I like it. At AWS, we've used Monte Carlo simulations quite extensively to model the behavior of complex distributed systems and distributed databases. These are typically systems with complex interactions between many components, each linked by a network with complex behavior of its own. Latency and response time distributions are typically multi-modal, and hard to deal with analytically.

One direction I'm particularly excited by in this niche is converging simulation tools and model checking tools. For example, we could have a tool like P use the same specification for exhaustive model checking, fuzzing invariants, and doing MC (and MCMC) to produce statistical models of things like latency and availability.


I store my authorized_keys in DNS TXT records, that are DNSSEC signed, with a validating resolver on the box. I then just use "/usr/bin/hesinfo %u ssh" as my AuthorizedKeysCommand in OpenSSH.

I wrote a little tool that allowed you to "#include" other DNS records etc, but "hesinfo" is generally easily installable/available so it's just easier.


I've had a lot of success with this!

Run highly targeted advertisements, and specifically include in the advertisement that you are looking for beta test users to talk to - and what kind of users you are looking for.

The most success we had was with LinkedIn Lead Gen forms. We got meetings with about 50 people at $25-$50 a pop. Be very personal and transparent - there are lots of somewhat bored professionals out there who would love nothing more than try out your app and give you feedback.

Another option is to sponsor a professional mixer event in your area. Depending on the event, you might be able to get a 5 minute speaking spot for less than a few hundred bucks.


Feel free to upgrade that to a very, very direct influence. In 2010 after I quit being a salaryman I wanted nothing other than to go into semi-retirement on Bingo Card Creator (“sip iced cocoa and play video games all day”).

A long-ranging conversation with Joel on, among other things, confluence of Catholic theology and the Talmud plus the memorable phrase “Shouldn’t you apply your skills to something which isn’t totally bullshit?” caused me to have a sharp reassessment of life and career goals, but for which it is unlikely I would have made a serious go of my consultancy, successfully launched my following few companies, or continued writing at anything like pace observed over the interval.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: