Hacker Newsnew | past | comments | ask | show | jobs | submit | sakras's commentslogin

Yeah I kind of think authors didn't conduct a thorough-enough literature review here. There are well-known relations between number of hash functions you use and the FPR, cache-blocking and register-blocking are classic techniques (Cache-, Hash-, and Space-Efficient Bloom Filters by Putze et. al), and there are even ways of generating patterns from only a single hash function that works well (shamelessly shilling my own blogpost on the topic: https://save-buffer.github.io/bloom_filter.html)

I also find the use of atomics to build the filter confusing here. If you're doing a join, you're presumably doing a batch of hashes, so it'd be much more efficient to partition your Bloom filter, lock the partitions, and do a bulk insertion.


Your blogpost is great! Except for one detail: you have used modulo n. If n is not known at compile time, multiply+shift is much faster [1]. Division and modulo (remainder) are slow, except on Apple silicon (I don't know what they did there). BTW for blocked Bloom filters, there are some SIMD variants that seem to be simpler than yours [2] (maybe I'm wrong, I didn't look at the details, just it seems yours uses more code). I implemented a register-based one in one in Java here [3].

Bulk insertion: yes, if there are many keys, bulk insertion is faster. For xor filters, I used radix sort before insertion [4] (I should have documented the code better), but for fuse filters and blocked Bloom filters it might not be worth it, unless if the filter is huge.

[1] https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-... [2] https://github.com/FastFilter/fastfilter_cpp/blob/master/src... [3] https://github.com/FastFilter/fastfilter_java/blob/master/fa... [4] https://github.com/FastFilter/fastfilter_cpp/blob/master/src...


Very interesting blog post. I’d never seen that method for quickly computing the patterns. I thought I had done a lot of research on bloom filters, too!

Have you tried SolveSpace? It's easily my favorite open source CAD program. The main things it's missing are shells, fillets, and chamfers. But I've been able to 3D print quite a few parts using it!

You might want to check out Dune3D. It advertises itself as combining the constraint solver from SolveSpace with a OpenCASCADE geometry kernel supporting fillets and chamfers. :)

Haven't used it much apart from some minor tests (I tend to prefer MoI3D, but that's in a different category in several ways...), but as far as FOSS solid modelers it seems like the most promising to me. I do remember some small UI quirks, but overall it felt very approachable and streamlined, and looking at the GitHub repo, development is active. FreeCAD IMHO is just too sprawling and complex, with seemingly little tought paid to UI/UX.


Agreed: The Dune3D developers made the wise decision to start from scratch implementing a parametric modeling UI. Extremely robust software; very fast, and almost intuitive (high praise for CAD).

The problem with FreeCAD, on the other hand, is that it's a "just two more weeks and it'll be great" solution.

The developers are clearly talented in a raw-math kind of way, but FreeCAD offers the eternal promise of usability in the next release; while never delivering it.

Those who are profoundly cynical might consider the possibility that the legacy CAD industry has infiltrated the FreeCAD development team and run Pied-Piper ops there to prevent a Blender-moment stealing their revenue.

This would perfectly explain why the FreeCAD experience is so consistently bizarre.


>Those who are profoundly cynical might consider the possibility that the legacy CAD industry has infiltrated the FreeCAD development team and run Pied-Piper ops there to prevent a Blender-moment stealing their revenue.

If you've been around on the FreeCAD forums, you'll see that the majority of users essentially believe that all comparisons of FreeCAD with commercial CAD software is illegitimate and become incredibly defensive. They have developed a huge arsenal of coping strategies to avoid improving FreeCAD and the results speak for themselves.

It's like they've got the Steve Jobs attitude but without the good taste that justified it.


>They have developed a huge arsenal of coping strategies to avoid improving FreeCAD and the results speak for themselves.

Exactly. These FreeCAD "strategies" you mention align themselves perfectly with the objectives of the legacy CAD industry: To delay; break; and obfuscate opensource CAD.

In other words: The FreeCAD team may not be infiltrated by the legacy-CAD industry, but its behavior is entirely consistent with such a state.

One solution is to fork the behemoth; but if FreeCAD is a hedge-maze-by-design, the only way to win is not to play the game: Build alternatives elsewhere, from scratch.

FreeCAD feels like a time-drainer honeypot. Though whether by accident, or malice, is unknown.


Meh, if you gauge FreeCAD development mindset off of the forums you are misleading yourself. That was certainly the case 3 or 4 years ago, but it would seem that the core contributors have mostly moved away from the forum as a platform due to the very toxic mentality you mention. GitHub is the most concrete view into things, and a lot of free-flowing discussion happens on Discord.

The mindset against usability improvements that was prevalent back then has largely shifted. The hard part is the complexity of the program makes a single sweeping overhaul incredibly unlikely so incremental jumps and improvements will probably continue. Seems to me like things are headed in a pretty healthy direction when comparing the last few versions.


This. I just can’t bring myself to use FreeCAD for anything. It’s been almost a decade of occasional attempts during vacation breaks and it is still one of the worst, most counter-intuitive pieces of 3D software I’ve ever used (and I paid my way through college doing early multimedia work, some 30 years ago).

Dune3D is by the same developer as HorizonEDA, a KiCad alternative.

Has anyone tried that too?


I was excited about dune3d but one of the things I needed to do I had to import an SVG as a path to extrude (or similar) and I couldn't see a way to do it.

I managed to do it (painfully) with freecad, so that's what I settled with.

Does anyone know if that's a feature yet?


Dune 3D developer here. Use inkscape to convert the SVG path to DXF and import that.

Oh awesome, dxf import.[0] Nice, that solves it.

Gonna check out dune3d for my next side project!

[0] https://docs.dune3d.org/en/latest/dxf-import.html


Solvespace is nice, but missing fillets and chamfers is kind of a deal-breaker. Last time I tried it it also had issues with small holes turning into diamonds.

That said, pre-1.0 FreeCAD had a terrible UX so it was the best FOSS CAD option.

With the 1.0 release of FreeCAD the UX is much better though. There are still a few WTFs (e.g. it took me quite a while to figure out rollback is done via right-click->set tip, or something like that)... But overall it's better than Solvespace now.


If you want a solvespace with chamfers and fillets, then give Dune 3D a try.

Disclaimer: Dune 3D developer here.


Ooo interesting. The screenshots look suspiciously basic but Horizon EDA is pretty great so I'll give it a try!

Set tip makes sense if you think of the steps taken to build up a parts as a history. Setting the tip isn't a rollback. It is saying "I want to insert a new step in the history".

Yeah, I use FreeCAD when I need fillets/chamfers... before that, I usually model my 3d printer stuff using OpenSCAD.

Yeah I actually have. I really liked the concept, but I designed a cylinder with many holes (think a robust sieve) and it just crashed when the number of holes grew too great. Even the OpenCL/MP version. I felt it being unstable in other ways too so I did not make it my go to tool. Sadly it also seems it's not being developed much.

EDIT: Missing fillets and chamfers we're also a big problem for me - probably I'm just a newbie maker and want unreasonable things, but still.


Just checked it out [1] but it appears the last version released was in 2022? Makes me wonder if it is still active.

[1] https://solvespace.com/index.pl


We have been very close to version 3.2 final for far too long. Development has slowed but not stopped. I would try a nightly/development build.

Thanks!

Giving it a quick look, seems like they've addressed a lot of the shortcomings of Parquet which is very exciting. In no particular order:

- Parquet metadata is Thrift, but with comments saying "if this field exists, this other field must exist", and no code actually verifying the fact, so I'm pretty sure you could feed it bogus Thrift metadata and crash the reader.

- Parquet metadata must be parsed out, meaning you have to: allocate a buffer, read the metadata bytes, and then dynamically keep allocating a whole bunch of stuff as you parse the metadata bytes, since you don't know the size of the materialized metadata! Too many heap allocations! This file format's Flatbuffers approach seems to solve this as you can interpret Flatbuffer bytes directly.

- The encodings are much more powerful. I think a lot of people in the database community have been saying that we need composable/recursive lightweight encodings for a long time. BtrBlocks was the first such format that was open in my memory, and then FastLanes followed up. Both of these were much better than Parquet by itself, so I'm glad ideas from those two formats are being taken up.

- Parquet did the Dremel record-shredding thing which just made my brain explode and I'm glad they got rid of it. It seemed to needlessly complicate the format with no real benefit.

- Parquet datapages might contain different numbers of rows, so you have to scan the whole ColumnChunk to find the row you want. Here it seems like you can just jump to the DataPage (IOUnit) you want.

- They got rid of the heavyweight compression and just stuck with the Delta/Dictionary/RLE stuff. Heavyweight compression never did anything anyway, and was super annoying to implement, and basically required you to pull in 20 dependencies.

Overall great improvement, I'm looking forward to this taking over the data analytics space.


> - They got rid of the heavyweight compression and just stuck with the Delta/Dictionary/RLE stuff. Heavyweight compression never did anything anyway, and was super annoying to implement, and basically required you to pull in 20 dependencies.

"Heavyweight compression" as in zstd and brotli? That's very useful for columns of non-repeated strings. I get compression ratios in the order of 1% on some of those columns, because they are mostly ASCII and have lots of common substrings.


I think the wasm compiler is going to bring in more dependencies than the ‘heavy’ compression would have.

I think that more expensive compression may have made more of a difference 15 years ago when cpu was more plentiful compared to network or disk bandwidth.


> I'm looking forward to this taking over the data analytics space

Parquet is surprisingly arcane. There are a lot of unpleasant and poorly documented details one has to be aware of in order to use it efficiently.


For compression, has the world settled with zstd now?


I think it’s a pretty common choice when you want compression in a new format or protocol. It works better for compressing chunks of your data rather than large files where you want to maintain some kind of index or random access. Similarly, if you have many chunks then you can parallelise decompression (I’m not sure any kind of parallelism support should have been built in to the zstd format, though it is useful for command line uses).

A big problem for some people is that Java support is hard as it isn’t portable so eg making a Java web server compress its responses with are isn’t so easy.


Java can just use native libraries, there are plenty of Java projects that do that.

It's not like it's 1999 and there is still some Sun dogma against doing this.


Sure, I don't want to make a big deal about this but I have observed Java projects choosing to not support zstd for portability (or software packaging) reasons.


Well, convenience is also a factor in some cases. Much easier to schlep a "pure-Java" jar around.


Depends on the use-case. For transparent filesystem compression I would still recommend lz4 over zstd because speed matters more than compression ratio in that use case.


Most definitely not settled, but it's a good default



They added wasm dependency though.

Build teams, weep in fear...


I see you guys are using Egg/Egglog! I've been mildly interested in egraphs for quite a while, glad to see they're gaining traction!


Right, my first thought when reading the blurb was "kinda sounds like e-graphs?"


e-graphs are awesome! none of this would be possible without them.


Likely performance - LLVM is somewhat notorious for being slower than ideal.


Likely refers to Machine IR, a lower level representation that normal LLVM IR lowers to?


Typically requests are binned by context length so that they can be batched together. So you might have a 10k bin and a 50k bin and a 500k bin, and then you drop context past 500k. So the costs are fixed per-bin.


Makes sense, and each model has a max context length, so they could charge per token assuming full context by model if they wanted to assume worst case.


I intuitively think about linear regression as attaching a spring between every point and your regression line (and constraining the spring to be vertical). When the line settles, that's your regression! Also gives a physical intuition about what happens to the line when you add a point. Adding a point at the very end will "tilt" the line, while adding a point towards the middle of your distribution will shift it up or down.

A while ago I think I even proved to myself that this hypothetical mechanical system is mathematically equivalent to doing a linear regression, since the system naturally tries to minimize the potential energy.


Perfect analogy! The cool part is that your model also gives good intuition about the gradient descent part. The springs' forces are the gradients, and the act of the line "snapping" into place is the gradient descent process.

Technically, physical springs will also have momentum and overshoot/oscillate. But even this is something that is used in practice, gradient descent with momentumg.


Maybe I'm missing something, but why do people expect PoW to be effective against companies who's whole existence revolves around acquiring more compute?


I was under the impression that the bad crawlers exist because it's cheaper to reload the data all the time than to cache it somewhere. If this changes the cost balance, those companies might decide to download only once instead of over and over again, which would probably be satisfactory to everyone.


I've been beating the drum about this to everyone who will listen lately, but I'll beat it here too! Why don't we use seL4 for everything? People are talking about moving to a smart grid, having IoT devices everywhere, putting chips inside of peoples' brains (!!!), cars connect to the internet, etc.

Anyway, it's insane that we have a mathematically-proven secure kernel, we should use it! Surely there's a startup in this somewhere..


Rewriting all software would cost infinite money.


New smart grids with new software do not require a rewrite!


Almost all vulnerabilities are in apps and libraries which seL4 does little or nothing to solve. The only solution is secure coding across the entire stack which will reveal that much of the existing code is so low-quality that it just has to be thrown away and rewritten.


They will!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: