ben509's favorites | Hacker News

		kstenerud on May 8, 2021 \| parent \| context \| on: The Byte Order Fiasco You can put emphasis on high order bits, but that makes decoding more complex. With little endian the decoder builds low to high, which is MUCH easier to deal with, especially on spillover. For example, with ULEB128 [1], you just read 7 bits at a time, going higher and higher up the value you're reconstituting. If the value grows too big and you need to spill over to the next (such as with big integer implementations), you just fill the last bits of the old value, then put the remainder bits in the next value and continue on. With a big endian encoding method (i.e. VLQ used in MIDI format), you start from the high bits and work your way down, which is fine until your value spills over. Because you only have the high bits decoded at the time of the spillover, you now have to start shifting bits along each of your already decoded big integer portions until you finally decode the lowest bit. This of course gets progressively slower as the bits and your big integer portions pile up. Encoding is easier too, since you don't need to check if for example a uint64 integer value can be encoded in 1, 2, 3, 4, 5, 6, 7 or 8 bits. Just encode the low 8 bits, shift the source right by 8, repeat, until the source value is 0. Then backtrack to the as-yet-blank encoded length field in your message and stuff in how many bytes you encoded. You just got the length calculation for free. Use a scheme where you only encode up to 60 bit values, place the length field in the low 4 bits, and Robert's your father's brother! For data that is right-heavy (i.e. the fully formed data always has real data on the right side and blank filler on the left - such as uint32 value 8 is actually 0x00000008), you want a little endian scheme. For data that is left-heavy, you want a big endian scheme. Since most of the data we deal with is right-heavy, little endian is the way to go. You can see how this has influenced my encoding design in [2] [3] [4]. [1] https://en.wikipedia.org/wiki/LEB128 [2] https://github.com/kstenerud/concise-encoding/blob/master/cb... [3] https://github.com/kstenerud/compact-float/blob/master/compa... [4] https://github.com/kstenerud/compact-time/blob/master/compac...
		7373737373 on Nov 9, 2020 \| parent \| context \| on: Turing Incomplete Languages > Resource bounded languages such as Hume, which allow better analysis and implementation techniques by restricting how expressive the language is. Hume seems to use a finite state machine and recursive functions as its "substrate": https://www.macs.hw.ac.uk/~greg/hume/ There's a more intuitive way: In a virtual machine interpreter loop, increase a value (commonly called "gas") at every instruction step (or each block of instructions). Different instructions can have different costs. The Ethereum Virtual Machine (EVM) uses this for example: https://blockgeeks.com/guides/ethereum-gas/ Stackless Python as well: https://stackless.readthedocs.io/en/latest/library/stackless... This can even work recursively: https://esolangs.org/wiki/RarVM
		nathan_f77 on March 3, 2020 \| parent \| context \| on: What's so hard about PDF text extraction? There are quite a few services that should be able to solve this problem (turning a PDF into a web form and collecting signatures.) Here's a few of the services I'm aware of: * https://www.hellosign.com/products/helloworks * https://www.useanvil.com * https://www.pandadoc.com * https://www.pdffiller.com * https://www.platoforms.com * JotForm (https://www.jotform.com/help/433-How-to-Add-an-E-Signature-t...) * https://www.webmerge.me (I know about all these because I'm working on a PDF generation service for developers called DocSpring [1]. I'm also working on e-signature support [2], but that's still under development, and still won't be a perfect fit for your use-case.) [1] https://docspring.com [2] https://docspring.com/docs/data_requests.html
		toomuchtodo on Nov 25, 2019 \| parent \| context \| on: Show HN: I made a no-bullshit image host I'm not a Google fan by any means, but you might explore the Content Safety API [1] they're offering to partners. Thorn might also offer something similar [2]. [1] https://www.blog.google/around-the-globe/google-europe/using... [2] https://www.thorn.org/
		agazso on Nov 6, 2019 \| parent \| context \| on: CRDT: Conflict-free replicated data type For anyone more deeply interested in this topic I recommend to read this blog post from Archagon. It describes the different alternatives (OT, CmRDT, CvRDT, diff sync) for writing a collaborative editor. And unlike academic papers it is written in a format how a programmer does research and thinks about a problem in real life, so it's very natural to follow, even if it's long and complex. http://archagon.net/blog/2018/03/24/data-laced-with-history/ (I am not affiliated in any way, just enjoyed it very much)
		canadaduane on Nov 6, 2019 \| parent \| context \| on: CRDT: Conflict-free replicated data type Several interesting open source projects are working with CRDTs to make state synchronization in distributed systems an easier problem to deal with: `- Braid HTTP (https://braid.news/) - Automerge (https://github.com/automerge/automerge) - Gun (https://gun.eco/) - Yjs (http://y-js.org/) - Noms (https://github.com/attic-labs/noms) - DAT (https://dat.foundation/)` Personally, I'm most excited for Braid's effort to bring state sync to HTTP through the IETF process, as well as Automerge's progress in P2P via Hypermerge (and it's star app, Pushpin). I'd also like to see Gun succeed, but have had a hard time getting started due to visually distracting quirks in its documentation.
		hombre_fatal on June 12, 2019 \| parent \| context \| on: You probably don’t need ReCAPTCHA https://luminati.io/#pricing
		efaref on March 10, 2017 \| parent \| context \| on: Emoji.length == 2 > backspace on a family emoji will eliminate family members one by one. Unexpectedly sinister.
		jrochkind1 on May 30, 2019 \| parent \| context \| on: UTF-8 String Indexing Strategies A codepoint is the smallest unit of meaning in unicode. A byte is just a number, that might (or might not) have meaning in a specific unicode encoding. (Also depending on what other bytes it's next to). A codepoint is the smallest unit that has a graphical representation you can print on screen. A codepoint is the smallest unit that allows API's that are agnostic to encoding, just in terms of the semantic content of unicode. If you want to write any kind of algorithm in terms of the actual character meaning (charecters represented), you want a codepoint abstraction. Most unicode algorithms -- like for collation, normalization, regexp character classes -- are in terms of codepoints. If you split a unicode string on codepoints, the results are always valid unicode strings. If you split a unicode string on bytes, they may not be. Human written language is complicated. Unicode actually does a pretty amazing job of providing an abstraction for dealing with it, but it's still complicated. It's true that it would be a (common) misconception to think that a codepoint always represents "one block on the screen", a "user-perceived character", (a "grapheme cluster"). If you start really getting into it, you realize "a user-perceived character" is a more complex concept than you thought/would like; not because of unicode but because of the complexities of global written human language and what software wants to do with it. But most people who have tried writing internationalized text manipulation of any kind with an API that is only in terms of bytes -- will know that codepoints is definitely superior. If you do need "user-perceived characters" aka "grapheme clusters" -- unicode has an algorithm for that, based on data for each codepoint in the unicode database. https://unicode.org/reports/tr29/ It can be locale-dependent (whereas codepoints are locale independent). And guess what, the algorithm is in terms of codepoints -- if you wanted to implement the algorithm, you would usually want an API based on a codepoint abstraction to start with. The "grapheme cluster" abstraction is necessarily more expensive to deal with than the "codepoint" abstraction (which is itself necessarily more expensive than "bytes") -- "codepoint" is quite often the right balance. I suppose if computers were or got another couple of magnitudes faster, we might all want/demand more widespread implementation of "grapheme cluster" as the abstraction for many more things -- but it'd still be described and usually implemented in terms of the "codepoint" abstraction, and you'd still need the codepoint abstraction for many things, such as normalization. But yes, it would be nice if more platforms/libraries provided "grapheme cluster" abstraction too. But it turns out you can mostly get by with "codepoint". You can't really even get by with just bytes if you want to do any kind of text manipulation or analysis (such as regexp). And codepoint is the abstraction on which "grapheme cluster" is built, it's the lower level and simpler abstraction, so is the first step -- and some platforms have only barely gotten there. A "grapheme cluster" is made up of codepoints. I suppose one could imagine some system that isn't unicode that doesn't use a "codepoint" abstraction but somehow only had "user-perceived characters"... but it would get pretty crazy, for a variety of reasons including but not limited to that "user-perceived character" can be locale-dependent. "codepoint" is a very good and useful abstraction, and is the primary building block of unicode, so it makes sense that unicode-aware platform APIs also use it as a fundamental unit. A codepoint is the unit on which you can look up metadata in the unicode database, for normalization, upper/lowercasing, character classes for regexps, collation (sort order), etc. Unicode is designed to let you do an awful lot with codepoints, in as performant a manner as unicode could figure out.
		Tossrock on Jan 25, 2016 \| parent \| context \| on: Marvin Minsky dies at 88 In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. "What are you doing?", asked Minsky. "I am training a randomly wired neural net to play Tic-tac-toe", Sussman replied. "Why is the net wired randomly?", asked Minsky. "I do not want it to have any preconceptions of how to play", Sussman said. Minsky then shut his eyes. "Why do you close your eyes?" Sussman asked his teacher. "So that the room will be empty." At that moment, Sussman was enlightened. RIP.
		Nullabillity on May 28, 2019 \| parent \| context \| on: Simple Dockerfile examples are often broken by def... Good points, but it's amusing that his solution to #1 didn't lock down the patch version, nor the distro around it. I think that also makes a decent point for Nix[0], which solves #1-#3 by default (since choosing a particular version of Nixpkgs locks down the whole environment, and considers the build as a DAG of dependencies rather than a linear history). It also supports exporting Docker images, while preserving Nix's richer build caching.[1] [0]: https://nixos.org/nix/ [1]: https://grahamc.com/blog/nix-and-layered-docker-images
		gnode on May 20, 2019 \| parent \| context \| on: U.S. put nuclear waste under a dome on a Pacific i... > Democracy is the only system that has ever served any kind of majority of the population. I disagree. Sortition / demarchy, which is the selection of officials randomly from the population (much like a grand jury), has been used throughout history, and is a strong contender. It's exponents tend to state diversity of representation, anti-corruption, and the minimisation of factionalism as advantages.
		sushiday on May 24, 2019 \| parent \| context \| on: Belly.io – Curated List of Programming Coding Stre... Suz Hinton (https://www.twitch.tv/noopkat) has a great article about her Twitch live coding setup: https://medium.com/@suzhinton/my-twitch-live-coding-setup-b2... Also, check out the programming streamers on belly.io! Many of us are happy to answer questions and help out new streamers. :)
		nbrochu on May 24, 2019 \| parent \| context \| on: Belly.io – Curated List of Programming Coding Stre... Not really a guide. Just a few bullet points. Start messing around in OBS (https://obsproject.com) and get as comfortable as you can using it. You can compose scenes, transition between them, set up your audio and video encoding and preview everything without streaming. You can make local recordings test things like volume levels and audio sync. It's fantastic software and it's how you will operate your stream. For programming, things are as simple as they come: Have a main scene that does display capture and perhaps overlay a camera. You can add more bells and whistles if you want; the tools are pretty intuitive. I recommend also making scenes for "Starting Soon", "BRB" and "Stream Over". Having a browser in guest/incognito mode is a good idea. Be mindful of stuff like API keys, secrets, passwords and personal information. Once you are ready to make the leap, you can link your Twitch account to OBS and when you press "Start Streaming" you'll be live shortly on your channel. Before you do though, you'll want to spend a little time in your Twitch dashboard to set up stuff like titles, categories, tags etc. There is a lot more to it that you'll figure out along the line. Live streaming is an iterative process and a skill / hobby that you perfect over time. Have fun!