Hacker Newsnew | past | comments | ask | show | jobs | submit | casual-dev's commentslogin

Question is, can I use my spare APU for this. On second thought, giving it more VRAM in BIOS only means using more RAM for it. So it would just be a RAM disk with extra steps.

Nevertheless, neat!


My first time in the UK shook my world: Where is the bread? This is it? That is not bread! After that visit I realized that Germany has a rich baking culture. And yes, we do love a hard crust once in while. So please, leave the Knäusele for me, if you don't like it.


First thing we did after moving to the UK was to buy a bread baking machine. Not quite bakery-level bread, but good enough for daily consumption, and it leaves a nice smell in the house.


Same here, when I went to university in England, after growing up in Germany, I was shocked to find that all the bread I was able to find was crap.


Im curious: Why are you running a fake HTTP server? To slow down bots for a while or does it serve a higher purpose?


Slow down bots, and to see what garbage they're sending. I keep track of POST data and HTTP headers.

I'm on CenturyLink fiber, they are apparently so incompetent they can't take my money for a static IP address, so whoever is doing HTTP requests is blindly trolling CenturyLink residential IPv4 blocks. Nobody doing this is up to anything legit.


I can agree on the fundamentals, even though I took OOP and CS in college, but that was a long time ago and not my main field of study. I find it hard to apply to JS though, as you said, it is not easy to comprehend.

Are you suggesting a different technology stack for the fundamentals? Others seem to go the same way, but they are suggesting it within the ecosystem of JS, just not with the overhead of the new technologies.


> Are you suggesting a different technology stack for the fundamentals?

It's not that important in general. But I assumed that if you're already overwhelmed by JS infra, you might get a smoother initial experience with less messy yet still interpreted dynamic language like ruby.


It does and it sets a good perspective on the problem at hand. Thank you.


You are right, but me and my colleague are limited on time unfortunately, because there is so much work to do. I already asked him, if he could include me in his next project, just to watch, what he is doing from the boilerplate up. Maybe this clears out some of the question marks. And thank you for the resources. I already use regexr.com for visualisation, but this seems better for learning.


Thanks, I just needed the hint to take a step back and reevaluate.

Having a wiki is a good tip, I already have started to maintain a small database with snippets I use often. But it's just that. My next focus will be having a project which does not have to many features. Baby steps it is.


You can do it. It's normal to feel overwhelmed. I was in that position once but I really had no choice, I had to press forward or starve, so I eventually made it through.


Thanks for the kind words. Slowly going into the water is my approach as well, but sometimes it just gets to me. My learning projects die on the hill, because of the frustration I have on the job with these techniques. Plus, overwhelming ecosystem.

About my regex problem: This is a structual mess. JSON/XML with HTML code in the data fields. We process them and send them to multiple job boards. Our clients mainly use HRM software or some CMS, some of which are only able to spit out whatever HTML is displayed on their career sites. This code often does not even have classes or IDs. Most of the times we are dangling together whatever is between two headlines, praying those won't change. But they do, because the recruiters put fields, where they not belong. I call myself code cleaner, not web dev nowadays. We are not able to use APIs, because the receiving job boards either don't offer one, the client doesn't, or it's just not worth it financially.

I will take a step back and reevaluate my situation.


I spent a decade parsing text-with-angle-brackets with regexes, and it sucks. It’s always tempting to try an html parser but if the code is written by a human (or worse, a mixture of human and machine, especially if the machine involves MS Word) it just doesn’t work.

I’d suggest rather than attempting to do big regexes that capture a bunch of stuff in one call, break it down to a bunch of smaller, more targeted calls - one call to capture the text of the whole record, another with 3 variants to get the title, another with 2 variants to pick up a tag line, etc.


Essentially, this is what I do. First matching with a broader regex ruleset, working down to next one and so on and so forth. But with more complexity of code comes more breakage down the line. I went in full maze mode yesterday and questioned everything after thtat, so this is what my sanity looked like this morning.

Regex isn't really the problem though (even though it technically should also not be the solution in this case, but I cannot dictate the techstack). It was just the last drop on my frustration with the situation and myself not being able to do, what my colleague does, even though I want to. I felt the need for help, and I got it. Awesome community around here.


Thank you for the context! What you're doing is actually much harder than regular web dev. It's a specialized kind of data processing, often called a "extract, transform, load" (ETL) workflow.

Most web devs don't need to do that, and that you're willing to tackle it at all just shows how willing to learn you are, despite the frustration.

If you hate this situation, it's totally understandable lol. That kind of work has all the tedium of dealing with someone else's arcane data format, and none of the joy of seeing your creativity come to life. Some people love that sort of work, and specialize in it, becoming backend people or DB engineers or data scientists or the such, but it's not usually what web devs are known for (who tend to focus instead on UIs and some level of design and interactive stateful apps). Nothing wrong if ETL just isn't your cup of tea. I'd go crazy if I had to do that often, too.

Anyhow, if I'm understanding you right, you have HTML embedded in either JSON and/or XML. Do you know what "escaping" is in the text embedding sense? Like if you have quotes inside quotes, or tag brackets inside tags, how to separate each layer of embedding? If your JSON and XML files are cleanly escaped, you should be able to (as a first step) just iterate through the files and get the HTML parts out (without regex).

Like if the HTML is just a data string inside JSON, you can transform the JSON into an array of HTML strings using array.map() or object.values.map().

In the XML, if the HTML is stored in CDATA fields, you can access it using an "XPath" selector... you know how CSS has selectors that let you say headings should be styled one way, paragraphs another? XML has its own selector language that lets you directly target a certain node inside the document, without using regex, by specifying the hierarchical path that takes you there (like a CDATA inside a description inside a job inside a company, or whatever). Although there is a learning curve to XPath, it is much more suited to the task than regex, because the regex can't easily account for the complexity within XML (especially when there's nested layers).

It would help if you can post some example snippets, but that might be better suited for Stack than HN (though feel free to link to it here).

Once you have the HTML out, then you can run it through a sanitizer -- that's an optional step, but would let you strip out unnecessary divs, old font tags, whatever, keeping old basic formatting (headers, paragraphs, links, bold, etc.) which should be much cleaner to hand off to your clients. That would be much easier to embed on someone else's site vs a scraped page with all the HTML mess from someone else's framework.

I know there is a lot of complexity in each of those steps, but there are great tools and documentation for each step of the way. That's just to get you started.

At the end of the day what you're doing isn't really a Javascript issue at all, it's just a different kind of work that Javascript happens to be able to handle if you really need it to (but so can Python or Java or specialized command line tools like jq). It's a different body of work, which is why your casual web dev skills aren't providing easy answers. It's OK! You can learn it once and make it work (and then decide never to do that again, like I did lol). Or switch tracks, totally up to you :)

But feel free to ask here or on Stack if you have followups!


You are much appreciated. I didn't even know there is term for this part of my work.

Down the line, we do everything you cautiously described. We extract single fields with pointers (in lack of a better term, english is not my main language) to the XML/JSON fields we like to extract. Our software then lets us use JS snippets to manipulate the contents. Problem is, once you define a rule, it may get 80-90% over hundreds of datasets. But breakage is not an option most of the time. It's pareto principle work: 80% in 20% of the time, 20% work in 80% of the time. In the end, they are just snippets, then a giant gap, then the projects my colleague does.

I get where you are coming from, regarding "never to do that again". This not the only work I do. I also build HTML from customer demands, many of which are pdfs meant for print use, but not for the web. I like it, but I only scratch the surface of what might be. Thanks to the resources in this thread, I have a good insight of what to come. So, thanks again.


Thanks for the infos. Are Storage-Boxes fully enabled Nextcloud instances with the known web ui?


The storage shares are nextcloud instances. The storage boxes on the other hand can only be accessed via SCP, FTP, SFTP, rsync, borg, WebDAV and smb.


We wrote a quick comparison/contrast of the two products here: https://docs.hetzner.com/robot/storage-box/faq/storage-box-v... --Katie


While I see your points, your statements are - as you said yourself - a good example of negativity. Covid for me personally was not that hard on me. I got to spend more time with my wife and less with colleagues at work while working remote. My commute dropped to zero, I had much more free time at hands. I got to work out indoor and outdoor whenever possible. Dropped 60 pounds from March 2020, after entering the first lockdown. I even got tighter with my family during these times. And I could finally tackle some of the projects I had left for dead.

I cannot claim to be the epitome of positivity though. I have to work on my views of people in general. Anti-Vaxxers and climate change have left me with deep emotional cuts, consider it weltschmerz.

Are you by any chance driven by outside 'pings'? Extrovert people that rely on direct feedback from others really had/have a bad time. Maybe this realisation or thinking about it may help you understand why you are feeling this way. In the end, every negative situation e.g. a global pandemic, need some form of personal resilience. How to get there is unfortunatley on you to figure out.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: