Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: SEO for JavaScript, HTML5 and Single Page Applications (snapsearch.io)
50 points by CMCDragonkai on May 19, 2014 | hide | past | favorite | 56 comments


Am I crazy for thinking that SEO is a huge reason to NOT do a Single Page App for a public facing website that you want Google to index?

It seems like the wrong business choice from a marketing perspective that we are now looking for a tech solution to solve a problem we created ourselves. Single Page Apps feel like building Flash web apps 5 years ago. It's a valid choice, but with clear drawbacks.


You are not crazy. The single page app trend is solidly in the "peak of inflated expectations" phase of the hype cycle ( http://en.wikipedia.org/wiki/Hype_cycle ), which is why there exists a subculture of developers who are trying to use the approach for everything, even where it isn't appropriate.


I think the single page app trend is coinciding with the growth of javascript, HTML5. It shows that the websites of the future will be more and more like web applications, and they need dynamic UI. The thing that's stopping us the SEO problem, which I hope is no longer problem anymore with snapsearch.


> the websites of the future will be more and more like web applications, and they need dynamic UI. The thing that's stopping us the SEO problem, which I hope is no longer problem anymore with snapsearch.

This juxtaposition doesn't make sense. If the "dynamic UI" can't be done in a progressively enhanced way, why can it work in a gracefully degrading post-hoc way?

The main point of your software is that the output proves that progressive enhancement was the correct way to build the website in the first place.


I don't expect quickbooks or photoshop to be SEO friendly... anyone that expects a web based app to be is doing it wrong. Use shebang urls or working push state loading urls if you want SEO. Also, google does run JS against your rendered content.. just not as often.


Why not just render the public pages of your app out to HTML or use one of the phantom JS services to do it. Then when spiders come up the waterspout have your web server tell them where to go for your sites links.

If you don't ant to do it yourself there are quite a few companies that can help: http://scotch.io/tutorials/javascript/angularjs-seo-with-pre...


There's some important technical and operational differences from SnapSearch and using phantomjs, I answered this in a comment below: https://news.ycombinator.com/item?id=7765731


Agree, to build website maybe not the best solution but i think it has it place for internal and no exposed apps like (CMS, andministrators, dashboards, etc) that normaly the public viewer wouldn't ever see.


I suppose React.js and similar approaches would make it easier, but I also don't immediately see the use case for building single-page apps that need indexation.


Well SnapSearch allows you to make dynamic web applications without worrying about SEO problems. It's not so much a technical solution to a technical problem that we created ourselves, it's the fact that search engines and social network robots have not caught up to next generation websites (which I believe will be a fully dynamic user experience).

We're just pushing the web ahead.


If you're building JavaScript-heavy/single-page applications, you should be applying the principle of progressive enhancement anyway, there is no reason not to do so.

Progressive enhancement brings a number of benefits, of which friendliness to search engines is just one. User has JS turned off? Page still works. Request to CDN fails? Page still works. Silly lint error slips through to production, breaking all your JS? Page still works. Weird cross-browser issue breaks your JS in some clients? Page still works.

The most frequent arguments against progressive enhancement, which has been prevalent amongst a number of back-end and product people I've worked with, is that it is a waste of time and that it adds too much extra effort. I disagree with both. It is not always necessary to reach full feature parity between your JS and non-JS implementations, for instance an auto-suggest input in your JS code can happily be a vanilla input in the non-JS. But if you do just enough to ensure that your application is functional without JavaScript enabled, my (subjective, biased) experience has been that it doesn't take too long.


A web site can be Javascript-heavy[0] without being a web application[1]. There's a big difference.

If you just need auto-completion, modal dialogs and similar features then you shouldn't build a web application, and you definitely should make your site work without Javascript.

On the other hand, if you are building Google Maps, then you should build a web application. Google Maps doesn't work when you disable Javascript.

Supporting clients without Javascript in web applications often doesn't make sense. If it does, you should consider building a standard web site. Just imagine building Google Maps without Javascript. Or better yet, imagine using a version of Google Maps without Javascript. I'm sure that there are people that could pull that off and still end up with something good but it's not the case for most teams (not enough time, resources or skill) building web applications.

[0]: What's considered Javascript-heavy anyways? I think that including support for standard stuff like auto-completion, modal dialogs or even just including bootstrap's javascript these days makes your site Javascript-heavy.

[1]: By this I mean a single page web application that uses Javascript for state management/routing, rendering, communication with the backend, etc. You should be able to deploy an application like this on S3 and it should work. That's not a requirement but I'm just trying to point out that it's an application built completely with Javascript, with no backend support (except for the api that it communicates with).


> On the other hand, if you are building Google Maps, then you should build a web application. Google Maps doesn't work when you disable Javascript.

The "web application" label you use here means what? That it's okay to ignore web development best practices like progressive enhancement, because "web application"?

That Google Maps ignored progressive enhancement is not a good reason for labelling it as a web application.

Is Fix My Street (http://fixmystreet.com/) a web application? It uses a Google Maps type interface so you can report broken street lights, fly tipping:

https://www.mysociety.org/2011/07/08/technical-fixmystreet-m...

> it's an application built completely with Javascript, with no backend support (except for the api that it communicates with).

It's a circuitous definition. Surely an API requires a backend? And surely that API predominantly uses HTTP?

There isn't much difference between returning structured data, and returning an HTML view of structured data. That's just a twiddling of views from a controller (assuming a well-structured codebase). Both deal with an HTTP request and emit an HTTP response, one just does an extra step of rendering HTML.


My problem with this approach is most people don't/can't pull it off properly with the end result being mediocre/sub-standard in either case. It's like when iPhone competitors tried to compete by producing duo keyboard/touch screen phones. They just ended up loosing both ways.

Pick one and do the best design/implementation you can pull off. One design/implementation is hard enough and unless you don't plan to do any real hustling (like getting out of the office and do some face-to-face marketing and lead generation), SEO shouldn't be that high on your list anyway. Good if SEO works for you but it's not the end of the world if it doesn't.


I agree, the goal of Snapsearch is to allow the possibility of going full steam ahead with heavy javascript, and not having to worry about trying to make it all compatible with non-js clients like search engines. It's all about reducing the workload in making single page apps.


> the goal of Snapsearch is to allow the possibility of going full steam ahead with heavy javascript, and not having to worry about trying to make it all compatible with non-js clients like search engines.

The same people who can't build websites in a progressively enhanced way are the same ones that also can't get other fundamental concepts working properly, like:

* Using HTTP URLs in links (or even, using anchor elements as links)

* text-equivalents for non-text content

* not relying on colour to convey content

* Using appropriate semantic structure to mark up content.

There's not much that can be done to save a developer at that point. The best your tool can do is expose more inferior/low-quality content to a search engine (inaccessible in terms of Perceivability, Operability and Usability - since your solution only deals with the Robustness principle of accessibility), after it has been safely constrained inside a JavaScript-mandated environment.

When JavaScript dependency becomes an acceptable starting point, semantic structure, and good markup principles get dumped just as quickly, because they are interdependent.


Easier said than done if you're using something like Angular or Ember.


If angular or ember make it difficult to follow best practices, are they the best tool for the job?

Less trollishly, although I have no experience with angular or ember, I have built a progressively enhanced application with knockout and I thought it went quite well. As with any progressively enhanced application, it requires that your back-end is set up to respond with both markup and JSON responses, but the data- attributes responsible for binding behaviour are just ignored by the browser when script is not running.

Perhaps my experience is too simplistic, though. Can you give me an example of the kind of things that angular and ember make difficult to progressively enhance?


Yehuda Katz, the lead on Ember.js was categorical about the use of ember.js when you require progressive enhancement: Don't use ember.js.

So this is just a case of using the wrong tools for the job, buying the hype without validating the requirements.


Don't you generally use those for web apps where SEO and search engine indexation is less important? I'm not very experienced in this area, so I find it hard to think of an example where you need indexation/SEO-optimization for the kinds of things you do with Angular or Ember.

I'm very curious to hear of some examples.


Content sites should be able to take advantage of javascript technologies more fully. It's all about user experience in the end. There's all sorts of new ways of navigating or presenting content that isn't just a static document after static document. In these cases, SEO is a requirement.


Ah, good point.

And come to think of it, I'm working on a basic blog and I'm using javascript (react.js, specifically) to make the client-side experiences smoother... I had completely forgotten about that.

Thanks for the insight.


I'm finding a handy benefit of progressive enhancement is that it forces two distinct server side web-controllers for each piece of functionality - a server-side representation, and an Ajax representation.

That's useful in separating the the HTML generation from the business logic in a way that's obviously evident. So it helps me structure the code in a good clean way.


Am I crazy thinking that we as web developers should NOT HAVE TO have to care about this? It's Google's and Bing's and Yahoo's problem to go build decent spiders that run multiple versions of the most recent web browsers inside them, and they do have the computing power that's needed.

All spiders/crawlers should see webpages just as human users see them that's the point! They should do whatever it takes to get this working, regardless the effort, because this is why they get paid (indirectly, though adds, but still...). And this is still not enough: it's any respectable search engine's duty to try and develop near-human-level AIs to be able to also extract meaningful information from visual structure, from pictures, and from hard to understand prose. And longer term, it will be their duty to develop above-human-level AIs to be even better at this than humans so they can provide even better search results.

Since when are we, web developers, expected to do the search engines' work for them?!

I expect this is just a transition phase and things will get on the right track soon.

(Note: and no, I'm not against helping search engines. Providing semantic markup, microformats and all this is great. But there is a difference between helping them and doing the work for them, on our dime!)

EDIT+: @OP: I'm not critical, what you and prerender.io are doing is awesome and immensely helpful for web developers! But you do realize that you're doing what search engines should be doing themselves, right? And when they will wake up and realize they should get into the game, they will have basically put you out of business... or buy your business, which I think is what you expect :) Anyway, good luck with it!


Hey I understand where you're coming from. I had the same impression before I began on SnapSearch. But right now SPAs are still a minority of the websites in the world, but eventually everybody will start making web applications.

But do note, as I said in the other comments, it's not just search engines. There'll be a day when curlling sites will require a JS VM. There's a big difference between a standard curl vs a full blown chrome/firefox level VM. Considering the amount of sites google crawls per second, their computing costs would explode compared to what they're doing now.

And this doesn't even address all the social network bots, and pretty much all the other bots doing stuff other than search indexing.

So I think we're still got quite some time before we're redundant.

At any case, the question for you is, are you going to wait for Google to implement this before you can start using single page application technologies which can give you an edge when it comes to customer user experience?


Why would curl need a full VM? Just curl the API/Service endpoints for the data you want.. if you need more that's what PhantomJS is for.


> Since when are we, web developers, expected to do the search engines' work for them?!

The Web is a trifecta of HTTP, URLs and hyperlinks. That is the fundamental baseline of what it is to be Web. That's what makes it accessible to anyone with a basic internet connection. With that universality comes an overall improvement in education, which improves the quality of life of people on this planet. Not just your little home-town, but the planet.

The Web isn't an artificial intelligence. Not even in the dark days of RDF, respite with logical reasoning engines and ontologies. That isn't going to be your saviour.

So far, there's nothing in the HTTP spec document that stipulates that it requires a JavaScript capable execution. The Web is built to be extensible, which is why techniques like progressive enhancement are just as fundamental. Assume the trifecta of HTTP, URLs and hyperlinks are present and build with that, then enhance with the litany of technologies and shiny toys built on top of the Web stack. Don't just assume those technical building blocks are always going to be there.

So what you are doing, by requiring JavaScript at the HTTP level, isn't Web Development. It's probably more akin to the death-throes of the last Flash developers struggling to justify their existence.

If you need your site to be indexable and searchable, universal, a JavaScript-dependent solution is sub-standard.

These "post-implementation" Javascript static site scrapers, they emulate progressive enhancement badly, because they are backwards. They are about as qualitatively as useful as building a separate accessible website to your main website, you know for them blind people with their talking browsers. It's a band-aid at the point you realised you did not spend the requisite amount of time understanding the environment in which your "website" runs.

And, please, read the full specification of the Google Crawlable Ajax spec before you throw it at me. I have. Including the part that says you really should have built it properly in the first place.


We as web developers need to create websites that make money. Part of that is to create websites which are visible and indexable by search engines..

In the UK people have been naming their businesses AAA whatever to appear first in the yellow pages for like 20 years. I don't see how this is much different from what we are doing now.


The search engines (google and bing) both run js content. Hell bing's IE browsers show up in google analytics. I worked on a site where we cleaned up our url structure with proper redirects etc... and bing went crazy with it, totally threw off our analytics (tons of unique clients, no retention, high drop off).

I also know that google loads content in a headless browser as well. If you use push-state or shebang urls that load properly, and have proper output, google and bing get it fine.

over 35k results for photogallery.classiccars.com in google. (I didn't design it, just implemented it a couple years ago)

If you want an SPA interface, be prepared to do the work for SEO. If you can't handle that, then you should deal with it... I don't expect actual applications to be SEO friendly.


Will Google eventually begin javascript-enabled scraping themselves?


They may one day. But it's a big computational overhead from straight curling to running a JS VM for everything. Right now SPAs are still a minority on the web. But I guess the question is will you wait for Google to support your site and lose out on the awesomeness of single page applications?


I don't know the specifics, but they already have been for years (at least to some extent). Several of my clients have found content deep inside public-facing client-side apps indexed in situations where we didn't care if it was indexed, but assumed it wouldn't/couldn't be.


Do you have any examples? Because I've been building single page apps for 2 years, and none of it is indexed. Also indexing for search is not the only thing, sometimes clients need google adverts, which require the google media bot check your site's content. One of my clients could not get any data out, and the site could not get approved by Google for the adverts. But with SnapSearch, the google media bot was able to extract the necessary information.

Check your google webmaster tools, and you can force a scrape from google bots, and you can see that SPA sites are not discoverable.


I can't share the specific examples, unfortunately.

That's right that if you force a googlebot request from the webmaster tools, it's essentially just how the basic googlebot would wget a page on your site. We know they're doing more than that though, because it would be trivial to hide cloaking and malware from Google if that's the only tool they used for scraping.

What Google actually does for crawling content seems to be a bit more nuanced. If you watch the server-side logs on a site that Google crawls often, you'll sometimes see a pattern of googlebot crawling a page and then an immediate subsequent request from a Google IP from a Chrome UA. I'm sure that's partially just to assess malware/cloaking, but it's probably also related to developments like this one: https://twitter.com/mattcutts/status/131425949597179904

They're definitely doing more than just the ?_escaped_fragment_ rigamarole that almost no one uses.

Fair point about the AdSense crawler though. It seems to be just about as dumb as a rock in my experience.


I feel like applications and tooling to pre-render your HTML (eg: Reactjs) are more interesting. It's definitely more work currently (when done right), but it seems like the most natural flow. Always render html, let clientside link it up, everybody wins.


I think this should be more a problem for google than for web developer. web developer have the right to choose new technique they like, it's google's responsibility to scratch your content well.

But now things are reversed, It's kind of monopoly, isn't it?


Developers have always been free to choose techniques. Search engines owe you, the developer, nothing. If Google et al can't scrape your content, they'll scrape a competitor who better serves parse-able content. If you want to play ball with a particular search engine, you have to serve content that they can parse at the time.


But it's not just google now. It's anybody who does any kind of crawling on the web, which includes social media robots (that acquire images from the link), personal web crawlers, private internal search engine, data mining, even advertisement bots that check content. Would they all have to embed and maintain/update a JS VM?


When you scrape the HTML from a single page app (let's say the default app homepage) do you re-write the javascript links to be HTML links with unique URLs?

If not, how does Google find anything more than the app homepage?


Your links on your homepage, should already be "<a></a>" anchor links. When google hits your page, and it's served up by SnapSearch, then it will automatically find those anchor tags and navigate to the subpages. If you're talking about buttons, perhaps Google will click on those buttons. But any links should be represented by anchor links.

This works for SnapSeach.io. Check on Google, search https://snapsearch.io and all the subpages of SnapSearch are indexed. Also works for http://dreamitapp.com

So to answer your question, there's no need to rewrite your links.


Sitemap works


Yep a sitemap can also work. But it's not needed, since Google will follow your anchored links anyway. SnapSearch in the future can also create sitemaps dynamically.


I think the logo could use a little work. The lightning bolt doesn't come off as an S, so it looks more like "napSearch".


Thanks for the feedback :). I'll look into it. Did you know the bolt is also a tiger stripe? That's why I got the tiger theme.


Hey HN, Roger here, happy to answer any questions.


This seems exactly like prerender.io but with more code to insert (harder to use) and more expensive..


I calculated the average price per usage versus Prerender.io for average sites. And it comes out to be cheaper or equivalent to the standard price of Prerender.io.

Basically take the amount of usages you use of Prerender.io per month, and compare it yourself.

What do you mean by more code to insert? I do know of Prerender.io but more code depends on which framework/language you're using. If it's node. It's pretty much one line.

The extra code is just options that allow you to customise the way the interceptor works, these options include things like blacklist, whitelist, regex, extensions ignoring, client side caching.. etc. But the basic usage is the just key + email, and you're good to go.

Also note that unlike Prerender, our cache storage is free, so we don't track how many pages you have.


I see three lines of code whereas prerender.io is one line. I guess that's because they don't need you to configure the client with the API key or email.

That's not really a huge difference, although it is still a little bit more effort.

It really comes down to price I think.

I can't really tell for sure what calculation to use to compare the pricing. Looks like you might be cheaper.

You might consider doing a blog or something comparing the pricing though because I think other people know about prerender also and having a simple explanation that proves your pricing is better would make it easier for people to make that decision.


Ah, well Prerender does use a single file, which means they haven't broken down their middleware into 3 components. All SnapSearch middleware is broken down (OOP style) into Detector, Client and Interceptor.

I recognised, that this OOP strictness leads to a bit more verbosity.

Therefore, for PHP I build a Stack interceptor. For Ruby, there's a Rack alternative. NodeJS has the expressConnector. Python doesn't have a single thing that brings them together atm.

These integration points, make the entire thing a one liner to integrate.

If you're referring to node, here's the connectInterceptor. Usage is on the README: https://github.com/SnapSearch/SnapSearch-Client-Node/blob/ma...

Also sure, I'm going to get a blog up soon. Thanks for the advice. I'll do one on comparing pricing.

But do note, that the fact that we use Firefox instead of QTWebkit means that we get 6 week release cycles so our scrapers keep up with the latest development HTML5 unlike the slower QTWebkit of PhantomJS. So we do have a technical advantage :)


Same question than the one asked some months ago [1]:

> Why should someone use your service instead of existing ones? What is your added-value compared for example to seo4ajax.com?

It loooks like the innovation is just to use Firefox instead of PhantomJS. Maybe I am wrong.

[1] https://news.ycombinator.com/item?id=6781843


Hey gildas, I answered a similar question here https://news.ycombinator.com/item?id=7765731


Is there any difference in functionality or added benefit of using your service over prerender?


Here's a shortlist of differences and benefits:

1. SnapSearch uses load balanced Firefox, this allows us to keep up with the latest in HTML5 unlike the QtWebkit of PhantomJS

2. SnapSearch does metered billing, this comes out cheaper for average sites, or sites that have lots of pages but dont send out tons of requests. We charge on usages, not cached requests, and not on cach storage.

3. SnapSearch's middleware is more robust, with 200 robots that it checks, and regularly updated robot list. There's a lot of flexibility in the code.

4. We provide you a bit more analytics in the control panel, so you can see things like request portion from multiple domains, requests/usages from search engines over a month. A progress bar on usages bar. A cache storage list of all the snapshots you have so you can precache pages if you wish.

Try it out, it's free for 1000 usages per month!

Also note that our API supports custom javascript callbacks, which allows you to do all sorts of interesting things prior to the caching of the snapshot.

Furthermore, we also support soft 404s and redirections (including meta tag, header and js redirections (synchronoous and asynchronous). All of it is explained on the documentation.

Flash support coming soon too.


Dear N00bs, SPA works great w/ SEO.

Hashtags! (ex: https://github.com/puppetMaster3/ModuleBU/blob/master/latest... in AppBu section, github has examples).

Just because twitter or Facebook programers don't know how to do it, but then they do hire 0 experience out of school.

Cheers.


Down votes?

It upsets you that I show you how to SEO w/ SPA?

Says a bit about you I think.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: