Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A lot of large enterprises won't be comfortable hosting their data outside of their own data centers. The killer application is making a portable, on premises version of this functionality without the high price.


I've heard this argument time and time again in the context of various solutions/technologies. I still today feel that for 95% of the companies that "feel this way", it's simply the result of foolish paranoia among older, upper management. The type of thing that separates a "Fast Company" from stodgy and likely to be disrupted companies.


Yes, the foolish neckbeards who aren't agile and dynamic, don't use git and aren't iterative but care deeply obeying regulations and not sending data to the /dev/null that mongodb on US-EAST is.

This mindset screams "I AM IN THE VALLEY AND EVERYONE WHO ISNT UNDER 30 AND USES APPLE PRODUCTS DOESNT GET WEB2.0" (also caps lock is cruise control for cool).

Apologies for the negativity, I think I get it, I want my data to be in the cloud, and easily accessible and all that jazz, but I want to keep it secrete and safe and most importantly I want to be mine.

Says the guy who just signed up for the iCloud today... ;-)


I think you're projecting a bit. FWIW I'm pushing 30, have worked on the East coast for finance as well as "in the valley", and generally fall on Yegge's "conservative" side of the spectrum.

A lot of other commenters immediately jumped to the medical records argument, but all I was saying is that for a LOT of companies that make the "we have to have everything on site" argument...it's just not true.


Projecting? Unsure in which direction you mean, but fwiw I'm pushing 30 my self (26) and use apple products.

But I agree with you, the medical records argument is kind of boring. But, not everything needs to be outsourced; There is value in keeping things on site, if not for anything besides job creation!

My pet-peeve in this is that it has been now for a while (and is trending upwards, fast) that we don't see any problems at all, long or short term with simply "shipping it to the cloud", where it is everything from medical records, to phone contact lists to personal communications with our other significant other.

We as a community are quickly eroding any expectation of privacy and security all in the name of being agile. I guess it just rubs me the wrong way.

I should tweet about it, on my iphone and then copy it to a file for prosperity and upload it to my google drive...


I'm kind of confused. You're obviously a rockstar ninja, but why are you just learning about client side encryption now?


That's not really a solution.

The problem falls into two categories, on one hand you have non technical end users, and it takes a non trivial amount of time to train them to roll their own crypto if you will, and it's also hard to convince them it's worth it (This is a fair point, as security is a cost/benefit between ease of use and not getting caught with your ass in the wind).

On the other hand, you have companies using outsourced services, and with SaaS/PaaS/aa becoming all the rage, it's very important in my opinion that those service providers shoulder some of the responsibility to not let their users, serve their users etc in a manner that's not conducive to security/privacy etc...

Punting this problem up the stack, with it most often ending on the end users desks, is IMNHO a bad idea, since then, as it is now all those good things crypto promises are the exception, rather then the norm.

This is obviously much much much more complicated in practice, but I at least see this problem reflected in the "to the cloud!" mentality.

EDIT: Complete rewrite.


Why does nontechnical users' inability to use crypto impact a business's decision on whether or not they should use externally hosted backend services?


Rewrote my previous comment, as muddling up both cases as a single obtuse analogy was a mistake on my part.

But can't we say that we have both a moral and an ethical obligation to protect our nontechnical users or our fellow developers from mistakes, lack of training or in the worst case maleficence ?

The business decision of using an externally hosted backend services, what ever they may be must take into account what data goes into it, out of it and how it's computed on by both you and the provider w.r.t. who the real end user is and how the data is going to live on.

And here I think is the crux of the problem, those questions and their solutions are generally very hard when put into practice (I dont have a silver bullet, or even a something vaguely resembling a mold for it) so it's not very conducive to being a "Fast" company.

For example, being European, it scares me a great deal that companies, schools and the public sector are increasingly punting the business decision of "how to handle email" to "let's use gmail".

That in no way takes into account my concerns (and often I do not have a choice in the matter of using these services) since my mail, and by extension a large part of my life is being handed to a for profit US corporation who "does no evil".

I use gmail privately though, since I did this particular cost/benefit and decided that i dont really care if google reads my mailing list traffic...


What is a solution? AWS is a general use compute resource. There is no reason that they should enforce crypto anywhere other than SSH/etc. That is obviously in the domain of the dependent service to decide and implement. Encryption has a cost/benefit ratio that is different for every client, there's no reason everyone should have to pay and use encryption resources if they don't need them.

I find your observation that this problem is reflected especially in service oriented architectures questionable. By centralizing all resources (including documentation: http://aws.amazon.com/security/) It makes it easier to enforce best practices and standard interfaces. But just because they can doesn't mean it's always a good idea to do that.


Some of those industries have legal and or regulatory reasons for not hosting all their data on AWS. In addition, AWS isn't always the economic boon it's made out to be. Those fees can really add up once you start moving enough data around.

I'm a big fan of AWS. But, like any other tool, it's not meant for every job.


Best I can figure, this will cost you around $180K for 44TB for three years. I think that's actually a very low estimate. The pricing is confusing. :-)

A Dell MD1220 with 24 Crucial M4 512GB SSDs will run you $12,600. That's 12TB. Multiply by 4, enable compression, etc etc.

You could buy two of those setups, pay for power, cabinet space, and bandwidth, have a ridiculous amount of IO available, with single-digit millisecond latency, pushing 6Gbps and still have money to burn. And it'll take you (much) less time to unbox and setup than it will to push that much data up to AWS.

Granted, this new service is probably only 50% more expensive than hosted your own, and if you have zero IT staff, it might make sense in some scenarios, but it's definitely not a no-brainer.

It doesn't need to compete with Terradata. It needs to compete with Dell, and in that field, it's still the more expensive option by a significant price margin, as well as being (odds are good) at least a couple orders of magnitude slower.


For some industries that's true - for others, being "Fast" isn't the priority.


I work for a company that's in direct competition with Amazon (in one of the many areas Amazon operates in).

Handing our customer lists, source code, finance and sales data to Amazon in plaintext form seems naive to me. There's lots of people at Amazon, and it only takes one ambitious middle manager who wants to get noticed by cleverly anticipating the competition. Most likely there's no audit trail, and no chance of getting caught.

What do you think, am I unreasonably cynical?


Yes. The actual ssh keys to AWS servers will be very heavily guarded by people in AWS whose jobs are not to let anyone see.

They will not have any skin in the "middle managers" personal game and so his only other resort is straightforward hacking which he could do in your data center anyway.

Nah. The cloud is as safe as your data center - with the exception of bad apples at Amazon (same diff at your data center). Its servers, in data centers, virtualised.

At this level, I suspect you will not even multi tenant with others above a certain price point.


Amazon's external-facing security seems robust, but most places I've worked have given a lot of trust to people on the inside. I've worked at places where all developers get read access to all databases - and managers who are former developers usually retain that access.

Amazon might have good internal security procedures - but this stuff can't be audited effectively, we can only take amazon's word for it. Taking their word for it, with the security of all your customers' data, is a big ask.


That's not how it works. EC2 can't log in to your virts. S3 can't (trivially) read your unencrypted bits, and they certainly can't get to your ec2 dom0. Everything is pretty well firewalled. There's no back door access. If service A uses service B they hit the same public API as every other customer. Beyond that AMZN is really three companies; Amazon.com (retail), AWS, & Amazon Digital (kindle/vod/etc). That all said, you own your availability (and risk assessment etc).

Edit: " this stuff can't be audited effectively, we can only take amazon's word for it". Or a trusted third party. Go ask your aws sales rep about pci, fisma, etc.


Maybe

Things you can do:

- Use another provider

- Disk encryption / DB encryption

- Audit audit audit


Good luck talking a military customer into hosting their data in some "cloud" somewhere. Pretty sure banks and other industries will also feel that way. The market there is to deliver and support a small cloud infrastructure they can host and use themselves on their closed network.

So instead of building 1 cloud storage service, you need to effectively build a cloud storage factory, so you can deploy N cloud storage services on demand.

At that point you also potentially deliver a product (a rack of hardware) not just a pure service.


Sometimes you just have to follow the specifications or follow the rules or obey the laws. They don't always make sense.


Totally agree, and they'll just be left in the dust while paying bucket loads for services that we can now acquire for greater deals less. At the end of the day, Amazon should have no problems finding clients interested in this service.


Question: would you trust all your medical data/history and shopping data/history tied up in their cloud?


What makes you think it isn't already?

Hospitals & doctors outsource, and as long as the provider is HIPPA compliant (which AWS is[1]), your data is probably out there already.

[1] http://awsmedia.s3.amazonaws.com/AWS_HIPAA_Whitepaper_Final....


What's the threat model?

Careless or corrupt health staff releasing my information without my permission? Well, it doesn't really matter where the data is stored.


What's the difference? We trust such data to be in all sorts of insecure places. Do you really think one of the low payed secretaries at a medical office isn't easily bribable? Or that every IT system your data eventually touches has Bruce Schneier doing their security?


All the people I wouldn't want to have access to my medical history (governments, insurance companies, and doctors) already do. Ditto for shopping data. The legal/security front for that sort of data is entirely pointless, as the bad guys are authorized parties, so you may as well save some money by putting it in the cloud.

There were always only two defensible privacy fronts: keeping the data off electronic records or filling the records with shit.


what are the regulations on things like health records, personal information, etc... stuff that has tight restrictions on how the data is handled. Can these types of data be stored on Amazon or similar services and still be in compliance of data protection laws?


For health records the regulations are a bit of a confusing mess when it comes to cloud storage. Basically, it boils down to "whatever your organization's legal team says". In theory, if data is encrypted in transit, encrypted at rest, and access is limited/logged, then it should meet US HIPAA requirements. However, that may not be enough to satisfy a particularly conservative legal department. There are also nuances about who holds the encryption keys, how are they managed, etc... Notably, Amazon won't actually stick their neck out and certify AWS as HIPAA compliant through a business associates agreement (interestingly Microsoft will for Azure: http://www.windowsazure.com/en-us/support/trust-center/compl...). I've been told by consultants that Amazon has so much business it's just not worth their time to bother with the headache of setting up such agreements.


Yep [1]. SAS70, HIPAA, PCIDSS, FISMA, etc.

[1] http://aws.amazon.com/security/#certifications


Depends on a variety of factors including which regulations are governing the data. Some privacy laws require such records can't leave the country in which they're obtained. Other records have strict rules about disposition or "destruction" of the record. It's a complex field and wide open with questions.

From courts to records managers/custodians everyone is still trying to understand those questions. In my experience, when in doubt, big business decides the safest legal answer is "probably not".


Another barrier will be the integration with their current "business intelligence" solutions. Microstrategy and Jaspersoft support is a good start, but what about Microsoft, Oracle and SAP offerings?

Switching to Amazon would involve rewriting your ETL process, and retooling your reporting software, and converting all your existing, currently-used reports.

A huge expense in data warehousing projects isn't the hardware - it's the consultants, the time, and the people to support the thing. I'm sure this is a great solution for companies looking to start a data warehouse, or maybe companies looking to revamp their reporting environment completely... but other than that it'd be a hard sell...


I bet they're getting ready for the upcoming "big data" explosion and not so much eyeing migration from already deployed data warehouses.


Cloudera is doing just that with the recent announcement / open sourcing of Impala. Based on Amazon's description of their hosted product, the technology is very similar. Impala is still in beta, and columnar storage (trevni/avro) is right around the corner...with that, you can do petabyte scale queries for a very low cost.

https://github.com/cloudera/impala


Platfora is doing some interesting work with interactive, in-memory BI for Hadoop. They essentially do away with the traditional DW/ETL model and create ephemeral in-memory 'lenses' for querying and visualization.


Impala is married to Hadoop. What if your data infrastructure isn't built on hbase and too complex/large to integrate it easily? Would impala still serve that purpose?


Impala doesn't require HBase to operate, it can use raw HDFS. Simple example, if you had a few terabytes of TSV files, you could easily copy the raw data into HDFS and then create a simple schema around it. All queries on this data would be in parallel across all the nodes in the cluster, this is partly due to the distributed nature of HDFS.


If your data is too difficult to integrate into HDFS (doesn't have to be HBase) using existing Hadoop tools, I suspect you're going to have to do some work to use that data on any platform.


Most enterprises already outsource a lot of their IT, data warehousing especially whether its to a big player like IBM or boutique consultancies.

For most applications does it really matter whether the data sits in your data center or Amazons? Nope... cause the organisation your company contracted to manage already has full access to all your secrets.

So really Amazon is just another IT outsourcer except you don't need a long drawn out sales process.


There are lots of startups hosting their services on AWS. Their data is already on AWS. Amazon's Redshift makes perfect sense for them.

Many startups like Qubole have been already working on providing such cloud based solutions for data analysis.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: