Amazon Announces new Data Warehousing Product

dude_abides · on Nov 28, 2012

This has the potential of really disrupting the enterprise data warehouse sector. All the MPP vendors today (HP Vertica, EMC Greenplum, Teradata) have exhorbitant pricing and ridiculous licensing. With their pricing - 1000 $ per TB per year, I would be really worried if I were Teradata (Not so much if I were IBM).

jasondc · on Nov 28, 2012

A lot of large enterprises won't be comfortable hosting their data outside of their own data centers. The killer application is making a portable, on premises version of this functionality without the high price.

optimusclimb · on Nov 28, 2012

I've heard this argument time and time again in the context of various solutions/technologies. I still today feel that for 95% of the companies that "feel this way", it's simply the result of foolish paranoia among older, upper management. The type of thing that separates a "Fast Company" from stodgy and likely to be disrupted companies.

thorduri · on Nov 28, 2012

Yes, the foolish neckbeards who aren't agile and dynamic, don't use git and aren't iterative but care deeply obeying regulations and not sending data to the /dev/null that mongodb on US-EAST is.

This mindset screams "I AM IN THE VALLEY AND EVERYONE WHO ISNT UNDER 30 AND USES APPLE PRODUCTS DOESNT GET WEB2.0" (also caps lock is cruise control for cool).

Apologies for the negativity, I think I get it, I want my data to be in the cloud, and easily accessible and all that jazz, but I want to keep it secrete and safe and most importantly I want to be mine.

Says the guy who just signed up for the iCloud today... ;-)

optimusclimb · on Nov 28, 2012

I think you're projecting a bit. FWIW I'm pushing 30, have worked on the East coast for finance as well as "in the valley", and generally fall on Yegge's "conservative" side of the spectrum.

A lot of other commenters immediately jumped to the medical records argument, but all I was saying is that for a LOT of companies that make the "we have to have everything on site" argument...it's just not true.

thorduri · on Nov 28, 2012

Projecting? Unsure in which direction you mean, but fwiw I'm pushing 30 my self (26) and use apple products.

But I agree with you, the medical records argument is kind of boring. But, not everything needs to be outsourced; There is value in keeping things on site, if not for anything besides job creation!

My pet-peeve in this is that it has been now for a while (and is trending upwards, fast) that we don't see any problems at all, long or short term with simply "shipping it to the cloud", where it is everything from medical records, to phone contact lists to personal communications with our other significant other.

We as a community are quickly eroding any expectation of privacy and security all in the name of being agile. I guess it just rubs me the wrong way.

I should tweet about it, on my iphone and then copy it to a file for prosperity and upload it to my google drive...

IheartApplesDix · on Nov 28, 2012

I'm kind of confused. You're obviously a rockstar ninja, but why are you just learning about client side encryption now?

thorduri · on Nov 28, 2012

That's not really a solution.

The problem falls into two categories, on one hand you have non technical end users, and it takes a non trivial amount of time to train them to roll their own crypto if you will, and it's also hard to convince them it's worth it (This is a fair point, as security is a cost/benefit between ease of use and not getting caught with your ass in the wind).

On the other hand, you have companies using outsourced services, and with SaaS/PaaS/aa becoming all the rage, it's very important in my opinion that those service providers shoulder some of the responsibility to not let their users, serve their users etc in a manner that's not conducive to security/privacy etc...

Punting this problem up the stack, with it most often ending on the end users desks, is IMNHO a bad idea, since then, as it is now all those good things crypto promises are the exception, rather then the norm.

This is obviously much much much more complicated in practice, but I at least see this problem reflected in the "to the cloud!" mentality.

EDIT: Complete rewrite.

superuser2 · on Nov 28, 2012

Why does nontechnical users' inability to use crypto impact a business's decision on whether or not they should use externally hosted backend services?

thorduri · on Nov 29, 2012

Rewrote my previous comment, as muddling up both cases as a single obtuse analogy was a mistake on my part.

But can't we say that we have both a moral and an ethical obligation to protect our nontechnical users or our fellow developers from mistakes, lack of training or in the worst case maleficence ?

The business decision of using an externally hosted backend services, what ever they may be must take into account what data goes into it, out of it and how it's computed on by both you and the provider w.r.t. who the real end user is and how the data is going to live on.

And here I think is the crux of the problem, those questions and their solutions are generally very hard when put into practice (I dont have a silver bullet, or even a something vaguely resembling a mold for it) so it's not very conducive to being a "Fast" company.

For example, being European, it scares me a great deal that companies, schools and the public sector are increasingly punting the business decision of "how to handle email" to "let's use gmail".

That in no way takes into account my concerns (and often I do not have a choice in the matter of using these services) since my mail, and by extension a large part of my life is being handed to a for profit US corporation who "does no evil".

I use gmail privately though, since I did this particular cost/benefit and decided that i dont really care if google reads my mailing list traffic...

IheartApplesDix · on Nov 29, 2012

What is a solution? AWS is a general use compute resource. There is no reason that they should enforce crypto anywhere other than SSH/etc. That is obviously in the domain of the dependent service to decide and implement. Encryption has a cost/benefit ratio that is different for every client, there's no reason everyone should have to pay and use encryption resources if they don't need them.

I find your observation that this problem is reflected especially in service oriented architectures questionable. By centralizing all resources (including documentation: http://aws.amazon.com/security/) It makes it easier to enforce best practices and standard interfaces. But just because they can doesn't mean it's always a good idea to do that.

numbsafari · on Nov 28, 2012

Some of those industries have legal and or regulatory reasons for not hosting all their data on AWS. In addition, AWS isn't always the economic boon it's made out to be. Those fees can really add up once you start moving enough data around.

I'm a big fan of AWS. But, like any other tool, it's not meant for every job.

ssmoot · on Nov 29, 2012

Best I can figure, this will cost you around $180K for 44TB for three years. I think that's actually a very low estimate. The pricing is confusing. :-)

A Dell MD1220 with 24 Crucial M4 512GB SSDs will run you $12,600. That's 12TB. Multiply by 4, enable compression, etc etc.

You could buy two of those setups, pay for power, cabinet space, and bandwidth, have a ridiculous amount of IO available, with single-digit millisecond latency, pushing 6Gbps and still have money to burn. And it'll take you (much) less time to unbox and setup than it will to push that much data up to AWS.

Granted, this new service is probably only 50% more expensive than hosted your own, and if you have zero IT staff, it might make sense in some scenarios, but it's definitely not a no-brainer.

It doesn't need to compete with Terradata. It needs to compete with Dell, and in that field, it's still the more expensive option by a significant price margin, as well as being (odds are good) at least a couple orders of magnitude slower.

TallGuyShort · on Nov 28, 2012

For some industries that's true - for others, being "Fast" isn't the priority.

michaelt · on Nov 28, 2012

I work for a company that's in direct competition with Amazon (in one of the many areas Amazon operates in).

Handing our customer lists, source code, finance and sales data to Amazon in plaintext form seems naive to me. There's lots of people at Amazon, and it only takes one ambitious middle manager who wants to get noticed by cleverly anticipating the competition. Most likely there's no audit trail, and no chance of getting caught.

What do you think, am I unreasonably cynical?

lifeisstillgood · on Nov 28, 2012

Yes. The actual ssh keys to AWS servers will be very heavily guarded by people in AWS whose jobs are not to let anyone see.

They will not have any skin in the "middle managers" personal game and so his only other resort is straightforward hacking which he could do in your data center anyway.

Nah. The cloud is as safe as your data center - with the exception of bad apples at Amazon (same diff at your data center). Its servers, in data centers, virtualised.

At this level, I suspect you will not even multi tenant with others above a certain price point.

michaelt · on Nov 28, 2012

Amazon's external-facing security seems robust, but most places I've worked have given a lot of trust to people on the inside. I've worked at places where all developers get read access to all databases - and managers who are former developers usually retain that access.

Amazon might have good internal security procedures - but this stuff can't be audited effectively, we can only take amazon's word for it. Taking their word for it, with the security of all your customers' data, is a big ask.

donavanm · on Nov 29, 2012

That's not how it works. EC2 can't log in to your virts. S3 can't (trivially) read your unencrypted bits, and they certainly can't get to your ec2 dom0. Everything is pretty well firewalled. There's no back door access. If service A uses service B they hit the same public API as every other customer. Beyond that AMZN is really three companies; Amazon.com (retail), AWS, & Amazon Digital (kindle/vod/etc). That all said, you own your availability (and risk assessment etc).

Edit: " this stuff can't be audited effectively, we can only take amazon's word for it". Or a trusted third party. Go ask your aws sales rep about pci, fisma, etc.

raverbashing · on Nov 29, 2012

Maybe

Things you can do:

- Use another provider

- Disk encryption / DB encryption

- Audit audit audit

rdtsc · on Nov 29, 2012

Good luck talking a military customer into hosting their data in some "cloud" somewhere. Pretty sure banks and other industries will also feel that way. The market there is to deliver and support a small cloud infrastructure they can host and use themselves on their closed network.

So instead of building 1 cloud storage service, you need to effectively build a cloud storage factory, so you can deploy N cloud storage services on demand.

At that point you also potentially deliver a product (a rack of hardware) not just a pure service.

DanBC · on Nov 28, 2012

Sometimes you just have to follow the specifications or follow the rules or obey the laws. They don't always make sense.

scottmey · on Nov 28, 2012

Totally agree, and they'll just be left in the dust while paying bucket loads for services that we can now acquire for greater deals less. At the end of the day, Amazon should have no problems finding clients interested in this service.

andyzweb · on Nov 28, 2012

Question: would you trust all your medical data/history and shopping data/history tied up in their cloud?

falcolas · on Nov 28, 2012

What makes you think it isn't already?

Hospitals & doctors outsource, and as long as the provider is HIPPA compliant (which AWS is[1]), your data is probably out there already.

[1] http://awsmedia.s3.amazonaws.com/AWS_HIPAA_Whitepaper_Final....

DanBC · on Nov 28, 2012

What's the threat model?

Careless or corrupt health staff releasing my information without my permission? Well, it doesn't really matter where the data is stored.

optimusclimb · on Nov 28, 2012

What's the difference? We trust such data to be in all sorts of insecure places. Do you really think one of the low payed secretaries at a medical office isn't easily bribable? Or that every IT system your data eventually touches has Bruce Schneier doing their security?

bcoates · on Nov 29, 2012

All the people I wouldn't want to have access to my medical history (governments, insurance companies, and doctors) already do. Ditto for shopping data. The legal/security front for that sort of data is entirely pointless, as the bad guys are authorized parties, so you may as well save some money by putting it in the cloud.

There were always only two defensible privacy fronts: keeping the data off electronic records or filling the records with shit.

kyrra · on Nov 28, 2012

what are the regulations on things like health records, personal information, etc... stuff that has tight restrictions on how the data is handled. Can these types of data be stored on Amazon or similar services and still be in compliance of data protection laws?

JunkDNA · on Nov 28, 2012

For health records the regulations are a bit of a confusing mess when it comes to cloud storage. Basically, it boils down to "whatever your organization's legal team says". In theory, if data is encrypted in transit, encrypted at rest, and access is limited/logged, then it should meet US HIPAA requirements. However, that may not be enough to satisfy a particularly conservative legal department. There are also nuances about who holds the encryption keys, how are they managed, etc... Notably, Amazon won't actually stick their neck out and certify AWS as HIPAA compliant through a business associates agreement (interestingly Microsoft will for Azure: http://www.windowsazure.com/en-us/support/trust-center/compl...). I've been told by consultants that Amazon has so much business it's just not worth their time to bother with the headache of setting up such agreements.

mje__ · on Nov 28, 2012

Yep [1]. SAS70, HIPAA, PCIDSS, FISMA, etc.

[1] http://aws.amazon.com/security/#certifications

fredericdb · on Nov 28, 2012

Depends on a variety of factors including which regulations are governing the data. Some privacy laws require such records can't leave the country in which they're obtained. Other records have strict rules about disposition or "destruction" of the record. It's a complex field and wide open with questions.

From courts to records managers/custodians everyone is still trying to understand those questions. In my experience, when in doubt, big business decides the safest legal answer is "probably not".

redidas · on Nov 28, 2012

Another barrier will be the integration with their current "business intelligence" solutions. Microstrategy and Jaspersoft support is a good start, but what about Microsoft, Oracle and SAP offerings?

Switching to Amazon would involve rewriting your ETL process, and retooling your reporting software, and converting all your existing, currently-used reports.

A huge expense in data warehousing projects isn't the hardware - it's the consultants, the time, and the people to support the thing. I'm sure this is a great solution for companies looking to start a data warehouse, or maybe companies looking to revamp their reporting environment completely... but other than that it'd be a hard sell...

facorreia · on Nov 28, 2012

I bet they're getting ready for the upcoming "big data" explosion and not so much eyeing migration from already deployed data warehouses.

monstrado · on Nov 28, 2012

Cloudera is doing just that with the recent announcement / open sourcing of Impala. Based on Amazon's description of their hosted product, the technology is very similar. Impala is still in beta, and columnar storage (trevni/avro) is right around the corner...with that, you can do petabyte scale queries for a very low cost.

https://github.com/cloudera/impala

flanger · on Nov 28, 2012

Platfora is doing some interesting work with interactive, in-memory BI for Hadoop. They essentially do away with the traditional DW/ETL model and create ephemeral in-memory 'lenses' for querying and visualization.

capkutay · on Nov 28, 2012

Impala is married to Hadoop. What if your data infrastructure isn't built on hbase and too complex/large to integrate it easily? Would impala still serve that purpose?

monstrado · on Nov 28, 2012

Impala doesn't require HBase to operate, it can use raw HDFS. Simple example, if you had a few terabytes of TSV files, you could easily copy the raw data into HDFS and then create a simple schema around it. All queries on this data would be in parallel across all the nodes in the cluster, this is partly due to the distributed nature of HDFS.

TallGuyShort · on Nov 28, 2012

If your data is too difficult to integrate into HDFS (doesn't have to be HBase) using existing Hadoop tools, I suspect you're going to have to do some work to use that data on any platform.

bbromhead · on Nov 28, 2012

Most enterprises already outsource a lot of their IT, data warehousing especially whether its to a big player like IBM or boutique consultancies.

For most applications does it really matter whether the data sits in your data center or Amazons? Nope... cause the organisation your company contracted to manage already has full access to all your secrets.

So really Amazon is just another IT outsourcer except you don't need a long drawn out sales process.

mitultiwari · on Nov 29, 2012

There are lots of startups hosting their services on AWS. Their data is already on AWS. Amazon's Redshift makes perfect sense for them.

Many startups like Qubole have been already working on providing such cloud based solutions for data analysis.

pinarsezer · on Nov 28, 2012

Rumor has it that they picked ParAccel to provide the underlying tech. I'm curious about how that compares to Vertica and Teradata.

monstrado · on Nov 28, 2012

Looks that way, https://twitter.com/ParAccel/status/273866697289650176

coldskull · on Nov 29, 2012

thats correct. i have heard that from a first hand source

jl6 · on Nov 28, 2012

Cost per TB for storage is a tiny proportion of the total costs of a data warehouse.

23david · on Nov 28, 2012

Have to say that this is pretty amazing. The price is so low that it's a no-brainer to just give it a try. For the same 2TB capability, a Vertica license would run between $20-40K, with high annual subscription fees.

The bigger question for me is why Amazon has been able to figure out the technical details necessary to run this kind of service for this price. It's just ridiculous. Talk about taking the oxygen out of the market...

aptwebapps · on Nov 28, 2012

That seems to be their general strategy. Matthew Yglesias has posted about it several times. Here's one such:

http://www.slate.com/blogs/moneybox/2012/10/26/amazon_profit...

jbellis · on Nov 28, 2012

"Amazon Redshift includes technology components licensed from ParAccel." http://finance.yahoo.com/news/amazon-services-announces-amaz...

perlgeek · on Nov 28, 2012

> The bigger question for me is why Amazon has been able to figure out the technical details necessary to run this kind of service for this price.

I guess they grew the infrastructure for themselves, optimizing it bit by bit over the years. And then noticed that it could be sold too.

justincormack · on Nov 28, 2012

They seem to have taken their own business requirements for amazon.com and reimplemented them on commodity hardware.

bravura · on Nov 28, 2012

Does anyone have insight into how painful it is for non-technical people to query their data warehouses?

I'm building a tool that allows business people and non-technical analysts to query their data warehouses using natural language. (Currently, you must ask a technical person to write ad-hoc queries for you, or build you a dashboard. This bogs down your data people.)

Does anyone have insight into the demand for such a product?

[edit: I'd love to chat with anyone with insight into this topic. Reach me at Joseph at metaoptimize dot com]

kanwisher · on Nov 28, 2012

Most of the time its usually easier to have people learn a touch of SQL and ask developers for harder queries. We used Tableau and after a couple of weeks, they had every query they every wanted saved.

tom_b · on Nov 28, 2012

Non-technical people don't query.

They call me up, ask me to do a "quick report across the inventory db with the project cost data." I send it off to them. If they like it, we push a report (maybe with a couple of parameters) into production.

My gut is that we aren't lacking for good technical options in analytics and data warehousing. To be honest, the lion's share of my work in data warehousing is helping the users know what questions to ask.

But there is lots of room and probably several excellent lifestyle to 8-digit businesses for good BI.

matwood · on Nov 28, 2012

Does anyone have insight into how painful it is for non-technical people to query their data warehouses?

Depends. Back when I did DW stuff my general workflow was to speak with the analysts about what they were trying to accomplish. From there I would create the cubes and additional metrics. I would also set up all the processing schedules at this time. The analysts would then use an Excel plugin that provided a pivot table interface to any cubes for which they had access. It worked pretty well.

For straight data access I would teach the them basic sql and/or build sql templates for them that they could extend.

My goal was always teach a man to fish and get out of the way.

ironchef · on Nov 28, 2012

The closest things i've seen are exploratory data visualization products such as tableau (which is pretty awesome). The downside (or partial downside) is it can end up writing some nasty non-performant queries in certain aspects.

johnrgrace · on Nov 28, 2012

Yes, it can be crazy painful to the point that Non-technical people just don't do the query unless it is business burning critical. In large part because the technical people often are tasked on projects from IT, and to get them to do a query required middle management department to department deal making which is slow and painful.

At ExxonMobil, a place I worked, you're going to have VP's asking eachother and IT is going to hedge with, yea if we do this then project X will be late (it's going to be late anyway but they've kept quite about it and no one knows).

My personal solution when I needed a query was to bring a six pack of beer down to IT friday afternoon, mostly because I wouldn't be given access to write queries because we had BI software.

meritt · on Nov 28, 2012

I would suggest reading some books on the topic of Dimensional Modeling [1] such as "The Data Warehouse Toolkit" [2]. The critical thing you need to expose to your users is the ability to ask for things which make sense in their world that are actually really difficult for even an engineer to code. Things like: "Show me average 9am-12pm sales on Mondays, Wednesday and Fridays for 1st quarter, 2012"

  [1] http://en.wikipedia.org/wiki/Dimensional_modeling
  [2] http://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247

alex_anglin · on Nov 28, 2012

Speaking as someone who does his fair share of dimensional modeling, I would just point out that the example you cite could only involve two tables in a well designed dimensional model (sales fact and time/date dimension, I reckon). The challenge is in getting to that point.

To speak to OPs point about difficulty in querying data warehouses, most business intelligence tools that I'm aware of provide semantic layer[1]-type capabilities, whereby the user interface of the tool is presented in the language of the business domain. Nevertheless, I still agree that this is still difficult work, unfortunately. That it is getting more complicated in some respects, such as through unstructured data, doesn't help either.

[1] http://en.wikipedia.org/wiki/Semantic_layer

meritt · on Nov 30, 2012

I guess I wasn't clear enough if I came across like my example was complex. It's one easily solved via DM and one that's extremely hard to execute in most non-dimensionally-modeled setups. That's exactly why I'm a huge advocate of DM instead of just throwing a ton of servers, hadoop & MR at everything.

jacques_chester · on Nov 29, 2012

> Does anyone have insight into the demand for such a product?

Enormous, and there are dozens of such tools available.

Most of them work best if you build an actual data warehouse -- dimensionally structured, not normalised. This is because they can easily build query forms using the DW dimensions in a language that makes sense to end users.

shuzchen · on Nov 28, 2012

You might want to look into rjmetrics or chart.io and see what they offer. I've been integrating with both, and it seems one of their goals is to (after the connection and datasources are set-up - that still requires technical knowledge) allow non-technical people access to analyze the data.

monstrado · on Nov 28, 2012

I'm curious what technology they are using to power it. According to the website, the technology described seems very similar to what Cloudera recently open sourced (Impala), which sits along side Hadoop allowing ad-hoc MPP style querying on petabytes of data.

https://github.com/cloudera/impala

jeremyjh · on Nov 28, 2012

I'm guessing it is quite a bit different from that. It is a relational data warehouse. It supports a Postgres protocol and API, which sounds more like what Netezza has built. In fact, I would expect Netezza to be one of the most likely companies to partner with Amazon at this kind of price-point.

lazyjones · on Nov 28, 2012

Other candidates:

* Yahoo's Everest

* Greenplum

* Aster Data

All mentioned here: http://www.cubrid.org/blog/dev-platform/database-technology-...

The Register wrote that Amazon's solution is a column-oriented database possibly based on Postgres, like Yahoo's:

http://www.theregister.co.uk/2012/11/28/amazon_aws_redshift_...

javery · on Nov 28, 2012

Except Netezza is now owned by IBM.

kanwisher · on Nov 28, 2012

Should be interesting if this will be a viable competitor to column oriented sql engines like Vertica or other OLAP solutions like SAP HANA. It would be nice if there was a simple SQL based olap solution that I can spin up for offline reporting that can scale terrabytes of data

zrail · on Nov 28, 2012

Now if only Amazon would offer PostgreSQL on normal RDS.

seiji · on Nov 28, 2012

Doesn't RDS crap itself whenever there's a core AWS problem?

semiquaver · on Nov 28, 2012

Yes, since it relies on EBS for persistence, which has been one of the flakiest parts of AWS so far.

ceejayoz · on Nov 28, 2012

All it takes is an EBS outage to determine this.

In the last one, RDS and ELB had issues (and were flagged as such on their status board) due to needing EBS, but I don't believe DynamoDB was.

jat850 · on Nov 28, 2012

And PostGIS too, please - if anyone from AWS is listening/reading :)

wahnfrieden · on Nov 28, 2012

Not likely to happen as long as Oracle has its licensing grip on AWS.

23david · on Nov 29, 2012

Update! The entire keynote is now available on youtube: http://www.youtube.com/watch?v=8FJ5DBLSFe4

The discussion about Amazon Redshift begins at 52:50 http://www.youtube.com/watch?feature=player_detailpage&v...

rpicard · on Nov 28, 2012

What is the use case for something like this versus a regular RDS service?

dgreensp · on Nov 28, 2012

The answer is in the term "data warehousing" -- http://en.wikipedia.org/wiki/Data_warehouse -- which has implication that you're going to be doing data mining on vast amounts of data, often historical data like logs or transaction histories.

Google has systems like this for analyzing its request logs. Think of how many HTTP requests hit Google's front-end servers per second or hour or day. Each one has a few dozen pieces of data associated with it -- URL, client IP, headers, etc. Suppose I want to make a bar chart of how many requests came from France containing a certain header, each day for the last year. The system can do this query quickly if the requests are already bucketed by time interval, organized by column, compressed, and stored so that exactly the information needed can be brought into RAM quickly.

It is a little funny, when you step back, that "storing," "archiving," and "warehousing" are different things and Amazon has services for each. Try explaining the difference between S3, RDS, EBS, Glacier, and Redshift to a layperson.

rpicard · on Nov 28, 2012

Thanks for the response. Would it make sense to say that this is more likely to be used for metadata (i.e. analytics, logs, etc.) while a normal RDB (or NoSQL DB) would be used for application data (i.e. users, settings, etc.)?

ohashi · on Nov 28, 2012

Would you mind explaining what you think each service is designed for and the use case(s) that they would be most appropriate for?

ceejayoz · on Nov 28, 2012

RDS has a maximum capacity of a terabyte, you'd need to shard to go beyond that.

amock · on Nov 28, 2012

This can scale up much more than a single RDS database since it spreads the data across multiple machines, but it's not exactly a replacement for MySQL database. It's also possible that this doesn't make use of EBS, which could make it perform more predictably and protect it from failure when EBS fails.

semiquaver · on Nov 28, 2012

> It's also possible that this doesn't make use of EBS

This quote from the product page seems to indicate that EBS is not used for primary data storage: "it runs on hardware that is optimized for data warehousing, with local attached storage and 10GigE network connections between nodes."

bvdbijl · on Nov 28, 2012

Why isn't it a replacement? From what I read it seems it's like a very large and transparantly scalable SQL database

K2h · on Nov 28, 2012

It's called Redshift!

wow.. I just finished reading the sci-fi book a few weeks ago - "Redshift Rendezvous" by John E Stith. I wonder if this is where the name comes from? In the book Redshift is the name of the space ship that runs cargo mission through folded space, the obvious problem that since you are traveling within just a few m/s of the speed of light just walking on the ship while underway causes color shift - thus redshift.

I read that Stith has a physic degree and worked as an Engineer for NORAD Cheyenne mountain. That made me really interested in what novel he would come up with. http://www.neverend.com/short-bio-john-e-stith

jmoiron · on Nov 28, 2012

Redshift is a real physical phenomena describing the way light wavelengths get "shifted" (stretched, to visualize) towards the red as they are seen coming from something moving away from the observer:

http://en.wikipedia.org/wiki/Redshift

http://en.wikipedia.org/wiki/Hubbles_law

kzahel · on Nov 28, 2012

It seems that the price (~$1 / GB / year) in the best case (3 year reserved) is comparable to S3 at its lowest tiers (~$0.1 / GB / month)

pierrend · on Nov 29, 2012

It's "Price per TB per Year" not GB.

23david · on Nov 29, 2012

Very cool that this will support regular sql queries and queries can be sent using postgresql drivers. Postgresql drivers are super stable and supported everywhere. Driver support is usually overlooked with 'Enterprise' Data Warehousing solutions. I recall that it was really hard to get the Vertica drivers installed and stable under Linux.

I took a few screenshots from the keynote and included one showing the mention of Postgresql and ODBC/JDBC support. Included here if you want to see for yourself: http://wp.me/p2sRpx-1e

sologoub · on Nov 28, 2012

Maybe a naive question, but how does this compare with Google Big Query?

rorrr · on Nov 28, 2012

https://developers.google.com/bigquery/docs/pricing#table

$1474 per TB per year for storage alone ($0.12 * 1024 * 12)

plus

$35.84 per TB queried

Amazon is definitely cheaper.

polskibus · on Nov 28, 2012

I cannot find information on whether Redshift supports queries in MDX. Lots of DWs today are run on Microsoft SQL Server Analysis Services and its MDX spec is now supported by several DW vendors. MDX support would mean it would be easy to switch the DW engine and leave your visualisation suite (or Excel, what the hell) and make it for an easy switch to the cloud - you'd just pick a different data source in your tool.

mgl · on Nov 28, 2012

Looks impressive and very interesting, signed up to review and compare with Teradata/Netezza.

Can we run more complex in-database processes implemented as stored procedures on this platform or is it going to be limited to pure SQL querying/analytics?

And does anyone have an idea how to upload 1 TB of data to this service using Internet connection from your in-house company server? ;)

maineldc · on Nov 28, 2012

AWS has pretty good support for taking external drives and importing them to S3 which could then be used with this service:

http://aws.amazon.com/importexport/

I am assuming that you have 1TB to start, not generating 1TB per day which obviously changes the equation.

alexdean · on Nov 28, 2012

This looks awesome - we'll definitely be plugging SnowPlow into this.

baltcode · on Nov 28, 2012

So is this the Amazon clone of Google's Spanner?

scott_s · on Nov 28, 2012

No. Spanner is a globally distributed database which supports transactions. It is meant for applications which need to make frequent updates to a database, but the storage for the database may be distributed around the world.

Redshift is a different usage model. You upload your data once, then ask questions of it - but you don't update it. Google does have something similar to Redshift: BigQuery (https://cloud.google.com/products/big-query).