109: (Default)
[personal profile] 109
как я уже писал, distributed storage space is very hot now, туда лезут все, кому не лень, от чего вреда бывает больше, чем пользы, поскольку понижается уровень обсуждения и повышается уровень шума. например, какие-то ламеры пишут: Today, Amazon announced its second entry in to the world of cloud databases. Called Amazon Elastic MapReduce, this appears to be a hosted implementation of the Hadoop framework.

какой специалист в здравом уме назовёт хадуп или мапредьюс базой данных? их даже более обобщённым словом "storage" нельзя назвать.

(no subject)

Date: 2009-04-04 07:13 pm (UTC)
From: [identity profile] jdevelop.livejournal.com
там есть BigTable, а это уже где-то рядом

(no subject)

Date: 2009-04-04 08:43 pm (UTC)
From: [identity profile] 109.livejournal.com
спасибо! этот комментарий является отличным подверждением моему тезису.

(no subject)

Date: 2009-04-05 07:39 am (UTC)
From: [identity profile] 109.livejournal.com
so hbase works on top of hadoop file system. does it make hadoop a database? is windows a database because sql server works on top of ntfs?

is hadoop a database?

Date: 2009-04-08 09:16 pm (UTC)
From: [identity profile] katsnelson.livejournal.com
I will be be the first to say that hadoop is not a database, at least not the way we DBMS people (I spent the last 16 years working on DB2) think of databases. However, when I talk to our customers they DO consider hadoop to be a solution to the same set of problems they use DB2. So, in their mind it is a database management system or maybe a data processing system.
In DB2 we have this feature called Data Partitioning Feature which lets one distribute data across a cluster of independent database nodes. This is share nothing approach i.e. each node is responsible for its own portion of the data. When a query comes in it is split up and is executed on multiple nodes.
It is not MapReduce but the point is that the use case is the same i.e. run complex data processing tasks against very large data sets.

(no subject)

Date: 2009-04-08 10:05 pm (UTC)
From: [identity profile] 109.livejournal.com
Well, neither Hadoop nor MapReduce offer any persistence by themselves, wouldn't you agree?

Anyway, I am very interested in the Data Partitioning Feature you described. Where can I read more about it?

I agree

Date: 2009-04-16 03:21 am (UTC)
From: [identity profile] katsnelson.livejournal.com
No argument about Hadoop/MapReduce not being a persitent data store on their own but typically used with HDFS/GFS.
You can find more info on DB2 Database partitioning Feature in this free red book http://www.redbooks.ibm.com/abstracts/sg246917.html. It is a bit dated and talks about several partitioning options. But if you ignore table partitioning and multi-dimentioning clustering you will get a good idea of database partitionign that DB2 does. Or you can read this http://www.ibmpressbooks.com/articles/article.asp?p=375537&seqNum=6

Profile

109: (Default)
109

March 2019

S M T W T F S
     12
3456789
101112131415 16
17181920212223
24252627282930
31      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags