English 中文(简体)
Greenplum vs PostgreSQL
原标题:

What are the arguments for and against using Greenplum instead of PostgreSQL in a webapp (django) environment?

My gut reaction is to prefer PostgreSQL s open-source approach and huge knowledgebase.

My configuration (though I d love to hear about any other configuration) is a medium-sized business with 2 web servers and (at the moment) 2 database servers.

Areas to contrast are binary data crunching, number of nodes in the replication and my personal favorite: communitiy support and skilled engineer support.

What are the pros and cons of using Greenplum instead of PostgreSQL?

最佳回答

I don t know much about Greenplum, except for quickly skimming the link you send. A data warehouse is not the same thing as a transactional operational data store. The former is for ad hoc queries, statistical analysis, dimensional analysis, read-mostly access to historical data. The latter is for real-time, read/write of operational data. They re complimentary.

I m guessing that you want PostgreSQL.

Who is pushing Greenplum on you and why? If it s being presented as an alternative, I d dig deeper and rebut the argument.

问题回答

Greenplum is an MPP adaption of PostgreSQL. It s optimized for warehousing and/or analytics on large sets of data and would not perform that well in a transactional environment. If you need a large DW environment, look at Greenplum. If you need OLTP or smaller DB sizes (under 10TB) then look at PostgreSQL.

Greenplum is an MPP analytical (OLAP) DBMS. PostgreSQL is an OLTP DBMS. And in general, there is not a single solution on the market that can be good at both OLAP and OLTP at the same time, you can find my thoughts on it here

The WebApp backend will always create OLTP workload. Greenplum has a big overhead for transaction processing as it is a distributed system, so don t expect this to deliver you more than 500-600 TPS. Postgres in contrast can go to hundreds of thousands of TPS with the right tuning.

In contrast, when you need a OLAP workload, Postgres can offer you only a single host processing, no partitioning with dynamic partition elimination, no compression, no columnar store. While Greenplum would be able to crunch your data in parallel on the cluster.

So the solution you are looking for is a typical data warehouse case - use OLTP solution for high transactional workload, extract the data to the DWH with ETL/ELT, and then run complex data crunching queries on it

At the moment both PostgreSQL and Greenplum are open source products, so you are free to chose any of them, but of cause PostgreSQL community is bigger ATM

Since Greenplum utilizes parallel processing, there will be overhead with running lots of tiny read queries as the master node needs to communicate with the underlying data nodes to retrieve an answers to all these queries. For a query taking milliseconds, expect an order of magnitude slower performance for Greenplum.

If you are looking for a PostgreSQL-based data warehousing solution, I would also look at GridSQL. It is a parallelization layer over multiple PostgreSQL instances, and is free and open source.

Like mentioned in other comments, it will not perform well for many small millisecond queries, but will help you greatly for long running queries. GridSQL also will not include DW optimizations like columnar storage that Greenplum has, but you can take advantage of constraint exclusion partitioning (ex: subtables by date range) combined with parallelism to get your query results faster.

You can also even use it on a single multi-core server, as PostgreSQL will only use a single core when processing a query.

I think Greenplum takes better advantage of parallel processing. It s based on PostgreSQL, though.

Greenplum has a free community edition. You can always download and test in your own environment.

If any data crunching takes longer than an hour, you ll get linear performance boosts for every core you add. It s not really worth the effort for anything that takes less time to crunch through.





相关问题
what is wrong with this mysql code

$db_user="root"; $db_host="localhost"; $db_password="root"; $db_name = "fayer"; $conn = mysqli_connect($db_host,$db_user,$db_password,$db_name) or die ("couldn t connect to server"); // perform query ...

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns ...

Easiest way to deal with sample data in Java web apps?

I m writing a Java web app in my free time to learn more about development. I m using the Stripes framework and eventually intend to use hibernate and MySQL For the moment, whilst creating the pages ...

join across databases with nhibernate

I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error: An association from the table xxx refers to an unmapped class. If the ...

How can I know if such value exists in database? (ADO.NET)

For example, I have a table, and there is a column named Tags . I want to know if value programming exists in this column. How can I do this in ADO.NET? I did this: OleDbCommand cmd = new ...

Convert date to string upon saving a doctrine record

I m trying to migrate one of my PHP projects to Doctrine. I ve never used it before so there are a few things I don t understand. In my current code, I have a class similar to this: class ...

热门标签