English 中文(简体)
Is Pentaho ETL and Data Analyzer good choice?
原标题:

I was looking for ETL tool and on google found lot about Pentaho Kettle.

I also need a Data Analyzer to run on Star Schema so that business user can play around and generate any kind of report or matrix. Again PentaHo Analyzer is looking good.

Other part of the application will be developed in java and the application should be database agnostic.

Is Pentaho good enough or there are other tools I should check.

问题回答

Pentaho seems to be pretty solid, offering the whole suite of BI tools, with improved integration reportedly on the way. But...the chances are that companies wanting to go the open source route for their BI solution are also most likely to end up using open source database technology...and in that sense "database agnostic" can easily be a double-edged sword. For instance, you can develop a cube in Microsoft s Analysis Services in the comfortable knowledge that whatver MDX/XMLA your cube sends to the database will be intrepeted consistently, holding very little in the way of nasty surprises.

Compare that to the Pentaho stack, which will typically end interacting with Postgresql or Mysql. I can t vouch for how Postgresql performs in the OLAP realm, but I do know from experience that Mysql - for all its undoubted strengths - has "issues" with the types of SQL that typically crops up all over the place in an OLAP solution (you can t get far in a cube without using GROUP BY or COUNT DISTINCT). So part of what you save in licence costs will almost certainly be used to solve issues arising from the fact the Pentaho doesn t always know which database it is talking to - robbing Peter to (at least partially) pay Paul, so to speak.

Unfortunately, more info is needed. For example:

  • will you need to exchange data with well-known apps (Oracle Financials, Remedy, etc)? If so, you can save a ton of time & money with an ETL solution that has support for that interface already built-in.
  • what database products (and versions) and file types do you need to talk to?
  • do you need to support querying of web-services?
  • do you need near real-time trickling of data?
  • do you need rule-level auditing & counts for accounting for every single row
  • do you need delta processing?
  • what kinds of machines do you need this to run on? linux? windows? mainframe?
  • what kind of version control, testing and build processes will this tool have to comply with?
  • what kind of performance & scalability do you need?
  • do you mind if the database ends up driving the transformations?
  • do you need this to run in userspace?
  • do you need to run parts of it on various networks disconnected from the rest? (not uncommon for extract processes)
  • how many interfaces and of what complexity do you need to support?

You can spend a lot of time deploying and learning an ETL tool - only to discover that it really doesn t meet your needs very well. You re best off taking a couple of hours to figure that out first.

I ve used Talend before with some success. You create your translation by chaining operations together in a graphical designer. There were definitely some WTF s and it was difficult to deal with multi-line records, but it worked well otherwise.

Talend also generates Java and you can access the ETL processes remotely. The tool is also free, although they provide enterprise training and support.

There are lots of choices. Look at BIRT, Talend and Pentaho, if you want free tools. If you want much more robustness, look at Tableau and BIRT Analytics.





相关问题
Pentaho vs SAP Business Objects

Is there anyone out there that used these two technologies and could give me some comparison in the form of advantages and disadvantages of both? I m currently working with BO and I have heard that ...

MDX Except function in where clause

I m having problem restricting a query in mdx, using except function at where clause. i need to retrieved a set of data but which not in an specific set. Then i created the next query: select {[...

Optimizing SMO with RBFKernel (C and gamma)

There are two parameters while using RBF kernels with Support Vector Machines: C and γ. It is not known beforehand which C and γ are the best for one problem; consequently some kind of model selection ...

Recursive calls in Pentaho Data Integration

Is it possible for a step or transformation in Pentaho Data Integration to call itself, passing the results of the previous call as parameters/variables? My first thought was to create a loop in a ...

XUL and Javascript

I have the following XUL markup: <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet href="chrome://global/skin/" type="text/css"?> <window width="400" height="275" title="...

MDX Measure Filtering

I am building a Mondrian Cube that shows information for a large range of dates. One of the measures for this cube is an average of a percentage value. Because some of the items in the cube should ...

Is Pentaho ETL and Data Analyzer good choice?

I was looking for ETL tool and on google found lot about Pentaho Kettle. I also need a Data Analyzer to run on Star Schema so that business user can play around and generate any kind of report or ...

Does Pentaho Kettle have a way to accept JMS messages?

Does Pentaho s ETL system, Kettle (http://kettle.pentaho.org/) have a plugin to accept information from JMS messages? I d like to set up a job that can read messages each containing a hash, extract ...

热门标签