English 中文(简体)
How to connect to Hadoop/Hive from .NET
原标题:

I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can t find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not finding?

最佳回答

Apparently it is possible to connect to Hadoop with non-Java solutions - see Do I have to write my application in Java?

问题回答

With Hadoop: there is no straight way to connect from C# because Hadoop communication tier is working with java only and is not cross platform. It is probably possible but in very non-trivial ways. I know there is a patch to add Protocol Buffers support for Hadoop but at the moment of writing (Aug 2011) is is not released yet.

With Hive situation is better because Hive has Thrift interface which supports C#. You can download Hive Thrift interfaces and generate C# client on your own but beware that it requires some hacking of generated code. Instead I would recommend you downloading dll from https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll or use Nuget package manager, search for "hive": http://nuget.org/List/Packages/Hive.Sharp.Lib Disclaimer: I m the author.

  1. There is Hortonworks ODBC driver. I havn t used it personally, but it shall let you work with hive as with any other ODBC datasource. You can use OdbcConnection class to connect to Hive once ODBC driver is installed.

  2. As noted in other answers - you can use Thrift api. For that you need to generate C# classes from interface definition files, which you can download from Hive source repository. This approach works for me.

  3. You can use IKVM, to convert hadoop client java libraries into .Net assemblies which you can use from C#. I havn t used IKVM with Hive client, but I ve IKVMed some other hadoop client library and surprisingly it worked.

EDIT:

  1. There s also Apache templeton, which allows submitting Hive jobs (Pig and MR also) using Rest interface. The problem with it is that it spawns another map task to submit Hive job, which makes it slower.

It is possible to access Hive utilizing C# by making use of Microsoft s ODBC connector. Download the Nuget package for "Microsoft.Hadoop.Hive" and follow the example provided at http://msdn.microsoft.com/en-us/library/dn749834.aspx

The trick lies in building the connection string to connect with it. The best way I came up with was to download the Microsoft Hive ODBC Driver (http://www.microsoft.com/en-us/download/details.aspx?id=40886), install it, then use the Server Explorer inside Visual Studio to add a new connection, then build the connection string for me. To do this, I used the following steps:

  • Change the data source to "Microsoft ODBC Data Source" and ensure you re using the ".NET Framework Data Provider for ODBC" as the data provider.

Change Data Source Dialog Window

  • Under the "Data source specification" portion, check the "Use connection string" then click the "Build" button.

Add Connection Dialog Window

  • Under the "Machine Data Source" tab, select the "Sample Microsoft Hive DSN" data source name, then click the "OK" button.

Select Data Source Dialog Window

  • A window titled "Microsoft Hive ODBC Driver Connection Dialog" will open. Enter an optional description, then type in the path to your Hive server, the port you will be using, and what database it should connect to. Indicate the Hive Server Type, and specify an authentication mechanism to use, then fill out the appropriate fields.

Microsoft Hive ODBC Driver Connection Dialog Window

  • Finally, click the "Test" button in the bottom to ensure that you re able to successfully connect. If successful, click the "OK" button, then you ll be back in the "Modify Connection" window. Enter the login information for your Hive service here.

Either utilize this data source or copy the connection string that it s built for you and use it within your application.

Thrift API is also another way for other language to access hdfs and hive

See if this helps. I have tried to connect to Hadoop via C#

How to communicate to Hadoop via Hive using .NET/C#

Use Hbase.Net library from https://hbasenet.codeplex.com/

Then you can connect to hbase/hive as shown below:

        Client c = new Client("10.20.14.179", 9090, 1000000);

        var cli = c.TotalClients;

        var tableList = c.GetTableNames();

FYI, we are using hortonworks sandbox and it connects fine.

In above example, 10.20.14.179 is host and 9090 is port.

Also, below might help from https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html

There is no native C# HBase client. however, there are several options for interacting with HBase from C#.

  1. C# HBase Thrift client - Thrift allows for defining service endpoints and data models in a common format and using code generators to create language specific bindings. HBase provides a Thirft server and definitions. There are many examples online for creating a C# HBase Thrift Client.

  2. Marlin - Marlin is a C# client for interacting with Stargate (HBase REST API) that ultimately became hbase-sdk-for-net. I have not personally tested this against HBase 1.x+, but considering it uses Stargate, I expect it should work. If you are planning to use Stargate and implement your own client, which I would recommend over Thrift, make sure to use protobufs to avoid the JSON serialization overhead. Using a HTTP based approach also makes it much easier to load balance requests over multiple gateways.

  3. Phoenix Query Server - Phoenix is a SQL skin on HBase. Phoenix Query Server is a REST API for submitting SQL queries to Phoenix. Here is some example code, however, I have not yet tested it.

  4. Simba HBase ODBC Driver - Using ODBC to connect to HBase. I ve heard positive feedback on this approach, especially from tools like Tableau. This is not open source and requires purchasing a license.





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签