English 中文(简体)
如何用地图合并一些档案?
原标题:How to merge some files using mapreduce?

I am going to merge some small files which under the same dir using map/reduce.I saw someone say using streaming will be very simple,but now I can only use the JAVA map/reduce. Now my solution is reading the files one record by one record and write them into the same output file.But I think it s inefficient.Can I use the whole file s content as mapper s value so I can improve the I/O efficiency? Thanks for your reply!

最佳回答

这样做的“效率不高”办法是仅仅编造档案并写回:

hadoop fs -cat /path/to/files/*.txt | hadoop fs -put - /output/path/blobbed.txt

也许你会发现,这将用更大的数据集进行罚款。

问题回答

暂无回答




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签