A Unix-like system will typically have a limit on the number of file handles open at any given time; on my Linux, for example, it s currently at 1024, though I could change it within reason. But there are good reasons for these limits, as open files are a burden to the system.
You haven t yet responded to my question on whether there are multiple occurrences of the same key in your input, meaning that several separate batches of data may need to be concatenated into each file. If this isn t the case, Pace s answer would be handily the best you can do, as all that work needs to be done and there s no sense in setting up a huge administration around such a simple sequence of events.
But if there are multiple messages in your input for the same key, it would be efficient to keep a large number of files open. I d advise against trying to keep all 6000 open at once, though. Instead, I d go for something like 500, opened on a first-come-first-served basis; i.e. you open up files for the first 500 (or so) distinct message keys and then chew through your entire input file looking for stuff to add into those 500, then close them all upon hitting EOF on input. You will also need to keep a HashSet
of keys already processed, because you then proceed to re-read your input file again, processing the next batch of 500 keys you didn t catch on the first round.
Rationale: Opening and closing a file is (usually) a costly operation; you do NOT want to open and close thousands of files more than once each if you can help it. So you keep as many handles open as possible, all of which end up filled on a single pass through your input. On the other hand, streaming sequentially through a single input file is quite efficient, and even if you have to make 12 passes through your input file, the time to do so will be almost negligible compared to the time needed to open/close 6000 other files.
Pseudocode:
processedSet = [ ]
keysWaiting = true
MAXFILE = 500
handlesMap = [ ]
while (keysWaiting) {
keysWaiting = false
open/rewind input file
while (not EOF(input file)) {
read message
if (handlesMap.containsKey(messageKey)) {
write data to handlesMap.get(messageKey)
} else if (processedSet.contains(messageKey) {
continue // already processed
} else if (handlesMap.size < MAXFILE) {
handlesMap.put(messageKey, new FileOutputStream(messageKey + ".dat")
processedSet.add(messageKey)
write data to handlesMap.get(messageKey)
else
keysWaiting = true
endif
}
for all handlesMap.values() {
close file handle
}
handlesMap.clear
}