I m using hadoop to update some records in a mysql db... The issue that I m seeing is that in certain cases, multiple reducers are launched for the same key set. I ve seen up to 2 reducers running on different slaves for the same key. This leads to the issue of both reducers updating the same record in the db.
I was thinking of turning off the autocommit mode to alleviate this issue.... but and doing the commit as part of the "cleanup" operation in the reducer, but was wondering what to do with the reducer(s) that lag behind...would the cleanup operation still be called for that...if so....is there a way to tell if the reducer finished normally or not, since I d like to call "rollback" on the reducer(s) that didn t finish processing the data entirely?