Missing Relay Log File after MySQL Crash

One of our slave databases crashed few days ago due to some disk issues.

After everything is back to operational - I noticed that the replication has stopped in this slave. I tried to restart the replication but it was unsuccessful.

I then checked the MySQL log and found the following:

110606 17:01:30 [ERROR] Failed to open the relay log './forbes-relay-bin.785212' (relay_log_pos 14931)
110606 17:01:30 [ERROR] Could not find target log during relay log initialization
110606 17:01:30 [ERROR] Failed to initialize the master info structure

And MySQL is complaining that it can’t find the relay log file and sure enough I had a look into the directory, I couldn’t find it.

The issue

I did some googling around for solution but it was hard to find an answer. Even when I thought I got a good answer from this post on MySQL forum, it turned out to be wrong.

The proposed solution from that post is:

CHANGE MASTER TO the positon on the master where the slave was, the slave will get the transactions from the master binary logs again and you will lose nothing. Requires that the master still has those binary logs.

However this is NOT the problem, it is not that the slave doesn’t know the master bin log position.

The issue is: the relay log is now outdated and needs to be regenerated.

Regenerate slave’s relay log files

Here are the steps to regenerate the log files:

  1. On the crashed slave - record the last Master bin log file and position before the crash
  2. The slave should not be running - if it is - issue STOP SLAVE command
  3. And then issue RESET SLAVE command
  4. Go to the MySQL directory - ensure that relay logs, master info and relay-log.info are deleted. They should be deleted when you run RESET SLAVE(*).
  5. Now issue `CHANGE_MASTER log position to the position that you have recorded on step 1 above
  6. Issue START SLAVE

(*) Note: When I did this the first time, MySQL didn’t delete the relay logs. Issuing RESET SLAVE the second time fixed it.

Now the replication should start again - you will notice that the relay log files are re-numbered from the beginning ie: suffix 00001.