MySQL Master Crash, Slave Replication: Best Practices for Restoring the Master from the Slave

asenapconquidu
Aug 18, 2023
7 min read

we are building a simple master/slave MySQL configuration using asynchronous replication, with MySQL enterprise 5.5.17 on both servers and innoDB based tables. In case of a crash of the master server we would like to offer to our users the possibility of recovering the master using the most-up-to date database contents of the slave. On the master server, the database and the binary logs are stored in different disk devices, for better reliability.

MySQL Master Crash, Slave Replication

Download: https://tinurll.com/2vGrae

Suppose M1 crashed, you failover to M2, and you bring up M1. Your goal is to reestablish Circular Replication. With a crash, there is the possibility of replication losing its place. Here is what to do:

A MySQL slave normally stores its position in files master.info and relay-log.info which are updated by slave IO_THREAD & slave SQL_THREAD respectively. The file master.info contains connection info & replication coordinates showing how many events were fetched from the master binary log. The relay-log.info, on the other hand, represents info showing the positions where the slave applied those events. You can read more about slave status logs here

This is a nice features, but I wonder will this cause performance problem on slave? Because in theory if we are putting replication data into innodb log file transactionally (which I assume it is innodb log buffer is being used). In this case will there more flushing being done?

On a slave, replication involves 2 threads: the IO thread which copies the binary log of the master to a local copy called the relay log and the SQL thread which then executes the queries written in the relay log. The current position of each thread is stored in a file: master.info for the IO thread and relay-log.info for the SQL thread.

So far, so good. The first problem is that these files are not synced to disk each time they are written to: whenever there is a crash, positions that are stored are likely to be incorrect. MySQL 5.5 has a fix for this: you can set sync_master_info = 1 and sync_relay_log_info = 1 to make sure both files are written and synced to disk after each transaction. Syncing is not free of course but if you have write-back cache, these settings can be valuable.

But wait, even with sync_master_info = 1 and sync_relay_info = 1, bad things can happen. The reason is that replication information is written after the transaction is committed. So if a crash occurs after the transaction is committed and before the replication information is updated, replication information will be wrong when the server restarts and a transaction could executed twice. The effect will depend on the transaction: replication may still run fine or it may be broken or inconsistencies can even be silently created.

MySQL 5.6 tackles this problem by letting us store replication information in tables instead of files (mysql.slave_relay_log_info table is created when relay_log_info_repository = TABLE and mysql.slave_master_info table is created with master_info_repository = TABLE). The idea is simple: we can include the update of the replication information inside the transaction, making sure it is always in sync with the data.

The answer is: it is controlled by sync_master_info. The default is 10,000 meaning that the IO thread position is only updated every 10,000 transactions. This is obviously not good to make the slave crash-safe. One solution is to set sync_master_info = 1, but as mentioned, it may have a performance impact (this is why 1 is not the default setting).

However there is a more elegant solution by using relay_log_recovery = ON, which will require a MySQL restart. This setting makes sure that when the server starts up, position for the IO thread is recovered from the slave_relay_log_info table, which is always up-to-date. Thus you do not even need to store IO thread information in a table for the slave to be crash-safe. In other words, setting master_info_repository = TABLE is not necessary.

%master_info%: These settings play no role in crash-safe replication or recovery. Master info (in table or file) seems to have no functional purpose. This is especially true with relay_log_recovery = ON which makes relay logs disposable, so IO thread status (which updates master info) no longer matters. Master info is also out of date unless sync_master_info is a very low value (like 1), but a very low value should not be used because it can cause too much overhead (on storage, replication, or both). Moreover, master info is available and always up-to-date in SHOW SLAVE STATUS. So master info (in table or file) seems to have no purpose, but since the variables exist nonetheless I suggest master_info_repository = TABLE for uniformity with relay_log_info_repository = TABLE, and syn_master_info = 0 (the default value).

Is there anything I can do on the slave that would essentially "roll back" to a given point in time, where I could then reset the master log number, poition, etc? If not, is there anything at all that I can do to get back in sync?

Yes, in theory pt-table-sync can fix any amount of replication drift, but it's not necessarily the most efficient way to correct large discrepancies. At some point, it's quicker and more efficient to trash the outdated replica and reinitialize it using a new backup from the master.

a mysql slave normally stores its position in files master.info and relay-log.info which are updated by slave io_thread & slave sql_thread respectively. the file master.info contains connection info & replication coordinates showing how many events were fetched from the master binary log. the relay-log.info, on the other hand, represents info showing the positions where the slave applied those events. you can read more about slave status logs here

as you can see, the exec_master_log_pos has been updated to the correct position to resume replication i.e. 17048324. further, as you can see in the error log, the binary log overwritten message is also there.

For larger database setups, companies use replication to ensure that data is passed to another server. This server is usually a secondary physical machine that imports data from the main publisher. The configuration is a "master-slave" setup where the master database is the original storage machine and the slave is the recipient of the replicated data. Replication configurations assume that you have two different MySQL servers set up. This article explains the benefits of replication and how to set up your MySQL environment.

Replication is also a type of disaster recovery database backup that's more efficient than storing data to disks. With replication, you can restore your master server with replicated data instead of digging into backup files.

The basic configuration is master-slave where the master handles the write transactions and the slave server only reads the data into a mirrored database. You can also set up master-master solutions, but this is for more advanced enterprise platforms. With a master-master setup, you can create a load balanced environment where the servers share the load between multiple transactions. The MySQL servers have a load balancer between the application and the databases, and the load balancer sends requests to the database that can handle each transaction with the best performance.

This file is necessary for the master server in a replicated environment. You can also set the size of this log to determine how many days of data it stores. It's this file that feeds the slave server, so you should have a relatively large binary log file to ensure that you keep records available for replication.

When the slave server connects to its master replication service, it connects in the same way a client connects to the server. It requests a connection on the configured port, which is 3306 by default in MySQL. It's important to remember that any maximum connection threshold reached can cause issues with the replication service, so consider your replication service when you set up a maximum connection configuration.

When the slave connects, it reads the binary log file for a list of events. It's important to have a short delay in the amount of time that the slave server returns to read events. If you have the service set up to only read every other day, you run the risk of the master database no longer having this data cached in the log file. The result is a disk error and possible data loss.

The first step to set up a replication system is to give your server an ID. This is done using the configuration files we discussed in the previous chapter. You'll recall that we discussed the mysqld section. The following is an example of a my.cnf file used to configure the MySQL server:

With the master database server configured, the administrator needs to create a user that has replication access. We discussed how to create a MySQL user, but creating this user is somewhat different in syntax. The following command creates your replication user.

The next line of code grants the user access to the replication database. The granted right is replication slave on all databases on the MySQL server. In this example, the user has the right to ping the master server for the data needed for the replicated tables.

After you've set up your user, you need to create a backup. This backup will be transferred to the slave server. Since the slave server needs records from the master database, using this mysqldump output will reduce the load time of new data on the slave server. This is especially beneficial if you have an extremely large database that you want to replicate.

You'll notice that this file is different than the configurations for the slave server. The server ID is also different. Each of your MySQL servers should have a different ID assigned. The slave server is set with the same binary log settings, but the data is read only. This is because the slave server is completely dependent on the master server. You don't want users inserting records on the slave server without fist inserting data in the master server. Since the master server doesn't poll the slave server for data, inserting data on the slave server leaves you with data in one server and not the other. The synchronization is one way, so data should only flow from master to server. 2ff7e9595c

Christine Bell

MySQL Master Crash, Slave Replication: Best Practices for Restoring the Master from the Slave

MySQL Master Crash, Slave Replication

Recent Posts

Comments