I Use This!
Moderate Activity

News

Analyzed about 12 hours ago. based on code collected about 22 hours ago.
Posted over 4 years ago by Tony Darnell
In a couple previous posts, I explained how to get an “Always Free” Oracle Cloud compute instance and how to install MySQL on it – as well as how to add a web server. I started my IT career (way back in 1989) using a (dumb) terminal and a 2400-baud ... [More] modem to access a server. While I still use a terminal window and the command-line, it is always nice to have access to a GUI. In this post, I will show you how to install and use a GUI on your Oracle Cloud compute instance so you can use a Virtual Network Computing (VNC) application to connect to your “Always Free” (or not-free) Oracle Cloud compute instance. VNC is a graphical desktop-sharing system that uses the Remote Frame Buffer protocol to remotely control another computer. In other words, it is (almost) like having a monitor connected to your compute instance. Installing everything you need should take about twenty minutes (only because one yum install takes 13-15 minutes). First, you will need to create your “Always Free” Oracle Cloud account, and at least one free compute instance. (Of course, this will also work on a paid compute instance.) If you need help creating your free compute instance, you can follow the instructions in the first part of this post (installing MySQL is optional). Once you have your compute instance ready to go, or if you already have an compute instance running, you can continue with this post. VNC Viewer I am using a Mac, so I can use the Screen Sharing application that comes with the operating system (OS). If you don’t have a Mac, you will need to find a VNC application for your OS. I have also used the free (non-commercial-use only) version of VNC Connect from RealVNC, but you will need to buy a copy of you are using it for work. But there are several free ones available, such as TeamViewer, TightVNC and TigerVNC. If you don’t use a Mac, I won’t be able to show you how to install or setup the VNC viewer you decide to use, but it should be easy to do. Whichever VNC app you choose should provide you with instructions. You should only have to input localhost and the port number of 5901. Installing what you need on your compute instance Login to your compute instance. When I created my compute instance, I chose to install Oracle Linux. These instructions should work for any other flavor of Linux, but if not, you can look for the similar packages for your OS and you might have to modify a few things. You can use yum (a tool for downloading, installing, deleting, querying, and managing software packages) to install the GNOME desktop, which is a free and open-source desktop environment. This installation is 678 megabytes in size, and it will take about 13-15 minutes. You can remove the -y option to your yum command if you want to answer “yes” to the single installation question of “Is this ok?”. Note: Normally I would post the entire output from a command, but the output is almost 6,000 lines long. I will replace the majority of the screen output with three dots (…). $ sudo yum -y groups install "Server with GUI" --skip-broken Loaded plugins: langpacks, ulninfo Repository ol7_latest is listed more than once in the configuration ... Transaction Summary ================================================== Install 209 Packages (+659 Dependent packages) Upgrade ( 3 Dependent packages) Total download size: 678 M Is this ok [y/d/N]: y Downloading packages: ... Complete! Install the TigerVNC server. (I will suppress most of this output as well) $ sudo yum -y install tigervnc-server Loaded plugins: langpacks, ulninfo ... Resolving Dependencies --> Running transaction check ---> Package tigervnc-server.x86_64 0:1.8.0-17.0.1.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ==================================================================== Package Arch Version Repository Size ==================================================================== Installing: tigervnc-server x86_64 1.8.0-17.0.1.el7 ol7_latest 215 k Transaction Summary ==================================================================== Install 1 Package Total download size: 215 k Installed size: 509 k Downloading packages: tigervnc-server-1.8.0-17.0.1.el7.x86_64.rpm | 215 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : tigervnc-server-1.8.0-17.0.1.el7.x86_64 1/1 Verifying : tigervnc-server-1.8.0-17.0.1.el7.x86_64 1/1 Installed: tigervnc-server.x86_64 0:1.8.0-17.0.1.el7 Complete! Note: I believe the display of duplicate messages “Repository xxx is listed more than once in the configuration” is a bug in yum. You can ignore these messages. Configure the VNC server You will be required to provide a password that you will need to remember to be able to access this server via VNC. You can also enter a “view-only password” if you want someone to be able to connect to the server, but you don’t want them to be able to control anything (they can only view the screen). I skipped this option. $ vncserver You will require a password to access your desktops. Password: Verify: Would you like to enter a view-only password (y/n)? n A view-only password is not used xauth: file /home/opc/.Xauthority does not exist New 'instance-20191113-1544:1 (opc)' desktop is instance-20191113-1544:1 Creating default startup script /home/opc/.vnc/xstartup Creating default config /home/opc/.vnc/config Starting applications specified in /home/opc/.vnc/xstartup Log file is /home/opc/.vnc/instance-20191113-1544:1.log SSH Tunnel for VNC I am going to create a tunnel for VNC through SSH, so I can punch through the VNC port, and also so I will be sending all of the data through an encrypted tunnel. Note: There is an alternate way to access your compute instance via VNC by creating an Instance Console Connection, but it only provides you with a console connection (non-GUI). If you want to do that, instructions are available via this blog. In a terminal window, issue the following command, with your public IP address at the end. This will create the SSH tunnel for you to use for VNC. $ ssh -L 5901:localhost:5901 [email protected] Now you are ready to connect to the instance using VNC. For the Mac, I open the Screen Sharing application, click on the menu option “Connection” then down to “New”. In the “Connect to” box, I enter localhost:5901 and press “Connect”. And then enter the password you used when you ran the vncserver command earlier. If you are using another VNC viewer, simply enter localhost:5901, or you might have to enter localhost and the port 5901 in separate fields. Then, just open the connection with your VNC application, and you should be see the Oracle Linux GUI appear: You will have to go through a few setup screens the first time you connect.   Alternatives to GNOME If you are using the “Always Free” OCI instances, you can also consider using two lightweight alternate GUI’s. You can also install MATE or Xfce. Multiple connections If you want to have multiple connections open to separate OCI instances at the same time, you can do this by changing the outgoing port on your SSH tunnel. The syntax for the ssh tunnel command contains two port numbers. The outgoing port (on your machine) is 5902, and the destination/incoming port is 5901. So, to open a second SSH tunnel to connect to a separate OCI instance, you can run this: $ ssh -L 5902:localhost:5901 [email protected] This syntax follows the ssh man page: -L [bind_address:]port:host:hostport. Now you know how to use VNC to connect to your Oracle Compute Cloud Instance.   Many thanks to Pilippe Vanhaesendonck for sharing his technical expertise.   Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn. Tony is the author of Twenty Forty-Four: The League of Patriots Visit http://2044thebook.com for more information. Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition Visit https://amzn.to/2oPFLI0 for more information. [Less]
Posted over 4 years ago by Tony Darnell
In a couple previous posts, I explained how to get an “Always Free” Oracle Cloud compute instance and how to install MySQL on it – as well as how to add a web server. I started my IT career (way back in 1989) using a (dumb) terminal and a 2400-baud ... [More] modem to access a server. While I still use a terminal window and the command-line, it is always nice to have access to a GUI. In this post, I will show you how to install and use a GUI on your Oracle Cloud compute instance so you can use a Virtual Network Computing (VNC) application to connect to your “Always Free” (or not-free) Oracle Cloud compute instance. VNC is a graphical desktop-sharing system that uses the Remote Frame Buffer protocol to remotely control another computer. In other words, it is (almost) like having a monitor connected to your compute instance. Installing everything you need should take about twenty minutes (only because one yum install takes 13-15 minutes). First, you will need to create your “Always Free” Oracle Cloud account, and at least one free compute instance. (Of course, this will also work on a paid compute instance.) If you need help creating your free compute instance, you can follow the instructions in the first part of this post (installing MySQL is optional). Once you have your compute instance ready to go, or if you already have an compute instance running, you can continue with this post. VNC Viewer I am using a Mac, so I can use the Screen Sharing application that comes with the operating system (OS). If you don’t have a Mac, you will need to find a VNC application for your OS. I have also used the free (non-commercial-use only) version of VNC Connect from RealVNC, but you will need to buy a copy of you are using it for work. But there are several free ones available, such as TeamViewer, TightVNC and TigerVNC. If you don’t use a Mac, I won’t be able to show you how to install or setup the VNC viewer you decide to use, but it should be easy to do. Whichever VNC app you choose should provide you with instructions. You should only have to input localhost and the port number of 5901. Installing what you need on your compute instance Login to your compute instance. When I created my compute instance, I chose to install Oracle Linux. These instructions should work for any other flavor of Linux, but if not, you can look for the similar packages for your OS and you might have to modify a few things. You can use yum (a tool for downloading, installing, deleting, querying, and managing software packages) to install the GNOME desktop, which is a free and open-source desktop environment. This installation is 678 megabytes in size, and it will take about 13-15 minutes. You can remove the -y option to your yum command if you want to answer “yes” to the single installation question of “Is this ok?”. Note: Normally I would post the entire output from a command, but the output is almost 6,000 lines long. I will replace the majority of the screen output with three dots (…). $ sudo yum -y groups install "Server with GUI" Loaded plugins: langpacks, ulninfo Repository ol7_latest is listed more than once in the configuration ... Transaction Summary ================================================== Install 209 Packages (+659 Dependent packages) Upgrade ( 3 Dependent packages) Total download size: 678 M Is this ok [y/d/N]: y Downloading packages: ... Complete! Install the TigerVNC server. (I will suppress most of this output as well) $ sudo yum -y install tigervnc-server Loaded plugins: langpacks, ulninfo ... Resolving Dependencies --> Running transaction check ---> Package tigervnc-server.x86_64 0:1.8.0-17.0.1.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ==================================================================== Package Arch Version Repository Size ==================================================================== Installing: tigervnc-server x86_64 1.8.0-17.0.1.el7 ol7_latest 215 k Transaction Summary ==================================================================== Install 1 Package Total download size: 215 k Installed size: 509 k Downloading packages: tigervnc-server-1.8.0-17.0.1.el7.x86_64.rpm | 215 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : tigervnc-server-1.8.0-17.0.1.el7.x86_64 1/1 Verifying : tigervnc-server-1.8.0-17.0.1.el7.x86_64 1/1 Installed: tigervnc-server.x86_64 0:1.8.0-17.0.1.el7 Complete! Configure the VNC server You will be required to provide a password that you will need to remember to be able to access this server via VNC. You can also enter a “view-only password” if you want someone to be able to connect to the server, but you don’t want them to be able to control anything (they can only view the screen). I skipped this option. $ vncserver You will require a password to access your desktops. Password: Verify: Would you like to enter a view-only password (y/n)? n A view-only password is not used xauth: file /home/opc/.Xauthority does not exist New 'instance-20191113-1544:1 (opc)' desktop is instance-20191113-1544:1 Creating default startup script /home/opc/.vnc/xstartup Creating default config /home/opc/.vnc/config Starting applications specified in /home/opc/.vnc/xstartup Log file is /home/opc/.vnc/instance-20191113-1544:1.log SSH Tunnel for VNC I am going to create a tunnel for VNC through SSH, so I can punch through the VNC port, and also so I will be sending all of the data through an encrypted tunnel. Note: There is an alternate way to access your compute instance via VNC by creating an Instance Console Connection, but it only provides you with a console connection (non-GUI). If you want to do that, instructions are available via this blog. In a terminal window, issue the following command, with your public IP address at the end. This will create the SSH tunnel for you to use for VNC. $ ssh -L 5901:localhost:5901 [email protected] Now you are ready to connect to the instance using VNC. For the Mac, I open the Screen Sharing application, click on the menu option “Connection” then down to “New”. In the “Connect to” box, I enter localhost:5901 and press “Connect”. And then enter the password you used when you ran the vncserver command earlier. If you are using another VNC viewer, simply enter localhost:5901, or you might have to enter localhost and the port 5901 in separate fields. Then, just open the connection with your VNC application, and you should be see the Oracle Linux GUI appear: You will have to go through a few setup screens the first time you connect.   Alternatives to GNOME If you are using the “Always Free” OCI instances, you can also consider using two lightweight alternate GUI’s. You can also install MATE or Xfce. Multiple connections If you want to have multiple connections open to separate OCI instances at the same time, you can do this by changing the outgoing port on your SSH tunnel. The syntax for the ssh tunnel command contains two port numbers. The outgoing port (on your machine) is 5902, and the destination/incoming port is 5901. So, to open a second SSH tunnel to connect to a separate OCI instance, you can run this: $ ssh -L 5902:localhost:5901 [email protected] This syntax follows the ssh man page: -L [bind_address:]port:host:hostport. Now you know how to use VNC to connect to your Oracle Compute Cloud Instance.   Many thanks to Pilippe Vanhaesendonck for sharing his technical expertise.   Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn. Tony is the author of Twenty Forty-Four: The League of Patriots Visit http://2044thebook.com for more information. Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition Visit https://amzn.to/2oPFLI0 for more information. [Less]
Posted over 4 years ago by MySQL Performance Blog
As Grafana powers our star product – Percona Monitoring and Management (PMM) – we have developed a lot of experience creating Grafana Dashboards over the last few years.   In this article, I will share some of the considerations for designing ... [More] Grafana Dashboards. As usual, when it comes to questions of design they are quite subjective, and I do not expect you to chose to apply all of them to your dashboards, but I hope they will help you to think through your dashboard design better. Design Practical Dashboards Grafana features many panel types, and even more are available as plugins. It may be very attractive to use many of them in your dashboards using many different visualization options. Do not!  Stick to a few data visualization patterns and only add additional visualizations when they provide additional practical value not because they are cool.  Graph and Singlestat panel types probably cover 80% of use cases. Do Not Place Too Many Graphs Side by Side This probably will depend a lot on how your dashboards are used.  If your dashboard is designed for large screens placed on the wall you may be able to fit more graphs side by side, if your dashboard needs to scale down to lower resolution small laptop screen I would suggest sticking to 2-3 graphs in a row. Use Proper Units Grafana allows you to specify a unit for the data type displayed. Use it! Without type set values will not be properly shortened and very hard to read: Compare this to Mind Decimals You can specify the number of values after decimal points you want to display or leave it default.  I found default picking does not always work very well, for example here: For some reason on the panel Axis, we have way too many values displayed after the decimal point.  Grafana also often picks three values after decimal points as in the table below which I find inconvenient – from the glance view, it is hard to understand if we’re dealing with a decimal point or with “,” as a “thousands” separator, so I may be looking at 2462 GiB there.  While it is not feasible in this case, there are cases such as data rate where a 1000x value difference is quite possible.  Instead, I prefer setting it to one decimal (or one if it is enough) which makes it clear that we’re not looking at thousands. Label your Axis You can label your axis (which especially makes sense) if the presentation is something not as obvious as in this example; we’re using a negative value to lot writes to a swap file. Use Shared Crosshair or Tooltip In Dashboard Settings, you will find “Graph Tooltip” option and set it to “Default”, “Shared Crosshair” or “Share Tooltip”  This is how these will look:   Shared crosshair shows the line matching the same time on all dashboards while Tooltip shows the tooltip value on all panels at the same time.  You can pick what makes sense for you; my favorite is using the tooltip setting because it allows me to visually compare the same time without making the dashboard too slow and busy. Note there is handy shortcut CTRL-O which allows you to cycle between these settings for any dashboard. Pick Colors If you’re displaying truly dynamic information you will likely have to rely on Grafana’s automatic color assignment, but if not, you can pick specific colors for all values being plotted.  This will prevent colors from potentially being re-assigned to different values without you planning to do so. Picking colors you also want to make sure you pick colors that make logical sense. For example, I think for free memory “green” is a better color than “red”.  As you pick the colors, use the same colors for the same type of information when you show it on the different panels if possible, because it makes it easier to understand. I would even suggest sticking to the same (or similar) color for the Same Kind of Data – if you have many panels which show disk Input and Output using similar colors, this can be a good idea. Fill Stacking Graphs Grafana does not require it, but I would suggest you use filling when you display stacking data and don’t use filling when you’re plotting multiple independent values.  Take a look at these graphs: In the first graph, I need to look at the actual value of the plotted value to understand what I’m looking at. At the same time, in the second graph, that value is meaningless and what is valuable is the filled amount. I can see on the second graph what amount of the Cache, blue value, has shrunk. I prefer using a fill factor of 6+ so it is easier to match the fill colors with colors in the table.   For the same reason, I prefer not to use the fill gradient on such graphs as it makes it much harder to see the color and the filled volume. Do Not Abuse Double Axis Graphs that use double axis are much harder to understand.  I used to use it very often, but now I avoid it when possible, only using it when I absolutely want to limit the number of panels. Note in this case I think gradient fits OK because there is only one value displayed as the line, so you can’t get confused if you need to look at total value or “filled volume”. Separate Data of Different Scales on Different Graphs I used to plot Innodb Rows Read and Written at the same graph. It is quite common to have reads to be 100x higher in volume than writes, crowding them out and making even significant changes in writes very hard to see.  Splitting them to different graphs solved this issue. Consider Staircase Graphs In the monitoring applications, we often display average rates computed over a period of time.  If this is the case, we do not know how the rate was changing within that period and it would be misleading to show that. This especially makes sense if you’re displaying only a few data points. Let’s look at this graph which is being viewed with one-hour resolution: This visually shows what amount of rows read was falling from 16:00 to 18:00, and if we compare it to the staircase graph: It simply shows us that the value at 18 am was higher than 17 am, but does not make any claim about the change. This display, however, has another issue. Let’s look at the same data set with 5min resolution: We can see the average value from 16:00 to 17:00 was lower than from 17:00 to 18:00, but this is however NOT what the lower resolution staircase graph shows – the value for 17 to 18 is actually lower! The reason for that is if we compute on Prometheus side rate() for 1 hour at 17:00 it will be returned as a data point for 17:00 where this average rate is really for 16:00 to 17:00, while staircase graph will plot it from 17:00 to 18:00 until a new value is available.  It is off by one hour. To fix it, you need to shift the data appropriately. In Prometheus, which we use in PMM, I can use an offset operator to shift the data to be displayed correctly: Provide Multiple Resolutions I’m a big fan of being able to see the data on the same dashboard with different resolutions, which can be done through a special dashboard variable of type “Interval”.  High-resolution data can provide a great level of detail but can be very volatile. While lower resolution can hide this level of detail, it does show trends better. Multiple Aggregates for the Same Metrics To get even more insights, you can consider plotting the same metrics with different aggregates applied to it: In this case, we are looking at the same variable – threads_running – but at its average value over a period of time versus max (peak) value. Both of them are meaningful in a different way. You can also notice here that points are used for the Max value instead of a line. This is in general good practice for highly volatile data, as a plottings line for something which changes wildly is messy and does not provide much value. Use Help and Panel Links If you fill out a description for the panel, it will be visible if you place your mouse over the tiny “i” sign. This is very helpful to explain what the panel shows and how to use this data.  You can use Markup for formatting.  You can also provide one or more panel links, that you can use for additional help or drill down. With newer Grafana versions, you can even define a more advanced drill-down, which can contain different URLs based on the series you are looking at, as well as other templating variables: Summary This list of considerations for designing Grafana Dashboards and best practices is by no means complete, but I hope you pick up an idea or two which will allow you to create better dashboards! [Less]
Posted over 4 years ago by Frederic Descamps
In the first part of this article related to the check plugin, we discovered information retrieved from the binary logs. This part, is about what Performance_Schema and SYS can provide us about the queries hitting the MySQL database. Currently, 3 ... [More] methods are available: getSlowerQuery() getQueryTempDisk() getFullTableScanQuery() The method’s name should be self explaining. This is an overview of the parameters for each methods: ext.check.getSlowQuery()ext.check.getQueryTempDisk()ext.check.getFullTableScanQuery()Some methods allow a select parameter if only SELECT statements should be returned. When only one query is returned (default), it’s also possible to interactively run several actions: EXPLAIN (Traditional MySQL Query Execution Plan) EXPLAIN FORMAT=JSON EXPLAIN FORMAT=TREE EXPLAIN ANALYZE This is a video illustrating these operations: Don’t hesitate to try those plugins and a report eventual bugs, enhancements, feature requests and your own plugins ! The github repository is https://github.com/lefred/mysqlshell-plugins [Less]
Posted over 4 years ago by Severalnines
In the first part of this blog, we covered a deployment walkthrough of MySQL InnoDB Cluster with an example on how the applications can connect to the cluster via a dedicated read/write port. In this operation walkthrough, we are going to show ... [More] examples on how to monitor, manage and scale the InnoDB Cluster as part of the ongoing cluster maintenance operations. We’ll use the same cluster what we deployed in the first part of the blog. The following diagram shows our architecture: We have a three-node MySQL Group Replication and one application server running with MySQL router. All servers are running on Ubuntu 18.04 Bionic. MySQL InnoDB Cluster Command Options Before we move further with some examples and explanations, it's good to know that you can get an explanation of each function in MySQL cluster for cluster component by using the help() function, as shown below: $ mysqlsh MySQL|localhost:3306 ssl|JS> shell.connect("clusteradmin@db1:3306"); MySQL|db1:3306 ssl|JS> cluster = dba.getCluster(); MySQL|db1:3306 ssl|JS> cluster.help() The following list shows the available functions on MySQL Shell 8.0.18, for MySQL Community Server 8.0.18: addInstance(instance[, options])- Adds an Instance to the cluster. checkInstanceState(instance)- Verifies the instance gtid state in relation to the cluster. describe()- Describe the structure of the cluster. disconnect()- Disconnects all internal sessions used by the cluster object. dissolve([options])- Deactivates replication and unregisters the ReplicaSets from the cluster. forceQuorumUsingPartitionOf(instance[, password])- Restores the cluster from quorum loss. getName()- Retrieves the name of the cluster. help([member])- Provides help about this class and it's members options([options])- Lists the cluster configuration options. rejoinInstance(instance[, options])- Rejoins an Instance to the cluster. removeInstance(instance[, options])- Removes an Instance from the cluster. rescan([options])- Rescans the cluster. resetRecoveryAccountsPassword(options)- Reset the password of the recovery accounts of the cluster. setInstanceOption(instance, option, value)- Changes the value of a configuration option in a Cluster member. setOption(option, value)- Changes the value of a configuration option for the whole cluster. setPrimaryInstance(instance)- Elects a specific cluster member as the new primary. status([options])- Describe the status of the cluster. switchToMultiPrimaryMode()- Switches the cluster to multi-primary mode. switchToSinglePrimaryMode([instance])- Switches the cluster to single-primary mode. We are going to look into most of the functions available to help us monitor, manage and scale the cluster. Monitoring MySQL InnoDB Cluster Operations Cluster Status To check the cluster status, firstly use the MySQL shell command line and then connect as clusteradmin@{one-of-the-db-nodes}: $ mysqlsh MySQL|localhost:3306 ssl|JS> shell.connect("clusteradmin@db1:3306"); Then, create an object called "cluster" and declare it as "dba" global object which provides access to InnoDB cluster administration functions using the AdminAPI (check out MySQL Shell API docs): MySQL|db1:3306 ssl|JS> cluster = dba.getCluster(); Then, we can use the object name to call the API functions for "dba" object: MySQL|db1:3306 ssl|JS> cluster.status() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "primary": "db1:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "db1:3306": { "address": "db1:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db2:3306": { "address": "db2:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": "00:00:09.061918", "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db3:3306": { "address": "db3:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": "00:00:09.447804", "role": "HA", "status": "ONLINE", "version": "8.0.18" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db1:3306" } The output is pretty long but we can filter it out by using the map structure. For example, if we would like to view the replication lag for db3 only, we could do like the following: MySQL|db1:3306 ssl|JS> cluster.status().defaultReplicaSet.topology["db3:3306"].replicationLag 00:00:09.447804 Note that replication lag is something that will happen in group replication, depending on the write intensivity of the primary member in the replica set and the group_replication_flow_control_* variables. We are not going to cover this topic in detail here. Check out this blog post to understand further on the group replication performance and flow control. Another similar function is the describe() function, but this one is a bit more simple. It describes the structure of the cluster including all its information, ReplicaSets and Instances: MySQL|db1:3306 ssl|JS> cluster.describe() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "topology": [ { "address": "db1:3306", "label": "db1:3306", "role": "HA" }, { "address": "db2:3306", "label": "db2:3306", "role": "HA" }, { "address": "db3:3306", "label": "db3:3306", "role": "HA" } ], "topologyMode": "Single-Primary" } } Similarly, we can filter the JSON output using map structure: MySQL|db1:3306 ssl|JS> cluster.describe().defaultReplicaSet.topologyMode Single-Primary When the primary node went down (in this case, is db1), the output returned the following: MySQL|db1:3306 ssl|JS> cluster.status() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "primary": "db2:3306", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", "topology": { "db1:3306": { "address": "db1:3306", "mode": "n/a", "readReplicas": {}, "role": "HA", "shellConnectError": "MySQL Error 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104", "status": "(MISSING)" }, "db2:3306": { "address": "db2:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db3:3306": { "address": "db3:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db2:3306" } Pay attention to the status OK_NO_TOLERANCE, where the cluster is still up and running but it can't tolerate any more failure after one over three node is not available. The primary role has been taken over by db2 automatically, and the database connections from the application will be rerouted to the correct node if they connect through MySQL Router. Once db1 comes back online, we should see the following status: MySQL|db1:3306 ssl|JS> cluster.status() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "primary": "db2:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "db1:3306": { "address": "db1:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db2:3306": { "address": "db2:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db3:3306": { "address": "db3:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db2:3306" } It shows that db1 is now available but served as secondary with read-only enabled. The primary role is still assigned to db2 until something goes wrong to the node, where it will be automatically failed over to the next available node. Check Instance State We can check the state of a MySQL node before planning to add it into the cluster by using the checkInstanceState() function. It analyzes the instance executed GTIDs with the executed/purged GTIDs on the cluster to determine if the instance is valid for the cluster. The following shows instance state of db3 when it was in standalone mode, before part of the cluster: MySQL|db1:3306 ssl|JS> cluster.checkInstanceState("db3:3306") Cluster.checkInstanceState: The instance 'db3:3306' is a standalone instance but is part of a different InnoDB Cluster (metadata exists, instance does not belong to that metadata, and Group Replication is not active). If the node is already part of the cluster, you should get the following: MySQL|db1:3306 ssl|JS> cluster.checkInstanceState("db3:3306") Cluster.checkInstanceState: The instance 'db3:3306' already belongs to the ReplicaSet: 'default'. Monitor Any "Queryable" State With MySQL Shell, we can now use the built-in \show and \watch command to monitor any administrative query in real-time. For example, we can get the real-time value of threads connected by using: MySQL|db1:3306 ssl|JS> \show query SHOW STATUS LIKE '%thread%'; Or get the current MySQL processlist: MySQL|db1:3306 ssl|JS> \show query SHOW FULL PROCESSLIST We can then use \watch command to run a report in the same way as the \show command, but it refreshes the results at regular intervals until you cancel the command using Ctrl + C. As shown in the following examples: MySQL|db1:3306 ssl|JS> \watch query SHOW STATUS LIKE '%thread%'; MySQL|db1:3306 ssl|JS> \watch query --interval=1 SHOW FULL PROCESSLIST The default refresh interval is 2 seconds. You can change the value by using the --interval flag and specified a value from 0.1 up to 86400. MySQL InnoDB Cluster Management Operations Primary Switchover Primary instance is the node that can be considered as the leader in a replication group, that has the ability to perform read and write operations. Only one primary instance per cluster is allowed in single-primary topology mode. This topology is also known as replica set and is the recommended topology mode for Group Replication with protection against locking conflicts. To perform primary instance switchover, login to one of the database nodes as the clusteradmin user and specify the database node that you want to promote by using the setPrimaryInstance() function: MySQL|db1:3306 ssl|JS> shell.connect("clusteradmin@db1:3306"); MySQL|db1:3306 ssl|JS> cluster.setPrimaryInstance("db1:3306"); Setting instance 'db1:3306' as the primary instance of cluster 'my_innodb_cluster'... Instance 'db2:3306' was switched from PRIMARY to SECONDARY. Instance 'db3:3306' remains SECONDARY. Instance 'db1:3306' was switched from SECONDARY to PRIMARY. WARNING: The cluster internal session is not the primary member anymore. For cluster management operations please obtain a fresh cluster handle using .getCluster(). The instance 'db1:3306' was successfully elected as primary. We just promoted db1 as the new primary component, replacing db2 while db3 remains as the secondary node. Shutting Down the Cluster The best way to shut down the cluster gracefully by stopping the MySQL Router service first (if it's running) on the application server: $ myrouter/stop.sh The above step provides cluster protection against accidental writes by the applications. Then shutdown one database node at a time using the standard MySQL stop command, or perform system shutdown as you wish: $ systemctl stop mysql Starting the Cluster After a Shutdown If your cluster suffers from a complete outage or you want to start the cluster after a clean shutdown, you can ensure it is reconfigured correctly using dba.rebootClusterFromCompleteOutage() function. It simply brings a cluster back ONLINE when all members are OFFLINE. In the event that a cluster has completely stopped, the instances must be started and only then can the cluster be started. Thus, ensure all MySQL servers are started and running. On every database node, see if the mysqld process is running: $ ps -ef | grep -i mysql Then, pick one database server to be the primary node and connect to it via MySQL shell: MySQL|JS> shell.connect("clusteradmin@db1:3306"); Run the following command from that host to start them up: MySQL|db1:3306 ssl|JS> cluster = dba.rebootClusterFromCompleteOutage() You will be presented with the following questions: After the above completes, you can verify the cluster status: MySQL|db1:3306 ssl|JS> cluster.status() At this point, db1 is the primary node and the writer. The rest will be the secondary members. If you would like to start the cluster with db2 or db3 as the primary, you could use the shell.connect() function to connect to the corresponding node and perform the rebootClusterFromCompleteOutage() from that particular node. You can then start the MySQL Router service (if it's not started) and let the application connect to the cluster again. Setting Member and Cluster Options To get the cluster-wide options, simply run: MySQL|db1:3306 ssl|JS> cluster.options() The above will list out the global options for the replica set and also individual options per member in the cluster. This function changes an InnoDB Cluster configuration option in all members of the cluster. The supported options are: clusterName: string value to define the cluster name. exitStateAction: string value indicating the group replication exit state action. memberWeight: integer value with a percentage weight for automatic primary election on failover. failoverConsistency: string value indicating the consistency guarantees that the cluster provides. consistency: string value indicating the consistency guarantees that the cluster provides. expelTimeout: integer value to define the time period in seconds that cluster members should wait for a non-responding member before evicting it from the cluster. autoRejoinTries: integer value to define the number of times an instance will attempt to rejoin the cluster after being expelled. disableClone: boolean value used to disable the clone usage on the cluster. Similar to other function, the output can be filtered in map structure. The following command will only list out the options for db2: MySQL|db1:3306 ssl|JS> cluster.options().defaultReplicaSet.topology["db2:3306"] You can also get the above list by using the help() function: MySQL|db1:3306 ssl|JS> cluster.help("setOption") The following command shows an example to set an option called memberWeight to 60 (from 50) on all members: MySQL|db1:3306 ssl|JS> cluster.setOption("memberWeight", 60) Setting the value of 'memberWeight' to '60' in all ReplicaSet members ... Successfully set the value of 'memberWeight' to '60' in the 'default' ReplicaSet. We can also perform configuration management automatically via MySQL Shell by using setInstanceOption() function and pass the database host, the option name and value accordingly: MySQL|db1:3306 ssl|JS> cluster = dba.getCluster() MySQL|db1:3306 ssl|JS> cluster.setInstanceOption("db1:3306", "memberWeight", 90) The supported options are: exitStateAction: string value indicating the group replication exit state action. memberWeight: integer value with a percentage weight for automatic primary election on failover. autoRejoinTries: integer value to define the number of times an instance will attempt to rejoin the cluster after being expelled. label a string identifier of the instance. Switching to Multi-Primary/Single-Primary Mode By default, InnoDB Cluster is configured with single-primary, only one member capable of performing reads and writes at one given time. This is the safest and recommended way to run the cluster and suitable for most workloads.  However, if the application logic can handle distributed writes, it's probably a good idea to switch to multi-primary mode, where all members in the cluster are able to process reads and writes at the same time. To switch from single-primary to multi-primary mode, simply use the switchToMultiPrimaryMode() function: MySQL|db1:3306 ssl|JS> cluster.switchToMultiPrimaryMode() Switching cluster 'my_innodb_cluster' to Multi-Primary mode... Instance 'db2:3306' was switched from SECONDARY to PRIMARY. Instance 'db3:3306' was switched from SECONDARY to PRIMARY. Instance 'db1:3306' remains PRIMARY. The cluster successfully switched to Multi-Primary mode. Verify with: MySQL|db1:3306 ssl|JS> cluster.status() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "db1:3306": { "address": "db1:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db2:3306": { "address": "db2:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db3:3306": { "address": "db3:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" } }, "topologyMode": "Multi-Primary" }, "groupInformationSourceMember": "db1:3306" } In multi-primary mode, all nodes are primary and able to process reads and writes. When sending a new connection via MySQL Router on single-writer port (6446), the connection will be sent to only one node, as in this example, db1: (app-server)$ for i in {1..3}; do mysql -usbtest -p -h192.168.10.40 -P6446 -e 'select @@hostname, @@read_only, @@super_read_only'; done +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db1 | 0 | 0 | +------------+-------------+-------------------+ +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db1 | 0 | 0 | +------------+-------------+-------------------+ +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db1 | 0 | 0 | +------------+-------------+-------------------+ If the application connects to the multi-writer port (6447), the connection will be load balanced via round robin algorithm to all members: (app-server)$ for i in {1..3}; do mysql -usbtest -ppassword -h192.168.10.40 -P6447 -e 'select @@hostname, @@read_only, @@super_read_only'; done +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db2 | 0 | 0 | +------------+-------------+-------------------+ +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db3 | 0 | 0 | +------------+-------------+-------------------+ +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db1 | 0 | 0 | +------------+-------------+-------------------+ As you can see from the output above, all nodes are capable of processing reads and writes with read_only = OFF. You can distribute safe writes to all members by connecting to the multi-writer port (6447), and send the conflicting or heavy writes to the single-writer port (6446). To switch back to the single-primary mode, use the switchToSinglePrimaryMode() function and specify one member as the primary node. In this example, we chose db1: MySQL|db1:3306 ssl|JS> cluster.switchToSinglePrimaryMode("db1:3306"); Switching cluster 'my_innodb_cluster' to Single-Primary mode... Instance 'db2:3306' was switched from PRIMARY to SECONDARY. Instance 'db3:3306' was switched from PRIMARY to SECONDARY. Instance 'db1:3306' remains PRIMARY. WARNING: Existing connections that expected a R/W connection must be disconnected, i.e. instances that became SECONDARY. The cluster successfully switched to Single-Primary mode. At this point, db1 is now the primary node configured with read-only disabled and the rest will be configured as secondary with read-only enabled. MySQL InnoDB Cluster Scaling Operations Scaling Up (Adding a New DB Node) When adding a new instance, a node has to be provisioned first before it's allowed to participate with the replication group. The provisioning process will be handled automatically by MySQL. Also, you can check the instance state first whether the node is valid to join the cluster by using checkInstanceState() function as previously explained. To add a new DB node, use the addInstances() function and specify the host: MySQL|db1:3306 ssl|JS> cluster.addInstance("db3:3306") The following is what you would get when adding a new instance: Verify the new cluster size with: MySQL|db1:3306 ssl|JS> cluster.status() //or cluster.describe() MySQL Router will automatically include the added node, db3 into the load balancing set. Scaling Down (Removing a Node) To remove a node, connect to any of the DB nodes except the one that we are going to remove and use the removeInstance() function with the database instance name: MySQL|db1:3306 ssl|JS> shell.connect("clusteradmin@db1:3306"); MySQL|db1:3306 ssl|JS> cluster = dba.getCluster() MySQL|db1:3306 ssl|JS> cluster.removeInstance("db3:3306") The following is what you would get when removing an instance: Verify the new cluster size with: MySQL|db1:3306 ssl|JS> cluster.status() //or cluster.describe() MySQL Router will automatically exclude the removed node, db3 from the load balancing set. Adding a New Replication Slave We can scale out the InnoDB Cluster with asynchronous replication slave replicates from any of the cluster nodes. A slave is loosely coupled to the cluster, and will be able to handle a heavy load without affecting the performance of the cluster. The slave can also be a live copy of the database for disaster recovery purposes. In multi-primary mode, you can use the slave as the dedicated MySQL read-only processor to scale out the reads workload, perform analytices operation, or as a dedicated backup server. On the slave server, download the latest APT config package, install it (choose MySQL 8.0 in the configuration wizard), install the APT key, update repolist and install MySQL server. $ wget https://repo.mysql.com/apt/ubuntu/pool/mysql-apt-config/m/mysql-apt-config/mysql-apt-config_0.8.14-1_all.deb $ dpkg -i mysql-apt-config_0.8.14-1_all.deb $ apt-key adv --recv-keys --keyserver ha.pool.sks-keyservers.net 5072E1F5 $ apt-get update $ apt-get -y install mysql-server mysql-shell Modify the MySQL configuration file to prepare the server for replication slave. Open the configuration file via text editor: $ vim /etc/mysql/mysql.conf.d/mysqld.cnf And append the following lines: server-id = 1044 # must be unique across all nodes gtid-mode = ON enforce-gtid-consistency = ON log-slave-updates = OFF read-only = ON super-read-only = ON expire-logs-days = 7 Restart MySQL server on the slave to apply the changes: $ systemctl restart mysql On one of the InnoDB Cluster servers (we chose db3), create a replication slave user and followed by a full MySQL dump: $ mysql -uroot -p mysql> CREATE USER 'repl_user'@'192.168.0.44' IDENTIFIED BY 'password'; mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'192.168.0.44'; mysql> exit $ mysqldump -uroot -p --single-transaction --master-data=1 --all-databases --triggers --routines --events > dump.sql Transfer the dump file from db3 to the slave: $ scp dump.sql root@slave:~ And perform the restoration on the slave: $ mysql -uroot -p < dump.sql With master-data=1, our MySQL dump file will automatically configure the GTID executed and purged value. We can verify it with the following statement on the slave server after the restoration: $ mysql -uroot -p mysql> show global variables like '%gtid_%'; +----------------------------------+----------------------------------------------+ | Variable_name | Value | +----------------------------------+----------------------------------------------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | ON | | gtid_executed | d4790339-0694-11ea-8fd5-02f67042125d:1-45886 | | gtid_executed_compression_period | 1000 | | gtid_mode | ON | | gtid_owned | | | gtid_purged | d4790339-0694-11ea-8fd5-02f67042125d:1-45886 | +----------------------------------+----------------------------------------------+ Looks good. We can then configure the replication link and start the replication threads on the slave: mysql> CHANGE MASTER TO MASTER_HOST = '192.168.10.43', MASTER_USER = 'repl_user', MASTER_PASSWORD = 'password', MASTER_AUTO_POSITION = 1; mysql> START SLAVE; Verify the replication state and ensure the following status return 'Yes': mysql> show slave status\G ... Slave_IO_Running: Yes Slave_SQL_Running: Yes ... At this point, our architecture is now looking like this:   Common Issues with MySQL InnoDB Clusters Memory Exhaustion When using MySQL Shell with MySQL 8.0, we were constantly getting the following error when the instances were configured with 1GB of RAM: Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug (MySQL Error 1135) Upgrading each host's RAM to 2GB of RAM solved the problem. Apparently, MySQL 8.0 components require more RAM to operate efficiently. Lost Connection to MySQL Server In case the primary node goes down, you would probably see the "lost connection to MySQL server error" when trying to query something on the current session: MySQL|db1:3306 ssl|JS> cluster.status() Cluster.status: Lost connection to MySQL server during query (MySQL Error 2013) MySQL|db1:3306 ssl|JS> cluster.status() Cluster.status: MySQL server has gone away (MySQL Error 2006) The solution is to re-declare the object once more: MySQL|db1:3306 ssl|JS> cluster = dba.getCluster() MySQL|db1:3306 ssl|JS> cluster.status() At this point, it will connect to the newly promoted primary node to retrieve the cluster status. Node Eviction and Expelled In an event where communication between nodes is interrupted, the problematic node will be evicted from the cluster without any delay, which is not good if you are running on a non-stable network. This is what it looks like on db2 (the problematic node): 2019-11-14T07:07:59.344888Z 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.' 2019-11-14T07:07:59.371966Z 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.' Meanwhile from db1, it saw db2 was offline: 2019-11-14T07:07:44.086021Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address db2:3306 has become unreachable.' 2019-11-14T07:07:46.087216Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: db2:3306' To tolerate a bit of delay on node eviction, we can set a higher timeout value before a node is being expelled from the group. The default value is 0, which means expel immediately. Use the setOption() function to set the expelTimeout value: Thanks to Frédéric Descamps from Oracle who pointed this out: Instead of relying on expelTimeout, it's recommended to set the autoRejoinTries option instead. The value represents the number of times an instance will attempt to rejoin the cluster after being expelled. A good number to start is 3, which means, the expelled member will try to rejoin the cluster for 3 times, which after an unsuccessful auto-rejoin attempt, the member waits 5 minutes before the next try. To set this value cluster-wide, we can use the setOption() function: MySQL|db1:3306 ssl|JS> cluster.setOption("autoRejoinTries", 3) WARNING: Each cluster member will only proceed according to its exitStateAction if auto-rejoin fails (i.e. all retry attempts are exhausted). Setting the value of 'autoRejoinTries' to '3' in all ReplicaSet members ... Successfully set the value of 'autoRejoinTries' to '3' in the 'default' ReplicaSet.   Conclusion For MySQL InnoDB Cluster, most of the management and monitoring operations can be performed directly via MySQL Shell (only available from MySQL 5.7.21 and later). Tags:  MySQL innodb group replication clustering [Less]
Posted over 4 years ago by Severalnines
In the first part of this blog, we covered a deployment walkthrough of MySQL InnoDB Cluster with an example on how the applications can connect to the cluster via a dedicated read/write port. In this operation walkthrough, we are going to show ... [More] examples on how to monitor, manage and scale the InnoDB Cluster as part of the ongoing cluster maintenance operations. We’ll use the same cluster what we deployed in the first part of the blog. The following diagram shows our architecture: We have a three-node MySQL Group Replication and one application server running with MySQL router. All servers are running on Ubuntu 18.04 Bionic. MySQL InnoDB Cluster Command Options Before we move further with some examples and explanations, it's good to know that you can get an explanation of each function in MySQL cluster for cluster component by using the help() function, as shown below: $ mysqlsh MySQL|localhost:3306 ssl|JS> shell.connect("clusteradmin@db1:3306"); MySQL|db1:3306 ssl|JS> cluster = dba.getCluster(); MySQL|db1:3306 ssl|JS> cluster.help() The following list shows the available functions on MySQL Shell 8.0.18, for MySQL Community Server 8.0.18: addInstance(instance[, options])- Adds an Instance to the cluster. checkInstanceState(instance)- Verifies the instance gtid state in relation to the cluster. describe()- Describe the structure of the cluster. disconnect()- Disconnects all internal sessions used by the cluster object. dissolve([options])- Deactivates replication and unregisters the ReplicaSets from the cluster. forceQuorumUsingPartitionOf(instance[, password])- Restores the cluster from quorum loss. getName()- Retrieves the name of the cluster. help([member])- Provides help about this class and it's members options([options])- Lists the cluster configuration options. rejoinInstance(instance[, options])- Rejoins an Instance to the cluster. removeInstance(instance[, options])- Removes an Instance from the cluster. rescan([options])- Rescans the cluster. resetRecoveryAccountsPassword(options)- Reset the password of the recovery accounts of the cluster. setInstanceOption(instance, option, value)- Changes the value of a configuration option in a Cluster member. setOption(option, value)- Changes the value of a configuration option for the whole cluster. setPrimaryInstance(instance)- Elects a specific cluster member as the new primary. status([options])- Describe the status of the cluster. switchToMultiPrimaryMode()- Switches the cluster to multi-primary mode. switchToSinglePrimaryMode([instance])- Switches the cluster to single-primary mode. We are going to look into most of the functions available to help us monitor, manage and scale the cluster. Monitoring MySQL InnoDB Cluster Operations Cluster Status To check the cluster status, firstly use the MySQL shell command line and then connect as clusteradmin@{one-of-the-db-nodes}: $ mysqlsh MySQL|localhost:3306 ssl|JS> shell.connect("clusteradmin@db1:3306"); Then, create an object called "cluster" and declare it as "dba" global object which provides access to InnoDB cluster administration functions using the AdminAPI (check out MySQL Shell API docs): MySQL|db1:3306 ssl|JS> cluster = dba.getCluster(); Then, we can use the object name to call the API functions for "dba" object: MySQL|db1:3306 ssl|JS> cluster.status() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "primary": "db1:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "db1:3306": { "address": "db1:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db2:3306": { "address": "db2:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": "00:00:09.061918", "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db3:3306": { "address": "db3:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": "00:00:09.447804", "role": "HA", "status": "ONLINE", "version": "8.0.18" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db1:3306" } The output is pretty long but we can filter it out by using the map structure. For example, if we would like to view the replication lag for db3 only, we could do like the following: MySQL|db1:3306 ssl|JS> cluster.status().defaultReplicaSet.topology["db3:3306"].replicationLag 00:00:09.447804 Note that replication lag is something that will happen in group replication, depending on the write intensivity of the primary member in the replica set and the group_replication_flow_control_* variables. We are not going to cover this topic in detail here. Check out this blog post to understand further on the group replication performance and flow control. Another similar function is the describe() function, but this one is a bit more simple. It describes the structure of the cluster including all its information, ReplicaSets and Instances: MySQL|db1:3306 ssl|JS> cluster.describe() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "topology": [ { "address": "db1:3306", "label": "db1:3306", "role": "HA" }, { "address": "db2:3306", "label": "db2:3306", "role": "HA" }, { "address": "db3:3306", "label": "db3:3306", "role": "HA" } ], "topologyMode": "Single-Primary" } } Similarly, we can filter the JSON output using map structure: MySQL|db1:3306 ssl|JS> cluster.describe().defaultReplicaSet.topologyMode Single-Primary When the primary node went down (in this case, is db1), the output returned the following: MySQL|db1:3306 ssl|JS> cluster.status() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "primary": "db2:3306", "ssl": "REQUIRED", "status": "OK_NO_TOLERANCE", "statusText": "Cluster is NOT tolerant to any failures. 1 member is not active", "topology": { "db1:3306": { "address": "db1:3306", "mode": "n/a", "readReplicas": {}, "role": "HA", "shellConnectError": "MySQL Error 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 104", "status": "(MISSING)" }, "db2:3306": { "address": "db2:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db3:3306": { "address": "db3:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db2:3306" } Pay attention to the status OK_NO_TOLERANCE, where the cluster is still up and running but it can't tolerate any more failure after one over three node is not available. The primary role has been taken over by db2 automatically, and the database connections from the application will be rerouted to the correct node if they connect through MySQL Router. Once db1 comes back online, we should see the following status: MySQL|db1:3306 ssl|JS> cluster.status() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "primary": "db2:3306", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "db1:3306": { "address": "db1:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db2:3306": { "address": "db2:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db3:3306": { "address": "db3:3306", "mode": "R/O", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" } }, "topologyMode": "Single-Primary" }, "groupInformationSourceMember": "db2:3306" } It shows that db1 is now available but served as secondary with read-only enabled. The primary role is still assigned to db2 until something goes wrong to the node, where it will be automatically failed over to the next available node. Check Instance State We can check the state of a MySQL node before planning to add it into the cluster by using the checkInstanceState() function. It analyzes the instance executed GTIDs with the executed/purged GTIDs on the cluster to determine if the instance is valid for the cluster. The following shows instance state of db3 when it was in standalone mode, before part of the cluster: MySQL|db1:3306 ssl|JS> cluster.checkInstanceState("db3:3306") Cluster.checkInstanceState: The instance 'db3:3306' is a standalone instance but is part of a different InnoDB Cluster (metadata exists, instance does not belong to that metadata, and Group Replication is not active). If the node is already part of the cluster, you should get the following: MySQL|db1:3306 ssl|JS> cluster.checkInstanceState("db3:3306") Cluster.checkInstanceState: The instance 'db3:3306' already belongs to the ReplicaSet: 'default'. Monitor Any "Queryable" State With MySQL Shell, we can now use the built-in \show and \watch command to monitor any administrative query in real-time. For example, we can get the real-time value of threads connected by using: MySQL|db1:3306 ssl|JS> \show query SHOW STATUS LIKE '%thread%'; Or get the current MySQL processlist: MySQL|db1:3306 ssl|JS> \show query SHOW FULL PROCESSLIST We can then use \watch command to run a report in the same way as the \show command, but it refreshes the results at regular intervals until you cancel the command using Ctrl + C. As shown in the following examples: MySQL|db1:3306 ssl|JS> \watch query SHOW STATUS LIKE '%thread%'; MySQL|db1:3306 ssl|JS> \watch query --interval=1 SHOW FULL PROCESSLIST The default refresh interval is 2 seconds. You can change the value by using the --interval flag and specified a value from 0.1 up to 86400. MySQL InnoDB Cluster Management Operations Primary Switchover Primary instance is the node that can be considered as the leader in a replication group, that has the ability to perform read and write operations. Only one primary instance per cluster is allowed in single-primary topology mode. This topology is also known as replica set and is the recommended topology mode for Group Replication with protection against locking conflicts. To perform primary instance switchover, login to one of the database nodes as the clusteradmin user and specify the database node that you want to promote by using the setPrimaryInstance() function: MySQL|db1:3306 ssl|JS> shell.connect("clusteradmin@db1:3306"); MySQL|db1:3306 ssl|JS> cluster.setPrimaryInstance("db1:3306"); Setting instance 'db1:3306' as the primary instance of cluster 'my_innodb_cluster'... Instance 'db2:3306' was switched from PRIMARY to SECONDARY. Instance 'db3:3306' remains SECONDARY. Instance 'db1:3306' was switched from SECONDARY to PRIMARY. WARNING: The cluster internal session is not the primary member anymore. For cluster management operations please obtain a fresh cluster handle using .getCluster(). The instance 'db1:3306' was successfully elected as primary. We just promoted db1 as the new primary component, replacing db2 while db3 remains as the secondary node. Shutting Down the Cluster The best way to shut down the cluster gracefully by stopping the MySQL Router service first (if it's running) on the application server: $ myrouter/stop.sh The above step provides cluster protection against accidental writes by the applications. Then shutdown one database node at a time using the standard MySQL stop command, or perform system shutdown as you wish: $ systemctl stop mysql Starting the Cluster After a Shutdown If your cluster suffers from a complete outage or you want to start the cluster after a clean shutdown, you can ensure it is reconfigured correctly using dba.rebootClusterFromCompleteOutage() function. It simply brings a cluster back ONLINE when all members are OFFLINE. In the event that a cluster has completely stopped, the instances must be started and only then can the cluster be started. Thus, ensure all MySQL servers are started and running. On every database node, see if the mysqld process is running: $ ps -ef | grep -i mysql Then, pick one database server to be the primary node and connect to it via MySQL shell: MySQL|JS> shell.connect("clusteradmin@db1:3306"); Run the following command from that host to start them up: MySQL|db1:3306 ssl|JS> cluster = dba.rebootClusterFromCompleteOutage() You will be presented with the following questions: After the above completes, you can verify the cluster status: MySQL|db1:3306 ssl|JS> cluster.status() At this point, db1 is the primary node and the writer. The rest will be the secondary members. If you would like to start the cluster with db2 or db3 as the primary, you could use the shell.connect() function to connect to the corresponding node and perform the rebootClusterFromCompleteOutage() from that particular node. You can then start the MySQL Router service (if it's not started) and let the application connect to the cluster again. Setting Member and Cluster Options To get the cluster-wide options, simply run: MySQL|db1:3306 ssl|JS> cluster.options() The above will list out the global options for the replica set and also individual options per member in the cluster. This function changes an InnoDB Cluster configuration option in all members of the cluster. The supported options are: clusterName: string value to define the cluster name. exitStateAction: string value indicating the group replication exit state action. memberWeight: integer value with a percentage weight for automatic primary election on failover. failoverConsistency: string value indicating the consistency guarantees that the cluster provides. consistency: string value indicating the consistency guarantees that the cluster provides. expelTimeout: integer value to define the time period in seconds that cluster members should wait for a non-responding member before evicting it from the cluster. autoRejoinTries: integer value to define the number of times an instance will attempt to rejoin the cluster after being expelled. disableClone: boolean value used to disable the clone usage on the cluster. Similar to other function, the output can be filtered in map structure. The following command will only list out the options for db2: MySQL|db1:3306 ssl|JS> cluster.options().defaultReplicaSet.topology["db2:3306"] You can also get the above list by using the help() function: MySQL|db1:3306 ssl|JS> cluster.help("setOption") The following command shows an example to set an option called memberWeight to 60 (from 50) on all members: MySQL|db1:3306 ssl|JS> cluster.setOption("memberWeight", 60) Setting the value of 'memberWeight' to '60' in all ReplicaSet members ... Successfully set the value of 'memberWeight' to '60' in the 'default' ReplicaSet. We can also perform configuration management automatically via MySQL Shell by using setInstanceOption() function and pass the database host, the option name and value accordingly: MySQL|db1:3306 ssl|JS> cluster = dba.getCluster() MySQL|db1:3306 ssl|JS> cluster.setInstanceOption("db1:3306", "memberWeight", 90) The supported options are: exitStateAction: string value indicating the group replication exit state action. memberWeight: integer value with a percentage weight for automatic primary election on failover. autoRejoinTries: integer value to define the number of times an instance will attempt to rejoin the cluster after being expelled. label a string identifier of the instance. Switching to Multi-Primary/Single-Primary Mode By default, InnoDB Cluster is configured with single-primary, only one member capable of performing reads and writes at one given time. This is the safest and recommended way to run the cluster and suitable for most workloads.  However, if the application logic can handle distributed writes, it's probably a good idea to switch to multi-primary mode, where all members in the cluster are able to process reads and writes at the same time. To switch from single-primary to multi-primary mode, simply use the switchToMultiPrimaryMode() function: MySQL|db1:3306 ssl|JS> cluster.switchToMultiPrimaryMode() Switching cluster 'my_innodb_cluster' to Multi-Primary mode... Instance 'db2:3306' was switched from SECONDARY to PRIMARY. Instance 'db3:3306' was switched from SECONDARY to PRIMARY. Instance 'db1:3306' remains PRIMARY. The cluster successfully switched to Multi-Primary mode. Verify with: MySQL|db1:3306 ssl|JS> cluster.status() { "clusterName": "my_innodb_cluster", "defaultReplicaSet": { "name": "default", "ssl": "REQUIRED", "status": "OK", "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.", "topology": { "db1:3306": { "address": "db1:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db2:3306": { "address": "db2:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" }, "db3:3306": { "address": "db3:3306", "mode": "R/W", "readReplicas": {}, "replicationLag": null, "role": "HA", "status": "ONLINE", "version": "8.0.18" } }, "topologyMode": "Multi-Primary" }, "groupInformationSourceMember": "db1:3306" } In multi-primary mode, all nodes are primary and able to process reads and writes. When sending a new connection via MySQL Router on single-writer port (6446), the connection will be sent to only one node, as in this example, db1: (app-server)$ for i in {1..3}; do mysql -usbtest -p -h192.168.10.40 -P6446 -e 'select @@hostname, @@read_only, @@super_read_only'; done +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db1 | 0 | 0 | +------------+-------------+-------------------+ +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db1 | 0 | 0 | +------------+-------------+-------------------+ +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db1 | 0 | 0 | +------------+-------------+-------------------+ If the application connects to the multi-writer port (6447), the connection will be load balanced via round robin algorithm to all members: (app-server)$ for i in {1..3}; do mysql -usbtest -ppassword -h192.168.10.40 -P6447 -e 'select @@hostname, @@read_only, @@super_read_only'; done +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db2 | 0 | 0 | +------------+-------------+-------------------+ +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db3 | 0 | 0 | +------------+-------------+-------------------+ +------------+-------------+-------------------+ | @@hostname | @@read_only | @@super_read_only | +------------+-------------+-------------------+ | db1 | 0 | 0 | +------------+-------------+-------------------+ As you can see from the output above, all nodes are capable of processing reads and writes with read_only = OFF. You can distribute safe writes to all members by connecting to the multi-writer port (6447), and send the conflicting or heavy writes to the single-writer port (6446). To switch back to the single-primary mode, use the switchToSinglePrimaryMode() function and specify one member as the primary node. In this example, we chose db1: MySQL|db1:3306 ssl|JS> cluster.switchToSinglePrimaryMode("db1:3306"); Switching cluster 'my_innodb_cluster' to Single-Primary mode... Instance 'db2:3306' was switched from PRIMARY to SECONDARY. Instance 'db3:3306' was switched from PRIMARY to SECONDARY. Instance 'db1:3306' remains PRIMARY. WARNING: Existing connections that expected a R/W connection must be disconnected, i.e. instances that became SECONDARY. The cluster successfully switched to Single-Primary mode. At this point, db1 is now the primary node configured with read-only disabled and the rest will be configured as secondary with read-only enabled. MySQL InnoDB Cluster Scaling Operations Scaling Up (Adding a New DB Node) When adding a new instance, a node has to be provisioned first before it's allowed to participate with the replication group. The provisioning process will be handled automatically by MySQL. Also, you can check the instance state first whether the node is valid to join the cluster by using checkInstanceState() function as previously explained. To add a new DB node, use the addInstances() function and specify the host: MySQL|db1:3306 ssl|JS> cluster.addInstance("db3:3306") The following is what you would get when adding a new instance: Verify the new cluster size with: MySQL|db1:3306 ssl|JS> cluster.status() //or cluster.describe() MySQL Router will automatically include the added node, db3 into the load balancing set. Scaling Down (Removing a Node) To remove a node, connect to any of the DB nodes except the one that we are going to remove and use the removeInstance() function with the database instance name: MySQL|db1:3306 ssl|JS> shell.connect("clusteradmin@db1:3306"); MySQL|db1:3306 ssl|JS> cluster = dba.getCluster() MySQL|db1:3306 ssl|JS> cluster.removeInstance("db3:3306") The following is what you would get when removing an instance: Verify the new cluster size with: MySQL|db1:3306 ssl|JS> cluster.status() //or cluster.describe() MySQL Router will automatically exclude the removed node, db3 from the load balancing set. Adding a New Replication Slave We can scale out the InnoDB Cluster with asynchronous replication slave replicates from any of the cluster nodes. A slave is loosely coupled to the cluster, and will be able to handle a heavy load without affecting the performance of the cluster. The slave can also be a live copy of the Galera cluster for disaster recovery purposes. In multi-primary mode, you can use the slave as the dedicated MySQL read-only processor to scale out the reads workload, perform analytices operation, or as a dedicated backup server. On the slave server, download the latest APT config package, install it (choose MySQL 8.0 in the configuration wizard), install the APT key, update repolist and install MySQL server. $ wget https://repo.mysql.com/apt/ubuntu/pool/mysql-apt-config/m/mysql-apt-config/mysql-apt-config_0.8.14-1_all.deb $ dpkg -i mysql-apt-config_0.8.14-1_all.deb $ apt-key adv --recv-keys --keyserver ha.pool.sks-keyservers.net 5072E1F5 $ apt-get update $ apt-get -y install mysql-server mysql-shell Modify the MySQL configuration file to prepare the server for replication slave. Open the configuration file via text editor: $ vim /etc/mysql/mysql.conf.d/mysqld.cnf And append the following lines: server-id = 1044 # must be unique across all nodes gtid-mode = ON enforce-gtid-consistency = ON log-slave-updates = OFF read-only = ON super-read-only = ON expire-logs-days = 7 Restart MySQL server on the slave to apply the changes: $ systemctl restart mysql On one of the InnoDB Cluster servers (we chose db3), create a replication slave user and followed by a full MySQL dump: $ mysql -uroot -p mysql> CREATE USER 'repl_user'@'192.168.0.44' IDENTIFIED BY 'password'; mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'192.168.0.44'; mysql> exit $ mysqldump -uroot -p --single-transaction --master-data=1 --all-databases --triggers --routines --events > dump.sql Transfer the dump file from db3 to the slave: $ scp dump.sql root@slave:~ And perform the restoration on the slave: $ mysql -uroot -p < dump.sql With master-data=1, our MySQL dump file will automatically configure the GTID executed and purged value. We can verify it with the following statement on the slave server after the restoration: $ mysql -uroot -p mysql> show global variables like '%gtid%'; +----------------------------------+----------------------------------------------+ | Variable_name | Value | +----------------------------------+----------------------------------------------+ | binlog_gtid_simple_recovery | ON | | enforce_gtid_consistency | OFF | | gtid_executed | d4790339-0694-11ea-8fd5-02f67042125d:1-45886 | | gtid_executed_compression_period | 1000 | | gtid_mode | OFF | | gtid_owned | | | gtid_purged | d4790339-0694-11ea-8fd5-02f67042125d:1-45886 | | session_track_gtids | OFF | +----------------------------------+----------------------------------------------+ Looks good. We can then configure the replication link and start the replication threads on the slave: mysql> CHANGE MASTER TO MASTER_HOST = '192.168.10.43', MASTER_USER = 'repl_user', MASTER_PASSWORD = 'password', MASTER_AUTO_POSITION = 1; mysql> START SLAVE; Verify the replication state and ensure the following status return 'Yes': mysql> show slave status\G ... Slave_IO_Running: Yes Slave_SQL_Running: Yes ... At this point, our architecture is now looking like this:   Common Issues with MySQL InnoDB Clusters Memory Exhaustion When using MySQL Shell with MySQL 8.0, we were constantly getting the following error when the instances were configured with 1GB of RAM: Can't create a new thread (errno 11); if you are not out of available memory, you can consult the manual for a possible OS-dependent bug (MySQL Error 1135) Upgrading each host's RAM to 2GB of RAM solved the problem. Apparently, MySQL 8.0 components require more RAM to operate efficiently. Lost Connection to MySQL Server In case the primary node goes down, you would probably see the "lost connection to MySQL server error" when trying to query something on the current session: MySQL|db1:3306 ssl|JS> cluster.status() Cluster.status: Lost connection to MySQL server during query (MySQL Error 2013) MySQL|db1:3306 ssl|JS> cluster.status() Cluster.status: MySQL server has gone away (MySQL Error 2006) The solution is to re-declare the object once more: MySQL|db1:3306 ssl|JS> cluster = dba.getCluster() MySQL|db1:3306 ssl|JS> cluster.status() At this point, it will connect to the newly promoted primary node to retrieve the cluster status. Node Eviction and Expelled In an event where communication between nodes is interrupted, the problematic node will be evicted from the cluster without any delay, which is not good if you are running on a non-stable network. This is what it looks like on db2 (the problematic node): 2019-11-14T07:07:59.344888Z 0 [ERROR] [MY-011505] [Repl] Plugin group_replication reported: 'Member was expelled from the group due to network failures, changing member status to ERROR.' 2019-11-14T07:07:59.371966Z 0 [ERROR] [MY-011712] [Repl] Plugin group_replication reported: 'The server was automatically set into read only mode after an error was detected.' Meanwhile from db1, it saw db2 was offline: 2019-11-14T07:07:44.086021Z 0 [Warning] [MY-011493] [Repl] Plugin group_replication reported: 'Member with address db2:3306 has become unreachable.' 2019-11-14T07:07:46.087216Z 0 [Warning] [MY-011499] [Repl] Plugin group_replication reported: 'Members removed from the group: db2:3306' To tolerate a bit of delay on node eviction, we can set a higher timeout value before a node is being expelled from the group. The default value is 0, which means expel immediately. Use the setOption() function to set the expelTimeout value: Thanks to Frédéric Descamps from Oracle who pointed this out: Instead of relying on expelTimeout, it's recommended to set the autoRejoinTries option instead. The value represents the number of times an instance will attempt to rejoin the cluster after being expelled. A good number to start is 3, which means, the expelled member will try to rejoin the cluster for 3 times, which after an unsuccessful auto-rejoin attempt, the member waits 5 minutes before the next try. To set this value cluster-wide, we can use the setOption() function: MySQL|db1:3306 ssl|JS> cluster.setOption("autoRejoinTries", 3) WARNING: Each cluster member will only proceed according to its exitStateAction if auto-rejoin fails (i.e. all retry attempts are exhausted). Setting the value of 'autoRejoinTries' to '3' in all ReplicaSet members ... Successfully set the value of 'autoRejoinTries' to '3' in the 'default' ReplicaSet.   Conclusion For MySQL InnoDB Cluster, most of the management and monitoring operations can be performed directly via MySQL Shell (only available from MySQL 5.7.21 and later). Tags:  MySQL innodb group replication clustering [Less]
Posted over 4 years ago by Dave Stokes
LIVE WEBINAR: Wednesday, November 27, 2019. 9AM PDT. Migrating to MySQL 8.0 from 5.6 or 5.7. Planning to upgrade to 8.0? Join us to learn the easy steps and best practices for a smooth upgrade. Register now!All opinions expressed in this blog are those of Dave Stokes who is actually amazed to find anyone else agreeing with him
Posted over 4 years ago by parvez alam
Today, I will describe MySQL database connectivity with golang. MySQL is most popular open source relational database. I will let you know step by step how to golang connect with MySQL database. I assumed you have configured golang environment ... [More] within your system, if not please configure golang environment into your system. The golang have MySQL […] The post How to Connect Golang with MySQL appeared first on Phpflow.com. [Less]
Posted over 4 years ago by MySQL Performance Blog
In this blog post, we will see how to use perf (a.k.a.: perf_events) together with Flame Graphs. They are used to generate a graphical representation of what functions are being called within our software of choice. Percona Server for MySQL is used ... [More] here, but it can be extended to any software you can take a resolved stack trace from. Before moving forward, a word of caution. As with any profiling tool, DON’T run this in production systems unless you know what you are doing. Installing Packages Needed For simplicity, I’ll use commands for CentOS 7, but things should be the same for Debian-based distros (apt-get install linux-tools-$(uname -r) instead of the yum command is the only difference in the steps). To install perf, simply issue: SHELL> sudo yum install -y perf To get Flame Graphs project: SHELL> mkdir -p ~/src SHELL> cd ~/src SHELL> git clone https://github.com/brendangregg/FlameGraph That’s it! We are good to go. Capturing Samples Flame Graphs are a way of visualizing data, so we need to have some samples we can base off of. There are three ways in which we can do this. (Note that we will use the -p flag to only capture data from our process of interest, but we can potentially capture data from all the running processes if needed.) 1- Capture for a set amount of time only (ten seconds here): SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- sleep 10 2- Capture until we send the interrupt signal (CTRL-C): SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) 3- Capture for the whole lifetime of the process: SHELL> sudo perf record -a -F 99 -g -- /sbin/mysqld \ --defaults-file=/etc/percona-server.conf.d/mysqld.cnf --user=mysql or SHELL> sudo perf record -a -F 99 -g -p $(pgrep -x mysqld) -- mysql -e "SELECT * FROM db.table" We are forced to capture data from all processes in the first case of the third variant since it’s impossible to know the process ID (PID) number beforehand (with the command executed, we are actually starting the MySQL service). This type of command comes in handy when you want to have data from the exact beginning of the process, which is not possible otherwise. In the second variant, we are running a query on an already-running MySQL service, so we can use the -p flag to capture data on the server process. This is handy if you want to capture data at the exact moment a job is running, for instance. Preparing the Samples After the initial capture, we will need to make the collected data “readable”. This is needed because it is stored in binary format by perf record. For this we will use: SHELL> sudo perf script > perf.script It will read perf.data by default, which is the same default perf record uses for its output file. It can be overridden by using the -i flag and -o flag, respectively. We will now be able to read the generated text file, as it will be in a human-readable form. However, when doing so, you will quickly realize why we need to aggregate all this data into a more intelligible form. Generating the Flame Graphs We can do the following in a one-liner, by piping the output of the first as input to the second. Since we didn’t add the FlameGraph git folder to our path, we will need to use full paths. SHELL> ~/src/FlameGraph/stackcollapse-perf.pl perf.script | ~/src/FlameGraph/flamegraph.pl > flamegraph.svg We can now open the .svg file in any browser and start analyzing the information-rich graphs. How Does it Look? As an example, I will leave full commands, their outputs, and a screenshot of a flame graph generated by the process using data capture method #2. We will run an INSERT INTO … SELECT query to the database, so we can then analyze its execution. SHELL> time sudo perf record -a -F 99 -g \ -p $(pgrep -x mysqld) \ -- mysql test -e "INSERT INTO joinit SELECT NULL, uuid(), time(now()),  (FLOOR( 1 + RAND( ) *60 )) FROM joinit;" Warning: PID/TID switch overriding SYSTEM [ perf record: Woken up 7 times to write data ] [ perf record: Captured and wrote 1.909 MB perf.data (8214 samples) ] real 1m24.366s user 0m0.133s sys 0m0.378s SHELL> sudo perf script | \ ~/src/FlameGraph/stackcollapse-perf.pl perf.script | \ ~/src/FlameGraph/flamegraph.pl > mysql_select_into_flamegraph.svg The keen-eyed reader will notice we went one step further here and joined steps #2 and #3 via a pipe (|) to avoid writing to and reading from the perf.script output file. Additionally, there are time outputs so we can get an estimation on the amount of data the tool generates (~2Mb in 1min 25secs); this will, of course, vary depending on many factors, so take it with a pinch of salt, and test in your own environment. The resulting flame graph is: One clear candidate for optimization is work around write_record: if we can make that function faster, there is a lot of potential for reducing overall execution time (squared in blue in the bottom left corner, we can see a total of ~60% of the samples were taken in this codepath). In the last section below we link to a blog post explaining more on how to interpret a Flame Graph, but for now, know you can mouse-over the function names and it will dynamically change the information shown at the bottom left corner. You may also visualize it better with the following guides in place: Conclusion For the Support team, we use this procedure in many cases where we need to have an in-depth view of what MySQL is executing, and for how long. This way, we can have a better insight into what operations are behind a specific workload and act accordingly. This procedure can be used either for optimizing or troubleshooting and is a very powerful tool in our tool belt! It’s known that humans are better at processing images rather than text, and this tool exploits that brilliantly, in my opinion. Related links Interpreting Flame Graphs (scroll down to the “Flame Graph Interpretation” section) Flame Graphs 201, a great webinar by Marcos, if you want to dig deeper into this Of course, Brendan Gregg (the mastermind behind the Flame Graph project) has even more information on this [Less]
Posted over 4 years ago by ScaleGrid.io
By having appropriate indexes on your MySQL tables, you can greatly enhance the performance of SELECT queries. But, did you know that adding indexes to your tables in itself is an expensive operation, and may take a long time to complete depending ... [More] on the size of your tables? During this time, you are also likely to experience a degraded performance of queries as your system resources are busy in index-creation work as well. In this blog post, we discuss an approach to optimize the MySQL index creation process in such a way that your regular workload is not impacted. MySQL Rolling Index Creation We call this approach a ‘Rolling Index Creation’ - if you have a MySQL master-slave replica set, you can create the index one node at a time in a rolling fashion. You should create the index only on the slave nodes so the master’s performance is not impacted. When index creation is completed on the slaves, we demote the current master and promote one of the slaves that is up-to-date as the new master. At this time, the index building continues on the original master node (which is a slave now). There will be a short duration (tens of seconds) during which you will lose connectivity to your database due to the failover, but this can be overcome by having application-level retries. Performance Benefits of Rolling Index Creation We did a small experiment to understand the performance benefits of Rolling Index Creation. The test utilized a MySQL dataset created using Sysbench which had 3 tables with 50 million rows each. We generated load on the MySQL master with 30 clients running a balanced workload (50% reads and 50% writes) for 10 minutes, and at the same time, built a simple secondary index on one of the tables in two scenarios: Creating the index directly on the master Creating the index on the slave MySQL Test Bed Configuration MySQL Instance Type EC2 instance m4.large with 8GB RAM Deployment Type 2 Node Master-Slave Set with Semisynchronous Replication MySQL Version 5.7.25 Performance Results Scenario Workload Throughput (Queries Per Second) 95th Percentile Latency Index Creation on Master 453.63 670 ms Rolling Index Creation 790.03 390 ms Takeaway By running index creation directly on the MySQL master, we could experience only 60% of the throughput that was achieved by running index creation on the MySQL slave through a rolling operation. The 95th percentile latency of queries was also 1.8 times higher when the index creation happened on the master server. Best Practice for Creating Indexes on your #MySQL TablesClick To Tweet Automating the Rolling Index Creation ScaleGrid automates the Rolling Index Creation for your MySQL deployment with a simple user interface to initiate it. In the UI above, you can select your Database and Table name, and ‘Add Index’ as the Alter Table Operation. Then, specify a Column Name and Index Name, and an Alter Table Command will be generated and displayed for you. Once you click to Create, the index creation will happen one node at a time in a rolling fashion. Additionally, ScaleGrid also supports other simple Alter Table operations like adding a new column to your table in a rolling fashion. Stay tuned for my follow-on blog post with more details! [Less]