I Use This!
High Activity

News

Analyzed about 15 hours ago. based on code collected 1 day ago.
Posted over 4 years ago by Olivier Dasini
Presentation of some of the new features of MySQL 8.0.18 released on October 14, 2019.
Posted over 4 years ago by MySQL Performance Blog
Making backups over the network can be done in two ways: either save on disk and transfer or just transfer without saving. Both ways have their strong and weak points. The second way, particularly, is highly dependent on the upload speed, which ... [More] would either reduce or increase the backup time. Other factors that influence it are chunk size and the number of upload threads. Percona XtraBackup 2.4.14 has gained S3 streaming, which is the capability to upload backups directly to s3-compatible storage without saving locally first. This feature was developed because we wanted to improve the upload speeds of backups in Percona Operator for XtraDB Cluster. There are many implementations of S3 Compatible Storage: AWS S3, Google Cloud Storage, Digital Ocean Spaces, Alibaba Cloud OSS, MinIO, and Wasabi. We’ve measured the speed of AWS CLI, gsutil, MinIO client, rclone, gof3r and the xbcloud tool (part of Percona XtraBackup) on AWS (in single and multi-region setups) and on Google Cloud. XtraBackup was compared in two variants: a default configuration and one with tuned chunk size and amount of uploading threads. Here are the results. AWS (Same Region) The backup data was streamed from the AWS EC2 instance to the AWS S3, both in the us-east-1 region.     tool settings CPU max mem speed speed comparison AWS CLI default settings 66% 149Mb 130MiB/s baseline AWS CLI 10Mb block, 16 threads 68% 169Mb 141MiB/s +8% MinIO client not changeable 10% 679Mb 59MiB/s -55% rclone rcat not changeable 102% 7138Mb 139MiB/s +7% gof3r default settings 69% 252Mb 97MiB/s -25% gof3r 10Mb block, 16 threads 77% 520Mb 108MiB/s -17% xbcloud default settings 10% 96Mb 25MiB/s -81% xbcloud 10Mb block, 16 threads 60% 185Mb 134MiB/s +3%   Tip: If you run MySQL on an EC2 instance to make backups inside one region, do snapshots instead. AWS (From US to EU) The backup data was streamed from AWS EC2 in us-east-1 to AWS S3 in eu-central-1.     tool settings CPU max mem speed speed comparison AWS CLI default settings 31% 149Mb 61MiB/s baseline AWS CLI 10Mb block, 16 threads 33% 169Mb 66MiB/s +8% MinIO client not changeable 3% 679Mb 20MiB/s -67% rclone rcat not changeable 55% 9307Mb 77MiB/s +26% gof3r default settings 69% 252Mb 97MiB/s +59% gof3r 10Mb block, 16 threads 77% 520Mb 108MiB/s +77% xbcloud default settings 4% 96Mb 10MiB/s -84% xbcloud 10Mb block, 16 threads 59% 417Mb 123MiB/s +101%   Tip: Think about disaster recovery, and what will you do when the whole region is not available. It makes no sense to back up to the same region; always transfer backups to another region. Google Cloud (From US to EU) The backup data were streamed from Compute Engine instance in us-east1 to Cloud Storage europe-west3. Interestingly, Google Cloud Storage supports both native protocol and S3(interoperability) API. So, Percona XtraBackup can transfer data to Google Cloud Storage directly via S3(interoperability) API.   tool settings CPU max mem speed speed comparison gsutil not changeable, native protocol 8% 246Mb 23MiB/s etalon rclone rcat not changeable, native protocol 6% 61Mb 16MiB/s -30% xbcloud default settings, s3 protocol 3% 97Mb 9MiB/s -61% xbcloud 10Mb block, 16 threads, s3 protocol 50% 417Mb 133MiB/s +478%   Tip: A cloud provider can block your account due to many reasons, such as human or robot mistakes, inappropriate content abuse after hacking, credit card expire, sanctions, etc. Think about disaster recovery and what will you do when a cloud provider blocks your account, it may make sense to back up to another cloud provider or on-premise. Conclusion xbcloud tool (part of Percona XtraBackup) is 2-5 times faster with tuned settings on long-distance with native cloud vendor tools, and 14% faster and requires 20% less memory than analogs with the same settings. Also, xbcloud is the most reliable tool for transferring backups to S3-compatible storage because of two reasons: It calculates md5 sums during the uploading and puts them into a .md5/filename.md5 file and verifies sums on the download (gof3r does the same). xbcloud sends data in 10mb chunks and resends them if any network failure happens. PS: Please find instructions on GitHub if you would like to reproduce this article’s results. [Less]
Posted over 4 years ago by The Pythian Group
It’s not uncommon these days for us to use a high availability stack for MySQL consisting of Orchestrator, Consul, and ProxySQL. You can read more details about this stack by reading Matthias Crauwels’ blog post How To Autoscale ProxySQL In The ... [More] Cloud as well as Ivan Groenwold’s post on MySQL High Availability With ProxySQL, Consul And Orchestrator. But the high-level concept is simply that Orchestrator will monitor the state of the MySQL replication topology and report changes to Consul which in turn can update ProxySQL hosts using a tool called consul-template. Until now we’ve typically implemented the ProxySQL portion of this stack using an autoscaling group of sorts due to the high levels of CPU usage that can be associated with ProxySQL. It’s better to be able to scale up and down as traffic increases and decreases because this ensures you’re not paying for resources that you don’t need; however, this comes with a few disadvantages. The first of which is the amount of time it takes to scale up. If you are using an autoscaling group and a new instance is launched, the following steps will need to be taken: There will be a request to your cloud service provider for a new VM instance. Once the instance is up and running as part of the group, it will need to install ProxySQL along with supporting packages such as consul (agent) and consul-template. Once the packages are installed, they will need to be configured to work with the consul server nodes as well as the ProxySQL nodes that are participating in the ProxySQL cluster. The new ProxySQL host will announce to Consul that it’s available, which in turn will update all the other participating nodes in the ProxySQL cluster. This can take time. Provisioning a new VM instance usually happens fairly quickly, normally within a couple minutes, but sometimes there can be unexpected delays. You can speed up package installation by using a custom machine image, but there is an operational overhead with keeping images up to date with the latest versions of the installed packages, so it may be easier to do this using a script that always installs the latest versions. All in all, you can expect a scale up to take more than a minute. The next issue is how deterministic this solution is. If you’re not using a custom machine image, you’ll need to pull down your config and template files from somewhere, most likely a storage bucket, and there’s a chance that those files could be overwritten – meaning that the next time an instance is launched by the autoscaler it may not necessarily have the same configuration as the rest of the hosts participating in the ProxySQL cluster. We can take this already impressive stack and take it another step further using Docker containers and Kubernetes. For those of you who are unfamiliar with containerization: a container is similar to a virtual machine snapshot but is not a full snapshot that would include the OS; instead, it contains just the binary that’s required to run your process. You create this image using a Dockerfile, typically starting from a specified Linux distribution, and then use verbs like RUN, COPY, and USER to specify what should be included in your container “image”. Once this image is constructed, it can be centrally located in a repository and made available for usage by machines using a containerization platform like Docker. This method of deployment has become more and more popular in recent years due to the fact that containers are lightweight, and you know that if the container works on one system it will work exactly the same way when it’s moved to a different system, thus reducing common issues like dependencies and configuration variations from host to host. Given that we want to be able to scale up and down, it’s safe to say we’re going to want to run more than one container. That’s where Kubernetes comes into play. Kubernetes is a container management platform that operates on an array of hosts (virtual or physical) and distributes containers on them as specified by your configuration, typically a YAML-format Kubernetes deployment file. If you’re using Google Kubernetes Engine (GKE) on Google Cloud Platform (GCP), this is even easier as the vast majority of the work in creating a Kubernetes deployment (referred to as a ‘workload’ in GKE) YAML is handled for you via a simple UI within the GCP Console. If you want to learn more about Docker or Kubernetes, I highly recommend Nigel Poulton’s video content on Pluralsight. For now, let’s stick to learning about ProxySQL on this platform. If we want ProxySQL to run in Kubernetes and operate with our existing stack with Consul and Orchestrator, we’re going to need to keep best practices in mind for our containers. Each container should run only a single process. We know that we’re working with ProxySQL, consul (agent), and consul-template, so these will all need to be in their own containers. The primary process running in each container should run as PID 1. The primary process running in each container should not run as root. Log output from the primary process in the container should be sent to STDOUT so that it can be collected by Docker logs. Containers should be as deterministic as possible – meaning they should run the same (or at least as much as possible) regardless of what environment they are deployed in. The first thing in the list above that popped out is the need to have ProxySQL, consul-template, and consul (agent) isolated within their own containers. These are going to need to work together given that consul (agent) is acting as our communication conduit back to consul (server) hosts and consul-template is what updates ProxySQL based on changes to keys and values in Consul. So how can they work together like this if they are in separate containers? The solution is provided by Kubernetes. When you’re thinking about Docker, the smallest computational unit is the container; however, when you’re thinking about Kubernetes, the smallest computational unit is the pod which can contain one or more containers. Any containers operating within the same pod can communicate with one another using localhost ports. So in this case, assuming you’re using default ports, the consul-template container can communicate to the consul (agent) container using localhost port 8500 and it can communicate to the ProxySQL container using port 6032 given that these three containers will be working together in the same pod. So let’s start looking at some code, starting with the simplest container and then working our way to the most complex. Consul (Agent) Container Below is a generic version of the Dockerfile that I’m using for consul (agent). The objective is to install Consul and then instruct it to connect as an agent to the Consul cluster comprised of the consul (server) nodes. FROM centos:7 RUN yum install -q -y unzip wget && \ yum clean all RUN groupadd consul && \ useradd -r -g consul -d /var/lib/consul consul RUN mkdir /opt/consul && \ mkdir /etc/consul && \ mkdir /var/log/consul && \ mkdir /var/lib/consul && \ chown -R consul:consul /opt/consul && \ chown -R consul:consul /etc/consul && \ chown -R consul:consul /var/log/consul && \ chown -R consul:consul /var/lib/consul RUN wget -q -O /opt/consul/consul.zip https://releases.hashicorp.com/consul/1.6.1/consul_1.6.1_linux_amd64.zip && \ unzip /opt/consul/consul.zip -d /opt/consul/ && \ rm -f /opt/consul/consul.zip && \ ln -s /opt/consul/consul /usr/local/bin/consul COPY supportfiles/consul.conf.json /etc/consul/ USER consul ENTRYPOINT ["/usr/local/bin/consul", "agent", "--config-file=/etc/consul/consul.conf.json"] Simply put, the code above follows these instructions: Start from CentOS 7. This is a personal preference of mine. There are probably more lightweight distributions that can be considered, such as Alpine as recommended by Google, but I’m not the best OS nerd out there so I wanted to stick with what I know. Install our dependencies, which in this case is unzip and wget. Create our consul user, group, and directory structure. Install consul. Copy over the consul config file from the host where the Docker build is being performed. Switch to the consul user. Start consul (agent). Now let’s check the code and see if it matches best practices. Container runs a single process The ENTRYPOINT runs Consul directly, meaning that nothing else is being run. Keep in mind that ENTRYPOINT specifies what should be run when the container starts. This means that when the container starts it won’t have to install anything because the packages come with the image as designated by the Dockerfile, but we still need to launch Consul when the container starts. Process should be PID 1 Any process ran by ENTRYPOINT will run as PID 1. Process should not be run as root We switched to the Consul user prior to starting the entrypoint. Log output should go to STDOUT If you run Consul using the command noted in the ENTRYPOINT, you’ll see that log output goes to STDOUT Should be as deterministic as possible We’ve copied the configuration file into the container, meaning that the container doesn’t have to get support files from anywhere else before Consul starts. The only way the nature of Consul will change is if we recreate the container image with a new configuration file. There’s really nothing special about the Consul configuration file that gets copied into the container. You can see an example of this by checking the aforementioned blog posts by Matthias or Ivan for this particular HA stack. ProxySQL Container Below is a generic version of the Dockerfile that I’m using for ProxySQL. The objective is to install ProxySQL and make it available to receive traffic requests on 6033 for write traffic, 6034 for read traffic, and 6032 for the admin console which is how consul-template will interface with ProxySQL. FROM centos:7 RUN groupadd proxysql && \   useradd -r -g proxysql proxysql RUN yum install -q -y https://github.com/sysown/proxysql/releases/download/v2.0.6/proxysql-2.0.6-1-centos67.x86_64.rpm mysql curl && \   yum clean all COPY supportfiles/* /opt/supportfiles/ COPY startstop/* /opt/ RUN chmod +x /opt/entrypoint.sh RUN chown proxysql:proxysql /etc/proxysql.cnf USER proxysql ENTRYPOINT ["/opt/entrypoint.sh"] Simply put, the code above follows these instructions: Start from CentOS 7. Create our ProxySQL user and group. Install ProxySQL and dependencies, which in this case is curl which while will be used to poll the GCP API in order to determine what region the ProxySQL cluster is in. We’ll cover this in more detail below. Move our configuration files and entrypoint script to the container. Make sure that the ProxySQL config file is readable by ProxySQL.  Switch to the ProxySQL user. Start ProxySQL via the entrypoint script that’s provided with the container. In my use case, I have multiple ProxySQL clusters – one per GCP region. They have to be logically grouped together in order to ensure they route read traffic to replicas within the local region but send traffic to the master regardless of what region it’s in. In my solution, a hostgroup is noted for read replicas in each region, so my mysql_query_rules table needs to be configured accordingly. In my solution, the MySQL hosts will be added to different host groups, but the routing to each hostgroup would remain consistent. Given that it’s highly unlikely to change, I have mysql_query_rules configured in the configuration file. This means that I need to select the correct configuration file based on my region before starting ProxySQL, and this is where my entrypoint script comes into play. Let’s have a look at a simplified and more generic version of my code: #!/bin/bash dataCenter=$(curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H "Metadata-Flavor: Google" | awk -F "/" '{print $NF}' | cut -d- -f1,2) ... case $dataCenter in   us-central1)     cp -f /opt/supportfiles/proxysql-us-central1.cnf /etc/proxysql.cnf     ;;   us-east1)     cp -f /opt/supportfiles/proxysql-us-east1.cnf /etc/proxysql.cnf     ;; esac ... exec proxysql -c /etc/proxysql.cnf -f -D /var/lib/proxysql The script starts by polling the GCP API to determine what region the container has been launched in. Based on the result, it will copy the correct config file to the appropriate location and then start ProxySQL. Let’s see how the combination of the Dockerfile and the entrypoint script allows us to meet best practices. Container runs a single process ENTRYPOINT calls the entrypoint.sh script, which does some conditional logic based on the regional location of the container and then ends by running ProxySQL. This means that at the end of the process ProxySQL will be the only process running. Process should be PID 1 The command “exec” at the end of the entrypoint script will start ProxySQL as PID 1 Process should not be run as root We switched to the proxysql user prior to starting the entrypoint. Log output should go to STDOUT If you run proxysql using the command noted at the end of the entrypoint script you’ll see that log output goes to STDOUT Should be as deterministic as possible We’ve copied the potential configuration files into the container. Unlike Consul, there are multiple configuration files and we need to determine which will be used based on the region that the container lives in, but the configuration files themselves will not change unless the container image itself is updated. This ensures that all containers running within the same region will behave the same. Consul-template container Below is a generic version of the Dockerfile that I’m using for consul-template. The objective is to install consul-template and have it act as the bridge between Consul via the consul (agent) container and ProxySQL, updating ProxySQL as needed when keys and values change in Consul. FROM centos:7 RUN yum install -q -y unzip wget mysql nmap-ncat curl && \   yum clean all RUN groupadd consul && \   useradd -r -g consul -d /var/lib/consul consul RUN mkdir /opt/consul-template && \   mkdir /etc/consul-template && \   mkdir /etc/consul-template/templates && \   mkdir /etc/consul-template/config && \   mkdir /opt/supportfiles && \   mkdir /var/log/consul/ && \   chown -R consul:consul /etc/consul-template && \   chown -R consul:consul /etc/consul-template/templates && \   chown -R consul:consul /etc/consul-template/config && \   chown -R consul:consul /var/log/consul RUN wget -q -O /opt/consul-template/consul-template.zip https://releases.hashicorp.com/consul-template/0.22.0/consul-template_0.22.0_linux_amd64.zip && \   unzip /opt/consul-template/consul-template.zip -d /opt/consul-template/ && \   rm -f /opt/consul-template/consul-template.zip && \   ln -s /opt/consul-template/consul-template /usr/local/bin/consul-template RUN chown -R consul:consul /opt/consul-template COPY supportfiles/* /opt/supportfiles/ COPY startstop/* /opt/ RUN chmod +x /opt/entrypoint.sh USER consul ENTRYPOINT ["/opt/entrypoint.sh"] Simply put, the code above follows these instructions: Start from CentOS 7. Install our dependencies which are unzip, wget, mysql (client), nmap-ncat, and curl. Create our Consul user and group. Create the consul-template directory structure. Download and install consul-template. Copy the configuration file, template files, and entrypoint script to the container. Make the entrypoint script executable. Switch to the Consul user. Start consul-template via the entrypoint script that’s provided with the container. Much like our ProxySQL container, we really need to look at the entrypoint here in order to get the whole story. Remember, this is multi-region so there is additional logic that has to be considered when working with template files. #!/bin/bash dataCenter=$(curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H "Metadata-Flavor: Google" | awk -F "/" '{print $NF}' | cut -d- -f1,2) ... cp /opt/supportfiles/consul-template-config /etc/consul-template/config/consul-template.conf.json case $dataCenter in   us-central1)     cp /opt/supportfiles/template-mysql-servers-us-central1 /etc/consul-template/templates/mysql_servers.tpl     ;;   us-east1)     cp /opt/supportfiles/template-mysql-servers-us-east1 /etc/consul-template/templates/mysql_servers.tpl     ;; esac cp /opt/supportfiles/template-mysql-users /etc/consul-template/templates/mysql_users.tpl ### Ensure that proxysql has started while ! nc -z localhost 6032; do   sleep 1; done ### Ensure that consul agent has started while ! nc -z localhost 8500; do   sleep 1; done exec /usr/local/bin/consul-template --config=/etc/consul-template/config/consul-template.conf.json This code is very similar to the entrypoint file that was used for ProxySQL in the sense that it checks for the region that the container is in and then moves configuration and template files into the appropriate location, but there is some additional logic here that checks to ensure that ProxySQL is up and listening on 6032 and that consul (agent) is up and listening on port 8500. The reason for this is the consul-template needs to be able to communicate with both of these hosts. You really have no assurance as to what container is going to load in what order in a pod, so to avoid excessive errors in the consul-template log, I have it wait until it knows that its dependent services are running. Let’s go through our best practices checklist one more time against our consul-template container code. Container runs a single process. ENTRYPOINT calls the entrypoint.sh script, which does some conditional logic based on the regional location of the container and then ends by running consul-template. This means that at the end of the process consul-template will be the only process running. Process should be PID 1. The command “exec” at the end of the entrypoint script will start consul-template as PID 1. Process should not be run as root. We switched to the coinsul user prior to starting the entrypoint. Log output should go to STDOUT. If you run Consul using the command noted at the end of the entrypoint script, you’ll see that log output goes to STDOUT. Should be as deterministic as possible. Just like ProxySQL and consul (agent), all the supporting files are packaged with the container. Yes, there is logic to determine what files should be used, but you have the assurance that the files won’t change unless you create a new version of the container image. Putting it all together Okay, we have three containers that represent the three processes that we need to package together so ProxySQL can work as part of our HA stack. Now we need to put it all together in a pod so that Kubernetes can have it run against our resources. In my use case, I’m running this on GCP, meaning that once my containers have been built they are going to need to be pushed up to the Google Container Registry. Once that’s done we can create our workload to run our pod and specify how many pods we want to run. Getting this up and running can be done with just a few short and simple steps: Create a Kubernetes cluster if you don’t already have one. This is what provisions the Cloud Compute VMs that the pods will run on. Push your three Docker images to the Google container registry. This makes the images available for use by the Kubernetes engine. Create your Kubernetes workload, which can be done simply via the user interface in the GCP console. All that’s required is selecting the latest version of the three containers that you’ve pushed up to the registry, optionally applying some metadata like an application name, Kubernetes namespace, and labels, then selecting which cluster you want to run the workload on. Once you click deploy, the containers will spin up and, assuming there are no issues bringing the containers online, you’ll quickly have a functioning ProxySQL pod in Kubernetes that follows these high-level steps: The pod is started. The three containers will start. In Kubernetes, pods are fully atomic. All the containers start without error or the pod will not consider itself started. The consul-template container will poll consul (agent) and ProxySQL on their respective ports until it’s confirmed that those processes have started and then consul-template will start. Consul-template will create the new SQL files meant to configure ProxySQL based on the contents of the Consul key/value store. Consul-template will run the newly created SQL files against ProxySQL via its admin interface. The pod is now ready to receive traffic. The YAML During the process of creating your workload, or even after the fact, you’ll be able to see the YAML that you’d normally have to create with standard Kubernetes deployments. Let’s have a look at the YAML that was created for my particular deployment. apiVersion: apps/v1 kind: Deployment metadata:   annotations:     deployment.kubernetes.io/revision: "1"   creationTimestamp: "2019-10-16T15:41:37Z"   generation: 64   labels:     app: pythian-proxysql     env: sandbox   name: pythian-proxysql   namespace: pythian-proxysql   resourceVersion: "7516809"   selfLink: /apis/apps/v1/namespaces/pythian-proxysql/deployments/pythian-proxysql   uid: 706c6284-f02b-11e9-8f3e-42010a800050 spec:   minReadySeconds: 10   progressDeadlineSeconds: 600   replicas: 2   revisionHistoryLimit: 10   selector:     matchLabels:       app: pythian-proxysql       env: sandbox   strategy:     rollingUpdate:       maxSurge: 100%       maxUnavailable: 25%     type: RollingUpdate   template:     metadata:       creationTimestamp: null       labels:         app: pythian-proxysql         env: sandbox     spec:       containers:       - image: gcr.io/pythian-proxysql/pythian-proxysql-proxysql@sha256:3ba95101eb7a5aac58523e4c6489956869865452d1cbdbd32b4186a44f2a4500         imagePullPolicy: IfNotPresent         name: pythian-proxysql-proxysql-sha256         resources: {}         terminationMessagePath: /dev/termination-log         terminationMessagePolicy: File       - image: gcr.io/pythian-proxysql/pythian-proxysql-consul-agent@sha256:7c66fa5e630c4a0d70d662ec8e9d988c05bd471b43323a47e240294fc00a153d         imagePullPolicy: IfNotPresent         name: pythian-proxysql-consul-agent-sha256         resources: {}         terminationMessagePath: /dev/termination-log         terminationMessagePolicy: File       - image: gcr.io/pythian-proxysql/pythian-proxysql-consul-template@sha256:1e70f4b96614dfd865641bf75784d895a794775a6c51ce6b368387591f3f1918         imagePullPolicy: IfNotPresent         name: pythian-proxysql-consul-template-sha256         resources: {}         terminationMessagePath: /dev/termination-log         terminationMessagePolicy: File       dnsPolicy: ClusterFirst       restartPolicy: Always       schedulerName: default-scheduler       securityContext: {}       terminationGracePeriodSeconds: 30 status:   availableReplicas: 2   collisionCount: 1   conditions:   - lastTransitionTime: "2019-10-16T15:41:37Z"     lastUpdateTime: "2019-11-11T15:56:55Z"     message: ReplicaSet "pythian-proxysql-8589fdbf54" has successfully progressed.     reason: NewReplicaSetAvailable     status: "True"     type: Progressing   - lastTransitionTime: "2019-11-11T20:41:31Z"     lastUpdateTime: "2019-11-11T20:41:31Z"     message: Deployment has minimum availability.     reason: MinimumReplicasAvailable     status: "True"     type: Available   observedGeneration: 64   readyReplicas: 2   replicas: 2   updatedReplicas: 2 The first thing I have to point out is that this is a LOT of YAML that we didn’t have to create given that it was all handled by the Google Kubernetes Engine. This is a huge part of easing the process which allows us to get our solution working so quickly. However, despite the fact that we have a lot of YAML created for us, there are still some occasions where we may need to modify this manually, such as working with Kubernetes Container Lifecycle Hooks, or working with requests or limits for hardware resources for individual containers in our pod. How do I access my ProxySQL instance? One consideration for Kubernetes is that when pods are started and stopped they will get an ephemeral IP address, so you don’t want to have your applications connect to your pods directly. Kubernetes has a feature called a “service” that allows your pods to be exposed via a consistent network interface. This service can also handle load balancing, which is what I’m planning on using with my Kubernetes deployment. Adding a service to your GKE workload is very simple and can be added with a few clicks. Autoscaling As noted earlier in this post, before the implementation of Kubernetes for this solution, it was recommended to use cloud compute autoscaling groups in order to handle fluctuations in traffic. We’re going to want to include the same strategy with Kubernetes to ensure we have enough pods available to handle traffic demand. Including autoscaling in your workload is also fairly simple and can be done via the console UI. One important thing to note about scaling with Kubernetes is the time it takes to scale up and down. In the intro section of this post, I noted the process of adding and removing nodes from an autoscaling group and how that can take minutes to achieve depending on how quickly your cloud provider can stand up a new instance and the complexity of your configuration. With Kubernetes, I’ve seen my pods scale up in as little as three seconds and scale down in less than one second. This is part of what makes this solution so powerful. Considerations for Connections During Scale Up and Down One important thing to note is that, with pods being added to and removed from the workload, your connections to ProxySQL via the exposed service can be interrupted. It’s noted in the autoscaling documentation that this can cause disruption and your application needs to be able to handle this much in the same way that it would need to have to handle this for a cloud compute autoscaling group. You’ll want to ensure that your application has retry on database failure logic built in before incorporating Kubernetes autoscaling (or any autoscaling for that matter) as part of your data platform. Considerations for MySQL users in ProxySQL There are three tables that are replicated when working with ProxySQL cluster: mysql_servers, mysql_query_rules, and mysql_users – meaning that when a change to any of these tables is made on one of the nodes in the cluster, it will be replicated to all the other nodes.  We really don’t need to worry about this when working with mysql_servers given that all nodes will get their mysql_server information from Consul via consul-template, so I’ve disabled this clustering feature. With my particular use case I don’t need to worry about mysql_query_rules either because, as noted earlier in this post, my traffic is being routed based on the port that traffic is being sent to. The rules for this are simple and should not change so I have it in the configuration file and I have disabled replicating this table, as well. The last table to consider is mysql_users and this is where things get interesting. Remember that with Kubernetes it’s possible to have persistent storage, but we really want our containers to be as stateless as possible, so if we were to follow the Docker and Kubernetes philosophy as closely as possible we wouldn’t want to have our data persist. This falls into the whole cattle vs pets discussion when working with containers, but I digress. Let’s assume we’ve opted NOT to persist our ProxySQL data, typically stored in SQLite, and we lose all of the pods in our Kuberenetes cluster. Unlikely, but we always need to be ready for disaster. When the first pod comes up, it’s starting with a blank slate and this isn’t a problem considering it will get its initial set of mysql_server data from Consul via consul-template and it’ll get its mysql_query_rules data from the config file. However, there is no source of truth for mysql_users data, so all that data would be lost. In this case, we need to incorporate some source of truth for the ProxySQL mysql_users table. It’s possible to use a cloud compute VM with ProxySQL installed that could be an ever-present member of the cluster which could seed data for new joining pods, but that breaks our construct of working specifically with containers. Plus, if you have a multi-cluster configuration like I do where there is one cluster in each region, then you need one ProxySQL “master host” in each region, which is a bit of a waste considering it’s just acting as a source of truth for mysql_users, which likely will be the same across all clusters. My solution, in this case, is to leverage the source of truth that we already have in place: Consul. If it’s already acting as a source of truth for mysql_servers, there’s no reason why it can’t act as a source of truth for this as well. All I need to do is have my MySQL users and password hashes (always stay secure) in Consul and use consul-template to create these on new ProxySQL host, or change them as keys and values change. You may have noticed this in the entrypoint script in my consul-template container. To Cluster or Not To Cluster? I mentioned before that ProxySQL cluster handles the replication of three tables: mysql_users, mysql_query_rules, and mysql_servers. Considering that all three of these tables now have their own source of truth, we really don’t need to worry about replicating this data. As changes are reported to Consul, it will update all the ProxySQL pods considering that all of them have consul (agent) and consul-template containers as part of the pod. With this in mind, I’ve opted to rely on my constructed sources of truth and reduce solution complexity by removing ProxySQL clustering; however, this is going to vary from use case to use case. Conclusion The solution implemented in this use case has required the inclusion of a lot of new technologies that MySQL DBAs may or may not have familiarity with: ProxySQL, Orchestrator, Consul, GTIDs, etc. We’ve made this solution a little more complex by adding Docker and Kubernetes to the stack, but I personally believe this complexity is worth it considering the higher degree of idempotency that is built into the solution, the lack of need for ProxySQL clustering, and the speed in which scale up and scale down occurs. One last consideration is the simple need for learning how to incorporate containers into your stack. This is not my first blog post on container philosophy and implementation and I believe that containers are going to become a greater part of the landscape for all of us – even us, the database professionals with our highly stateful technological challenges. If you have not already started educating yourself on these technologies, I would highly encourage you to do so in order to better prepare yourself for the shift from “Database Administrator” to “Database Reliability Engineer” [Less]
Posted over 4 years ago by Selva
Problem : We are running 5 node percona cluster on Ubuntu 16.04, and its configured with master-slave replication. Suddenly we got an alert for replica broken from slave server, which was earlier configured with normal replication  We have tried ... [More] to sync the data and configure the replication, unable to fix that immediately due to huge transactions and GTID enabled servers. So we have decided to follow with innobackupex tool, and problem fixed in 2 hours  Followed all the steps from percona doc and shared the experience in my environment  Steps involving to repair the broken Replication : 1.Backup master server  2.Prepare the backup  3.Restore and Configure the Replication 4Check Replication Status 1.Backup master server  We need to configure the complete master server database into a slave. So we are taking a full backup from the master server, before proceeding the backup we should check disk space available for the backup because of its system-level backup  We have created a specific user for taking a backup from master server, once the backup is completed will get OK like below, 2. Prepare the backup for RestoreWe need to prepare the backups to apply the transaction logs into data files, once it is OK, data files are ready to restore  Before moving the prepared files into slave server, verify the GTID information from xtrabackup_binlog_info  3.Restore and Configure the ReplicationWe can restore the backup else to create a new data directory and move the files into the directory. We have followed created a new directory and change datadir values in mysqld.conf file. Once changed the data directory, we need to change the owner and permission of the MySQL data directory  chown mysql:mysql /mnt/mysqldatanew And restart the service with a new data directory, once its started login with master MySQL root user password. Because we have taken file backup from the master so metadata will be the same of master Execute below commands to configure the replication  4. Check Replication StatusOnce slave is configured, verify the replication status as below  Also, Slave has retrieved a new transaction  Thanks for Reading !!! [Less]
Posted over 4 years ago by Selva
Problem : We are running 5 node percona cluster on Ubuntu 16.04, and its configured with master-slave replication. Suddenly we got an alert for replica broken from slave server, which was earlier configured with normal replication  We have tried ... [More] to sync the data and configure the replication, unable to fix that immediately due to huge transactions and GTID enabled servers. So we have decided to follow with innobackupex tool, and problem fixed in 2 hours  Followed all the steps from percona doc and shared the experience in my environment  Steps involving to repair the broken Replication : 1.Backup master server  2.Prepare the backup  3.Restore and Configure the Replication 4Check Replication Status 1.Backup master server  We need to configure the complete master server database into a slave. So we are taking a full backup from the master server, before proceeding the backup we should check disk space available for the backup because of its system-level backup  We have created a specific user for taking a backup from master server, once the backup is completed will get OK like below, 2. Prepare the backup for RestoreWe need to prepare the backups to apply the transaction logs into data files, once it is OK, data files are ready to restore  Before moving the prepared files into slave server, verify the GTID information from xtrabackup_binlog_info  3.Restore and Configure the ReplicationWe can restore the backup else to create a new data directory and move the files into the directory. We have followed created a new directory and change datadir values in mysqld.conf file. Once changed the data directory, we need to change the owner and permission of the MySQL data directory  chown mysql:mysql /mnt/mysqldatanew And restart the service with a new data directory, once its started login with master MySQL root user password. Because we have taken file backup from the master so metadata will be the same of master Execute below commands to configure the replication  4. Check Replication StatusOnce slave is configured, verify the replication status as below  Also, Slave has retrieved a new transaction  Thanks for Reading !!! [Less]
Posted over 4 years ago by Mark Callaghan
Reducing response time variance is important for some workloads. This post explains sources of variance for workloads with high write rates when the index structure is an LSM or a B-Tree. I previously wrote about this in my post on durability ... [More] debt.Short summary: For a given write rate stalls are more likely with a B-Tree than an LSM Many RocksDB write stalls can be avoided via configuration Write stalls with a B-Tree are smaller but more frequent versus an LSM Write stalls are more likely when the redo log isn't forced on commit The worst case difference between an LSM and B-Tree is larger when the working set isn't cached Life is easier but more expensive when the working set fits in cache Less write amplification saves IO for other uses Less short summary: Write stalls for an LSM occur when compaction has trouble keeping up with the incoming write rate. The worst stalls occur at write rates that a B-Tree could not sustain. One way to mitigate stalls is to reduce the write rate. Another way is to use an index structure that doesn't support or is inefficient for range scans (see index+log). The cost from configuring RocksDB to avoid write stalls is more CPU overhead on reads as there will be more data in the upper levels of the LSM. I am partly to blame for the default configuration in RocksDB that throttles writes when the LSM tree gets too much data in the L0, L1 and L2. But that configuration can be changed. SQLite4  has a clever LSM designed for systems that don't allow background threads. It implements a pay as you go approach to durability debt. A traditional LSM takes the opposite approach - it defers the IO cost to the background. RocksDB has optional write throttling and work has been done to smooth the impact from it but it is not solved. A B-Tree in the worst-case (buffer pool full & mostly dirty, working set not cached) also implements pay as you go approach. I almost always disable sync-on-commit for benchmarks because I want to observe how the DBMS observes under stress and less commit latency means more writes/second and more IO stress. See item #6 where I argue that it is good to not have the working set cached. A common rule of thumb has been to keep all indexes in cache or all of the working set in cache. That simplifies tuning and makes it easier to avoid performance problems. But that also might force a deployment to use 2X more HW than it needs because NAND flash SSDs are everywhere and the response time difference between reading from RAM and reading from NAND flash might not matter for many applications. But if you are using a DBMS in the cloud that charges by the IO, then keeping the working set in RAM might be a good idea. An LSM usually has less write-amp than a B-Tree. So the IO capacity it saves from that can be used elsewhere to support more read or write transactions. Worst case behavior I am wary of faster is better. I prefer nuance but I also know that people don't have time to read long blog posts like this or long performance reports. Here I explain worst case behavior in terms of IO overheads. Worst case behavior isn't the only way to judge an index structure but it helps me to explain performance. Another way is to measure the average amount of IO per transaction (in operations and KB) and treat IO efficiency as important. I describe worst case behavior for a write operation under a few scenarios. By worst case I mean the largest amount of IO done in the foreground (the thread handling the write) as that determines the response time. I ignore the work done in the background which favors an LSM because that defers more work to the background. For a B-Tree I ignore undo and page splits. The write is a SQL update which is read-modify-write, as opposed to a blind-write like a Put with RocksDB. Finally, I assume the update isn't to an indexed column. The scenarios are: Cached, PK only - working set cached, PK index only Not cached, PK only - working set not cached, PK index only Cached, PK and secondary index - working set cached, PK and non-unique secondary index Not cached, PK and secondary index - working set not cached, PK and non-unique secondary index  PK only For the cached, PK only scenario neither an LSM nor a B-Tree do IO in the foreground with the exception of the redo log fsync. Stalls are unlikely for both but more likely with a B-Tree especially when the DBMS storage uses a spinning disk. An LSM writes the redo log buffer, optionally syncs the redo log and then does an insert into the memtable. Both memtable flush and Ln:Ln+1 compaction are deferred to background threads. If memtable flush were too slow then there are write stalls until flush catches up to avoid too many memtables wasting memory. A B-Tree modifies a page in the buffer pool, writes the redo log buffer and optionally syncs the redo log. If checkpoint were too slow a full redo log can't be rotated until checkpoint catches up and there are write stalls. For the not cached, PK only scenario the work done in the foreground is 1 IO/update for an LSM and 2 IO/update for a B-Tree. Here a B-Tree uses a pay as you go model. An LSM reads a page into the block cache and then repeats the work described in cached, PK only.  A B-Tree finds a dirty page to evict, writes that page back to storage, then reads the desired page into that slot in the buffer pool and repeats the work described in cached, PK only. PK and secondary index For the cached, PK and secondary index scenario there is approximately twice as much work to be done per update compared to the cached, PK only scenario. Thus stalls are more likely here. But other than the optional redo fsync there is no foreground IO for the LSM and B-Tree. An LSM repeats the work explained in the cached, PK only scenario. For the secondary index it does an additional insert to the memtable which is also logged as redo. This can double the demand for compaction. A B-Tree repeats the work explained in the cached, PK only scenario. For the secondary index it makes an additional page dirty in the buffer pool. This can double the demand for page write back. For the not cached, PK and secondary index scenario the foreground IO difference between an LSM and B-Tree is more significant -- 1 IO for the LSM vs 4 IO for the B-Tree -- ignoring the redo log overhead. The IO difference is reduced from 1:4 to approximately 1:2 for a B-Tree like InnoDB that implements a change buffer. An LSM does the union of the work described in not cached, PK only and cached, PK and secondary index scenarios. Ignoring the optional redo fsync the cost is 1 read IO for the PK index and no reads for the secondary index because non-unique secondary index maintenance is read-free. A B-Tree repeats the work explained in the cached, PK only scenario but this is done for both the PK and secondary indexes. Thus the cost is 2 IOs to write back dirty pages and then 2 IOs to read pages from the PK and secondary indexes into the buffer pool and then make them dirty -- which then requires redo log writes. So the cost for this is 4 IOs ignoring the redo log. Make writes fast: LSMWrites can be fast with an LSM because most of the IO cost is deferred but that also increases the need to throttle writes. Life is good as long as that deferred cost can be repaid fast enough, otherwise there will be more response time variance.Flush and compaction are the deferred cost for an LSM write. Flush means writing the memtable to an SST on storage. Compaction means merging SSTs to move flushed data from the root to leaf of the LSM tree. Compaction costs more than flush. RocksDB can stall writes when compaction doesn't keep up with ingest. Ingest creates durability debt, compaction reduces it and write stalls are there to bound the debt. Write stalls are enabled by default but can be disabled by configuration. Putting a bound on durability debt also puts a bound on read latency by reducing the number of SSTs that can exist in the L0, L1 and L2. So if you want to support extremely high write rates than choose one of: read stalls, write stalls.Make writes fast: B-TreeWrites can also be fast with a B-Tree as there are no page reads/writes to/form storage when the working set is cached and background page write back is fast enough. In that case the only IO work in the foreground is the optional redo log fsync.Page write back is the primary deferred cost for a B-Tree write. Most of my B-Tree experience is with InnoDB which does fuzzy checkpoint. The goal is to flush dirty pages before the current redo log segment gets full. Using larger redo log segments lets InnoDB defer write back for a longer time increasing the chance that more transactions will modify the page -- reducing write amplification and helping performance.Purge can be an additional deferred cost for a B-Tree write. I use the InnoDB name here as Postgres calls this vacuum. This is the process of reclaiming space from deleted rows that are no longer visible by open MVCC snapshots. The LSM equivalent of purge is checking the open snapshot list during compaction for KV pairs that are not the latest version of a given key to determine whether that version is still needed.When write back and purge are fast enough then write stalls should be infrequent with a B-Tree. But write back isn't always fast enough. A B-Tree write stall occurs when a write transaction must read a page into the buffer pool prior to modifying that page but 1) the buffer pool is full and 2) write back must be done for a dirty page before the memory can be reused.OtherA few other comments that didn't have a place above: In this post I assume the B-Tree uses no-force, but there is at least one nice OSS B-Tree that uses force. Making commit slower is another way to throttle writes and reduce the chance of stalled writes. Examples of this include redo log fsync, semisync or synchronous replication. The InnoDB change buffer is a wonderful feature that reduces the IO overhead for write-heavy workloads. NAND flash GC stalls are another source of write stalls. I wish more were written about this topic. Stalls during TRIM when using an LSM with NAND flash are another source of stalls. I wish there were more TRIM benchmarks. Smart friends tell me that NAND flash devices vary widely in their ability to handle TRIM. And they display different stall behavior when their TRIM capacity has been exceeded. Some of us were spoiled by FusionIO. [Less]
Posted over 4 years ago by Pinterest Engineering
Qi Li | Software Engineer, Real-time AnalyticsAt Pinterest, backend core services are in charge of various operations on pins, boards, and users from both Pinners and internal services. While Pinners’...
Posted over 4 years ago by Frederic Descamps
What is great with MySQL Shell Plugins, it’s that it provides you an infinite amount of possibilities. While I was writing the part I and part II of the check plugin, I realized I could extend it event more. The new methods I added to the plugin ... [More] are especially useful when you are considering to use MySQL InnoDB Cluster in Multi-Primary mode, but not only Let’s have a look at these new methods: These 4 new methods are targeting large queries or large transactions. It’s also possible to get the eventual hot spots. Let’s see the first two that are more basic in action: The first method (getQueryUpdatingSamePK()) is in fact the eventual hot spot. The second one just show the query updating the most records in the schema with the name big. Now, let’s have a look at the other two methods that I find more interesting. The first one, will show the transaction modifying the most records. Not the statement, but the transaction: As you can see, the plugin provides you the which thread it was and how many rows were affected. It also provides you the possibility to see all the statements inside that transaction ! And I think this is cool ! And the last method, provides us the same behavior but for the transaction containing the biggest amount of statements: MySQL 8.0 is really great and so is the Shell ! You can find this plugin and others on github: https://github.com/lefred/mysqlshell-plugins and don’t forget that pull requests are always welcome! [Less]
Posted over 4 years ago by Yves Trudeau
If you do a quick web search about UUIDs and MySQL, you’ll get a fair number of results. Here are just a few examples: Storing UUID and Generated Columns Storing UUID Values in MySQL Illustrating Primary Key models in InnoDB and their impact on ... [More] disk usage MySQL UUID Smackdown: UUID vs. INT for Primary Key GUID/UUID Performance Breakthrough To UUID or not to UUID? So, does a well-covered topic like this one needs any more attention? Well, apparently – yes. Even though most posts are warning people against the use of UUIDs, they are still very popular. This popularity comes from the fact that these values can easily be generated by remote devices, with a very low probability of collision. With this post, my goal is to summarize what has already been written by others and, hopefully, bring in a few new ideas. What are UUIDs? UUID stands for Universally Unique IDentifier and is defined in the RFC 4122. It is a 128 bits number, normally written in hexadecimal and split by dashes into five groups. A typical UUID value looks like: yves@laptop:~$ uuidgen 83fda883-86d9-4913-9729-91f20973fa52 There are officially 5 types of UUID values, version 1 to 5, but the most common are: time-based (version 1 or version 2) and purely random (version 3). The time-based UUIDs encode the number of 10ns since January 1st, 1970 in 7.5 bytes (60 bits), which is split in a “time-low”-“time-mid”-“time-hi” fashion. The missing 4 bits is the version number used as a prefix to the time-hi field.  This yields the 64 bits of the first 3 groups. The last 2 groups are the clock sequence, a value incremented every time the clock is modified and a host unique identifier. Most of the time, the MAC address of the main network interface of the host is used as a unique identifier. There are important points to consider when you use time-based UUID values: It is possible to determine the approximated time when the value was generated from the first 3 fields There are many repetitive fields between consecutive UUID values The first field, “time-low”, rolls over every 429s The MySQL UUID function produces version one values Here’s an example using the “uuidgen” Unix tool to generate time-based values: yves@laptop:~$ for i in $(seq 1 500); do echo "$(date +%s): $(uuidgen -t)"; sleep 1; done 1573656803: 572e4122-0625-11ea-9f44-8c16456798f1 1573656804: 57c8019a-0625-11ea-9f44-8c16456798f1 1573656805: 586202b8-0625-11ea-9f44-8c16456798f1 ... 1573657085: ff86e090-0625-11ea-9f44-8c16456798f1 1573657086: 0020a216-0626-11ea-9f44-8c16456798f1 ... 1573657232: 56b943b2-0626-11ea-9f44-8c16456798f1 1573657233: 57534782-0626-11ea-9f44-8c16456798f1 1573657234: 57ed593a-0626-11ea-9f44-8c16456798f1 ... The first field rolls over (at t=1573657086) and the second field is incremented. It takes about 429s to see similar values again for the first field. The third field changes only once per about a year. The last field is static on a given host, the MAC address is used on my laptop: yves@laptop:~$ ifconfig | grep ether | grep 8c ether 8c:16:45:67:98:f1 txqueuelen 1000 (Ethernet) The other frequently seen UUID version is 4, the purely random one. By default, the Unix “uuidgen” tool produces UUID version 4 values: yves@laptop:~$ for i in $(seq 1 3); do uuidgen; done 6102ef39-c3f4-4977-80d4-742d15eefe66 14d6e343-028d-48a3-9ec6-77f1b703dc8f ac9c7139-34a1-48cf-86cf-a2c823689a91 The only “repeated” value is the version, “4”, at the beginning of the 3rd field. All the other 124 bits are random. What is so Wrong with UUID Values? In order to appreciate the impact of using UUID values as a primary key, it is important to review how InnoDB organizes the data. InnoDB stores the rows of a table in the b-tree of the primary key. In database terminology, we call this a clustered index. The clustered index orders the rows automatically by the primary key. When you insert a new row with a random primary key value, InnoDB has to find the page where the row belongs, load it in the buffer pool if it is not already there, insert the row and then, eventually, flush the page back to disk. With purely random values and large tables, all b-tree leaf pages are susceptible to receive the new row, there are no hot pages. Rows inserted out of the primary key order cause page splits causing a low filling factor. For tables much larger than the buffer pool, an insert will very likely need to read a table page from disk. The page in the buffer pool where the new row has been inserted will then be dirty.  The odds the page will receive a second row before it needs to be flushed to disk are very low. Most of the time, every insert will cause two IOPs – one read and one write. The first major impact is on the rate of IOPs and it is a major limiting factor for scalability. The only way to get decent performance is thus to use storage with low latency and high endurance. That’s where you’ll the second major performance impact. With a clustered index, the secondary indexes use the primary key values as the pointers. While the leaves of the b-tree of the primary key store rows, the leaves of the b-tree of a secondary index store primary key values. Let’s assume a table of 1B rows having UUID values as primary key and five secondary indexes. If you read the previous paragraph, you know the primary key values are stored six times for each row. That means a total of 6B char(36) values representing 216 GB. That is just the tip of the iceberg, as tables normally have foreign keys, explicit or not, pointing to other tables. When the schema is based on UUID values, all these columns and indexes supporting them are char(36). I recently analyzed a UUID based schema and found that about 70 percent of storage was for these values. As if that’s not enough, there’s a third important impact of using UUID values. Integer values are compared up to 8 bytes at a time by the CPU but UUID values are compared char per char. Databases are rarely CPU bound, but nevertheless this adds to the latencies of the queries. If you are not convinced, look at this performance comparison between integers vs strings: mysql> select benchmark(100000000,2=3); +--------------------------+ | benchmark(100000000,2=3) | +--------------------------+ | 0 | +--------------------------+ 1 row in set (0.96 sec) mysql> select benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='df878007-80da-11e9-93dd-00163e000003'); +----------------------------------------------------------------------------------------------------+ | benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='df878007-80da-11e9-93dd-00163e000003') | +----------------------------------------------------------------------------------------------------+ | 0 | +----------------------------------------------------------------------------------------------------+ 1 row in set (27.67 sec) Of course, the above example is a worst-case scenario but it at least gives the span of the issue. Comparing integers is about 28 times faster. Even if the difference appears rapidly in the char values, it is still about 2.5 times slower: mysql> select benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='ef878007-80da-11e9-93dd-00163e000003'); +----------------------------------------------------------------------------------------------------+ | benchmark(100000000,'df878007-80da-11e9-93dd-00163e000002'='ef878007-80da-11e9-93dd-00163e000003') | +----------------------------------------------------------------------------------------------------+ | 0 | +----------------------------------------------------------------------------------------------------+ 1 row in set (2.45 sec) Let’s explore a few solutions to address those issues. Size of the Values The default representation for UUID, hash, and token values is often the hexadecimal notation. With a cardinality, the number of possible values, of only 16 per byte, it is far from efficient. What about using another representation like base64 or even straight binary? How much do we save? How is the performance affected? Let’s begin by the base64 notation. The cardinality of each byte is 64 so it takes 3 bytes in base64 to represent 2 bytes of actual value. A UUID value consists of 16 bytes of data, if we divide by 3, there is a remainder of 1. To handle that, the base64 encoding adds ‘=’ at the end: mysql> select to_base64(unhex(replace(uuid(),'-',''))); +------------------------------------------+ | to_base64(unhex(replace(uuid(),'-',''))) | +------------------------------------------+ | clJ4xvczEeml1FJUAJ7+Fg== | +------------------------------------------+ 1 row in set (0.00 sec) If the length of the encoded entity is known, like for a UUID, we can remove the ‘==’, as it is just dead weight. A UUID encoded in base64 thus has a length of 22. The next logical step is to directly store the value in binary format. This the most optimal format but displaying the values in the mysql client is less convenient. So, how’s the size impacting performance? To illustrate the impact, I inserted random UUID values in a table with the following definition… CREATE TABLE `data_uuid` ( `id` char(36) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; … for the default hexadecimal representation. For base64, the ‘id’ column is defined as char(22) while binary(16) is used for the binary example. The database server has a buffer pool size at 128M and its IOPs are limited to 500. The insertions are done over a single thread. Insertion rates for tables using different representation for UUID values In all cases, the insertion rate is at first CPU bound but as soon the table is larger than the buffer pool, the insertion rapidly becomes IO bound. This is expected and shouldn’t surprise anyone. The use of a smaller representation for the UUID values just allows more rows to fit in the buffer pool but in the long run, it doesn’t really help the performance, as the random insertion order dominates. If you are using random UUID values as primary keys, your performance is limited by the amount of memory you can afford. Option 1: Saving IOPs with Pseudo-Random Order As we have seen, the most important issue is the random nature of the values. A new row may end up in any of the table leaf pages. So unless the whole table is loaded in the buffer pool, it means a read IOP and eventually a write IOP. My colleague David Ducos gave a nice solution to this problem but some customers do not want to allow for the possibility of extracting information from the UUID values, like, for example, the generation timestamp. What if we somewhat just reduce then the randomness of the values in a way that a prefix of a few bytes is constant for a time interval? During the time interval, only a fraction of the whole table, corresponding to the cardinality of the prefix, would be required to be in the memory to save the read IOPs. This would also increase the likelihood a page receives a second write before being flushed to disk, thus reducing the write load. Let’s consider the following UUID generation function: drop function if exists f_new_uuid; delimiter ;; CREATE DEFINER=`root`@`%` FUNCTION `f_new_uuid`() RETURNS char(36) NOT DETERMINISTIC BEGIN DECLARE cNewUUID char(36); DECLARE cMd5Val char(32); set cMd5Val = md5(concat(rand(),now(6))); set cNewUUID = concat(left(md5(concat(year(now()),week(now()))),4),left(cMd5Val,4),'-', mid(cMd5Val,5,4),'-4',mid(cMd5Val,9,3),'-',mid(cMd5Val,13,4),'-',mid(cMd5Val,17,12)); RETURN cNewUUID; END;; delimiter ; The first four characters of the UUID value comes from the MD5 hash of the concatenation of the current year and week number. This value is, of course, static over a week. The remaining of the UUID value comes from the MD5 of a random value and the current time at a precision of 1us. The third field is prefixed with a “4” to indicate it is a version 4 UUID type. There are 65536 possible prefixes so, during a week, only 1/65536 of the table rows are required in the memory to avoid a read IOP upon insertion. That’s much easier to manage, a 1TB table will need to have only about 16MB in the buffer pool to support the inserts. Option 2: Mapping UUIDs to Integers Even if you use pseudo-ordered UUID values stored using binary(16), it is still a very large data type which will inflate the size of the dataset. Remember the primary key values are used as pointers in the secondary indexes by InnoDB. What if we store all the UUID values of a schema in a mapping table? The mapping table will be defined as: CREATE TABLE `uuid_to_id` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `uuid` char(36) NOT NULL, `uuid_hash` int(10) unsigned GENERATED ALWAYS AS (crc32(`uuid`)) STORED NOT NULL, PRIMARY KEY (`id`), KEY `idx_hash` (`uuid_hash`) ) ENGINE=InnoDB AUTO_INCREMENT=2590857 DEFAULT CHARSET=latin1; It is important to notice the uuid_to_id table does not enforce the uniqueness of uuid. The idx_hash index acts a bit like a bloom filter. We’ll know for sure a UUID value is not present in the table when there is no matching hash value but we’ll have to validate with the stored UUID value when there is a matching hash. To help us here, let’s create a SQL function: DELIMITER ;; CREATE DEFINER=`root`@`%` FUNCTION `f_uuid_to_id`(pUUID char(36)) RETURNS int(10) unsigned DETERMINISTIC BEGIN DECLARE iID int unsigned; DECLARE iOUT int unsigned; select get_lock('uuid_lock',10) INTO iOUT; SELECT id INTO iID FROM uuid_to_id WHERE uuid_hash = crc32(pUUID) and uuid = pUUID; IF iID IS NOT NULL THEN select release_lock('uuid_lock') INTO iOUT; SIGNAL SQLSTATE '23000' SET MESSAGE_TEXT = 'Duplicate entry', MYSQL_ERRNO = 1062; ELSE insert into uuid_to_id (uuid) values (pUUID); select release_lock('uuid_lock') INTO iOUT; set iID = last_insert_id(); END IF; RETURN iID; END ;; DELIMITER ; The function checks if the UUID values passed exist in the uuid_to_id table, and if it does it returns the matching id value otherwise it inserts the UUID value and returns the last_insert_id. To protect against the concurrent submission of the same UUID values, I added a database lock. The database lock limits the scalability of the solution. If your application cannot submit twice the request over a very short time frame, the lock could be removed. I have also another version of the function with no lock calls and using a small dedup table where recent rows are kept for only a few seconds. See my github if you are interested. Results for the Alternate Approaches Now, let’s have a look at the insertion rates using these alternate approaches. Insertion on tables using UUID values as primary keys, alternative solutions The pseudo-order results are great. Here I modified the algorithm to keep the UUID prefix constant for one minute instead of one week in order to better fit the test environment. Even if the pseudo-order solution performs well, keep in mind it is still bloating the schema and overall the performance gains may not be that great. The mapping to integer values, although the insert rates are smaller due to the additional DMLs required, decouples the schema from the UUID values. The tables now use integers as primary keys. This mapping removes nearly all the scalability concerns of using UUID values. Still, even on a small VM with limited CPU and IOPS, the UUID mapping technique yields nearly 4000 inserts/s. Put into context, this means 14M rows per hour, 345M rows per day and 126B rows per year. Such rates likely fit most requirements. The only growth limitation factor is the size of the hash index. When the hash index will be too large to fit in the buffer pool, performance will start to decrease. Other Options than UUID Values? Of course, there are other possibilities to generate unique IDs.  The method used by the MySQL function UUID_SHORT() is interesting. A remote device like a smartphone could use the UTC time instead of the server uptime. Here’s a proposal: (Seconds since January 1st 1970) << 32 + (lower 2 bytes of the wifi MAC address) << 16 + 16_bits_unsigned_int++; The 16 bits counter should be initialized at a random value and allowed to roll over. The odds of two devices producing the same ID are very small. It has to happen at approximately the same time, both devices must have the same lower bytes for the MAC and their 16 bits counter at the same increment. Notes All the data related to this post can be found in my github. [Less]
Posted over 4 years ago by Tony Darnell
In a couple previous posts, I explained how to get an “Always Free” Oracle Cloud compute instance and how to install MySQL on it – as well as how to add a web server. I started my IT career (way back in 1989) using a (dumb) terminal and a 2400-baud ... [More] modem to access a server. While I still use a terminal window and the command-line, it is always nice to have access to a GUI. In this post, I will show you how to install and use a GUI on your Oracle Cloud compute instance so you can use a Virtual Network Computing (VNC) application to connect to your “Always Free” (or not-free) Oracle Cloud compute instance. VNC is a graphical desktop-sharing system that uses the Remote Frame Buffer protocol to remotely control another computer. In other words, it is (almost) like having a monitor connected to your compute instance. Installing everything you need should take about twenty minutes (only because one yum install takes 13-15 minutes). First, you will need to create your “Always Free” Oracle Cloud account, and at least one free compute instance. (Of course, this will also work on a paid compute instance.) If you need help creating your free compute instance, you can follow the instructions in the first part of this post (installing MySQL is optional). Once you have your compute instance ready to go, or if you already have an compute instance running, you can continue with this post. VNC Viewer I am using a Mac, so I can use the Screen Sharing application that comes with the operating system (OS). If you don’t have a Mac, you will need to find a VNC application for your OS. I have also used the free (non-commercial-use only) version of VNC Connect from RealVNC, but you will need to buy a copy of you are using it for work. But there are several free ones available, such as TeamViewer, TightVNC and TigerVNC. If you don’t use a Mac, I won’t be able to show you how to install or setup the VNC viewer you decide to use, but it should be easy to do. Whichever VNC app you choose should provide you with instructions. You should only have to input localhost and the port number of 5901. Installing what you need on your compute instance Login to your compute instance. When I created my compute instance, I chose to install Oracle Linux. These instructions should work for any other flavor of Linux, but if not, you can look for the similar packages for your OS and you might have to modify a few things. Change your directory to the yum repo directory, and then download the yum repo file from yum.oracle.com for your version of Oracle Linux. $ cd /etc/yum.repos.d $ sudo wget http://yum.oracle.com/public-yum-ol7.repo --2019-11-20 00:01:31-- http://yum.oracle.com/public-yum-ol7.repo Resolving yum.oracle.com (yum.oracle.com)... 69.192.108.102 Connecting to yum.oracle.com (yum.oracle.com)|69.192.108.102|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 16402 (16K) [text/plain] Saving to: ‘public-yum-ol7.repo’ 100%[=======================================>] 16,402 --.-K/s in 0s 2019-11-20 00:01:31 (412 MB/s) - ‘public-yum-ol7.repo’ saved [16402/16402] Next, install the GNOME desktop via yum. This installation is 678 megabytes in size, and it will take about 13-15 minutes. You can remove the -y option to your yum command if you want to answer “yes” to the single installation question of “Is this ok?”. Note: Normally I would post the entire output from a command, but the output is over 6,000 lines long. I will replace the majority of the screen output with three dots (…). $ sudo yum -y groups install "Server with GUI" --skip-broken Loaded plugins: langpacks, ulninfo Repository ol7_latest is listed more than once in the configuration ... Transaction Summary ================================================== Install 209 Packages (+659 Dependent packages) Upgrade ( 3 Dependent packages) Total download size: 678 M Is this ok [y/d/N]: y Downloading packages: ... Complete! Install the TigerVNC server. (I will suppress most of this output as well) $ sudo yum -y install tigervnc-server Loaded plugins: langpacks, ulninfo ... Resolving Dependencies --> Running transaction check ---> Package tigervnc-server.x86_64 0:1.8.0-17.0.1.el7 will be installed --> Finished Dependency Resolution Dependencies Resolved ==================================================================== Package Arch Version Repository Size ==================================================================== Installing: tigervnc-server x86_64 1.8.0-17.0.1.el7 ol7_latest 215 k Transaction Summary ==================================================================== Install 1 Package Total download size: 215 k Installed size: 509 k Downloading packages: tigervnc-server-1.8.0-17.0.1.el7.x86_64.rpm | 215 kB 00:00:00 Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : tigervnc-server-1.8.0-17.0.1.el7.x86_64 1/1 Verifying : tigervnc-server-1.8.0-17.0.1.el7.x86_64 1/1 Installed: tigervnc-server.x86_64 0:1.8.0-17.0.1.el7 Complete! Note: I believe the display of duplicate messages “Repository xxx is listed more than once in the configuration” is a bug in yum. You can ignore these messages. Configure the VNC server You will be required to provide a password that you will need to remember to be able to access this server via VNC. You can also enter a “view-only password” if you want someone to be able to connect to the server, but you don’t want them to be able to control anything (they can only view the screen). I skipped this option. $ vncserver You will require a password to access your desktops. Password: Verify: Would you like to enter a view-only password (y/n)? n A view-only password is not used xauth: file /home/opc/.Xauthority does not exist New 'instance-20191113-1544:1 (opc)' desktop is instance-20191113-1544:1 Creating default startup script /home/opc/.vnc/xstartup Creating default config /home/opc/.vnc/config Starting applications specified in /home/opc/.vnc/xstartup Log file is /home/opc/.vnc/instance-20191113-1544:1.log SSH Tunnel for VNC I am going to create a tunnel for VNC through SSH, so I can punch through the VNC port, and also so I will be sending all of the data through an encrypted tunnel. Note: There is an alternate way to access your compute instance via VNC by creating an Instance Console Connection, but it only provides you with a console connection (non-GUI). If you want to do that, instructions are available via this blog. In a terminal window, issue the following command, with your public IP address at the end. This will create the SSH tunnel for you to use for VNC. $ ssh -L 5901:localhost:5901 [email protected] Now you are ready to connect to the instance using VNC. For the Mac, I open the Screen Sharing application, click on the menu option “Connection” then down to “New”. In the “Connect to” box, I enter localhost:5901 and press “Connect”. And then enter the password you used when you ran the vncserver command earlier. If you are using another VNC viewer, simply enter localhost:5901, or you might have to enter localhost and the port 5901 in separate fields. Then, just open the connection with your VNC application, and you should be see the Oracle Linux GUI appear:   Creating a security rule for multiple connections I don’t have to create a stateless security rule for port 5901. But, there is one issue you will have if you want to open multiple VNC windows at a time. By using the localhost:5901 SSH tunnel, you can only open one VNC window at a time on port 5901 on your local computer. But, you can use port forwarding on your Oracle Cloud compute instances (via the security list) so you can connect to multiple compute instances at a time from your local computer. To do this, create a Security List Ingress Rule so that your compute instance will accept another incoming port – for example, port 5902 – but you direct the connection to port 5901. You will need to create a stateless security rule to allow ingress traffic on port 5902. From the Oracle Cloud menu (top left of your screen), go down to Networking and over to Virtual Cloud Networks. You will be presented with a list of the Virtual Cloud Networks (VCN) you have already created, and if you are doing this from the beginning, you should only have one VCN listed. Click on the VCN name that begins with VirtualCloudNetwork. On the left, you will see a menu like this. Click on “Security Lists”: To the right of the above menu, you will be see a list of the security lists you have already created, and if you are doing this from the beginning, you should only have one security list available. Click on the security list name that begins with Default Security List for VirtualCloudNetwork – where the VirtualCloudNetwork name matches your VirtualCloudNetwork name. You are going to need to add an Ingress Rule, so click on the “Add Ingress Rules” button: Fill out the form like this, and then click on “Add Ingress Rules”. Note: You do not want to click on the “Stateless” box. A stateless rule means that you will also need to create an egress rule for the outbound port 5901 traffic. If you leave this unchecked, the rule that is created will be a “stateful” rule, which means that if you allow inbound traffic on port 5902, outbound traffic is also automatically allowed via the redirect on port 5902. From Oracle’s documentation: “Marking a security rule as stateful indicates that you want to use connection tracking for any traffic that matches that rule. This means that when an instance receives traffic matching the stateful ingress rule, the response is tracked and automatically allowed back to the originating host, regardless of any egress rules applicable to the instance. And when an instance sends traffic that matches a stateful egress rule, the incoming response is automatically allowed, regardless of any ingress rules. For more details, see Connection Tracking Details for Stateful Rules. To use port 5902 – which is redirected to port 5901, your security list should look like this: Then, in a terminal window, you will need to use this command to open the SSH tunnel, where the outgoing port is 5902, and the destination/incoming port is 5901. $ ssh -L 5902:localhost:5901 [email protected] This syntax follows the ssh man page: -L [bind_address:]port:host:hostport. Now you know how to use VNC to connect to your Oracle Compute Cloud Instance.   Tony Darnell is a Principal Sales Consultant for MySQL, a division of Oracle, Inc. MySQL is the world’s most popular open-source database program. Tony may be reached at info [at] ScriptingMySQL.com and on LinkedIn. Tony is the author of Twenty Forty-Four: The League of Patriots Visit http://2044thebook.com for more information. Tony is the editor/illustrator for NASA Graphics Standards Manual Remastered Edition Visit https://amzn.to/2oPFLI0 for more information. [Less]