I Use This!
Very High Activity

News

Analyzed 28 days ago. based on code collected about 1 month ago.
Posted 4 days ago by MySQL Performance Blog
In this blog post, we’ll look at how ClickHouse performs in a general analytical workload using the star schema benchmark test. We have mentioned ClickHouse in some recent posts (ClickHouse: New Open Source Columnar Database, Column Store Database ... [More] Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark), where it showed excellent results. ClickHouse by itself seems to be event-oriented RDBMS, as its name suggests (clicks). Its primary purpose, using Yandex Metrica (the system similar to Google Analytics), also points to an event-based nature. We also can see there is a requirement for date-stamped columns. It is possible, however, to use ClickHouse in a general analytical workload. This blog post shares my findings. For these tests, I used a Star Schema benchmark — slightly-modified so that able to handle ClickHouse specifics. First, let’s talk about schemas. We need to adjust to ClickHouse data types. For example, the biggest fact table in SSB is “lineorder”. Below is how it is defined for Amazon RedShift (as taken from https://docs.aws.amazon.com/redshift/latest/dg/tutorial-tuning-tables-create-test-data.html): CREATE TABLE lineorder ( lo_orderkey INTEGER NOT NULL, lo_linenumber INTEGER NOT NULL, lo_custkey INTEGER NOT NULL, lo_partkey INTEGER NOT NULL, lo_suppkey INTEGER NOT NULL, lo_orderdate INTEGER NOT NULL, lo_orderpriority VARCHAR(15) NOT NULL, lo_shippriority VARCHAR(1) NOT NULL, lo_quantity INTEGER NOT NULL, lo_extendedprice INTEGER NOT NULL, lo_ordertotalprice INTEGER NOT NULL, lo_discount INTEGER NOT NULL, lo_revenue INTEGER NOT NULL, lo_supplycost INTEGER NOT NULL, lo_tax INTEGER NOT NULL, lo_commitdate INTEGER NOT NULL, lo_shipmode VARCHAR(10) NOT NULL ); For ClickHouse, the table definition looks like this: CREATE TABLE lineorderfull ( LO_ORDERKEY UInt32, LO_LINENUMBER UInt8, LO_CUSTKEY UInt32, LO_PARTKEY UInt32, LO_SUPPKEY UInt32, LO_ORDERDATE Date, LO_ORDERPRIORITY String, LO_SHIPPRIORITY UInt8, LO_QUANTITY UInt8, LO_EXTENDEDPRICE UInt32, LO_ORDTOTALPRICE UInt32, LO_DISCOUNT UInt8, LO_REVENUE UInt32, LO_SUPPLYCOST UInt32, LO_TAX UInt8, LO_COMMITDATE Date, LO_SHIPMODE String )Engine=MergeTree(LO_ORDERDATE,(LO_ORDERKEY,LO_LINENUMBER),8192); From this we can see we need to use datatypes like UInt8 and UInt32, which are somewhat unusual for database world datatypes. The second table (RedShift definition): CREATE TABLE customer ( c_custkey INTEGER NOT NULL, c_name VARCHAR(25) NOT NULL, c_address VARCHAR(25) NOT NULL, c_city VARCHAR(10) NOT NULL, c_nation VARCHAR(15) NOT NULL, c_region VARCHAR(12) NOT NULL, c_phone VARCHAR(15) NOT NULL, c_mktsegment VARCHAR(10) NOT NULL ); For ClickHouse, I defined as: CREATE TABLE customerfull ( C_CUSTKEY UInt32, C_NAME String, C_ADDRESS String, C_CITY String, C_NATION String, C_REGION String, C_PHONE String, C_MKTSEGMENT String, C_FAKEDATE Date )Engine=MergeTree(C_FAKEDATE,(C_CUSTKEY),8192); For reference, the full schema for the benchmark is here: https://github.com/vadimtk/ssb-clickhouse/blob/master/create.sql. For this table, we need to define a rudimentary column C_FAKEDATE Date in order to use ClickHouse’s most advanced engine (MergeTree). I was told by the ClickHouse team that they plan to remove this limitation in the future. To generate data acceptable by ClickHouse, I made modifications to ssb-dbgen. You can find my version here: https://github.com/vadimtk/ssb-dbgen. The most notable change is that ClickHouse can’t accept dates in CSV files formatted as “19971125”. It has to be “1997-11-25”. This is something to keep in mind when loading data into ClickHouse. It is possible to do some preformating on the load, but I don’t have experience with that. A common approach is to create the staging table with datatypes that match loaded data, and then convert them using SQL functions when inserting to the main table. Hardware Setup One of the goals of this benchmark to see how ClickHouse scales on multiple nodes. I used a setup of one node, and then compared to a setup of three nodes. Each node is 24 cores of “Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz” CPUs, and the data is located on a very fast PCIe Flash storage. For the SSB benchmark I use a scale factor of 2500, which provides (in raw data): Table lineorder – 15 bln rows, raw size 1.7TB, Table customer – 75 mln rows When loaded into ClickHouse, the table lineorder takes 464GB, which corresponds to a 3.7x compression ratio. We compare a one-node (table names lineorderfull, customerfull) setup vs. a three-node (table names lineorderd, customerd) setup. Single Table Operations Query: SELECT toYear(LO_ORDERDATE) AS yod, sum(LO_REVENUE) FROM lineorderfull GROUP BY yod One node: 7 rows in set. Elapsed: 9.741 sec. Processed 15.00 billion rows, 90.00 GB (1.54 billion rows/s., 9.24 GB/s.) Three nodes: 7 rows in set. Elapsed: 3.258 sec. Processed 15.00 billion rows, 90.00 GB (4.60 billion rows/s., 27.63 GB/s.) We see a speed up of practically three times. Handling 4.6 billion rows/s is blazingly fast! One Table with Filtering SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorderfull WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25) One node: 1 rows in set. Elapsed: 3.175 sec. Processed 2.28 billion rows, 18.20 GB (716.60 million rows/s., 5.73 GB/s.) Three nodes: 1 rows in set. Elapsed: 1.295 sec. Processed 2.28 billion rows, 18.20 GB (1.76 billion rows/s., 14.06 GB/s.) It’s worth mentioning that during the execution of this query, ClickHouse was able to use ALL 24 cores on each box. This confirms that ClickHouse is a massively parallel processing system. Two Tables (Independent Subquery) In this case, I want to show how Clickhouse handles independent subqueries: SELECT sum(LO_REVENUE) FROM lineorderfull WHERE LO_CUSTKEY IN ( SELECT C_CUSTKEY AS LO_CUSTKEY FROM customerfull WHERE C_REGION = 'ASIA' ) One node: 1 rows in set. Elapsed: 28.934 sec. Processed 15.00 billion rows, 120.00 GB (518.43 million rows/s., 4.15 GB/s.) Three nodes: 1 rows in set. Elapsed: 14.189 sec. Processed 15.12 billion rows, 121.67 GB (1.07 billion rows/s., 8.57 GB/s.) We  do not see, however, the close to 3x speedup on three nodes, because of the required data transfer to perform the match LO_CUSTKEY with C_CUSTKEY Two Tables JOIN With a subquery using columns to return results, or for GROUP BY, things get more complicated. In this case we want to GROUP BY the column from the second table. First, ClickHouse doesn’t support traditional subquery syntax, so we need to use JOIN. For JOINs, ClickHouse also strictly prescribes how it must be written (a limitation that will also get changed in the future). Our JOIN should look like: SELECT C_REGION, sum(LO_EXTENDEDPRICE * LO_DISCOUNT) FROM lineorderfull ANY INNER JOIN ( SELECT C_REGION, C_CUSTKEY AS LO_CUSTKEY FROM customerfull ) USING (LO_CUSTKEY) WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25) GROUP BY C_REGION One node: 5 rows in set. Elapsed: 31.443 sec. Processed 2.35 billion rows, 28.79 GB (74.75 million rows/s., 915.65 MB/s.) Three nodes: 5 rows in set. Elapsed: 25.160 sec. Processed 2.58 billion rows, 33.25 GB (102.36 million rows/s., 1.32 GB/s.) In this case the speedup is not even two times. This corresponds to the fact of the random data distribution for the tables lineorderd and customerd. Both tables were defines as: CREATE TABLE lineorderd AS lineorder ENGINE = Distributed(3shards, default, lineorder, rand()); CREATE TABLE customerd AS customer ENGINE = Distributed(3shards, default, customer, rand()); Where  rand() defines that records are distributed randomly across three nodes. When we perform a JOIN by LO_CUSTKEY=C_CUSTKEY, records might be located on different nodes. One way to deal with this is to define data locally. For example: CREATE TABLE lineorderLD AS lineorderL ENGINE = Distributed(3shards, default, lineorderL, LO_CUSTKEY); CREATE TABLE customerLD AS customerL ENGINE = Distributed(3shards, default, customerL, C_CUSTKEY); Three Tables JOIN This is where it becomes very complicated. Let’s consider the query that you would normally write: SELECT sum(LO_REVENUE),P_MFGR, toYear(LO_ORDERDATE) yod FROM lineorderfull ,customerfull,partfull WHERE C_REGION = 'ASIA' and LO_CUSTKEY=C_CUSTKEY and P_PARTKEY=LO_PARTKEY GROUP BY P_MFGR,yod ORDER BY P_MFGR,yod; With Clickhouse’s limitations on JOINs syntax, the query becomes: SELECT sum(LO_REVENUE), P_MFGR, toYear(LO_ORDERDATE) AS yod FROM ( SELECT LO_PARTKEY, LO_ORDERDATE, LO_REVENUE FROM lineorderfull ALL INNER JOIN ( SELECT C_REGION, C_CUSTKEY AS LO_CUSTKEY FROM customerfull ) USING (LO_CUSTKEY) WHERE C_REGION = 'ASIA' ) ALL INNER JOIN ( SELECT P_MFGR, P_PARTKEY AS LO_PARTKEY FROM partfull ) USING (LO_PARTKEY) GROUP BY P_MFGR, yod ORDER BY P_MFGR ASC, yod ASC By writing queries this way, we force ClickHouse to use the prescribed JOIN order — at this moment there is no optimizer in ClickHouse and it is totally unaware of data distribution. There is also not much speedup when we compare one node vs. three nodes: One node execution time: 35 rows in set. Elapsed: 697.806 sec. Processed 15.08 billion rows, 211.53 GB (21.61 million rows/s., 303.14 MB/s.) Three nodes execution time: 35 rows in set. Elapsed: 622.536 sec. Processed 15.12 billion rows, 211.71 GB (24.29 million rows/s., 340.08 MB/s.) There is a way to make the query faster for this 3-way JOIN, however. (Thanks to Alexander Zaytsev from https://www.altinity.com/ for help!) Optimized query: SELECT sum(revenue), P_MFGR, yod FROM ( SELECT LO_PARTKEY AS P_PARTKEY, toYear(LO_ORDERDATE) AS yod, SUM(LO_REVENUE) AS revenue FROM lineorderfull WHERE LO_CUSTKEY IN ( SELECT C_CUSTKEY FROM customerfull WHERE C_REGION = 'ASIA' ) GROUP BY P_PARTKEY, yod ) ANY INNER JOIN partfull USING (P_PARTKEY) GROUP BY P_MFGR, yod ORDER BY P_MFGR ASC, yod ASC Optimized query time: One node: 35 rows in set. Elapsed: 106.732 sec. Processed 15.00 billion rows, 210.05 GB (140.56 million rows/s., 1.97 GB/s.) Three nodes: 35 rows in set. Elapsed: 75.854 sec. Processed 15.12 billion rows, 211.71 GB (199.36 million rows/s., 2.79 GB/s. That’s an improvement of about 6.5 times compared to the original query. This shows the importance of understanding data distribution, and writing the optimal query to process the data. Another option for dealing with JOIN complexity, and to improve performance, is to use ClickHouse’s dictionaries. These dictionaries are described here: https://www.altinity.com/blog/2017/4/12/dictionaries-explained. I will review dictionary performance in future posts. Another traditional way to deal with JOIN complexity in an analytics workload is to use denormalization. We can move some columns (for example, P_MFGR from the last query) to the facts table (lineorder). Observations ClickHouse can handle general analytical queries (it requires special schema design and considerations, however) Linear speedup is possible, but it depends on query design and requires advanced planning — proper speedup depends on data locality ClickHouse is blazingly fast (beyond what I’ve seen before) because it can use all available CPU cores for query, as shown above using 24 cores for single server and 72 cores for three nodes Multi-table JOINs are cumbersome and require manual work to achieve better performance, so consider using dictionaries or denormalization [Less]
Posted 4 days ago by Yngve Svendsen
Debian logo by Software in the Public Interest, Inc. (CC-BY-SA) MySQL and Debian 9 “Stretch” Debian 9 (“Stretch”) was released on June 17, and we congratulate the Debian community on another iteration of a central Linux distro. Now, the Debian release team some time ago decided not to ship MySQL as part of the distro, instead replacing it […]
Posted 5 days ago by RoseHosting
LAMP (Linux, Apache, MySQL, PHP) is a combination of open source software, typically installed on a server used for hosting dynamic websites and web applications. It includes the Linux operating system, the Apache web server, MySQL for data storage ... [More] and management and PHP for handling the dynamic content. In this tutorial, we will go through a complete LAMP installation on a Ubuntu 16.04 based VPS. Requirements SSH access with root privileges to your server, as well as an Ubuntu 16.04 VPS is required for you to be able to complete all the steps in this tutorial. 1. Update your system […] [Less]
Posted 5 days ago by Nilnandan Joshi
Recently, I had to do this as 10GB was not sufficient for Centos6.7 VM. I would just like to mention simple steps for how to do it. Shutdown VM and quit from VirtualBox. Open the terminal and go to the below path (default location). From here you ... [More] can run the command to resize the VDI file. Nils-Mac:MacOS nilnandan$ pwd /Applications/VirtualBox.app/Contents/Resources/VirtualBoxVM.app/Contents/MacOS Nils-Mac:MacOS nilnandan$ Start another terminal with command + T and find the directory where actually the VDI file stored. Default location would be , Nils-Mac:centos67-mysql nilnandan$ pwd /Users/nilnandan/VirtualBox VMs/centos67-mysql Nils-Mac:centos67-mysql nilnandan$ ls -alh total 21925920 drwx------   7 nilnandan  staff   238B Aug 11 18:15 . drwx------  12 nilnandan  staff   408B Aug 11 17:50 .. drwx------   6 nilnandan  staff   204B Aug 11 17:50 Logs drwx------   2 nilnandan  staff    68B Aug 11 17:47 Snapshots -rw-------   1 nilnandan  staff    10G Aug 11 18:20 centos67-3.vdi -rw-------   1 nilnandan  staff   7.7K Aug 11 18:15 centos67-mysql.vbox -rw-------   1 nilnandan  staff   7.7K Aug 11 17:50 centos67-mysql.vbox-prev Nils-Mac:centos67-mysql nilnandan$ Now from the location of step 2, run the command to resize it. I’m going to increase size from 10GB to 25GB Nils-Mac:MacOS nilnandan$ VBoxManage modifyhd --resize 25600 /Users/nilnandan/VirtualBox VMs/centos67-mysql/centos67-3.vdi Oracle VM VirtualBox Command Line Management Interface Version 5.0.12 (C) 2005-2015 Oracle Corporation All rights reserved. Usage: VBoxManage modifymedium [disk|dvd|floppy] [--type normal|writethrough|immutable|shareable| readonly|multiattach] [--autoreset on|off] [--property ] [--compact] [--resize |--resizebyte ] Syntax error: Invalid parameter 'VMs/centos67-mysql/centos67-3.vdi' Nils-Mac:MacOS nilnandan$ Here, I got the error because there is a space between VirtualBox and VMs so I have to use escape character to solve this, Nils-Mac:MacOS nilnandan$ VBoxManage modifyhd --resize 25600 /Users/nilnandan/VirtualBox\ VMs/centos67-mysql/centos67-3.vdi 0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100% Nils-Mac:MacOS nilnandan$ Now, when you’ll check the size, it will be Nils-Mac:MacOS nilnandan$ VBoxManage showhdinfo /Users/nilnandan/VirtualBox\ VMs/centos67-mysql/centos67-3.vdi ... Location:       /Users/nilnandan/VirtualBox VMs/centos67-mysql/centos67-3.vdi Storage format: VDI Format variant: dynamic default Capacity:       25600 MBytes Size on disk:   10706 MBytes Encryption:     disabled ... Nils-Mac:MacOS nilnandan$ It’s done. Just open the VirtualBox and start VM   [Less]
Posted 5 days ago by Mark Callaghan
I used Linux perf to get more details on system performance while running the point-query test with modern sysbench. This is for an in-memory workload and part of my series on low-concurrency CPU regressions for bug 86215.tl;dr I have more explaining ... [More] to do The increase in instructions/query explains the decrease in queries/second from MySQL 5.6 to 5.7 to 8.0. ConfigurationI tested MySQL using upstream 5.6.35, 5.7.17 and 8.0.1. For 8.0.1 I used the latin1 charset and latin1_swedish_ci collation. My servers are described here and are named i3 NUC and i5 NUC.My usage of sysbench, including the tests run, and the my.cnf used for each database engine is described here for MySQL 5.6, 5.7 and 8. The my.cnf files I share were for the i3 NUC. For the i5 NUC the InnoDB buffer pool and IO capacity options were increased using these values. The my.cnf files are here for 5.0, 5.1 and 5.5. I use the same server for mysqld and the sysbench clients. The binlog is enabled but sync-on-commit is disabled. Sysbench is run with 4 tables and 1M rows per table. After loading the tables via the sysbench prepare option I ran the update-index test for a few minutes and then ran point-query with 1 client. After letting point-query warm up for 60 seconds I used this script to collect data from Linux perf. Output from Linux perf is here: MySQL 5.6.35 for the i3 NUC and i5 NUC MySQL 5.7.17 for the i3 NUC and i5 NUC MySQL 8.0.1 for the i3 NUC and i5 NUC ResultsThe charts below show the queries/second and instructions/query for each server/release with the point-query sysbench test and 1 client. For the i3 NUC the regression is from MySQL 5.7.17 to 8.0.1. For the i5 NUC it is from 5.6.35 to 5.7.17. Hopefully I will explain why the results are different between the i3 and i5 NUC servers. I am still collecting results from the CPUs I use at work and they don't match what I report here. So I have some explaining to do.But it is clear to me that the increase in instructions/query explains the decrease in queries/second. Note that I run the sysbench client on the same host as mysqld so the instructions/query overhead includes the sysbench client and the real regression from MySQL 5.6 to 8.0 would be larger were that excluded. Bar [Less]
Posted 5 days ago by MariaDB
Secure Binlog Server: Encrypted binary Logs and SSL Communication massimiliano_pinto_g Wed, 06/21/2017 - 19:23 The 2.1.3 GA release of MariaDB MaxScale, introduces the following key features for the secure setup of MaxScale Binlog ... [More] Server: The binlog cache files in the MaxScale host can now be encrypted. MaxScale binlog server also uses SSL in communication with the master and the slave servers. The MaxScale binlog server can optionally encrypt the events received from the master server: the setup requires a MariaDB (from 10.1.7) master server with encryption active and the mariadb10-compatibility=On option set in maxscale.cnf. This way both master and MaxScale will have encrypted events stored in the binlog files. How does the Binary log encryption work in MariaDB Server and in MaxScale? Let’s look at MariaDB Server 10.1 or 10.2 implementation first. The encryption is related to stored events and not to their transmission over the network: SSL must be enabled in order to secure the network layer. Each binlog file holds a new special Binlog Event, type 0xa4 (164), called START_ENCRIPTION event: it's written to disk but never sent to connected slave servers. Each binlog event is encrypted/decrypted using an AES Key and an Initialization Vector (IV) built from the "nonce" data in START_ENCRIPTION event and the current event position in the file. The encrypted event has the same size as that of a non encrypted event. Let’s start with MariaDB Server 10.1.7 configuration: my.cnf. [mysqld] … encrypt-binlog=1 plugin-load-add=file_key_management.so file_key_management_encryption_algorithm=aes_cbc file_key_management_filename = /some_path/keys.txt Binlog files can be optionally encrypted by specifying  encrypt-binlog=1 The encryption key facility has to be added as well. The easiest one to setup is the Key File: plugin-load-add=file_key_management.so With its key file: file_key_management_filename = /some_path/keys.txt The keys listed in the key file can be specified per table but the system XtraDB/InnoDB tablespace and binary log files always use the key number 1, so it must always exists. The last parameter is the AES encryption algorithm. AES_CBC (default) or AES_CTR. file_key_management_encryption_algorithm=aes_cbc | aes_ctr There is another key management solution (Eperi Gateway for Databases), which allows storing keys in a key server, preventing an attacker with file system access from unauthorized database file reading. For additional information, please read more about MariaDB data-at-rest encryption. MariaDB MaxScale Implementation: The implementation follows the same MariaDB Server implementation above. MaxScale adds its own START_ENCRIPTION event (which has same size but different content from the master one). The encryption algorithm can be selected: AES_CBC or AES_CTR The event size on disk is the same as the “clear” event. Only key file management is currently supported: no key rotation. A closer look at the new event might be interesting for most readers: START_ENCRIPTION size is 36 or 40 bytes in size depending on CRC32 being used or not: Replication header 19 bytes + 1 byte encryption schema           // 1 is for system files 4 bytes Encryption Key Version  // It allows key rotation 12 bytes NONCE random bytes // first part of IV As the saved event on disk has the same size of the clear event, the event size is in “clear” in the replication header. Moving some bytes in the replication event header is required in order to encrypt/decrypt the event header and content. MariaDB MaxScale configuration Encrypt_binlog = On|Off Encryption_algorithm = aes_ctr OR aes_cbc Encryption_key_file =/maxscale_path/enc_key.txt The specified key file must have this format: a line with 1;HEX(KEY) Id is the scheme identifier, which must have the value 1 for binlog encryption, the ';' is a separator and HEX(KEY) contains the hex representation of the KEY. The KEY must have exact 16, 24 or 32 bytes size and the selected algorithm (aes_ctr or aes_cbc) with 128, 192 or 256 ciphers will be used. Note: the key file has the same format as MariaDB Server 10.1 so it's possible to use an existing key file (not encrypted) which could contain several scheme; keys: only key id with value 1 will be parsed, and if not found, an error will be reported. Example: # # This is the Encryption Key File # key id 1 is for binlog files encryption: it's mandatory # The keys come from a 32bytes value, 64 bytes with HEX format # 2;abcdef1234567890abcdef12345678901234567890abcdefabcdef1234567890 1;5132bbabcde33ffffff12345ffffaaabbbbbbaacccddeee11299000111992aaa 3;bbbbbbbbbaaaaaaabbbbbccccceeeddddd3333333ddddaaaaffffffeeeeecccd See all the encryption options in “router options” [BinlogServer] type=service router=binlogrouter version_string=10.1.17-log router_options=server-id=93,mariadb10-compatibility=On,encrypt_binlog=On,encryption_key_file=/home/maxscale/binlog/keys.txt,encryption_algorithm=aes_cbc Enable SSL communication to Master and from connecting Slaves We have seen how to secure data-at-rest and we would like to show how to secure all the components in the Binlog Server setup: Master MaxScale Slaves Network connections SSL between the Master and MaxScale is required, along with SSL communications with the slave servers. How to use SSL in the master connection? We have to add some new options to CHANGE MASTER TO command issued to MySQL Listener of Binlog Server, the options are used in the following examples: All options. MySQL [(none)]> CHANGE MASTER TO MASTER_SSL = 1, MASTER_SSL_CERT='/home/maxscale/packages/certificates/client/client-cert.pem', MASTER_SSL_CA='/home/maxscale/packages/certificates/client/ca.pem', MASTER_SSL_KEY='/home/maxscale/packages/certificates/client/client-key.pem', MASTER_TLS_VERSION='TLSv12'; Or just use some of them: MySQL [(none)]> CHANGE MASTER TO MASTER_TLS_VERSION='TLSv12'; MySQL [(none)]> CHANGE MASTER TO MASTER_SSL = 0; Some constraints: In order to enable/re-enable Master SSL communication the MASTER_SSL=1 option is required and all certificate options must be explicitly set in the same CHANGE MASTER TO command. New certificate options changes take effect after MaxScale restart or after MASTER_SSL=1 with the new options. MySQL> SHOW SLAVE STATUS\G Master_SSL_Allowed: Yes Master_SSL_CA_File: /home/mpinto/packages/certificates/client/ca.pem Master_SSL_CA_Path: Master_SSL_Cert: /home/mpinto/packages/certificates/client/client-cert.pem Master_SSL_Cipher: Master_SSL_Key: /home/mpinto/packages/certificates/client/client-key.pem Setting Up Binlog Server Listener for Slave Server [Binlog_server_listener] type=listener service=BinlogServer protocol=MySQLClient address=192.168.100.10 port=8808 authenticator=MySQL ssl=required ssl_cert=/usr/local/mariadb/maxscale/ssl/crt.maxscale.pem ssl_key=/usr/local/mariadb/maxscale/ssl/key.csr.maxscale.pem ssl_ca_cert=/usr/local/mariadb/maxscale/ssl/crt.ca.maxscale.pem ssl_version=TLSv12 The final step is for each slave that connects to Binlog Server: The slave itself should connect to MaxScale using SSL options in the CHANGE MASTER TO … SQL command The setup is done! Encryption for data-at-rest and for data-in-transit is now complete! DBA Developer How to MariaDB Releases MaxScale Replication Security The 2.1.3 GA release of MariaDB MaxScale, introduces the following key features for the secure setup of MariaDB MaxScale Binlog Server: The binlog cache files in the MaxScale host can now be encrypted. MaxScale binlog server also uses SSL in communication with the master and the slave servers. This blog covers how the binary log encryption works in MariaDB Server and in MariaDB MaxScale. Login or Register to post comments [Less]
Posted 5 days ago by Mike Frank
Often databases contain data that needs to be proven as valid and authentic. We want to ensure that a known person or other sender (e.g. a trusted app) of the information can’t deny content, nor that the content can change without that person (senders) consent.…
Posted 6 days ago by Severalnines
While the idea of containers have been around since the early days of Unix, Docker made waves in 2013 when it hit the market with its innovative solution. What began as an open source project, Docker allows you to add your stacks and ... [More] applications to containers where they share a common operating system kernel. This lets you have a lightweight virtualized system with almost zero overhead. Docker also lets you bring up or down containers in seconds, making for rapid deployment of your stack. Severalnines, like many other companies, got excited early on about Docker and began experimenting and developing ways to deploy advanced open source database configurations using Docker containers. We also released, early on, a docker image of ClusterControl that lets you utilize the management and monitoring functionalities of ClusterControl with your existing database deployments. Here are just some of the great resources we’ve developed for Docker over the last few years... Severalnines on Docker Hub In addition to the ClusterControl Docker Image, we have also provided a series of images to help you get started on Docker with other open source database technologies like Percona XtraDB Cluster and MariaDB. Check Out the Docker Images ClusterControl on Docker Documentation For detailed instructions on how to install ClusterControl utilizing the Docker Image click on the link below. Read More Top Blogs MySQL on Docker: Running Galera Cluster on Kubernetes In our previous posts, we showed how one can run Galera Cluster on Docker Swarm, and discussed some of the limitations with regards to production environments. Kubernetes is widely used as orchestration tool, and we’ll see whether we can leverage it to achieve production-grade Galera Cluster on Docker. Read More MySQL on Docker: Swarm Mode Limitations for Galera Cluster in Production Setups This blog post explains some of the Docker Swarm Mode limitations in handling Galera Cluster natively in production environments. Read More MySQL on Docker: Composing the Stack Docker 1.13 introduces a long-awaited feature called Compose-file support. Compose-file defines everything about an application - services, databases, volumes, networks, and dependencies can all be defined in one place. In this blog, we’ll show you how to use Compose-file to simplify the Docker deployment of MySQL containers. Read More MySQL on Docker: Deploy a Homogeneous Galera Cluster with etcd Our journey to make Galera Cluster run smoothly on Docker containers continues. Deploying Galera Cluster on Docker is tricky when using orchestration tools. With this blog, find out how to deploy a homogeneous Galera Cluster with etcd. Read More MySQL on Docker: Introduction to Docker Swarm Mode and Multi-Host Networking This blog post covers the basics of managing MySQL containers on top of Docker swarm mode and overlay network. Read More MySQL on Docker: Single Host Networking for MySQL Containers This blog covers the basics of how Docker handles single-host networking, and how MySQL containers can leverage that. Read More MySQL on Docker: Building the Container Image In this post, we will show you two ways how to build a MySQL Docker image - changing a base image and committing, or using Dockerfile. We’ll show you how to extend the Docker team’s MySQL image, and add Percona XtraBackup to it. Read More MySQL Docker Containers: Understanding the basics In this post, we will cover some basics around running MySQL in a Docker container. It walks you through how to properly fire up a MySQL container, change configuration parameters, how to connect to the container, and how the data is stored. Read More ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE ClusterControl on Docker ClusterControl provides advanced management and monitoring functionality to get your MySQL replication and clustered instances up-and-running using proven methodologies that you can depend on to work. Used in conjunction with other orchestration tools for deployment to the containers, ClusterControl makes managing your open source databases easy with point-and-click interfaces and no need to have specialized knowledge about the technology. ClusterControl delivers on an array of features to help manage and monitor your open source database environments: Management & Monitoring: ClusterControl provides management features to repair and recover broken nodes, as well as test and automate MySQL upgrades. Advanced Monitoring: ClusterControl provides a unified view of all MySQL nodes and clusters across all your data centers and lets you drill down into individual nodes for more detailed statistics. Automatic Failure Detection and Handling: ClusterControl takes care of your replication cluster’s health. If a master failure is detected, ClusterControl automatically promotes one of the available slaves to ensure your cluster is always up. Learn more about how ClusterControl can enhance performance here or pull the Docker Image here. We hope that these resources prove useful! Happy Clustering! Tags:  docker clustercontrol kubernetes docker swarm MySQL [Less]
Posted 6 days ago by MySQL Performance Blog
Join Percona’s Senior Operations Engineer, Daniel Kowalewski as he presents Deploying MySQL in Production on Thursday, June 22, 2017 at 11:00 am PDT / 2:00 pm EDT (UTC-7). Register Now  MySQL is famous for being something you can install and get ... [More] going in less than five minutes in terms of development. But normally you want to run MySQL in production, and at scale. This requires some planning and knowledge. So why not learn the best practices around installation, configuration, deployment and backup? This webinar is a soup-to-nuts talk that will have you going from zero to hero in no time. It includes discussion of the best practices for installation, configuration, taking backups, monitoring, etc. Register for the webinar here. Daniel Kowalewski, Senior Technical Operations Engineer Daniel has been designing and deploying solutions around MySQL for over ten years. He lives for those magic moments where response time drops by 90%, and loves adding more “nines” to everything. [Less]
Posted 6 days ago by Yves Trudeau
In this blog, we’ll look at different MySQL high availability options. The dynamic MySQL ecosystem is rapidly evolving many technologies built around MySQL. This is especially true for the technologies involved with the high availability (HA) ... [More] aspects of MySQL. When I joined Percona back in 2009, some of these HA technologies were very popular – but have since been almost forgotten. During the same interval, new technologies have emerged. In order to give some perspective to the reader, and hopefully help to make better choices, I’ll review the MySQL HA landscape as it is in 2017. This review will be in three parts. The first part (this post) will cover the technologies that have been around for a long time: the elders. The second part will focus on the technologies that are very popular today: the adults. Finally, the last part will try to extrapolate which technologies could become popular in the upcoming years: the babies. Quick disclaimer, I am reporting on the technologies I see the most. There are likely many other solutions not covered here, but I can’t talk about technologies I have barely or never used. Apart from the RDS-related technologies, all the technologies covered are open-source. The target audience for this post are people relatively new to MySQL. The Elders Let’s define the technologies in the elders group. These are technologies that anyone involved with MySQL for last ten years is sure to be aware of. I could have called this group the “classics”.  I include the following technologies in this group: Replication Shared storage NDB cluster Let’s review these technologies in the following sections. Replication Simple replication topology  MySQL replication is very well known. It is one of the main features behind the wide adoption of MySQL. Replication gets used almost everywhere. The reasons for that are numerous: Replication is simple to setup. There are tons of how-to guides and scripts available to add a slave to a MySQL server. With Amazon RDS, adding a slave is just a few clicks. Slaves allow you to easily scale reads. The slaves are accessible and can be used for reads. This is the most common way of scaling up a MySQL database. Slaves have little impact on the master. Apart from the added network traffic, the presence of slaves does not impact the master performance significantly. It is well known. No surprises here. Used for failover. Your master died, promote a slave and use it as your new master. Used for backups. You don’t want to overload your master with the backups, run them off a slave. Of course, replication also has some issues: Replication can lag. Replication used to be single-threaded. That means a master with a concurrent load could easily outpace a slave. MySQL 5.6 and MariaDB 10.0 have introduced some parallelism to the slave. Newer versions have further improved to a point where today’s slaves are many times faster than they were. Slaves can diverge. When you modify data on the master, the slave must perform the exact same update. That seems easy, but there are many ways an update can be non-deterministic with statement-based replication. They fixed many issues, and the introduction of row-based replication has been another big step forward. Still, if you write directly to a slave you are asking for trouble. There is a read_only setting, but if the MySQL user has the “SUPER” privilege it is just ignored. That’s why there is now the “super_read_only” setting. Tools like pt-table-checksum and pt-table-sync from the Percona toolkit exist to solve this problem. Replication can impact the master. I wrote above that the presence of slaves does not affect the master, but logging changes are more problematic. The most common issue is the InnoDB table-level locking for auto_increment values with statement-based replication. Only one thread can insert new rows at a time. You can avoid this issue with row-based replication and properly configuring settings. Data gets lost. Replication is asynchronous. That means the master will reply “done” after a commit statement even though the slaves have not received updates yet. Some transactions can get lost if the master crashes. Although an old technology, a lot of work has been done on replication. It is miles away from the replication implementation of 5.0.x. Here’s a list, likely incomplete, of the evolution of replication: Row based replication (since 5.1). The binary internal representation of the rows is sent instead of the SQL statements. This makes replication more robust against slave divergence. Global transaction ID (since 5.6). Transactions are uniquely identified. Replication can be setup without knowing the binlog file and offset. Checksum (since 5.6). Binlog events have checksum values to validate their integrity. Semi-sync replication (since 5.5). An addition to the replication protocol to make the master aware of the reception of events by the slaves. This helps to avoid losing data when a master crashes. Multi-source replication (since 5.7). Allows a slave to have more than one master. Multi-threaded replication (since 5.6). Allows a slave to use multiple threads. This helps to limit the slave lag. Managing replication is a tedious job. The community has written many tools to manage replication: MMM. An old Perl tool that used to be quite popular, but had many issues. Now rarely used. MHA. The most popular tool to manage replication. It excels at reconfiguring replication without losing data, and does a decent at handling failover.  It is also simple. No wonder it is popular. PRM. A Pacemaker-based solution developed to replace MMM. It’s quite good at failover, but not as good as MHA at reconfiguring replication. It’s also quite complex, thanks to Pacemaker. Not used much. Orchestrator. The new cool tool. It can manage complex topologies and has a nice web-based interface to monitor and control the topology.   Shared Storage Simple shared storage topology  Back when I was working for MySQL ten years ago, shared storage HA setups were very common. A shared storage HA cluster uses one copy of the database files between one of two servers. One server is active, the other one is passive. In order to be shared, the database files reside on a device that can be mounted by both servers. The device can be physical (like a SAN), or logical (like a Linux DRBD device). On top of that, you need a cluster manager (like Pacemaker) to handle the resources and failovers. This solution is very popular because it allows for failover without losing any transactions. The main drawback of this setup is the need for an idle standby server. The standby server cannot have any other assigned duties since it must always be ready to take over the MySQL server. A shared storage solution is also obviously not resilient to file-level corruption (but that situation is exceptional). Finally, it doesn’t play well with a cloud-based environment. Today, newly-deployed shared storage HA setups are rare. The only ones I encountered over the last year were either old implementations needing support, or new setups that deployed because of existing corporate technology stacks. That should tell you about the technology’s loss of popularity. NDB Cluster A simple NDB Cluster topology  An NDB Cluster is a distributed clustering solution that has been around for a long time. I personally started working with this technology back in 2008. An NDB Cluster has three types of nodes: SQL, management and data. A full HA cluster requires a minimum of four nodes. An NDB Cluster is not a general purpose database due to its distributed nature. For suitable workloads, it is extraordinary good. For unsuitable workloads, it is miserable. A suitable workload for an NDB Cluster contains high concurrency, with a high rate of small primary key oriented transactions. Reaching one million trx/s on an NDB Cluster is nothing exceptional. At the other end of the spectrum, a poor workload for an NDB Cluster is a single-threaded report query on a star-like schema. I have seen some extreme cases where just the network time of a reporting query amounted to more than 20 minutes. Although NDB Clusters have improved, and are still improving, their usage has been pushed toward niche-type applications. Overall, the technology is losing ground and is now mostly used for Telcos and online gaming applications. [Less]