I Use This!
High Activity

News

Analyzed about 13 hours ago. based on code collected 1 day ago.
Posted over 5 years ago by Mark Callaghan
An LSM like RocksDB has much better write and space efficiency than a B-Tree. That means with RocksDB you will use less SSD than with a B-Tree and the SSD will either last longer or you can use lower endurance SSD. But this efficiency comes at a ... [More] cost. In the worst-case RocksDB might use 2X more CPU/query than a B-Tree, which means that in the worst case QPS with RocksDB might be half of what it is with a B-Tree. In the examples below I show that RocksDB can use ~2X more comparisons per query compared to a B-Tree and it is nice when results in practice can be explained by theory.But first I want to explain the context where this matters and where it doesn't matter. It matters when the CPU overhead from RocksDB is a significant fraction of the query response time -- so the workload needs to be CPU bound (cached working set). It isn't a problem for workloads that are IO-bound. Many performance results have been published on my blog and more are coming later this year. With fast storage devices available I recommend that database workloads strive to be IO-bound to avoid using too much memory+power.Basic OperationsWhere does the CPU overhead come from? I started to explain this in my review of the original LSM paper. My focus in this post is on SELECT statements in SQL, but note that writes in SQL also do reads from the storage engine (updates have a where clause, searches are done to find matching rows and appropriate leaf pages, queries might be done to determine whether constraints will be violated, etc).With RocksDB data can be found in the memtable and sorted runs from the L0 to Lmax (Lmax is the max/largest level in the LSM tree. There are 3 basic operations that are used to search for data - get, range seek and range next. Get searches for an exact match. Range seek positions iterators at the start of a range scan. Range next gets the next row from a range scan.First a few comments before I explain the CPU overhead: I will try to be clear when I refer to the keys for a SQL table (SQL key) and the key as used by RocksDB (RocksDB key). Many SQL indexes can be stored in the same LSM tree (column family) and they are distinguished by prepending the index ID as the first bytes in the RocksDB key. The prepended bytes aren't visible to the user. Get is used for exact match on a PK index but not used for exact match on a secondary index because with MyRocks the RocksDB key for a secondary index entry is made unique by appending the PK to it. A SQL SELECT that does an exact match on the secondary key searches for a prefix of the RocksDB key and that requires a range scan. With RocksDB a bloom filter can be on a prefix of the key or the entire key (prefix vs whole key). A prefix bloom filter can be used for range and point queries. A whole key bloom filter will only be used for point queries on a PK. A whole key bloom filter won't be used for range queries. Nor will it be used for point queries on a secondary index (because secondary index exact match uses a range scan internally). Today bloom filters are configured per column family. A good use for column families is to put secondary indexes into separate ones that are configured with prefix bloom filters. Eventually we expect to make this easier in RocksDB. Get A point query is evaluated for RocksDB by searching sorted runs in order: first the memtable, then runs in the L0, then the L1 and so on until the Lmax is reached. The search stops as soon as the key is found, whether from a tombstone or a key-value pair. The work done to search each sorted run varies. I use comparisons and bloom filter probes as the proxy for work. I ignore memory system behavior (cache misses) but assume that the skiplist isn't great in that regard: memtable - this is a skiplist and I assume the search cost is log2(N) when it has N entries.  L0 - for each sorted run first check the bloom filter and if the key might exist then do binary search on the block index and then do binary search within the data block. The cost is a bloom filter probe and possibly log2(M) comparisons when an L0 sorted run has M entries. L1 to Ln - each of these levels is range partitioned into many SST files so the first step is to do a binary search to find the SST that might contain the key, then check the bloom filter for that SST and if the key might exist then do binary search on the block index for that SST and then do binary search within the data block. The binary search cost to find the SST gets larger for the larger levels and that cost doesn't go away when bloom filters are used. For RocksDB with a 1T database, per-level fanout=8, SST size=32M then the number of SSTs per level is 8 in L1, 64 in L2, 512 in L3, 4096 in L4 and 32768 in L5. The number of comparisons before the bloom filter check are 3 for L1, 6 for L2, 9 for L3 and 12 for L4. I assume there is no bloom filter on L5 because it is the max level. A bloom filter check isn't free. Fortunately, RocksDB makes the cost less than I expected by limiting the bits that are set for a key, and must be checked on a probe, to one cache line. See the code in AddHash. For now I assume that the cost of a bloom filter probe is equivalent to a few (< 5) comparisons given the cache line optimization. A Get operation on a B-Tree with 8B rows needs ~33 comparisons. With RocksDB it might need 20 comparisons for the memtable, bloom filter probes for 4 SSTs in L0, 3+6+9+12 SST search comparisons for L1 to L4, bloom filter probes for L1 to L4, and then ~33 comparisons to search the max level (L5). So it is easy to see that the search cost might be double the cost for a B-Tree.The search cost for Get with RocksDB can be reduced by configuring the LSM tree to use fewer sorted runs although we are still figuring out how much that can be reduced. With this example about 1/3 of the comparisons are from the memtable, another 1/3 are from the max level and the remaining 1/3 are from L0 to L4.Range Seek and Next The cost for a range scan has two components: range seek to initialize the scan and range next to produce each row. For range seek an iterator is positioned within each sorted run in the LSM tree. For range next the merging iterator combines the per-run iterators. For RocksDB the cost of range seek depends on the number of sorted runs while the cost of range next does not (for leveled compaction, assuming uniform distribution).Range seek does binary search per sorted run. This is similar to a point query without a bloom filter. The cost across all sorted runs depends on the number of sorted runs. In the example above where RocksDB has a memtable, 4 SSTs in the L0, L1 to L5 and 8B rows that requires 20 comparisons for the memtable, 4 x 20 comparisons for the L0 SSTs, 21 + 24 + 27 + 30 + 33 comparison for L1 to L5. The total is 235 comparisons. There isn't a way to avoid this and for short range queries the cost of range seek dominates the cost of range next. While this overhead is significant for short range queries with embedded RocksDB and an in-memory workload it is harder to notice with MyRocks because there is a lot of CPU overhead above MyRocks from optimize, parse and client RPC for a short range query. It is easier to notice the difference in CPU overhead between MyRocks and a B-Tree with longer range scans. The cost for range next is interesting. LevelDB has a comment that suggests using a heap but the merging iterator code uses N-1 comparisons per row produced which means the overhead is dependent on the number of sorted runs. The cost of range next in RocksDB is much less dependent on the number of sorted runs because it uses an optimized binary heap and the number of comparisons to produce a row depends on the node that produces the winner. Only one comparison is done if the root produces the winner and the root produced the previous winner. Two comparisons are done if the root produces the winner but did not produce the previous winner. More comparisons are needed in other cases. The code is optimized for long runs of winners from one iterator and that is likely with leveled compaction because Lmax is ~10X larger than the next largest level and usually produces most winners. Using a simulation with uniform distribution the expected number of comparisons per row is <= 1.55 regardless of the number of sorted runs for leveled compaction. The optimization in the binary heap used by the merging iterator is limited to remembering the comparison result between the two children of the root node. Were that extended to remembering comparison results between any two sibling nodes in the heap then the expected number of comparisons would be reduced from ~1.55 to ~1.38. For a B-Tree: The overhead for range seek is similar to a point query -- ~33 comparisons when there are 8B rows. There are no merging iterators. The overhead for range next is low -- usually move to the next row in the current leaf page. [Less]
Posted over 5 years ago by Severalnines
One of the cool features in Galera is automatic node provisioning and membership control. If a node fails or loses communication, it will be automatically evicted from the cluster and remain unoperational. As long as the majority of nodes are ... [More] still communicating (Galera calls this PC - primary component), there is a very high chance the failed node would be able to automatically rejoin, resync and resume the replication once the connectivity is back. Generally, all Galera nodes are equal. They hold the same data set and same role as masters, capable of handling read and write simultaneously, thanks to Galera group communication and certification-based replication plugin. Therefore, there is actually no failover from the database point-of-view due to this equilibrium. Only from the application side that would require failover, to skip the unoperational nodes while the cluster is partitioned. In this blog post, we are going to look into understanding how Galera Cluster performs node and cluster recovery in case network partition happens. Just as a side note, we have covered a similar topic in this blog post some time back. Codership has explained Galera's recovery concept in great details in the documentation page, Node Failure and Recovery. Node Failure and Eviction In order to understand the recovery, we have to understand how Galera detects the node failure and eviction process first. Let's put this into a controlled test scenario so we can understand the eviction process better. Suppose we have a three-node Galera Cluster as illustrated below: The following command can be used to retrieve our Galera provider options: mysql> SHOW VARIABLES LIKE 'wsrep_provider_options'\G It's a long list, but we just need to focus on some of the parameters to explain the process: evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.keepalive_period = PT1S; evs.suspect_timeout = PT5S; evs.view_forget_timeout = P1D; gmcast.peer_timeout = PT3S; First of all, Galera follows ISO 8601 formatting to represent duration. P1D means the duration is one day, while PT15S means the duration is 15 seconds (note the time designator, T, that precedes the time value). For example if one wanted to increase evs.view_forget_timeout to 1 day and a half, one would set P1DT12H, or PT36H. Considering all hosts haven't been configured with any firewall rules, we use the following script called block_galera.sh on galera2 to simulate a network failure to/from this node: #!/bin/bash # block_galera.sh # galera2, 192.168.55.172 iptables -I INPUT -m tcp -p tcp --dport 4567 -j REJECT iptables -I INPUT -m tcp -p tcp --dport 3306 -j REJECT iptables -I OUTPUT -m tcp -p tcp --dport 4567 -j REJECT iptables -I OUTPUT -m tcp -p tcp --dport 3306 -j REJECT # print timestamp date By executing the script, we get the following output: $ ./block_galera.sh Wed Jul 4 16:46:02 UTC 2018 The reported timestamp can be considered as the start of the cluster partitioning, where we lose galera2, while galera1 and galera3 are still online and accessible. At this point, our Galera Cluster architecture is looking something like this: From Partitioned Node Perspective On galera2, you will see some printouts inside the MySQL error log. Let's break them out into several parts. The downtime was started around 16:46:02 UTC time and after gmcast.peer_timeout=PT3S, the following appears: 2018-07-04 16:46:05 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') connection to peer 8b2041d6 with addr tcp://192.168.55.173:4567 timed out, no messages seen in PT3S 2018-07-04 16:46:05 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.55.173:4567 2018-07-04 16:46:06 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') connection to peer 737422d6 with addr tcp://192.168.55.171:4567 timed out, no messages seen in PT3S 2018-07-04 16:46:06 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 8b2041d6 (tcp://192.168.55.173:4567), attempt 0 As it passed evs.suspect_timeout = PT5S, both nodes galera1 and galera3 are suspected as dead by galera2: 2018-07-04 16:46:07 140454904243968 [Note] WSREP: evs::proto(62116b35, OPERATIONAL, view_id(REG,62116b35,54)) suspecting node: 8b2041d6 2018-07-04 16:46:07 140454904243968 [Note] WSREP: evs::proto(62116b35, OPERATIONAL, view_id(REG,62116b35,54)) suspected node without join message, declaring inactive 2018-07-04 16:46:07 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 737422d6 (tcp://192.168.55.171:4567), attempt 0 2018-07-04 16:46:08 140454904243968 [Note] WSREP: evs::proto(62116b35, GATHER, view_id(REG,62116b35,54)) suspecting node: 737422d6 2018-07-04 16:46:08 140454904243968 [Note] WSREP: evs::proto(62116b35, GATHER, view_id(REG,62116b35,54)) suspected node without join message, declaring inactive Then, Galera will revise the current cluster view and the position of this node: 2018-07-04 16:46:09 140454904243968 [Note] WSREP: view(view_id(NON_PRIM,62116b35,54) memb { 62116b35,0 } joined { } left { } partitioned { 737422d6,0 8b2041d6,0 }) 2018-07-04 16:46:09 140454904243968 [Note] WSREP: view(view_id(NON_PRIM,62116b35,55) memb { 62116b35,0 } joined { } left { } partitioned { 737422d6,0 8b2041d6,0 }) With the new cluster view, Galera will perform quorum calculation to decide whether this node is part of the primary component. If the new component sees "primary = no", Galera will demote the local node state from SYNCED to OPEN: 2018-07-04 16:46:09 140454288942848 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 2018-07-04 16:46:09 140454288942848 [Note] WSREP: Flow-control interval: [16, 16] 2018-07-04 16:46:09 140454288942848 [Note] WSREP: Trying to continue unpaused monitor 2018-07-04 16:46:09 140454288942848 [Note] WSREP: Received NON-PRIMARY. 2018-07-04 16:46:09 140454288942848 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 2753699) With the latest change on the cluster view and node state, Galera returns the post-eviction cluster view and global state as below: 2018-07-04 16:46:09 140454222194432 [Note] WSREP: New cluster view: global state: 55238f52-41ee-11e8-852f-3316bdb654bc:2753699, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3 2018-07-04 16:46:09 140454222194432 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. You can see the following global status of galera2 have changed during this period: mysql> SELECT * FROM information_schema.global_status WHERE variable_name IN ('WSREP_CLUSTER_STATUS','WSREP_LOCAL_STATE_COMMENT','WSREP_CLUSTER_SIZE','WSREP_EVS_DELAYED','WSREP_READY'); +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ | VARIABLE_NAME | VARIABLE_VALUE | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ | WSREP_CLUSTER_SIZE | 1 | | WSREP_CLUSTER_STATUS | non-Primary | | WSREP_EVS_DELAYED | 737422d6-7db3-11e8-a2a2-bbe98913baf0:tcp://192.168.55.171:4567:1,8b2041d6-7f62-11e8-87d5-12a76678131f:tcp://192.168.55.173:4567:2 | | WSREP_LOCAL_STATE_COMMENT | Initialized | | WSREP_READY | OFF | +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------+ At this point, MySQL/MariaDB server on galera2 is still accessible (database is listening on 3306 and Galera on 4567) and you can query the mysql system tables and list out the databases and tables. However when you jump into the non-system tables and make a simple query like this: mysql> SELECT * FROM sbtest1; ERROR 1047 (08S01): WSREP has not yet prepared node for application use You will immediately get an error indicating WSREP is loaded but not ready to use by this node, as reported by wsrep_ready status. This is due to the node losing its connection to the Primary Component and it enters the non-operational state (the local node status was changed from SYNCED to OPEN). Data reads from nodes in a non-operational state are considered stale, unless you set wsrep_dirty_reads=ON to permit reads, although Galera still rejects any command that modifies or updates the database. Finally, Galera will keep on listening and reconnecting to other members in the background infinitely: 2018-07-04 16:47:12 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 8b2041d6 (tcp://192.168.55.173:4567), attempt 30 2018-07-04 16:47:13 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 737422d6 (tcp://192.168.55.171:4567), attempt 30 2018-07-04 16:48:20 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 8b2041d6 (tcp://192.168.55.173:4567), attempt 60 2018-07-04 16:48:22 140454904243968 [Note] WSREP: (62116b35, 'tcp://0.0.0.0:4567') reconnecting to 737422d6 (tcp://192.168.55.171:4567), attempt 60 The eviction process flow by Galera group communication for the partitioned node during network issue can be summarized as below: Disconnects from the cluster after gmcast.peer_timeout. Suspects other nodes after evs.suspect_timeout. Retrieves the new cluster view. Performs quorum calculation to determine the node's state. Demotes the node from SYNCED to OPEN. Attempts to reconnect to the primary component (other Galera nodes) in the background. From Primary Component Perspective On galera1 and galera3 respectively, after gmcast.peer_timeout=PT3S, the following appears in the MySQL error log: 2018-07-04 16:46:05 139955510687488 [Note] WSREP: (8b2041d6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.55.172:4567 2018-07-04 16:46:06 139955510687488 [Note] WSREP: (8b2041d6, 'tcp://0.0.0.0:4567') reconnecting to 62116b35 (tcp://192.168.55.172:4567), attempt 0 After it passed evs.suspect_timeout = PT5S, galera2 is suspected as dead by galera3 (and galera1): 2018-07-04 16:46:10 139955510687488 [Note] WSREP: evs::proto(8b2041d6, OPERATIONAL, view_id(REG,62116b35,54)) suspecting node: 62116b35 2018-07-04 16:46:10 139955510687488 [Note] WSREP: evs::proto(8b2041d6, OPERATIONAL, view_id(REG,62116b35,54)) suspected node without join message, declaring inactive Galera checks out if the other nodes respond to the group communication on galera3, it finds galera1 is in primary and stable state: 2018-07-04 16:46:11 139955510687488 [Note] WSREP: declaring 737422d6 at tcp://192.168.55.171:4567 stable 2018-07-04 16:46:11 139955510687488 [Note] WSREP: Node 737422d6 state prim Galera revises the cluster view of this node (galera3): 2018-07-04 16:46:11 139955510687488 [Note] WSREP: view(view_id(PRIM,737422d6,55) memb { 737422d6,0 8b2041d6,0 } joined { } left { } partitioned { 62116b35,0 }) 2018-07-04 16:46:11 139955510687488 [Note] WSREP: save pc into disk Galera then removes the partitioned node from the Primary Component: 2018-07-04 16:46:11 139955510687488 [Note] WSREP: forgetting 62116b35 (tcp://192.168.55.172:4567) The new Primary Component is now consisted of two nodes, galera1 and galera3: 2018-07-04 16:46:11 139955502294784 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 The Primary Component will exchange the state between each other to agree on the new cluster view and global state: 2018-07-04 16:46:11 139955502294784 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2018-07-04 16:46:11 139955510687488 [Note] WSREP: (8b2041d6, 'tcp://0.0.0.0:4567') turning message relay requesting off 2018-07-04 16:46:11 139955502294784 [Note] WSREP: STATE EXCHANGE: sent state msg: b3d38100-7f66-11e8-8e70-8e3bf680c993 2018-07-04 16:46:11 139955502294784 [Note] WSREP: STATE EXCHANGE: got state msg: b3d38100-7f66-11e8-8e70-8e3bf680c993 from 0 (192.168.55.171) 2018-07-04 16:46:11 139955502294784 [Note] WSREP: STATE EXCHANGE: got state msg: b3d38100-7f66-11e8-8e70-8e3bf680c993 from 1 (192.168.55.173) Galera calculates and verifies the quorum of the state exchange between online members: 2018-07-04 16:46:11 139955502294784 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 27, members = 2/2 (joined/total), act_id = 2753703, last_appl. = 2753606, protocols = 0/8/3 (gcs/repl/appl), group UUID = 55238f52-41ee-11e8-852f-3316bdb654bc 2018-07-04 16:46:11 139955502294784 [Note] WSREP: Flow-control interval: [23, 23] 2018-07-04 16:46:11 139955502294784 [Note] WSREP: Trying to continue unpaused monitor Galera updates the new cluster view and global state after galera2 eviction: 2018-07-04 16:46:11 139955214169856 [Note] WSREP: New cluster view: global state: 55238f52-41ee-11e8-852f-3316bdb654bc:2753703, view# 28: Primary, number of nodes: 2, my index: 1, protocol version 3 2018-07-04 16:46:11 139955214169856 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-07-04 16:46:11 139955214169856 [Note] WSREP: REPL Protocols: 8 (3, 2) 2018-07-04 16:46:11 139955214169856 [Note] WSREP: Assign initial position for certification: 2753703, protocol version: 3 2018-07-04 16:46:11 139956691814144 [Note] WSREP: Service thread queue flushed. Clean up the partitioned node (galera2) from the active list: 2018-07-04 16:46:14 139955510687488 [Note] WSREP: cleaning up 62116b35 (tcp://192.168.55.172:4567) At this point, both galera1 and galera3 will be reporting similar global status: mysql> SELECT * FROM information_schema.global_status WHERE variable_name IN ('WSREP_CLUSTER_STATUS','WSREP_LOCAL_STATE_COMMENT','WSREP_CLUSTER_SIZE','WSREP_EVS_DELAYED','WSREP_READY'); +---------------------------+------------------------------------------------------------------+ | VARIABLE_NAME | VARIABLE_VALUE | +---------------------------+------------------------------------------------------------------+ | WSREP_CLUSTER_SIZE | 2 | | WSREP_CLUSTER_STATUS | Primary | | WSREP_EVS_DELAYED | 1491abd9-7f6d-11e8-8930-e269b03673d8:tcp://192.168.55.172:4567:1 | | WSREP_LOCAL_STATE_COMMENT | Synced | | WSREP_READY | ON | +---------------------------+------------------------------------------------------------------+ They list out the problematic member in the wsrep_evs_delayed status. Since the local state is "Synced", these nodes are operational and you can redirect the client connections from galera2 to any of them. If this step is inconvenient, consider using a load balancer sitting in front of the database to simplify the connection endpoint from the clients. Node Recovery and Joining A partitioned Galera node will keep on attempting to establish connection with the Primary Component infinitely. Let's flush the iptables rules on galera2 to let it connect with the remaining nodes: # on galera2 $ iptables -F Once the node is capable of connecting to one of the nodes, Galera will start re-establishing the group communication automatically: 2018-07-09 10:46:34 140075962705664 [Note] WSREP: (1491abd9, 'tcp://0.0.0.0:4567') connection established to 8b2041d6 tcp://192.168.55.173:4567 2018-07-09 10:46:34 140075962705664 [Note] WSREP: (1491abd9, 'tcp://0.0.0.0:4567') connection established to 737422d6 tcp://192.168.55.171:4567 2018-07-09 10:46:34 140075962705664 [Note] WSREP: declaring 737422d6 at tcp://192.168.55.171:4567 stable 2018-07-09 10:46:34 140075962705664 [Note] WSREP: declaring 8b2041d6 at tcp://192.168.55.173:4567 stable Node galera2 will then connect to one of the Primary Component (in this case is galera1, node ID 737422d6) to get the current cluster view and nodes state: 2018-07-09 10:46:34 140075962705664 [Note] WSREP: Node 737422d6 state prim 2018-07-09 10:46:34 140075962705664 [Note] WSREP: view(view_id(PRIM,1491abd9,142) memb { 1491abd9,0 737422d6,0 8b2041d6,0 } joined { } left { } partitioned { }) 2018-07-09 10:46:34 140075962705664 [Note] WSREP: save pc into disk Galera will then perform state exchange with the rest of the members that can form the Primary Component: 2018-07-09 10:46:34 140075954312960 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3 2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f 2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE EXCHANGE: sent state msg: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f 2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE EXCHANGE: got state msg: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f from 0 (192.168.55.172) 2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE EXCHANGE: got state msg: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f from 1 (192.168.55.171) 2018-07-09 10:46:34 140075954312960 [Note] WSREP: STATE EXCHANGE: got state msg: 4b23eaa0-8322-11e8-a87e-fe4e0fce2a5f from 2 (192.168.55.173) The state exchange allows galera2 to calculate the quorum and produce the following result: 2018-07-09 10:46:34 140075954312960 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 71, members = 2/3 (joined/total), act_id = 2836958, last_appl. = 0, protocols = 0/8/3 (gcs/repl/appl), group UUID = 55238f52-41ee-11e8-852f-3316bdb654bc Galera will then promote the local node state from OPEN to PRIMARY, to start and establish the node connection to the Primary Component: 2018-07-09 10:46:34 140075954312960 [Note] WSREP: Flow-control interval: [28, 28] 2018-07-09 10:46:34 140075954312960 [Note] WSREP: Trying to continue unpaused monitor 2018-07-09 10:46:34 140075954312960 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2836958) As reported by the above line, Galera calculates the gap on how far the node is behind from the cluster. This node requires state transfer to catch up to writeset number 2836958 from 2761994: 2018-07-09 10:46:34 140075929970432 [Note] WSREP: State transfer required: Group state: 55238f52-41ee-11e8-852f-3316bdb654bc:2836958 Local state: 55238f52-41ee-11e8-852f-3316bdb654bc:2761994 2018-07-09 10:46:34 140075929970432 [Note] WSREP: New cluster view: global state: 55238f52-41ee-11e8-852f-3316bdb654bc:2836958, view# 72: Primary, number of nodes: 3, my index: 0, protocol version 3 2018-07-09 10:46:34 140075929970432 [Warning] WSREP: Gap in state sequence. Need state transfer. 2018-07-09 10:46:34 140075929970432 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2018-07-09 10:46:34 140075929970432 [Note] WSREP: REPL Protocols: 8 (3, 2) 2018-07-09 10:46:34 140075929970432 [Note] WSREP: Assign initial position for certification: 2836958, protocol version: 3 Galera prepares the IST listener on port 4568 on this node and asks any Synced node in the cluster to become a donor. In this case, Galera automatically picks galera3 (192.168.55.173), or it could also pick a donor from the list under wsrep_sst_donor (if defined) for the syncing operation: 2018-07-09 10:46:34 140075996276480 [Note] WSREP: Service thread queue flushed. 2018-07-09 10:46:34 140075929970432 [Note] WSREP: IST receiver addr using tcp://192.168.55.172:4568 2018-07-09 10:46:34 140075929970432 [Note] WSREP: Prepared IST receiver, listening at: tcp://192.168.55.172:4568 2018-07-09 10:46:34 140075954312960 [Note] WSREP: Member 0.0 (192.168.55.172) requested state transfer from '*any*'. Selected 2.0 (192.168.55.173)(SYNCED) as donor. It will then change the local node state from PRIMARY to JOINER. At this stage, galera2 is granted with state transfer request and starts to cache write-sets: 2018-07-09 10:46:34 140075954312960 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2836958) 2018-07-09 10:46:34 140075929970432 [Note] WSREP: Requesting state transfer: success, donor: 2 2018-07-09 10:46:34 140075929970432 [Note] WSREP: GCache history reset: 55238f52-41ee-11e8-852f-3316bdb654bc:2761994 -> 55238f52-41ee-11e8-852f-3316bdb654bc:2836958 2018-07-09 10:46:34 140075929970432 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset Node galera2 starts receiving the missing writesets from the selected donor's gcache (galera3): 2018-07-09 10:46:34 140075954312960 [Note] WSREP: 2.0 (192.168.55.173): State transfer to 0.0 (192.168.55.172) complete. 2018-07-09 10:46:34 140075929970432 [Note] WSREP: Receiving IST: 74964 writesets, seqnos 2761994-2836958 2018-07-09 10:46:34 140075593627392 [Note] WSREP: Receiving IST... 0.0% ( 0/74964 events) complete. 2018-07-09 10:46:34 140075954312960 [Note] WSREP: Member 2.0 (192.168.55.173) synced with group. 2018-07-09 10:46:34 140075962705664 [Note] WSREP: (1491abd9, 'tcp://0.0.0.0:4567') connection established to 737422d6 tcp://192.168.55.171:4567 2018-07-09 10:46:41 140075962705664 [Note] WSREP: (1491abd9, 'tcp://0.0.0.0:4567') turning message relay requesting off 2018-07-09 10:46:44 140075593627392 [Note] WSREP: Receiving IST... 36.0% (27008/74964 events) complete. 2018-07-09 10:46:54 140075593627392 [Note] WSREP: Receiving IST... 71.6% (53696/74964 events) complete. 2018-07-09 10:47:02 140075593627392 [Note] WSREP: Receiving IST...100.0% (74964/74964 events) complete. 2018-07-09 10:47:02 140075929970432 [Note] WSREP: IST received: 55238f52-41ee-11e8-852f-3316bdb654bc:2836958 2018-07-09 10:47:02 140075954312960 [Note] WSREP: 0.0 (192.168.55.172): State transfer from 2.0 (192.168.55.173) complete. Once all the missing writesets are received and applied, Galera will promote galera2 as JOINED until seqno 2837012: 2018-07-09 10:47:02 140075954312960 [Note] WSREP: Shifting JOINER -> JOINED (TO: 2837012) 2018-07-09 10:47:02 140075954312960 [Note] WSREP: Member 0.0 (192.168.55.172) synced with group. The node applies any cached writesets in its slave queue and finishes catching up with the cluster. Its slave queue is now empty. Galera will promote galera2 to SYNCED, indicating the node is now operational and ready to serve clients: 2018-07-09 10:47:02 140075954312960 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 2837012) 2018-07-09 10:47:02 140076605892352 [Note] WSREP: Synchronized with group, ready for connections At this point, all nodes are back operational. You can verify by using the following statements on galera2: mysql> SELECT * FROM information_schema.global_status WHERE variable_name IN ('WSREP_CLUSTER_STATUS','WSREP_LOCAL_STATE_COMMENT','WSREP_CLUSTER_SIZE','WSREP_EVS_DELAYED','WSREP_READY'); +---------------------------+----------------+ | VARIABLE_NAME | VARIABLE_VALUE | +---------------------------+----------------+ | WSREP_CLUSTER_SIZE | 3 | | WSREP_CLUSTER_STATUS | Primary | | WSREP_EVS_DELAYED | | | WSREP_LOCAL_STATE_COMMENT | Synced | | WSREP_READY | ON | +---------------------------+----------------+ The wsrep_cluster_size reported as 3 and the cluster status is Primary, indicating galera2 is part of the Primary Component. The wsrep_evs_delayed has also been cleared and the local state is now Synced. The recovery process flow for the partitioned node during network issue can be summarized as below: Re-establishes group communication to other nodes. Retrieves the cluster view from one of the Primary Component. Performs state exchange with the Primary Component and calculates the quorum. Changes the local node state from OPEN to PRIMARY. Calculates the gap between local node and the cluster. Changes the local node state from PRIMARY to JOINER. Prepares IST listener/receiver on port 4568. Requests state transfer via IST and picks a donor. Starts receiving and applying the missing writeset from chosen donor's gcache. Changes the local node state from JOINER to JOINED. Catches up with the cluster by applying the cached writesets in the slave queue. Changes the local node state from JOINED to SYNCED. ClusterControl Single Console for Your Entire Database Infrastructure Find out what else is new in ClusterControl Install ClusterControl for FREE Cluster Failure A Galera Cluster is considered failed if no primary component (PC) is available. Consider a similar three-node Galera Cluster as depicted in the diagram below: A cluster is considered operational if all nodes or majority of the nodes are online. Online means they are able to see each other through Galera's replication traffic or group communication. If no traffic is coming in and out from the node, the cluster will send a heartbeat beacon for the node to response in a timely manner. Otherwise, it will be put into the delay or suspected list according to how the node responses. If a node goes down, let's say node C, the cluster will remain operational because node A and B are still in quorum with 2 votes out of 3 to form a primary component. You should get the following cluster state on A and B: mysql> SHOW STATUS LIKE 'wsrep_cluster_status'; +----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | wsrep_cluster_status | Primary | +----------------------+---------+ If let's say a primary switch went kaput, as illustrated in the following diagram: At this point, every single node loses communication to each other, and the cluster state will be reported as non-Primary on all nodes (as what happened to galera2 in the previous case). Every node would calculate the quorum and find out that it is the minority (1 vote out of 3) thus losing the quorum, which means no Primary Component is formed and consequently all nodes refuse to serve any data. This is deemed as cluster failure. Once the network issue is resolved, Galera will automatically re-establish the communication between members, exchange node's states and determine the possibility of reforming the primary component by comparing node state, UUIDs and seqnos. If the probability is there, Galera will merge the primary components as shown in the following lines: 2018-06-27 0:16:57 140203784476416 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3 2018-06-27 0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2018-06-27 0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: sent state msg: 5885911b-795c-11e8-8683-931c85442c7e 2018-06-27 0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: got state msg: 5885911b-795c-11e8-8683-931c85442c7e from 0 (192.168.55.171) 2018-06-27 0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: got state msg: 5885911b-795c-11e8-8683-931c85442c7e from 1 (192.168.55.172) 2018-06-27 0:16:57 140203784476416 [Note] WSREP: STATE EXCHANGE: got state msg: 5885911b-795c-11e8-8683-931c85442c7e from 2 (192.168.55.173) 2018-06-27 0:16:57 140203784476416 [Warning] WSREP: Quorum: No node with complete state: Version : 4 Flags : 0x3 Protocols : 0 / 8 / 3 State : NON-PRIMARY Desync count : 0 Prim state : SYNCED Prim UUID : 5224a024-791b-11e8-a0ac-8bc6118b0f96 Prim seqno : 5 First seqno : 112714 Last seqno : 112725 Prim JOINED : 3 State UUID : 5885911b-795c-11e8-8683-931c85442c7e Group UUID : 55238f52-41ee-11e8-852f-3316bdb654bc Name : '192.168.55.171' Incoming addr: '192.168.55.171:3306' Version : 4 Flags : 0x2 Protocols : 0 / 8 / 3 State : NON-PRIMARY Desync count : 0 Prim state : SYNCED Prim UUID : 5224a024-791b-11e8-a0ac-8bc6118b0f96 Prim seqno : 5 First seqno : 112714 Last seqno : 112725 Prim JOINED : 3 State UUID : 5885911b-795c-11e8-8683-931c85442c7e Group UUID : 55238f52-41ee-11e8-852f-3316bdb654bc Name : '192.168.55.172' Incoming addr: '192.168.55.172:3306' Version : 4 Flags : 0x2 Protocols : 0 / 8 / 3 State : NON-PRIMARY Desync count : 0 Prim state : SYNCED Prim UUID : 5224a024-791b-11e8-a0ac-8bc6118b0f96 Prim seqno : 5 First seqno : 112714 Last seqno : 112725 Prim JOINED : 3 State UUID : 5885911b-795c-11e8-8683-931c85442c7e Group UUID : 55238f52-41ee-11e8-852f-3316bdb654bc Name : '192.168.55.173' Incoming addr: '192.168.55.173:3306' 2018-06-27 0:16:57 140203784476416 [Note] WSREP: Full re-merge of primary 5224a024-791b-11e8-a0ac-8bc6118b0f96 found: 3 of 3. 2018-06-27 0:16:57 140203784476416 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 5, members = 3/3 (joined/total), act_id = 112725, last_appl. = 112722, protocols = 0/8/3 (gcs/repl/appl), group UUID = 55238f52-41ee-11e8-852f-3316bdb654bc 2018-06-27 0:16:57 140203784476416 [Note] WSREP: Flow-control interval: [28, 28] 2018-06-27 0:16:57 140203784476416 [Note] WSREP: Trying to continue unpaused monitor 2018-06-27 0:16:57 140203784476416 [Note] WSREP: Restored state OPEN -> SYNCED (112725) 2018-06-27 0:16:57 140202564110080 [Note] WSREP: New cluster view: global state: 55238f52-41ee-11e8-852f-3316bdb654bc:112725, view# 6: Primary, number of nodes: 3, my index: 2, protocol version 3 A good indicator to know if the re-bootstrapping process is OK is by looking at the following line in the error log: [Note] WSREP: Synchronized with group, ready for connections ClusterControl Auto Recovery ClusterControl comes with node and cluster automatic recovery features, because it oversees and understands the state of all nodes in the cluster. Automatic recovery is by default enabled if the cluster is deployed using ClusterControl. To enable or disable the cluster, simply clicking on the power icon in the summary bar as shown below: Green icon means automatic recovery is turned on, while red is the opposite. You can monitor the recovery progress from the Activity -> Jobs dialog, like in this case, galera2 was totally inaccessible due to firewall blocking, thus forcing ClusterControl to report the following: Related resources  Become a MySQL DBA blog series - Galera Cluster diagnostic logs  Disaster Recovery Planning for MySQL & MariaDB  How to Recover Galera Cluster or MySQL Replication from Split Brain Syndrome The recovery process will only be commencing after a graceful timeout (30 seconds) to give Galera node a chance to recover itself beforehand. If ClusterControl fails to recover a node or cluster, it will first pull all MySQL error logs from all accessible nodes and will raise the necessary alarms to notify the user via email or by pushing critical events to the third-party integration modules like PagerDuty, VictorOps or Slack. Manual intervention is then required. For Galera Cluster, ClusterControl will keep on trying to recover the failure until you mark the node as under maintenance, or disable the automatic recovery feature. ClusterControl's automatic recovery is one of most favorite features as voted by our users. It helps you to take the necessary actions quickly, with a complete report on what has been attempted and recommendation steps to troubleshoot further on the issue. For users with support subscriptions, you can look for extra hands by escalating this issue to our technical support team for assistance. Conclusion Galera automatic node recovery and membership control are neat features to simplify the cluster management, improve the database reliability and reduce the risk of human error, as commonly haunting other open-source database replication technology like MySQL Replication, Group Replication and PostgreSQL Streaming/Logical Replication. Tags:  galera MySQL MariaDB recovery network partitioning galera cluster [Less]
Posted over 5 years ago by ScaleGrid.io
MySQL semisynchronous replication provides improved data integrity because when a commit returns successfully, it’s known that the data exists in at least two places – the master and its slave. In this blog post, we review some of the MySQL ... [More] configurations that influence the data integrity and performance aspects of semisynchronous replication. We’ll be using InnoDB storage engine and GTID-based replication in a 3-node replica set (master and 2 slaves), which will ensure there is redundancy in the slaves. This means that if there are issues with one slave, we can fall back on the other. Configurations Applicable to Both Master and Slave Nodes innodb_flust_log_at_trx_commit = 1 sync_binlog = 1 These configurations guarantee high durability and consistency settings for data. That is, each committed transaction is guaranteed to be present in binary logs and also the logs are flushed to the disk. Hence, in the case of a power failure or operating system crash, the data consistency of MySQL is always preserved. Configurations on the Master Node. rpl_semi_sync_master_wait_for_slave_count: This option is used to configure the number of slaves that must send an acknowledgment before a semisynchronous master can commit the transaction. In the 3-node replica set, we recommend setting this to 1, so that we always have an assurance that the data is available in at least one slave while avoiding any performance impact involved in waiting for acknowledgment from both the slaves. rpl_semi_sync_master_timeout: This option is used to configure the amount of time that a semisynchronous master waits for slave acknowledgment before switching back to asynchronous mode. We recommend setting this to a large number so that there is no fallback to asynchronous mode which then defeats our data integrity objective. Since we’re operating with 2 slaves and rpl_semi_sync_master_wait_for_slave_count is set to 1, we can assume that at least one of the slaves does acknowledge within a reasonable amount of time, thereby minimizing the performance impact of this setting. #MySQL Tutorial: Data Integrity and Performance Considerations in Semisynchronous ReplicationClick To Tweet Configurations on the Slave Nodes In the slaves, it’s always important to track two positions very accurately: the current executed position of SQL thread in relay log, and the current position of the IO thread which indicates how far the mater binary file is read and copied to slave. The consequences of not maintaining these positions are quite obvious. If there’s a slave crash and restart, SQL thread can start processing transactions from a wrong offset or the IO thread can start pulling data from a wrong position in the master binary logs. Both these cases will lead to data corruption. it is important to ensure crash-safety of slaves through the following configurations: relay_log_info_repository = TABLE relay_log_recovery = ON Setting relay_log_info_repository to TABLE will ensure the position of the SQL thread is updated together with each transaction commit on the slave. However, it’s difficult to maintain the exact position of IO thread and flush to the disk. This is because reading master binary log and writing to slave relay log is not based on transactions. The impact on performance is very high if IO thread position has to be updated and flushed to disk after each write to slave relay logs. A more elegant solution would be to set relay_log_recovery = ON, in which case, if there’s a MySQL restart, current relay logs will be assumed to be corrupted and will be freshly pulled from the master based on the SQL thread position. Last but not least, it’s important to note that semisynchronous replication ensures that the data has just ‘reached’ one of the slaves before the master committing the transaction, and does not mean that the transactions are committed on the slave. Hence, it will be good to ensure that the SQL thread works with good performance. In the ideal case, the SQL thread moves hand in hand with the IO thread so we can have the benefit of slave not only receiving the transactions, but also committing them. It’s recommended to go with a multi-threaded slave configuration so that we can get increased slave SQL thread performance. The important configurations for multi-threaded slaves are: slave_parallel_workers :  Set this to > 1 to enable multiple slave SQL thread workers. Based on the number of threads writing on the master, we can decide an optimal number for this so that the slave does not lag. slave-parallel-type = LOGICAL_CLOCK slave-preserve-commit-order =1 The above configurations are going to promise parallelism in the slave, while at the same time, preserving the order of transactions as seen on the master. In summary, by using the above configurations on our MySQL replica set, we’re able to maintain high data integrity along with an optimal performance. As always, if you have any questions, leave us a comment, reach out to us at @scalegridio on Twitter, or send us an email at [email protected]. [Less]
Posted over 5 years ago by MySQL Performance Blog
Ever since AMD released their EPYC CPU for servers I wanted to test it, but I did not have the opportunity until recently, when Packet.net started offering bare metal servers for a reasonable price. So I started a couple of instances to test Percona ... [More] Server for MySQL under this CPU. In this benchmark, I discovered some interesting discrepancies in performance between  AMD and Intel CPUs when running under systemd . The set up To test CPU performance, I used a read-only in-memory sysbench OLTP benchmark, as it burns CPU cycles and no IO is performed by Percona Server. For this benchmark I used Packet.net c2.medium.x86 instances powered by AMD EPYC 7401P processors. The OS is exposed to 48 CPU threads. For the OS I tried Ubuntu 16.04 with default kernel 4.4 and upgraded to 4.15 Ubuntu 18.04 with kernel 4.15 Percona Server started from SystemD and without SystemD (for reasons which will become apparent later) To have some points for comparison, I also ran a similar workload on my 2 socket Intel CPU server, with CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. I recognize this is not most recent Intel CPU, but this was the best I had at the time, and it also gave 48 CPU Threads. Ubuntu 16 First, let’s review the results for Ubuntu 16 Or in tabular format: Threads Ubuntu 16, kernel 4.4; systemd Ubuntu 16, kernel 4.4; NO systemd Ubuntu 16, kernel 4.15 1 943.44 948.7 899.82 2 1858.58 1792.36 1774.45 4 3533.2 3424.05 3555.94 8 6762.35 6731.57 7010.51 12 10012.18 9950.3 10062.82 16 13063.39 13043.55 12893.17 20 15347.68 15347.56 14756.27 24 16886.24 16864.81 16176.76 30 18150.2 18160.07 17860.5 36 18923.06 18811.64 19528.27 42 19374.86 19463.08 21537.79 48 20110.81 19983.05 23388.18 56 20548.51 20362.31 23768.49 64 20860.51 20729.52 23797.14 72 21123.71 21001.06 23645.95 80 21370 21191.24 23546.03 90 21622.54 21441.73 23486.29 100 21806.67 21670.38 23392.72 128 22161.42 22031.53 23315.33 192 22388.51 22207.26 22906.42 256 22091.17 21943.37 22305.06 512 19524.41 19381.69 19181.71   There are few conclusions we can see from this data AMD EPYC CPU scales quite well to the number of CPU Threads The recent kernel helps to boost the throughput. Ubuntu 18.04 Now, let’s review the results for Ubuntu 18.04 Threads Ubuntu 18, systemd Ubuntu 18, NO systemd 1 833.14 843.68 2 1684.21 1693.93 4 3346.42 3359.82 8 6592.88 6597.48 12 9477.92 9487.93 16 12117.12 12149.17 20 13934.27 13933 24 15265.1 15152.74 30 16846.02 16061.16 36 18488.88 16726.14 42 20493.57 17360.56 48 22217.47 17906.4 56 22564.4 17931.83 64 22590.29 17902.95 72 22472.75 17857.73 80 22421.99 17766.76 90 22300.09 17773.57 100 22275.24 17646.7 128 22131.86 17411.55 192 21750.8 17134.63 256 21177.25 16826.53 512 18296.61 17418.72   This is where the result surprised me: on Ubuntu 18.04 with SystemD running Percona Server for MySQL as a service the throughput was up to 24% better than if Percona Server for MySQL is started from a bash shell. I do not know exactly what causes this dramatic difference—systemd uses different slices for services and user commands, and somehow it affects the performance. Baseline benchmark To establish a baseline, I ran the same benchmark on my Intel box, running Ubuntu 16, and I tried two kernels: 4.13 and 4.15 Threads Ubuntu 16, kernel 4.13, systemd Ubuntu 16, kernel 4.15, systemd Ubuntu 16, kernel 4.15, NO systemd 1 820.07 798.42 864.21 2 1563.31 1609.96 1681.91 4 2929.63 3186.01 3338.47 8 6075.73 6279.49 6624.49 12 8743.38 9256.18 9622.6 16 10580.14 11351.31 11984.64 20 12790.96 12599.78 14147.1 24 14213.68 14659.49 15716.61 30 15983.78 16096.03 17530.06 36 17574.46 18098.36 20085.9 42 18671.14 19808.92 21875.84 48 19431.05 22036.06 23986.08 56 19737.92 22115.34 24275.72 64 19946.57 21457.32 24054.09 72 20129.7 21729.78 24167.03 80 20214.93 21594.51 24092.86 90 20194.78 21195.61 23945.93 100 20753.44 21597.26 23802.16 128 20235.24 20684.34 23476.82 192 20280.52 20431.18 23108.36 256 20410.55 20952.64 22775.63 512 20953.73 22079.18 23489.3   Here we see the opposite result with SystemD: Percona Server running from a bash shell shows the better throughput compared with the SystemD service. So for some reason, systemd works differently for AMD and Intel CPUs. Please let me know if you have any ideas on how to deal with the impact that systemd has on performance. Conclusions So there are some conclusions from these results: AMD EPYC shows a decent performance scalability; the new kernel helps to improve it systemd shows different effects on throughput for AMD and Intel CPUs With AMD the throughput declines for a high concurrent workload with 512 threads, while Intel does not show a decline. The post AMD EPYC Performance Testing… or Don’t get on the wrong side of SystemD appeared first on Percona Database Performance Blog. [Less]
Posted over 5 years ago by Vlad Mihalcea
Introduction When executing an entity query (e.g. JPQL, HQL or Criteria API), you can use any SQL function without having to register it as long as the function is passed directly to the WHERE clause of the underlying SQL statement. However, if the ... [More] SQL function is used in the SELECT clause, and Hibernate has not … Continue reading The best way to use SQL functions in JPQL or Criteria API queries with JPA and Hibernate → The post The best way to use SQL functions in JPQL or Criteria API queries with JPA and Hibernate appeared first on Vlad Mihalcea. [Less]
Posted over 5 years ago by MinervaDB
MySQL partitioning makes data distribution of individual tables (typically we recommend partition for large & complex I/O table for performance, scalability and manageability) across multiple files based on partition strategy / rules. In very ... [More] simple terms, different portions of table are stored as separate tables in different location to distribute I/O optimally. The user defined division of data by some rule is known as partition function, In MySQL we partition data by RANGE of values / LIST of values / internal hashing function / linear hashing function. By restricting the query examination on the selected partitions by matching rows increases the query performance by multiple times compared to the same query on a non partitioned table, This methodology is also called partition pruning (trimming of unwanted partitions), Please find below example of partition pruning: CREATE TABLE tab1 ( col1 VARCHAR(30) NOT NULL, col2 VARCHAR(30) NOT NULL, col3 TINYINT UNSIGNED NOT NULL, col4 DATE NOT NULL ) PARTITION BY RANGE( col3 ) ( PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (200), PARTITION p2 VALUES LESS THAN (300), PARTITION p3 VALUES LESS THAN MAXVALUE ); Write a SELECT query benefitting partition pruning: SELECT col1, col2, col3, col4 FROM tab1 WHERE col3 > 200 AND col3 < 250; What is explicit partitioning in MySQL and how is it different from partition pruning ?  In MySQL we can explicitly select partition and sub-partitions when executing a statement matching a given WHERE condition, This sounds very much similar to partition pruning, but there is a difference: Partition to be checked are explicitly mentioned in the query statement, In partition pruning it is automatic. In explicit partition, the explicit selection of partitions is supported for both queries and DML statements, partition pruning applies only to queries. SQL statements supported in explicit partitioning – SELECT, INSERT, UPDATE, DELETE, LOAD DATA, LOAD XML and REPLACE Explicit partition example: CREATE TABLE customer ( cust_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, cust_fname VARCHAR(25) NOT NULL, cust_lname VARCHAR(25) NOT NULL, cust_phone INT NOT NULL, cust_fax INT NOT NULL ) PARTITION BY RANGE(cust_id) ( PARTITION p0 VALUES LESS THAN (100), PARTITION p1 VALUES LESS THAN (200), PARTITION p2 VALUES LESS THAN (300), PARTITION p3 VALUES LESS THAN MAXVALUE ); Query explicitly mentioning partition: mysql> SELECT * FROM customer PARTITION (p1); RANGE partitioningIn RANGE partitioning you can partition values within a given range, Ranges should be contiguous but not overlapping, usually defined by VALUES LESS THAN operator, The following examples explain how to create and use RANGE partitioning for MySQL performance: CREATE TABLE customer_contract( cust_id INT NOT NULL, cust_fname VARCHAR(30), cust_lname VARCHAR(30), st_dt DATE NOT NULL DEFAULT '1970-01-01', end_dt DATE NOT NULL DEFAULT '9999-12-31', contract_code INT NOT NULL, contract_id INT NOT NULL ) PARTITION BY RANGE (contract_id) ( PARTITION p0 VALUES LESS THAN (50), PARTITION p1 VALUES LESS THAN (100), PARTITION p2 VALUES LESS THAN (150), PARTITION p3 VALUES LESS THAN (200) ); For example, let us suppose that you wish to partition based on the year contract ended: CREATE TABLE customer_contract( cust_id INT NOT NULL, cust_fname VARCHAR(30), cust_lname VARCHAR(30), st_dt DATE NOT NULL DEFAULT '1970-01-01', end_dt DATE NOT NULL DEFAULT '9999-12-31', contract_code INT NOT NULL, contract_id INT NOT NULL ) PARTITION BY RANGE (year(end_dt)) ( PARTITION p0 VALUES LESS THAN (2001), PARTITION p1 VALUES LESS THAN (2002), PARTITION p2 VALUES LESS THAN (2003), PARTITION p3 VALUES LESS THAN (2004) ); It is also possible to partition a table by RANGE, based on the value of a TIMESTAMP column, using the UNIX_TIMESTAMP() function, as shown in this example: CREATE TABLE sales_forecast ( sales_forecast_id INT NOT NULL, sales_forecast_status VARCHAR(20) NOT NULL, sales_forecast_updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP ) PARTITION BY RANGE ( UNIX_TIMESTAMP(sales_forecast_updated) ) ( PARTITION p0 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-01-01 00:00:00') ), PARTITION p1 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-04-01 00:00:00') ), PARTITION p2 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-07-01 00:00:00') ), PARTITION p3 VALUES LESS THAN ( UNIX_TIMESTAMP('2008-10-01 00:00:00') ), PARTITION p4 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-01-01 00:00:00') ), PARTITION p5 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-04-01 00:00:00') ), PARTITION p6 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-07-01 00:00:00') ), PARTITION p7 VALUES LESS THAN ( UNIX_TIMESTAMP('2009-10-01 00:00:00') ), PARTITION p8 VALUES LESS THAN ( UNIX_TIMESTAMP('2010-01-01 00:00:00') ), PARTITION p9 VALUES LESS THAN (MAXVALUE) ); LIST partitioningThe difference between RANGE and LIST partitioning is: In LIST partitioning, each partition is grouped on the selected list of values of a specific column. You can do it by PARTITION BY LIST (EXPR) where EXPR is the selected column for list partition, We have explained LIST partitioning with example below: CREATE TABLE students ( student_id INT NOT NULL, student_fname VARCHAR(30), student_lname VARCHAR(30), student_joined DATE NOT NULL DEFAULT '1970-01-01', student_separated DATE NOT NULL DEFAULT '9999-12-31', student_house INT, student_grade_id INT ) PARTITION BY LIST(student_grade_id) ( PARTITION P1 VALUES IN (1,2,3,4), PARTITION P2 VALUES IN (5,6,7), PARTITION P3 VALUES IN (8,9,10), PARTITION P4 VALUES IN (11,12) ); HASH partitioningHASH partitioning makes an even distribution of data among predetermined number of partitions, In RANGE and LIST partitioning you must explicitly define the partitioning logic and which partition given column value or set of column values are stored. In HASH partitioning MySQL take care of this, The following example explains HASH partitioning better: CREATE TABLE store ( store_id INT NOT NULL, store_name VARCHAR(30), store_location VARCHAR(30), store_started DATE NOT NULL DEFAULT '1997-01-01', store_code INT ) PARTITION BY HASH(store_id) PARTITIONS 4; P.S. : If you do not include a PARTITIONS clause, the number of partitions defaults to 1. LINEAR HASH partitioningThe LINEAR HASH partitioning utilizes a linear powers-of-two algorithm, Where HASH partitioning employs the modulus of the hashing function’s value. Please find below LINEAR HASH partitioning example: CREATE TABLE store ( store_id INT NOT NULL, store_name VARCHAR(30), store_location VARCHAR(30), store_started DATE NOT NULL DEFAULT '1997-01-01', store_code INT ) PARTITION BY LINEAR HASH( YEAR(store_started) ) PARTITIONS 4; KEY partitioningKEY partitioning is very much similar to HASH, the only difference is, the hashing function for the KEY partitioning is supplied by MySQL, In case of MySQL NDB Cluster, MD5() is used, For tables using other storage engines, the MySQL server uses the storage engine specific hashing function which is based on the same algorithm as PASSWORD(). CREATE TABLE contact( id INT NOT NULL, name VARCHAR(20), contact_number INT, email VARCHAR(50), UNIQUE KEY (id) ) PARTITION BY KEY() PARTITIONS 5; P.S. – if the unique key column were not defined as NOT NULL, then the previous statement would fail. SubpartitioningSUBPARTITIONING  is also known as composite partitioning, You can partition table combining RANGE and HASH for better results, The example below explains SUBPARTITIONING better: CREATE TABLE purchase (id INT, item VARCHAR(30), purchase_date DATE) PARTITION BY RANGE( YEAR(purchase_date) ) SUBPARTITION BY HASH( TO_DAYS(purchase_date) ) SUBPARTITIONS 2 ( PARTITION p0 VALUES LESS THAN (2000), PARTITION p1 VALUES LESS THAN (2010), PARTITION p2 VALUES LESS THAN MAXVALUE ); It is also possible to define subpartitions explicitly using SUBPARTITION clauses to specify options for individual subpartitions: CREATE TABLE purchase (id INT, item VARCHAR(30), purchase_date DATE) PARTITION BY RANGE( YEAR(purchase_date) ) SUBPARTITION BY HASH( TO_DAYS(purchase_date) ) ( PARTITION p0 VALUES LESS THAN (2000) ( SUBPARTITION s0, SUBPARTITION s1 ), PARTITION p1 VALUES LESS THAN (2010) ( SUBPARTITION s2, SUBPARTITION s3 ), PARTITION p2 VALUES LESS THAN MAXVALUE ( SUBPARTITION s4, SUBPARTITION s5 ) ); Things to remember: Each partition must have the same number of subpartitions. Each SUBPARTITION clause must include (at a minimum) a name for the subpartition. Otherwise, you may set any desired option for the subpartition or allow it to assume its default setting for that option. Subpartition names must be unique across the entire table. For example, the following CREATE TABLE statement is valid in MySQL 5.7: CREATE TABLE purchase (id INT, item VARCHAR(30), purchase_date DATE) PARTITION BY RANGE( YEAR(purchase_date) ) SUBPARTITION BY HASH( TO_DAYS(purchase_date) ) ( PARTITION p0 VALUES LESS THAN (1990) ( SUBPARTITION s0, SUBPARTITION s1 ), PARTITION p1 VALUES LESS THAN (2000) ( SUBPARTITION s2, SUBPARTITION s3 ), PARTITION p2 VALUES LESS THAN MAXVALUE ( SUBPARTITION s4, SUBPARTITION s5 ) ); MySQL partitioning limitationsMySQL partitioning also has limitations, We are listing down below the limitations of MySQL partitioning: A PRIMARY KEY must include all columns in the table’s partitioning function: CREATE TABLE tab3 ( column1 INT NOT NULL, column2 DATE NOT NULL, column3 INT NOT NULL, column4 INT NOT NULL, UNIQUE KEY (column1, column2), UNIQUE KEY (column3) ) PARTITION BY HASH(column1 + column3) PARTITIONS 4; Expect this error after running above script – ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table’s partitioning function The right way of doing it: CREATE TABLE table12 ( column1 INT NOT NULL, column2 DATE NOT NULL, column3 INT NOT NULL, column4 INT NOT NULL, UNIQUE KEY (column1, column2, column3) ) PARTITION BY HASH(column3) PARTITIONS 5; CREATE TABLE table25 ( column11 INT NOT NULL, column12 DATE NOT NULL, column13 INT NOT NULL, column14 INT NOT NULL, UNIQUE KEY (column11, column13) ) PARTITION BY HASH(column11 + column13) PARTITIONS 5; Most popular limitation of MySQL – Primary key is by definition a unique key, this restriction also includes the table’s primary key, if it has one. The example below explains this limitation better: CREATE TABLE table55 ( column11 INT NOT NULL, column12 DATE NOT NULL, column13 INT NOT NULL, column14 INT NOT NULL, PRIMARY KEY(column11, column12) ) PARTITION BY HASH(column13) PARTITIONS 4; CREATE TABLE table65 ( column20 INT NOT NULL, column25 DATE NOT NULL, column30 INT NOT NULL, column35 INT NOT NULL, PRIMARY KEY(column20, column30), UNIQUE KEY(column25) ) PARTITION BY HASH( YEAR(column25) ) PARTITIONS 5; Both of the above scripts will return this error – ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table’s partitioning function The right way of doing it: CREATE TABLE t45 ( column50 INT NOT NULL, column55 DATE NOT NULL, column60 INT NOT NULL, column65 INT NOT NULL, PRIMARY KEY(column50, column55) ) PARTITION BY HASH(column50 + YEAR(column55)) PARTITIONS 5; CREATE TABLE table88 ( column80 INT NOT NULL, column81 DATE NOT NULL, column82 INT NOT NULL, column83 INT NOT NULL, PRIMARY KEY(column80, column81, column82), UNIQUE KEY(column81, column82) ); In above example, the primary key does not include all columns referenced in the partitioning expression. However, both of the statements are valid ! You can still successfully partition a MySQL table without unique keys – this also includes having no primary key and you may use any column or columns in the partitioning expression as long as the column type is compatible with the partitioning type, The example below shows partitioning a table with no unique / primary keys: CREATE TABLE table_has_no_pk (column10 INT, column11 INT, column12 varchar(20)) PARTITION BY RANGE(column10) ( PARTITION p0 VALUES LESS THAN (500), PARTITION p1 VALUES LESS THAN (600), PARTITION p2 VALUES LESS THAN (700), PARTITION p3 VALUES LESS THAN (800) ); You cannot later add a unique key to a partitioned table unless the key includes all columns used by the table’s partitioning expression, The example below explains this much better: ALTER TABLE table_has_no_pk ADD PRIMARY KEY(column10); ALTER TABLE table_has_no_pk drop primary key; ALTER TABLE table_has_no_pk ADD PRIMARY KEY(column10,column11); ALTER TABLE table_has_no_pk drop primary key; However, the next statement fails, because column10 is part of the partitioning key, but is not part of the proposed primary key: mysql> ALTER TABLE table_has_no_pk ADD PRIMARY KEY(column11); ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function mysql> MySQL partitioning limitations (at storage engine level) InnoDB InnoDB foreign keys and MySQL partitioning are not compatible, Partitioned InnoDB tables cannot have foreign key references, nor can they have columns referenced by foreign keys, So you cannot partition InnoDB tables which have or referenced by foreign keys. InnoDB does not support use of multiple disks for subpartition (MyISAM supports this feature) Use ALTER TABLE … REBUILD PARTITION and ALTER TABLE … ANALYZE PARTITION than using ALTER TABLE … OPTIMIZE PARTITION NDB storage engine  We can only partition by KEY (including LINEAR KEY) in NDB storage engine. FEDERATED storage engine  Partitioning not supported in FEDERATED storage engine. CSV storage engine Partitioning not supported in CSV storage engine. MERGE storage engine  Tables using the MERGE storage engine cannot be partitioned. Partitioned tables cannot be merged. MySQL functions shown in the following list are allowed in partitioning expressions: ABS() CEILING() DATEDIFF() DAY() DAYOFMONTH() DAYOFWEEK() DAYOFYEAR() EXTRACT() FLOOR() HOUR() MICROSECOND() MINUTE() MOD() MONTH() QUARTER() SECOND() TIME_TO_SEC() TO_DAYS() TO_SECONDS() UNIX_TIMESTAMP() WEEKDAY() YEAR() YEARWEEK() MySQL partitioning and locks  Effect on DML statements In MySQL 5.7, updating a partitioned MyISAM table cause only the affected partitioned to be locked. SELECT statements (including those containing unions or joins) lock only those partitions that actually need to be read. This also applies to SELECT …PARTITION. An UPDATE prunes locks only for tables on which no partitioning columns are updated. REPLACE and INSERT lock only those partitions having rows to be inserted or replaced. However, if an AUTO_INCREMENT value is generated for any partitioning column then all partitions are locked. INSERT … ON DUPLICATE KEY UPDATE is pruned as long as no partitioning column is updated. INSERT … SELECT locks only those partitions in the source table that need to be read, although all partitions in the target table are locked. Locks imposed by LOAD DATA statements on partitioned tables cannot be pruned. Effect on DML statements CREATE VIEW does not cause any locks. ALTER TABLE … EXCHANGE PARTITION prunes locks; only the exchanged table and the exchanged partition are locked. ALTER TABLE … TRUNCATE PARTITION prunes locks; only the partitions to be emptied are locked. In addition, ALTER TABLE statements take metadata locks on the table level. Effect on other statements LOCK TABLES cannot prune partition locks. CALL stored_procedure(expr) supports lock pruning, but evaluating expr does not. DO and SET statements do not support partitioning lock pruning.   The post What is MySQL partitioning ? appeared first on MySQL Consulting, Support and Remote DBA Services. [Less]
Posted over 5 years ago by MySQL Performance Blog
The common wisdom with database performance management is that a “cold” database server has poor performance. Then, as it “warms up”, performance improves until finally you reach a completely warmed up state with peak database performance. In other ... [More] words, that to get peak performance from MySQL you need to wait for database warm up. This thinking comes from the point of view of database cache warmup. Indeed from the cache standpoint, you start with an empty cache and over time the cache is filled with data. Moreover the longer the database runs, the more statistics about data access patterns it has, and the better it can manage database cache contents. Over recent years with the rise of SSDs, cache warmup has become less of an issue. High Performance NVMe Storage can do more than 1GB/sec read, meaning you can warm up a 100GB database cache in less than 2 minutes. Also, SSD IO latency tends to be quite good so you’re not paying as high a penalty for a higher miss rate during the warm up stage. It is not all so rosy with database performance over time. Databases tend to delay work when possible, but there is only so much delaying you can do. When the database can’t delay work any longer performance tends to be negatively impacted. Here are some examples of delaying work: Checkpointing: depending on the database technology and configuration, checkpointing may be delayed for 30 minutes or more after database start Change Buffer (Innodb) can delay index maintenance work Pushing Messages from Buffers to Leaves (TokuDB) can be delayed until space in the buffers is exhausted Compaction for RocksDB and other LSM-Tree based system can take quite a while to reach steady state In all these cases database performance can be a lot better almost immediately after start compared to when it is completely “warmed up”. An experiment with database warm up Let’s illustrate this with a little experiment running Sysbench with MySQL and Innodb storage engine for 15 minutes: sysbench --db-driver=mysql --threads=200 --rand-type=uniform --report-interval=10 --percentile=99 --time=900 --mysql-user=root --mysql-password= /usr/share/sysbench/oltp_update_index.lua --table_size=100000000 run Let’s look in detail at what happens during the run using graphs from Percona Monitoring and Management As you can see the number of updates/sec we’re doing actually gets worse (and more uneven) after the first 3 minutes, while a jump to peak performance is almost immediate The log space usage explains some of this—in the first few minutes, we did not need to do as aggressive flushing as we had to do later. On the InnoDB I/O graph we can see a couple of interesting things. First, you can see how quickly warm up happens—in 2 minutes the IO is already at half of its peak. You can also see the explanation for the little performance dip after its initial high performance (around 19:13)—this is where we got close to using all log space, so active flushing was required while, at the same time, a lot of IO was still needed for cache warmup. Reaching Steady State is another term commonly used to describe the stage after warm up completes. Note though that such steady state is not guaranteed to be steady at all. In fact, the most typical steady state is unsteady. For example, you can see in this blog post both InnoDB and MyRocks have quite a variance. Summary While the term database warm up may imply performance after warm up will be better, it is often not the case. “Reaching Steady State” is a better term as long as you understand that “steady” does not mean uniform performance.   The post When Database Warm Up is Not Really UP appeared first on Percona Database Performance Blog. [Less]
Posted over 5 years ago by Percona Community
Amazon RDS for MySQL offers the option to automate minor version upgrades using the minor version upgrade policy, a property that lets you decide if Amazon is allowed to perform the upgrades on your behalf. Usually the goal is not to upgrade ... [More] automatically every RDS instance but to keep up to date automatically non-production deployments. This helps you address engine issues as soon as possible and improve the automation of the deployment process. If your are using the AWS Command Line Interface (CLI) and you have an instance called test-rds01 it is as simple as changing [--auto-minor-version-upgrade | --no-auto-minor-version-upgrade] For example: aws rds modify-db-instance --db-instance-identifier test-rds01 --apply-immediately --auto-minor-version-upgrade true And if you use the AWS Management Console, it is just a check box.  All sorted? Unfortunately not. The main problem is that Amazon performs those upgrade only in rare circumstances. As for Amazon’s documentation: Minor version upgrades only occur automatically if a minor upgrade replaces an unsafe version, such as a minor upgrade that contains bug fixes for a previous version. In all other cases, you must modify the DB instance manually to perform a minor version upgrade. If the new version fixes any vulnerabilities that were present in the previous version, then the auto minor version upgrade will automatically take place during the next weekly maintenance window on your DB instance. In all other cases, you should manually perform the minor version upgrade. So in most scenarios, the automatic upgrade is unlikely to happen and using the auto-minor-version-upgrade  attribute is not the way to keep your MySQL running on RDS updated to the latest available minor version.How to improve automation of minor version upgrades Amazon RDS for MySQL Let’s say you want to reduce the time a newer minor version reaches your development environments or even your production ones. How can you achieve that on RDS? First of all you have to consider the delay it takes for a minor version to reach RDS that can be anything between a few weeks and a few months.  And you might even not notice that a new minor is available as it is not obvious how to be notified when it is. What is the best way to be notified of new minor versions available on RDS MySQL? In the past you could (even automatically) monitor the release notes page but the page is not anymore used for RDS. Now you have to monitor the database announcement page, something that you can hardly automate. Any way to speed up the minor version upgrades? You can use the AWS CLI invoking the describe-db-engine-versions API or write a simple Lambda function to retrieve the latest available minor version and act accordingly: you can, for example, notify your team of DBAs using Amazon Simple Notification Service (SNS) or you can automatically upgrade the instance. Let’s first see how to achieve that using the command line: aws --profile sandbox rds describe-db-engine-versions --engine 'mysql' --engine-version '5.7' --query "DBEngineVersions[-1].EngineVersion" where the -1 in the array let you filter out the very latest version of the engine available on RDS. Today the result is “5.7.21” and a simple cron job will monitor and can trigger notification for changes. Note that the same approach can be used to retrieve the latest available minor version for engines running MySQL 5.5 and MySQL 5.6. And PostgreSQL engines too. If you want to automatically and immediately upgrade your instance, the logic can be easily done in a few lines in bash with a cron on a EC2. For example, the following function requires only the database instance identifier: rds_minor_upgrade() { rds_endpoint=$1 engine_version="5.7" rds_current_minor=$(aws rds describe-db-instances --db-instance-identifier="$rds_endpoint" --query "DBInstances[].EngineVersion") rds_latest_minor=$(aws rds describe-db-engine-versions -- engine 'mysql' --engine-version $eng_version --query "DBEngineVersions[-1].EngineVersion") if [ "$rds_latest_minor" != "$rds_current_minor" ]; then aws rds modify-db-instance --apply-immediately --engine-version $rds_latest_minor --db-instance-identifier $rds_endpoint fi } Alternatively you can write the code as a scheduled Lambda function in your favourite language. For example, using the AWS node.js SDK you can manage RDS and implement the logic above using the rds.describeDBEngineVersions andrds.modifyDBInstance to achieve the same. (...) rds.describeDBEngineVersions(params, function(err, data) { (...) }); (...) var params = { DBInstanceIdentifier: 'test-rds01', ApplyImmediately: true, EngineVersion: '', (...) }; rds.modifyDBInstance(params, function(err, data) { (...) }); Speed up your minor upgrade! To summarize, Amazon Web Services does not offer a real way to automatically upgrade a RDS instance to the latest available minor in the most common scenarios, but it is very easy to achieve that by taking advantage of the AWS CLI or the many SDKs. The goal is not to upgrade automatically every deployment. You would not normally use this for production deployments. However, being able to monitor the latest available minor version on RDS and apply the changes automatically for development and staging deployment can significantly reduce the time it takes to have MySQL up to date on RDS and make your upgrade process more automated.   The post How to Automate Minor Version Upgrades for MySQL on RDS appeared first on Percona Community Blog. [Less]
Posted over 5 years ago by MySQL Performance Blog
Announcing the opening of the Percona Live Europe Open Source Database Conference 2018 in Frankfurt, Germany call for papers. It will be open from now until August 10, 2018. The conference takes place November 5–7. Our theme this year is Connect. ... [More] Accelerate. Innovate. As a speaker at Percona Live Europe, you’ll have the opportunity to CONNECT with your peers—open source database experts and enthusiasts who share your commitment to improving knowledge and exchanging ideas. ACCELERATE your projects and career by presenting at the premier open source database event, a great way to build your personal and company brands. And influence the evolution of the open source software movement by demonstrating how you INNOVATE! Community initiatives remain core to the open source ethos, and we are proud of the contribution we make with Percona Live Europe in showcasing thought leading practices in the open source database world. With a nod to innovation, for the first time, this year we are introducing a business track to benefit those business leaders who are exploring the use of open source and are interested in learning more about its costs and benefits. Speaking Opportunities The Percona Live Europe Open Source Database Conference 2018 Call for Papers is open until August 10, 2018. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Classes and talks are invited for Foundation (either entry level or of general interest to all), Core (intermediate) and Masterclass (advanced) levels. If selected, you will receive a complimentary full conference pass. Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A). Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. We encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A). Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration. Topics and Tracks We want proposals that cover the many aspects of application development using all open source databases, as well as new and interesting ways to monitor and manage database environments. Did you just embrace open source databases this year? What are the technical and business values of moving to or using open source databases? How did you convince your company to make the move? Was there tangible ROI? Best practices and current trends, including design, application development, performance optimization, HA and clustering, cloud, containers and new technologies, as well as new and interesting ways to monitor and manage database environments—what’s holding your focus? Share your case studies, experiences and technical knowledge with an engaged audience of open source peers. In the submission entry you will be asked to indicate which of these tracks your proposal best fits: tutorial, business needs; case studies/use cases; operations; or developer. A few ideas The conference committee is looking for proposals that cover the many aspects of using, deploying and managing open source databases, including: Open source – Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI? Security – All of us have experienced security challenges. Whether they are initiated by legislature (GDPR), bugs (Meltdown/Spectre), experience (external attacks) or due diligence (planning for the worst), when do you have ‘enough’ security? Are you finding that security requirements are preventing your ability to be agile? Serverless, Cloud or On-Premise – The technology landscape is no longer a simple one, and mixing infrastructures has almost become the norm. Are you designing data architectures for the new landscape, and eager to share your experience? Have microservices become an important part of your plans? MySQL – Do you have an opinion on what is new and exciting in MySQL? With the release of MySQL 8.0, are you using the latest features? How and why? Are they helping you solve any business issues, or making deployment of applications and websites easier, faster or more efficient? Did the new release get you to change to MySQL? What do you see as the biggest impact of the MySQL 8.0 release? Do you use MySQL in conjunction with other databases in your environment? MongoDB – How has the 3.6 release improved your experience in application development or time-to-market? How are the new features making your database environment better? What is it about MongoDB 4.0 that excites you? What are your experiences with Atlas? Have you moved to it, and has it lived up to its promises? Do you use MongoDB in conjunction with other databases in your environment? PostgreSQL – Why do you use PostgreSQL as opposed to other SQL options? Have you done a comparison or benchmark of PostgreSQL vs. other types of databases related to your tasks? Why and what were the results? How does PostgreSQL help you with application performance or deployment? How do you use PostgreSQL in conjunction with other databases in your environment? SQL, NewSQL, NoSQL – It’s become a perennial question without an easy answer. How do databases compare, how do you choose the right technology for the job, how do you trade off between features and their benefits in comparing databases? If you have ever tried a hybrid database approach in a single application, how did that work out? How nicely does MongoDB play with MySQL in the real world? Do you have anything to say about using SQL with NoSQL databases? High Availability – What choices are you making to ensure high availability? How do you find the balance between redundancy and cost? Are you using hot backups, and if so, what happened when you needed to rollback on them? Scalability – When did you recognize you needed to address data scale? Did your data growth take you by surprise or were you always in control? Did it take a degradation in performance to get your management to sit up and take notice? How do you plan for scale if you can’t predict demand? What the Future Holds – What do you see as the “next big thing”? What new and exciting features are going to be released? What’s in your next release? What new technologies will affect the database landscape? AI? Machine learning? Blockchain databases? Let us know about innovations you see on the way. How to respond to the call for papers For information on how to submit your proposal visit our call for papers page. The conference web pages will be updated throughout the next few weeks and bios, synopsis and slides will be published on those pages after the event. Sponsorship If you would like to obtain a sponsor pack for Percona Live Europe Open Source Database Conference 2018, you will find more information including a prospectus on our sponsorship page. You are welcome to contact me, Bronwyn Campbell, directly. The post Percona Live Europe 2018 Call for Papers is Now Open appeared first on Percona Database Performance Blog. [Less]
Posted over 5 years ago by MySQL Performance Blog
Announcing the opening of the Percona Live Europe Open Source Database Conference 2018 in Frankfurt, Germany call for papers. It will be open from now until August 10, 2018. Our theme this year is Connect. Accelerate. Innovate. As a speaker at ... [More] Percona Live Europe, you’ll have the opportunity to CONNECT with your peers—open source database experts and enthusiasts who share your commitment to improving knowledge and exchanging ideas. ACCELERATE your projects and career by presenting at the premier open source database event, a great way to build your personal and company brands. And influence the evolution of the open source software movement by demonstrating how you INNOVATE! Community initiatives remain core to the open source ethos, and we are proud of the contribution we make with Percona Live Europe in showcasing thought leading practices in the open source database world. With a nod to innovation, for the first time, this year we are introducing a business track to benefit those business leaders who are exploring the use of open source and are interested in learning more about its costs and benefits. Speaking Opportunities The Percona Live Europe Open Source Database Conference 2018 Call for Papers is open until August 10, 2018. We invite you to submit your speaking proposal for breakout, tutorial or lightning talk sessions. Classes and talks are invited for Foundation (either entry level or of general interest to all), Core (intermediate) and Masterclass (advanced) levels. If selected, you will receive a complimentary full conference pass. Breakout Session. Broadly cover a technology area using specific examples. Sessions should be either 25 minutes or 50 minutes in length (including Q&A). Tutorial Session. Present a technical session that aims for a level between a training class and a conference breakout session. We encourage attendees to bring and use laptops for working on detailed and hands-on presentations. Tutorials will be three or six hours in length (including Q&A). Lightning Talk. Give a five-minute presentation focusing on one key point that interests the open source community: technical, lighthearted or entertaining talks on new ideas, a successful project, a cautionary story, a quick tip or demonstration. Topics and Tracks We want proposals that cover the many aspects of application development using all open source databases, as well as new and interesting ways to monitor and manage database environments. Did you just embrace open source databases this year? What are the technical and business values of moving to or using open source databases? How did you convince your company to make the move? Was there tangible ROI? Best practices and current trends, including design, application development, performance optimization, HA and clustering, cloud, containers and new technologies, as well as new and interesting ways to monitor and manage database environments—what’s holding your focus? Share your case studies, experiences and technical knowledge with an engaged audience of open source peers. In the submission entry you will be asked to indicate which of these tracks your proposal best fits: tutorial, business needs; case studies/use cases; operations; or developer. A few ideas The conference committee is looking for proposals that cover the many aspects of using, deploying and managing open source databases, including: Open source – Describe the technical and business values of moving to or using open source databases. How did you convince your company to make the move? Was there tangible ROI? Security – All of us have experienced security challenges. Whether they are initiated by legislature (GDPR), bugs (Meltdown/Spectre), experience (external attacks) or due diligence (planning for the worst), when do you have ‘enough’ security? Are you finding that security requirements are preventing your ability to be agile? Serverless, Cloud or On-Premise – The technology landscape is no longer a simple one, and mixing infrastructures has almost become the norm. Are you designing data architectures for the new landscape, and eager to share your experience? Have microservices become an important part of your plans? MySQL – Do you have an opinion on what is new and exciting in MySQL? With the release of MySQL 8.0, are you using the latest features? How and why? Are they helping you solve any business issues, or making deployment of applications and websites easier, faster or more efficient? Did the new release get you to change to MySQL? What do you see as the biggest impact of the MySQL 8.0 release? Do you use MySQL in conjunction with other databases in your environment? MongoDB – How has the 3.6 release improved your experience in application development or time-to-market? How are the new features making your database environment better? What is it about MongoDB 4.0 that excites you? What are your experiences with Atlas? Have you moved to it, and has it lived up to its promises? Do you use MongoDB in conjunction with other databases in your environment? PostgreSQL – Why do you use PostgreSQL as opposed to other SQL options? Have you done a comparison or benchmark of PostgreSQL vs. other types of databases related to your tasks? Why and what were the results? How does PostgreSQL help you with application performance or deployment? How do you use PostgreSQL in conjunction with other databases in your environment? SQL, NewSQL, NoSQL – It’s become a perennial question without an easy answer. How do databases compare, how do you choose the right technology for the job, how do you trade off between features and their benefits in comparing databases? If you have ever tried a hybrid database approach in a single application, how did that work out? How nicely does MongoDB play with MySQL in the real world? Do you have anything to say about using SQL with NoSQL databases? High Availability – What choices are you making to ensure high availability? How do you find the balance between redundancy and cost? Are you using hot backups, and if so, what happened when you needed to rollback on them? Scalability – When did you recognize you needed to address data scale? Did your data growth take you by surprise or were you always in control? Did it take a degradation in performance to get your management to sit up and take notice? How do you plan for scale if you can’t predict demand? What the Future Holds – What do you see as the “next big thing”? What new and exciting features are going to be released? What’s in your next release? What new technologies will affect the database landscape? AI? Machine learning? Blockchain databases? Let us know about innovations you see on the way. How to respond to the call for papers For information on how to submit your proposal visit our call for papers page. The conference web pages will be updated throughout the next few weeks and bios, synopsis and slides will be published on those pages after the event. Sponsorship If you would like to obtain a sponsor pack for Percona Live Europe Open Source Database Conference 2018, you will find more information including a prospectus on our sponsorship page. You are welcome to contact me, Bronwyn Campbell, directly. The post Percona Live Europe 2018 Call for Papers is Now Open appeared first on Percona Database Performance Blog. [Less]