Posted
over 4 years
ago
by
Admin
I will create simple hello example using golang programming.Let’s create hello.go file into sublime editor and write below code into them. import – This help to import package as like includes or src, you can access other files or package methods
... [More]
using import. main – The main method is a golang entry point for application.The […]
The post Golang Hello Example appeared first on GolangLearn.
[Less]
|
Posted
over 4 years
ago
by
Admin
This golang tutorial help to get and set go cache.The cache is use to get faster data and improve the performance. Go Cache is a great in-memory caching package for golang that can help speed up your application. The Cache helps to bypass the parsing
... [More]
and minimizes web request to the server. I am storing […]
The post Golang Cache Example appeared first on GolangLearn.
[Less]
|
Posted
over 4 years
ago
by
Bhuvanesh R
Debezium is providing out of the box CDC solution from various databases. In my last blog post, I have published how to configure the Debezium MySQL connector. This is the next part of that post. Once we deployed the debezium, to we need some kind
... [More]
of monitoring to keep track of whats happening in the debezium connector. Luckily Debezium has its own metrics that are already integrated with the connectors. We just need to capture them using the JMX exporter agent. Here I have written how to monitor Debezium MySQL connector with Prometheus and Grafana. But the dashboard is having the basic metrics only. You can build your own dashboard for more detailed monitoring.
Reference: List of Debezium monitoring metrics
Install JMX exporter in Kafka Distributed connector:
All the connectors are managed by the Kafka connect(Distributed or standalone). In our previous blog, we used Distributed Kafka connect service. So we are going to modify the distributed service binary file.
Download the JMX exporter.
mkdir/opt/jmx/
cd /opt/jmx/
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar
mv jmx_prometheus_javaagent-0.12.0.jar jmx-exporter.jar
Create config file.
vi /opt/jmx/config.yml
startDelaySeconds: 0
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
rules:
pattern : "kafka.connect(\[^:\]+):"
name: "kafka_connect_connect_worker_metrics_$1"
pattern : "kafka.connect<>(\[^:\]+)"
name: "kafka_connect_connect_metrics_$2"
labels:
client: "$1"
pattern: "debezium.(\[^:\]+)\]+)><>RowsScanned"
name: "debezium_metrics_RowsScanned"
labels:
plugin: "$1"
name: "$3"
context: "$2"
table: "$4"
pattern: "debezium.(\[^:\]+)\]+)>(\[^:\]+)"
name: "debezium_metrics_$4"
labels:
plugin: "$1"
name: "$3"
context: "$2"
Add the JMX export to the Kafka connect binary File.
vi /usr/bin/connect-distributed
-- Find this line below export CLASSPATH
exec $(dirname $0)/kafka-run-class $EXTRA_ARGS org.apache.kafka.connect.cli.ConnectDistributed "$@"
--Replace with
exec $(dirname $0)/kafka-run-class $EXTRA_ARGS -javaagent:/opt/jmx/jmx-exporter.jar=7071:/opt/jmx/config.yml org.apache.kafka.connect.cli.ConnectDistributed "$@"
Restart the Distributed Connect Service.
systemctl restart confluent-connect-distributed
Verify the JMX Agent installation.
netstat -tulpn | grep 7071
tcp6 0 0 :::7071 :::* LISTEN 2885/java
Get the debezium metrics.
localhost:7071 | grep debezium
:-debezium_metrics_NumberOfDisconnects{context="binlog",name="mysql-db01",plugin="mysql",} 0.
You can these metrics in your browser as well.
http://ip-of-the-connector-vm:7071/metrics
Install Prometheus
Im using a separate server for Prometheus and Grafana.
Create a user for Prometheus:
sudo useradd --no-create-home --shell /bin/false prometheus
Create Directories for Prometheus:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
Download the Prometheus binary files:
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.15.0/prometheus-2.15.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.15.0.linux-amd64.tar.gz
Copy the binary files to respective locations:
cd prometheus-2.15.0.linux-amd64
cp prometheus /usr/local/bin/
cp promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
cp -r consoles /etc/prometheus
cp -r console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
Create a Prometheus config file:
vi /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
Set permission for config file:
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
Create a Prometheus systemctl file:
vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
Start the Prometheus Service:
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
Add Debezium MySQL connector metrics to Prometheus:
vi /etc/prometheus/prometheus.yml
- job_name: debezium
scrape_interval: 5s
static_configs:
- targets:
- debezium-node-ip:7071
Restart the Prometheus service:
sudo systemctl restart prometheus
Check the status:
In your browser Open the below URL.
http://IP_of-prometheus-ec2:9090/graph
Install Grafana:
wget https://dl.grafana.com/oss/release/grafana_6.5.2_amd64.deb
sudo dpkg -i grafana_6.5.2_amd64.deb
sudo systemctl daemon-reload
sudo systemctl start grafana-server
It’ll start listening to the port 3000. The default username and password admin/admin. You can change once you logged in.
http://grafana-server-ip:3000
Add the Debezium MySQL Dashboard:
This dashboard is taken from the official Debezium’s example repo. But they gave this for MSSQL Server. With some changes and fixes, we can use the same for MySQL and other databases. I made it as a template.
In grafana add the Prometheus datasource.
http://grafana-ip:3000/datasources
Click on Add Data source, select Prometheus.
Name: Prometheus
URL: localhost:9090 (I have installed grafana and Prometheus on the same server, If you have different server for Prometheus, use that IP instead of localhost).
Click on Save & Test.
You’ll get a pop-up message that its is connected.
Now go to the dashboards page and import the Template JSON.
http://grafan-ip:3000/dashboards
Click on Import button.
Copy the Template JSON file from here. Paste it or download the JSON file and choose the upload button. Now the dashboard is ready. You can see a few basic metrics.
Contribution:
Debezium is a great platform for who wants to do real-time analytics. But in terms of monitoring, still, I feel it should get more contribution. This template is just a kickstart. We can build a more detailed monitoring dashboard for the debezium connectors. Please feel free to contribute to repo. Pull requests are welcome. Lets make the debezium more powerful.
[Less]
|
Posted
over 4 years
ago
by
Bhuvanesh R
Debezium is providing out of the box CDC solution from various databases. In my last blog post, I have published how to configure the Debezium MySQL connector. This is the next part of that post. Once we deployed the debezium, to we need some kind
... [More]
of monitoring to keep track of whats happening in the debezium connector. Luckily Debezium has its own metrics that are already integrated with the connectors. We just need to capture them using the JMX exporter agent. Here I have written how to monitor Debezium MySQL connector with Prometheus and Grafana. But the dashboard is having the basic metrics only. You can build your own dashboard for more detailed monitoring.
Reference: List of Debezium monitoring metrics
Install JMX exporter in Kafka Distributed connector:
All the connectors are managed by the Kafka connect(Distributed or standalone). In our previous blog, we used Distributed Kafka connect service. So we are going to modify the distributed service binary file.
Download the JMX exporter.
mkdir/opt/jmx/
cd /opt/jmx/
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar
mv jmx_prometheus_javaagent-0.12.0.jar jmx-exporter.jar
Create config file.
vi /opt/jmx/config.yml
startDelaySeconds: 0
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
rules:
- pattern : "kafka.connect([^:]+):"
name: "kafka_connect_connect_worker_metrics_$1"
- pattern : "kafka.connect<>([^:]+)"
name: "kafka_connect_connect_metrics_$2"
labels:
client: "$1"
- pattern: "debezium.([^:]+)]+)><>RowsScanned"
name: "debezium_metrics_RowsScanned"
labels:
plugin: "$1"
name: "$3"
context: "$2"
table: "$4"
- pattern: "debezium.([^:]+)]+)>([^:]+)"
name: "debezium_metrics_$4"
labels:
plugin: "$1"
name: "$3"
context: "$2"Add the JMX export to the Kafka connect binary File.
vi /usr/bin/connect-distributed
-- Find this line below export CLASSPATH
exec $(dirname $0)/kafka-run-class $EXTRA_ARGS org.apache.kafka.connect.cli.ConnectDistributed "$@"
--Replace with
exec $(dirname $0)/kafka-run-class $EXTRA_ARGS -javaagent:/opt/jmx/jmx-exporter.jar=7071:/opt/jmx/config.yml org.apache.kafka.connect.cli.ConnectDistributed "$@"
Restart the Distributed Connect Service.
systemctl restart confluent-connect-distributed
Verify the JMX Agent installation.
netstat -tulpn | grep 7071
tcp6 0 0 :::7071 :::* LISTEN 2885/java
Get the debezium metrics.
curl localhost:7071 | grep debezium
:-debezium_metrics_NumberOfDisconnects{context="binlog",name="mysql-db01",plugin="mysql",} 0.
You can these metrics in your browser as well.
http://ip-of-the-connector-vm:7071/metrics
Install Prometheus
Im using a separate server for Prometheus and Grafana.
Create a user for Prometheus:
sudo useradd --no-create-home --shell /bin/false prometheus
Create Directories for Prometheus:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
Download the Prometheus binary files:
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.15.0/prometheus-2.15.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.15.0.linux-amd64.tar.gz
Copy the binary files to respective locations:
cd prometheus-2.15.0.linux-amd64
cp prometheus /usr/local/bin/
cp promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
cp -r consoles /etc/prometheus
cp -r console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
Create a Prometheus config file:
vi /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']Set permission for config file:
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
Create a Prometheus systemctl file:
vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
Start the Prometheus Service:
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
Add Debezium MySQL connector metrics to Prometheus:
vi /etc/prometheus/prometheus.yml
- job_name: debezium
scrape_interval: 5s
static_configs:
- targets:
- debezium-node-ip:7071Restart the Prometheus service:
sudo systemctl restart prometheus
Check the status:
In your browser Open the below URL.
http://IP_of-prometheus-ec2:9090/graph
Install Grafana:
wget https://dl.grafana.com/oss/release/grafana_6.5.2_amd64.deb
sudo dpkg -i grafana_6.5.2_amd64.deb
sudo systemctl daemon-reload
sudo systemctl start grafana-server
It’ll start listening to the port 3000. The default username and password admin/admin. You can change once you logged in.
http://grafana-server-ip:3000
Add the Debezium MySQL Dashboard:
This dashboard is taken from the official Debezium’s example repo. But they gave this for MSSQL Server. With some changes and fixes, we can use the same for MySQL and other databases. I made it as a template.
In grafana add the Prometheus datasource.
http://grafana-ip:3000/datasources
Click on Add Data source, select Prometheus.
Name: Prometheus
URL: localhost:9090 (I have installed grafana and Prometheus on the same server, If you have different server for Prometheus, use that IP instead of localhost).
Click on Save & Test.
You’ll get a pop-up message that its is connected.
Now go to the dashboards page and import the Template JSON.
http://grafan-ip:3000/dashboards
Click on Import button.
Copy the Template JSON file from here. Paste it or download the JSON file and choose the upload button. Now the dashboard is ready. You can see a few basic metrics.
Contribution:
Debezium is a great platform for who wants to do real-time analytics. But in terms of monitoring, still, I feel it should get more contribution. This template is just a kickstart. We can build a more detailed monitoring dashboard for the debezium connectors. Please feel free to contribute to repo. Pull requests are welcome. Lets make the debezium more powerful.
Debezium Series blogs:
Build Production Grade Debezium Cluster With Confluent Kafka
Monitor Debezium MySQL Connector With Prometheus And Grafana
Debezium MySQL Snapshot From Read Replica With GTID
Debezium MySQL Snapshot From Read Replica And Resume From Master
Debezium MySQL Snapshot For AWS RDS Aurora From Backup Snaphot
RealTime CDC From MySQL Using AWS MSK With Debezium
[Less]
|
Posted
over 4 years
ago
by
Bhuvanesh R
Debezium is providing out of the box CDC solution from various databases. In my last blog post, I have published how to configure the Debezium MySQL connector. This is the next part of that post. Once we deployed the debezium, to we need some kind
... [More]
of monitoring to keep track of whats happening in the debezium connector. Luckily Debezium has its own metrics that are already integrated with the connectors. We just need to capture them using the JMX exporter agent. Here I have written how to monitor Debezium MySQL connector with Prometheus and Grafana. But the dashboard is having the basic metrics only. You can build your own dashboard for more detailed monitoring.
Reference: List of Debezium monitoring metrics
Install JMX exporter in Kafka Distributed connector:
All the connectors are managed by the Kafka connect(Distributed or standalone). In our previous blog, we used Distributed Kafka connect service. So we are going to modify the distributed service binary file.
Download the JMX exporter.
mkdir/opt/jmx/
cd /opt/jmx/
wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.12.0/jmx_prometheus_javaagent-0.12.0.jar
mv jmx_prometheus_javaagent-0.12.0.jar jmx-exporter.jar
Create config file.
vi /opt/jmx/config.yml
startDelaySeconds: 0
ssl: false
lowercaseOutputName: false
lowercaseOutputLabelNames: false
rules:
- pattern : "kafka.connect([^:]+):"
name: "kafka_connect_connect_worker_metrics_$1"
- pattern : "kafka.connect<>([^:]+)"
name: "kafka_connect_connect_metrics_$2"
labels:
client: "$1"
- pattern: "debezium.([^:]+)]+)><>RowsScanned"
name: "debezium_metrics_RowsScanned"
labels:
plugin: "$1"
name: "$3"
context: "$2"
table: "$4"
- pattern: "debezium.([^:]+)]+)>([^:]+)"
name: "debezium_metrics_$4"
labels:
plugin: "$1"
name: "$3"
context: "$2"Add the JMX export to the Kafka connect binary File.
vi /usr/bin/connect-distributed
-- Find this line below export CLASSPATH
exec $(dirname $0)/kafka-run-class $EXTRA_ARGS org.apache.kafka.connect.cli.ConnectDistributed "$@"
--Replace with
exec $(dirname $0)/kafka-run-class $EXTRA_ARGS -javaagent:/opt/jmx/jmx-exporter.jar=7071:/opt/jmx/config.yml org.apache.kafka.connect.cli.ConnectDistributed "$@"
Restart the Distributed Connect Service.
systemctl restart confluent-connect-distributed
Verify the JMX Agent installation.
netstat -tulpn | grep 7071
tcp6 0 0 :::7071 :::* LISTEN 2885/java
Get the debezium metrics.
curl localhost:7071 | grep debezium
:-debezium_metrics_NumberOfDisconnects{context="binlog",name="mysql-db01",plugin="mysql",} 0.
You can these metrics in your browser as well.
http://ip-of-the-connector-vm:7071/metrics
Install Prometheus
Im using a separate server for Prometheus and Grafana.
Create a user for Prometheus:
sudo useradd --no-create-home --shell /bin/false prometheus
Create Directories for Prometheus:
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
Download the Prometheus binary files:
cd /tmp
wget https://github.com/prometheus/prometheus/releases/download/v2.15.0/prometheus-2.15.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.15.0.linux-amd64.tar.gz
Copy the binary files to respective locations:
cd prometheus-2.15.0.linux-amd64
cp prometheus /usr/local/bin/
cp promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
cp -r consoles /etc/prometheus
cp -r console_libraries /etc/prometheus
sudo chown -R prometheus:prometheus /etc/prometheus/consoles
sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
Create a Prometheus config file:
vi /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']Set permission for config file:
sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
Create a Prometheus systemctl file:
vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries
[Install]
WantedBy=multi-user.target
Start the Prometheus Service:
sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus
Add Debezium MySQL connector metrics to Prometheus:
vi /etc/prometheus/prometheus.yml
- job_name: debezium
scrape_interval: 5s
static_configs:
- targets:
- debezium-node-ip:7071Restart the Prometheus service:
sudo systemctl restart prometheus
Check the status:
In your browser Open the below URL.
http://IP_of-prometheus-ec2:9090/graph
Install Grafana:
wget https://dl.grafana.com/oss/release/grafana_6.5.2_amd64.deb
sudo dpkg -i grafana_6.5.2_amd64.deb
sudo systemctl daemon-reload
sudo systemctl start grafana-server
It’ll start listening to the port 3000. The default username and password admin/admin. You can change once you logged in.
http://grafana-server-ip:3000
Add the Debezium MySQL Dashboard:
This dashboard is taken from the official Debezium’s example repo. But they gave this for MSSQL Server. With some changes and fixes, we can use the same for MySQL and other databases. I made it as a template.
In grafana add the Prometheus datasource.
http://grafana-ip:3000/datasources
Click on Add Data source, select Prometheus.
Name: Prometheus
URL: localhost:9090 (I have installed grafana and Prometheus on the same server, If you have different server for Prometheus, use that IP instead of localhost).
Click on Save & Test.
You’ll get a pop-up message that its is connected.
Now go to the dashboards page and import the Template JSON.
http://grafan-ip:3000/dashboards
Click on Import button.
Copy the Template JSON file from here. Paste it or download the JSON file and choose the upload button. Now the dashboard is ready. You can see a few basic metrics.
Contribution:
Debezium is a great platform for who wants to do real-time analytics. But in terms of monitoring, still, I feel it should get more contribution. This template is just a kickstart. We can build a more detailed monitoring dashboard for the debezium connectors. Please feel free to contribute to repo. Pull requests are welcome. Lets make the debezium more powerful.
[Less]
|
Posted
over 4 years
ago
by
MySQL Performance Blog
While we’ve had MySQL Group Replication support in ProxySQL since version 1.3 (native as of v1.4), development has continued in subsequent versions. I’d like to describe a scenario of how latency can affect ProxySQL in a MySQL Group Replication
... [More]
environment, and outline a few new features that might help mitigate those issues. Before we dive into the specifics of the discussion, however, let’s take a quick overview of ProxySQL and Group Replication for those who may not be familiar.
MySQL Group Replication
Similar in functionality to Percona XtraDB Cluster or Galera, MySQL Group Replication is the only synchronous native HA solution for MySQL*. With built-in automatic distributed recovery, conflict detection, and group membership, MySQL GR provides a completely native HA solution for MySQL environments.
ProxySQL
A high performance, high availability, protocol aware proxy for MySQL. It allows the shaping of database traffic by delaying, caching or rewriting queries on the fly. ProxySQL can also be used to create an environment where failovers will not affect your application, automatically removing (and adding back) database nodes from a cluster based on definable thresholds.
* There is technically one other native HA solution from Oracle – MySQL NDB Cluster. However, it is outside the scope of this article and not for most general use cases.
Test Case
I recently had an interesting case with a client who was having severe issues with latency due to network/storage stalls at the hypervisor level. The environment is fairly standard, with a single MySQL 8.x GR writer node, and two MySQL 8.x GR passive nodes. In front of the database cluster sits ProxySQL, routing traffic to the active writer and handling failover duties should one of the database nodes become unavailable. The latency always occurred in short spikes, ramping up and then falling off quickly (within seconds).
The latency and I/O stalls from the network/hypervisor were throwing ProxySQL a curveball in determining if a node was actually healthy or not, and the client was seeing frequent failovers of the active writer node – often multiple times per day. To dive a bit deeper into this, let’s examine how ProxySQL determines a node’s health at a high level.
PING
mysql-monitor_ping_timeout
Issued on open connection.
SELECT
mysql-monitor_groupreplication_healthcheck_timeout
Gets the number of transactions a node is behind and identifies which node is the writer.
CONNECT
mysql-monitor_ping_timeout
Will try to open new connections to the host and measure timing.
In a perfect environment, these checks work as intended, and if a node is not reachable, or has fallen too far behind, ProxySQL is able to determine that and remove the node from the cluster. This is known as a hard_offline in ProxySQL, and means the node is removed from the routing table and all traffic to that node stops. If that node is the writer node, ProxySQL will then tee up one of the passive nodes as the active writer, and the failover is complete.
Many of the ProxySQL health checks have multiple variables to control the timeout behavior. For instance, mysql-monitor_ping_timeout sets the maximum timeout for a MySQL node to be unresponsive to a ping, and mysql-monitor_ping_max_failures set up how many times a MySQL node would have to fail a ping check before ProxySQL decides to mark it hard_offline and pull the node out of the cluster.
This wasn’t the case for the Group Replication specific ping checks, however. Prior to version 2.0.7, the options were more limited for Group Replication checks. Note we did not have the same max_failures for Group Replication that we had for standalone MySQL, and we only had the timeout check:
mysql-monitor_groupreplication_healthcheck_timeout
Added in version 2.0.7 was a new variable, giving us the ability to retry multiple times before marking a GR node hard_offline:
mysql-monitor_groupreplication_healthcheck_max_timeout_count
By setting this variable it is possible to have the group replication health check fail a configurable number of times before pulling a node out of the cluster. While this is certainly more of a Band-Aid than an actual resolution, it would allow keeping a ProxySQL + GR environment up and running while work is being done to find the root cause of latency and prevent unnecessary flapping between active and passive nodes during short latency spikes and I/O stalls.
Another similar option is currently being implemented in ProxySQL 2.0.9 for the transactions_behind check. See below:
mysql-monitor_groupreplication_max_transactions_behind
Currently, if group replication max_transactions_behind exceeds the threshold once, the node is evicted from the cluster. The upcoming 2.0.9 release features another additional variable which will define a count for such checks so that max_transactions_behind would have to fail more than once (x number of times) before eviction.
mysql-monitor_groupreplication_max_transactions_behind_count
In Summary
To be clear, the above settings will not fix any latency issues present in your environment. However, since latency can often be a hardware or network issue, and in many cases can take time to track down, these options may stabilize the environment by allowing you to relax ProxySQL’s health checks while the root cause investigation for the latency is underway.
[Less]
|
Posted
over 4 years
ago
by
MySQL Performance Blog
In an earlier post, I discussed the Shared Responsibility Model in the cloud and how it relates to databases. With either IaaS or DBaaS deployments, much of the operational and security burden is shifted away from the DBA to the cloud provider. I
... [More]
also noted that regardless of the deployment method, there is always a need for a DBA. In both cloud deployment models, notice the top user responsibility: customer data.
Let’s review the major tasks of a DBA in the cloud and how that role differs between and IaaS and DBaaS deployment.
DBA Responsibility in the Cloud
Application/Database
With the burden of hardware, OS, and physical security in the cloud, the focus is shifted to the application data and performance. From the application perspective, here are the top areas of focus:
Schema design and review
Ensuring optimal data types, indexing, etc
Performance tuning
Queries, system variables
Data archiving
Proactive optimization
While these aspects should always be the responsibility of the DBA, they are often overshadowed by operational tasks. Moving to the cloud allows DBAs to get back to what they should be focused on – the data in the database.
Access
Traditional DBAs were generally tasked with ensuring properly limited access to customer data. In most cases, this is done with some combination of firewalls and user grants. When moving to a cloud deployment, this will remain a responsibility of the DBA.
The cloud provider can provide you with the tools to manage the firewall (i.e. security groups) and within MySQL, you are still required to manage user grants. Note that this is NO DIFFERENT when compared to a traditional on-premise deployment.
Monitoring / Alerting
Finally, proper monitoring and alerting is the responsibility of the DBA. As a best practice, it is advised to capture metrics on everything and alert on as few metrics as needed. This is done to ensure proper trending can be reviewed (capture everything) while not overwhelming the pager with unactionable alerts (minimal alerts).
Some of this metric data is provided by the cloud provider via various monitor portals (i.e. CloudWatch). However, it is up to the DBA to determine the proper thresholds and alerts. This can only be properly achieved after a thoughtful review of all the collected historical metrics. Once baselines are achieved, then proper alerts are able to set up.
I would also note that regardless of the deployment method (IaaS or DBaaS), a tool such as Percona Monitoring and Management (PMM) can be invaluable here.
Cost Control
One of the benefits of the cloud is the elasticity and ease of launching new resources. This can also lead to quite a headache in the finance department. The DBA should always be reviewing the systems to ensure:
You aren’t paying for unused resources
You are properly leveraging Reserved Instances where possible
Systems are tied to the proper teams for billing (i.e. tags, etc)
Understanding your systems and keeping them right-sized is an important role of the DBA. Along with right-sizing your instances, proper capacity planning is also critical in controlling cost.
In a survey of our users, 41% said they had to upgrade 5 times (or more) in the last 2 years, with the cost of the excess moves resulting in a 10x cost increase. Having the time to properly review your data growth patterns is critical when planning for future growth.
DBA Responsibility in IaaS
When considering an IaaS deployment, there are additional tasks that need to be managed by a DBA. Along with managing the customer data and access, here are some additional tasks needed in an IaaS environment:
Managing backups (verification, restoration, retention, etc)
Managing high availability
Patching the guest OS
Installing / updating MySQL
Verifying DR solution
These responsibilities aren’t unique to an IaaS deployment and generally mirror a traditional DBA (minus the hardware support). Automation is key to a successful IaaS deployment and having a DBA that truly understands the data access patterns and performance is critical.
Conclusion
Overall, the need for a DBA doesn’t go away when moving to a cloud environment (even when looking at a DBaaS deployment as noted by AWS). The benefit of the cloud does not lie in eliminating the position of a DBA, but rather allowing the DBA to focus on what is most important to your organization: the data. By removing the operational headaches and burdens from the team, you free up time to ensure the system is running at the optimal level.
Contact Percona today to see how we can help your team if you are moving or considering a move to the cloud. We have experts in both databases and cloud deployments that can advise and help in all phases of migration. How can we help you?
Companies are increasingly embracing database automation and the advantages offered by the cloud. Our new white paper discusses common database scenarios and the true cost of downtime to your business, including the potential losses that companies can incur without a well-configured database and infrastructure setup.
Download “The Hidden Costs of Not Properly Managing Your Databases”
[Less]
|
Posted
over 4 years
ago
by
Frederic Descamps
As you could read in this previous post, PHP 7.4 is now completely supporting MySQL 8.0 and the new default authentication plugin.
I wanted to make a summary table providing and overview of all PHP versions and how they supported MySQL 8.0 and
... [More]
the different authentication plugins.
As I am a RPM based distribution user, I’m using the famous repository of remi since a lot of years, and I use it then also to install PHP 7.4.0 and 7.4.1…
I created a new user to test to connect with PHP and then… I was surprised to see that I could not connect to MySQL using caching_sha2_password. Of course I tried to see if my credentials were correct using the MySQL client… and I could connect… Then I tried again my PHP script and new surprise, I could connect !?!
I could connect because the password was cached. If I run FLUSH PRIVILEGES, then the PHP script could not connect anymore.
The error was:
Trying with caching_sha2_password....
PHP Warning: mysqli::__construct(): (HY000/1045): Access denied for
user 'fred_secure'@'mysql-dc1-2' (using password: YES)
I discussed this with my colleagues. They tried ith the same PHP version and they could not reproduce my error… but they were using Ubuntu.
What’s wrong ?
So I decided to compile from scratch PHP 7.4 on my CentOS 8 box… and… it worked as expected !
After a lot of debugging, testing many openSSL versions and compilation more than 10 times PHP… I was able to find the difference and compile a rpm based on Remi‘s spec file.
The problem was in mysqli.so.
I don’t explain yet why this is a problem, and I already reported this to my colleagues, but the difference between Ubuntu packages and my compiled from scratch version and the one installed from Remi’s repo, is the absence of value for mysqli.default_socket:
mysqli.default_socket => no value => no value
So, I’ve rebuild Remi’s package removing --with-mysql-sock=%{mysql_sock} \ and it worked !
I will now wait for feedback from the developers to understand the reason and see if this is a bug. However, if you want already to use PHP 7.4.1 and MySQL 8.0 on any RedHat based distribution, you will need to have a new php74-php-mysqlnd package.
You can download this one for el8 (RedHat, Oracle Linux and CentOS):
php74-php-mysqlnd-7.4.1-2.el8.remi.x86_64.rpmDownload
The package is built in way that you don’t need to update all PHP 7.4 packages, but only the mysqlnd one, like this:
rpm -Uvh php74-php-mysqlnd-7.4.1-2.el8.remi.x86_64.rpm
Verifying… ################# [100%]
Preparing… ################# [100%]
Updating / installing…
1:php74-php-mysqlnd-7.4.1-2.el8.rem################# [ 50%]
Cleaning up / removing…
2:php74-php-mysqlnd-7.4.1-1.el8.rem################# [100%]
I hope this can help you if you faced some authentication issue with PHP 7.4 and MySQL 8.0.
[Less]
|
Posted
over 4 years
ago
by
Searce Engineering
Credit: AWSCDC is becoming more popular nowadays. Many organizations that want to build a realtime/near realtime data pipe and reports are using the CDC as a backbone to powering their real-time reports. Debezium is an opensource product from RedHat
... [More]
and it supports multiple databases (both SQL and NoSQL). Apache Kafka is the core of this.
Managing and scaling Kafka clusters is not easy for everyone. If you are using AWS for your infra then let AWS manage the cluster. AWS has MSK is a managed Kafka service. We are going to configure Debezium with AWS MSK.
Configuration File:
If you are already worked with AWS MSK, then you might be familiar with this configuration file. This is similar to the RDS Parameter group but here you need to upload a with your parameter name and its value. If you are using MSK for the first time, then it’ll make you a bit confused. No worries, I’ll give you the steps to do this. You can change this configuration file even after you launched the cluster. But it's a good practice to create the configuration file before launching the cluster. Unfortunately AWS CLI is the only way to create the configuration file.
1. Create a conf file:
create a file in your Desktop or somewhere with the following parameters. For Debezium auto-create topic parameter is required. So I’ll use only this one for now. You can add more parameters if you want. Copy and Paste the below content to your conf file called kafka-custom-conf
If you didn’t enable the auto topic creation, then you’ll see the following error.
Dec 20 18:32:19 ip-172-31-44-220 connect-distributed[23563]: [2019-12-20 18:32:19,845] WARN [Producer clientId=connector-producer-mysql-connector-db01-0] Error while fetching metadata with correlation id 294 : {mysql-db01=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient:1051)
kafka-custom-conf file:
auto.create.topics.enable = truezookeeper.connection.timeout.ms = 1000
2. Upload the conf file to AWS
Install AWS CLI on your workstation and run the following command.
NOTE: Im going to use Kafka Version 2.3.1, if you are going to use different Kafka version then change the --kafka-versions value.
aws kafka create-configuration --name "kafka-231-custom-conf" --description "Example configuration description." --kafka-versions "2.3.1" --server-properties file://C:\Users\rbhuv\Desktop\kafka-custom-conf
Once you ran the command it’ll give you the following output.
{ "Arn": "arn:aws:kafka:us-east-1:0000111222333:configuration/kafka-231-custom-conf/6061ca2d-10b7-46c6-81c0-7fae1b208452-7", "CreationTime": "2019-12-20T18:38:17.103000+00:00", "LatestRevision": { "CreationTime": "2019-12-20T18:38:17.103000+00:00", "Description": "Example configuration description.", "Revision": 1 }, "Name": "kafka-231-custom-conf"}
3. (Optional) Update existing cluster with this conf file
If you are going to create a new cluster then ignore this step.
Note: if you forgot to take a note of the configuration ARN(from step2), then you can get it from cli aws kafka list-configurations
You need the version for Kafka(its, not Kafka software version, its giving to your cluster by AWS). Run aws kafka list-clusters this will give you the value for a current version like this “CurrentVersion”: “K2EUQ1WTGCTBG2”
Create a configuration info file called configuration-info.jsonwhich contains the ARN of your new conf file.
{ "Arn": "arn:aws:kafka:us-east-1:0000111222333:configuration/kafka-231-custom-conf/6061ca2d-10b7-46c6-81c0-7fae1b208452-7", "Revision": 1}
Now run the below command to update your Kafka cluster configuration file with the new file.
aws kafka update-cluster-configuration --cluster-arn "arn:aws:kafka:us-east-1:0000111222333:cluster/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7" --configuration-info file://C:\Users\rbhuv\Desktop\configuration-info.json --current-version "K2EUQ1WTGCTBG2"
This will give you the following output.
{ "ClusterArn": "arn:aws:kafka:us-east-1:0000111222333:cluster/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7", "ClusterOperationArn": "arn:aws:kafka:us-east-1:0000111222333:cluster-operation/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7/519396ad-1df2-46aa-8858-ba2c49f06c3c"}
Launching the AWS MSK Cluster:
This MSK launch console is very easy and you can select the options as you need. I'm just giving you a few options where you need to focus.
Under the configuration choose Use a custom configuration supporting Apache Kafka 2.2.1
Then you can see the conf file which you created.
Under the encryption, if you are not going to use TLS then select Both TLS encrypted and plaintext traffic allowed In this blog, I'm not using any TLS connections from the connector to Kafka.
The rest of the options are straight forward, you can select them as your requirement.
It’ll take 20 to 25mins to create the cluster.
Click on the cluster name and it’ll take you the details page of the cluster. In the Top Right select View client information. There you can see the Kafka bootstrap servers endpoints and Zookeeper endpoints.
Setup Debezium MySQL Connector on EC2:
Install Java and Confluent Connector binaries:
apt-get update sudo apt-get install default-jre
wget -qO - https://packages.confluent.io/deb/5.3/archive.key | sudo apt-key add -sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.3 stable main"sudo apt-get update && sudo apt-get install confluent-hub-client confluent-common confluent-kafka-2.12
Configure the Distributed connector properties:
Bootstrap Servers — Copy the Plaintext value from the MSK client information.
replication.factor — it should be >1, else you’ll get the following error. (Refer here)
Dec 20 11:42:36 ip-172-31-44-220 connect-distributed[2630]: [2019-12-20 11:42:36,290] WARN [Producer clientId=producer-3] Got error produce response with correlation id 844 on topic-partition connect-configs-0, retrying (2147482809 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:637)
/etc/kafka/connect-distributed.properties file:
vi /etc/kafka/connect-distributed.properties
bootstrap.servers=b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092group.id=debezium-clusteroffset.storage.replication.factor=2config.storage.replication.factor=2status.storage.replication.factor=2plugin.path=/usr/share/java,/usr/share/confluent-hub-components
Install Debezium MySQL connector and S3 connector:
confluent-hub install debezium/debezium-connector-mysql:latestconfluent-hub install confluentinc/kafka-connect-s3:latest
Start the connector service:
You can run your confluent connector application via systemctl.
vi /lib/systemd/system/confluent-connect-distributed.service
[Unit]Description=Apache Kafka - connect-distributedDocumentation=http://docs.confluent.io/After=network.target
[Service]Type=simpleUser=cp-kafkaGroup=confluentExecStart=/usr/bin/connect-distributed /etc/kafka/connect-distributed.propertiesTimeoutStopSec=180Restart=no
[Install]WantedBy=multi-user.target
Start the service
systemctl enable confluent-connect-distributedsystemctl start confluent-connect-distributed
Configure MySQL Connector:
Note: Before configuring MySQL connector, make sure you have enabled binlog and the MySQL port should be accessible from the Debezium EC2. Also a MySQL User with respective permissions. (refer the Debezium docs).
Create a file mysql.json (this is my example conf file, you can refer Debezium docs for the meaning of these parameters)
Note: From line number 14 onwards I have added some filters to bring only the new data from MySQL to my consumer app. By default, Debezium adds some metadata info along with the MySQL Data, but I don’t want them) and make sure bootstrap servers and MySQL credentials are correct.
{ "name": "mysql-connector-db01", "config": { "name": "mysql-connector-db01", "connector.class": "io.debezium.connector.mysql.MySqlConnector", "database.server.id": "1", "tasks.max": "3", "database.history.kafka.bootstrap.servers": "b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092", "database.history.kafka.topic": "schema-changes.mysql", "database.server.name": "mysql-db01", "database.hostname": "172.31.84.129", "database.port": "3306", "database.user": "bhuvi", "database.password": "your_strong_pass", "database.whitelist": "bhuvi,new,test", "internal.key.converter.schemas.enable": "false", "transforms.unwrap.add.source.fields": "ts_ms", "key.converter.schemas.enable": "false", "internal.key.converter": "org.apache.kafka.connect.json.JsonConverter", "internal.value.converter.schemas.enable": "false", "value.converter.schemas.enable": "false", "internal.value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "transforms": "unwrap", "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState" }}
“database.history.kafka.bootstrap.servers” — Kafka Servers IP.
“database.whitelist” — List of databases to get the CDC.
key.converter and value.converter and transforms parameters — By default Debezium output will have more detailed information. But I don’t want all of that information. I'm only interested in to get the new row and the timestamp when its inserted.
If you don’t want to customize anything then just remove everything after the database.whitelist
Register the MySQL Connector:
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors -d @mysql.json
Check the status:
curl GET localhost:8083/connectors/mysql-connector-db01/status
{ "name": "mysql-connector-db01", "connector": { "state": "RUNNING", "worker_id": "172.31.44.151:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "172.31.44.151:8083" } ], "type": "source"}
Test the MySQL Consumer:
Now insert something into any tables in proddb or test (because we have whilelisted only these databases to capture the CDC.
use test; create table rohi (id int, fn varchar(10), ln varchar(10), phone int );
insert into rohi values (2, 'rohit', 'ayare','87611');
We can get these values from the Kafker brokers. Listen to the below topic:
mysql-db01.test.rohiThis is the combination of servername.databasename.tablenameservername(you mentioned this in as a server name in mysql json file).
kafka-console-consumer --bootstrap-server b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092 --topic mysql-db01.test.rohi --from-beginning
{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}
It’ll start copying the historical data and start capturing real-time CDC.
Setup S3 Sink connector in All Producer Nodes:
I want to send this data to the S3 bucket. Make sure the Debezium VM is attached with an IAM role that has S3 access to write. So you must have an EC2 IAM role which has access to the target S3 bucket. Or install awscli and configure access and secret key(but its not recommended)
Create s3.json file.
{ "name": "s3-sink-db01", "config": { "connector.class": "io.confluent.connect.s3.S3SinkConnector", "storage.class": "io.confluent.connect.s3.storage.S3Storage", "s3.bucket.name": "searce-00000", "name": "s3-sink-db01", "tasks.max": "3", "s3.region": "us-east-1", "s3.part.size": "5242880", "s3.compression.type": "gzip", "timezone": "UTC", "locale": "en", "flush.size": "10000", "rotate.interval.ms": "3600000", "topics.regex": "mysql-db01.(.*)", "internal.key.converter.schemas.enable": "false", "key.converter.schemas.enable": "false", "internal.key.converter": "org.apache.kafka.connect.json.JsonConverter", "format.class": "io.confluent.connect.s3.format.json.JsonFormat", "internal.value.converter.schemas.enable": "false", "value.converter.schemas.enable": "false", "internal.value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner", "path.format": "YYYY/MM/dd/HH", "partition.duration.ms": "3600000", "rotate.schedule.interval.ms": "3600000" }}
"topics.regex": "mysql-db01" - It’ll send the data only from the topics which have mysql-db01 as a prefix. In our case, all the MySQL databases related topics will start with this prefix.
"flush.size" - The data will be uploaded to S3 only after these many number of records stored. Or after "rotate.schedule.interval.ms" this duration.
Register this S3 sink connector:
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors -d @s3.json
Check the Status:
curl GET localhost:8083/connectors/s3-sink-db01/status |jq{ "name": "s3-sink-db01", "connector": { "state": "RUNNING", "worker_id": "172.31.44.151:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "172.31.44.151:8083" }, { "id": 1, "state": "RUNNING", "worker_id": "172.31.44.151:8083" }, { "id": 2, "state": "RUNNING", "worker_id": "172.31.44.151:8083" } ], "type": "sink"}
Test the S3 sync:
Insert the 10000 rows into the rohi table. Then check the S3 bucket. It’ll save the data in JSON format with GZIP compression. Also in an HOUR wise partitions.
Monitoring the Debezium MySQL Connector:
Monitor Debezium MySQL Connector With Prometheus And Grafana
Conclusion:
The MySQL and S3 config files are just referenced and we are using it. If you want more customization or you need any help in understanding the parameter, please refer to the Debezium documentation. Also, in this example blog, I'm doing S3 upload with a micro-batch(every 1hr or 10000 rows added/modified) If you want real-time then modify the config file accordingly.
If you want to do the same setup with Kafka in EC2 instead of AWS MSK please refer to the following link.
Build Production Grade Debezium Cluster With Confluent Kafka
RealTime CDC From MySQL Using AWS MSK With Debezium was originally published in Searce Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.
[Less]
|
Posted
over 4 years
ago
by
Searce Engineering
Credit: AWSCDC is becoming more popular nowadays. Many organizations that want to build a realtime/near realtime data pipe and reports are using the CDC as a backbone to powering their real-time reports. Debezium is an opensource product from RedHat
... [More]
and it supports multiple databases (both SQL and NoSQL). Apache Kafka is the core of this.
Managing and scaling Kafka clusters is not easy for everyone. If you are using AWS for your infra then let AWS manage the cluster. AWS has MSK is a managed Kafka service. We are going to configure Debezium with AWS MSK.
Configuration File:
If you are already worked with AWS MSK, then you might be familiar with this configuration file. This is similar to the RDS Parameter group but here you need to upload a with your parameter name and its value. If you are using MSK for the first time, then it’ll make you a bit confused. No worries, I’ll give you the steps to do this. You can change this configuration file even after you launched the cluster. But it's a good practice to create the configuration file before launching the cluster. Unfortunately AWS CLI is the only way to create the configuration file.
1. Create a conf file:
create a file in your Desktop or somewhere with the following parameters. For Debezium auto-create topic parameter is required. So I’ll use only this one for now. You can add more parameters if you want. Copy and Paste the below content to your conf file called kafka-custom-conf
If you didn’t enable the auto topic creation, then you’ll see the following error.
Dec 20 18:32:19 ip-172-31-44-220 connect-distributed[23563]: [2019-12-20 18:32:19,845] WARN [Producer clientId=connector-producer-mysql-connector-db01-0] Error while fetching metadata with correlation id 294 : {mysql-db01=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient:1051)
kafka-custom-conf file:
auto.create.topics.enable = truezookeeper.connection.timeout.ms = 1000
2. Upload the conf file to AWS
Install AWS CLI on your workstation and run the following command.
NOTE: Im going to use Kafka Version 2.3.1, if you are going to use different Kafka version then change the --kafka-versions value.
aws kafka create-configuration --name "kafka-231-custom-conf" --description "Example configuration description." --kafka-versions "2.3.1" --server-properties file://C:\Users\rbhuv\Desktop\kafka-custom-conf
Once you ran the command it’ll give you the following output.
{ "Arn": "arn:aws:kafka:us-east-1:0000111222333:configuration/kafka-231-custom-conf/6061ca2d-10b7-46c6-81c0-7fae1b208452-7", "CreationTime": "2019-12-20T18:38:17.103000+00:00", "LatestRevision": { "CreationTime": "2019-12-20T18:38:17.103000+00:00", "Description": "Example configuration description.", "Revision": 1 }, "Name": "kafka-231-custom-conf"}
3. (Optional) Update existing cluster with this conf file
If you are going to create a new cluster then ignore this step.
Note: if you forgot to take a note of the configuration ARN(from step2), then you can get it from cli aws kafka list-configurations
You need the version for Kafka(its, not Kafka software version, its giving to your cluster by AWS). Run aws kafka list-clusters this will give you the value for a current version like this “CurrentVersion”: “K2EUQ1WTGCTBG2”
Create a configuration info file called configuration-info.jsonwhich contains the ARN of your new conf file.
{ "Arn": "arn:aws:kafka:us-east-1:0000111222333:configuration/kafka-231-custom-conf/6061ca2d-10b7-46c6-81c0-7fae1b208452-7", "Revision": 1}
Now run the below command to update your Kafka cluster configuration file with the new file.
aws kafka update-cluster-configuration --cluster-arn "arn:aws:kafka:us-east-1:0000111222333:cluster/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7" --configuration-info file://C:\Users\rbhuv\Desktop\configuration-info.json --current-version "K2EUQ1WTGCTBG2"
This will give you the following output.
{ "ClusterArn": "arn:aws:kafka:us-east-1:0000111222333:cluster/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7", "ClusterOperationArn": "arn:aws:kafka:us-east-1:0000111222333:cluster-operation/searce-bigdata/599c6202-ec40-455a-afa8-d7c5916d7bc2-7/519396ad-1df2-46aa-8858-ba2c49f06c3c"}
Launching the AWS MSK Cluster:
This MSK launch console is very easy and you can select the options as you need. I'm just giving you a few options where you need to focus.
Under the configuration choose Use a custom configuration supporting Apache Kafka 2.2.1
Then you can see the conf file which you created.
Under the encryption, if you are not going to use TLS then select Both TLS encrypted and plaintext traffic allowed In this blog, I'm not using any TLS connections from the connector to Kafka.
The rest of the options are straight forward, you can select them as your requirement.
It’ll take 20 to 25mins to create the cluster.
Click on the cluster name and it’ll take you the details page of the cluster. In the Top Right select View client information. There you can see the Kafka bootstrap servers endpoints and Zookeeper endpoints.
Setup Debezium MySQL Connector on EC2:
Install Java and Confluent Connector binaries:
apt-get update sudo apt-get install default-jre
wget -qO - https://packages.confluent.io/deb/5.3/archive.key | sudo apt-key add -sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.3 stable main"sudo apt-get update && sudo apt-get install confluent-hub-client confluent-common confluent-kafka-2.12
Configure the Distributed connector properties:
Bootstrap Servers — Copy the Plaintext value from the MSK client information.
replication.factor — it should be >1, else you’ll get the following error. (Refer here)
Dec 20 11:42:36 ip-172-31-44-220 connect-distributed[2630]: [2019-12-20 11:42:36,290] WARN [Producer clientId=producer-3] Got error produce response with correlation id 844 on topic-partition connect-configs-0, retrying (2147482809 attempts left). Error: NOT_ENOUGH_REPLICAS (org.apache.kafka.clients.producer.internals.Sender:637)
/etc/kafka/connect-distributed.properties file:
vi /etc/kafka/connect-distributed.properties
bootstrap.servers=b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092group.id=debezium-clusteroffset.storage.replication.factor=2config.storage.replication.factor=2status.storage.replication.factor=2plugin.path=/usr/share/java,/usr/share/confluent-hub-components
Install Debezium MySQL connector and S3 connector:
confluent-hub install debezium/debezium-connector-mysql:latestconfluent-hub install confluentinc/kafka-connect-s3:latest
Start the connector service:
You can run your confluent connector application via systemctl.
vi /lib/systemd/system/confluent-connect-distributed.service
[Unit]Description=Apache Kafka - connect-distributedDocumentation=http://docs.confluent.io/After=network.target
[Service]Type=simpleUser=cp-kafkaGroup=confluentExecStart=/usr/bin/connect-distributed /etc/kafka/connect-distributed.propertiesTimeoutStopSec=180Restart=no
[Install]WantedBy=multi-user.target
Start the service
systemctl enable confluent-connect-distributedsystemctl start confluent-connect-distributed
Configure MySQL Connector:
Note: Before configuring MySQL connector, make sure you have enabled binlog and the MySQL port should be accessible from the Debezium EC2. Also a MySQL User with respective permissions. (refer the Debezium docs).
Create a file mysql.json (this is my example conf file, you can refer Debezium docs for the meaning of these parameters)
Note: From line number 14 onwards I have added some filters to bring only the new data from MySQL to my consumer app. By default, Debezium adds some metadata info along with the MySQL Data, but I don’t want them) and make sure bootstrap servers and MySQL credentials are correct.
{ "name": "mysql-connector-db01", "config": { "name": "mysql-connector-db01", "connector.class": "io.debezium.connector.mysql.MySqlConnector", "database.server.id": "1", "tasks.max": "3", "database.history.kafka.bootstrap.servers": "b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092", "database.history.kafka.topic": "schema-changes.mysql", "database.server.name": "mysql-db01", "database.hostname": "172.31.84.129", "database.port": "3306", "database.user": "bhuvi", "database.password": "your_strong_pass", "database.whitelist": "bhuvi,new,test", "internal.key.converter.schemas.enable": "false", "transforms.unwrap.add.source.fields": "ts_ms", "key.converter.schemas.enable": "false", "internal.key.converter": "org.apache.kafka.connect.json.JsonConverter", "internal.value.converter.schemas.enable": "false", "value.converter.schemas.enable": "false", "internal.value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "transforms": "unwrap", "transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState" }}
“database.history.kafka.bootstrap.servers” — Kafka Servers IP.
“database.whitelist” — List of databases to get the CDC.
key.converter and value.converter and transforms parameters — By default Debezium output will have more detailed information. But I don’t want all of that information. I'm only interested in to get the new row and the timestamp when its inserted.
If you don’t want to customize anything then just remove everything after the database.whitelist
Register the MySQL Connector:
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors -d @mysql.json
Check the status:
curl GET localhost:8083/connectors/mysql-connector-db01/status
{ "name": "mysql-connector-db01", "connector": { "state": "RUNNING", "worker_id": "172.31.44.151:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "172.31.44.151:8083" } ], "type": "source"}
Test the MySQL Consumer:
Now insert something into any tables in proddb or test (because we have whilelisted only these databases to capture the CDC.
use test; create table rohi (id int), fn varchar(10), ln varchar(10), phone int );
insert into rohi values (2, 'rohit', 'ayare','87611');
We can get these values from the Kafker brokers. Listen to the below topic:
mysql-db01.test.rohiThis is the combination of servername.databasename.tablenameservername(you mentioned this in as a server name in mysql json file).
kafka-console-consumer --bootstrap-server b-1.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-3.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092,b-2.searce-bigdata.XXXXXXXXX.kafka.us-east-1.amazonaws.com:9092 --from-beginning
{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}{"id":1,"fn":"rohit","ln":"ayare","phone":87611,"__ts_ms":0}
It’ll start copying the historical data and start capturing real-time CDC.
Setup S3 Sink connector in All Producer Nodes:
I want to send this data to the S3 bucket. Make sure the Debezium VM is attached with an IAM role that has S3 access to write. So you must have an EC2 IAM role which has access to the target S3 bucket. Or install awscli and configure access and secret key(but its not recommended)
Create s3.json file.
{ "name": "s3-sink-db01", "config": { "connector.class": "io.confluent.connect.s3.S3SinkConnector", "storage.class": "io.confluent.connect.s3.storage.S3Storage", "s3.bucket.name": "searce-00000", "name": "s3-sink-db01", "tasks.max": "3", "s3.region": "us-east-1", "s3.part.size": "5242880", "s3.compression.type": "gzip", "timezone": "UTC", "locale": "en", "flush.size": "10000", "rotate.interval.ms": "3600000", "topics.regex": "mysql-db01.(.*)", "internal.key.converter.schemas.enable": "false", "key.converter.schemas.enable": "false", "internal.key.converter": "org.apache.kafka.connect.json.JsonConverter", "format.class": "io.confluent.connect.s3.format.json.JsonFormat", "internal.value.converter.schemas.enable": "false", "value.converter.schemas.enable": "false", "internal.value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "partitioner.class": "io.confluent.connect.storage.partitioner.HourlyPartitioner", "path.format": "YYYY/MM/dd/HH", "partition.duration.ms": "3600000", "rotate.schedule.interval.ms": "3600000" }}
"topics.regex": "mysql-db01" - It’ll send the data only from the topics which have mysql-db01 as a prefix. In our case, all the MySQL databases related topics will start with this prefix.
"flush.size" - The data will be uploaded to S3 only after these many number of records stored. Or after "rotate.schedule.interval.ms" this duration.
Register this S3 sink connector:
curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" http://localhost:8083/connectors -d @s3.json
Check the Status:
curl GET localhost:8083/connectors/s3-sink-db01/status |jq{ "name": "s3-sink-db01", "connector": { "state": "RUNNING", "worker_id": "172.31.44.151:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "172.31.44.151:8083" }, { "id": 1, "state": "RUNNING", "worker_id": "172.31.44.151:8083" }, { "id": 2, "state": "RUNNING", "worker_id": "172.31.44.151:8083" } ], "type": "sink"}
Test the S3 sync:
Insert the 10000 rows into the rohi table. Then check the S3 bucket. It’ll save the data in JSON format with GZIP compression. Also in an HOUR wise partitions.
Conclusion:
The MySQL and S3 config files are just referenced and we are using it. If you want more customization or you need any help in understanding the parameter, please refer to the Debezium documentation. Also, in this example blog, I'm doing S3 upload with a micro-batch(every 1hr or 10000 rows added/modified) If you want real-time then modify the config file accordingly.
If you want to do the same setup with Kafka in EC2 instead of AWS MSK please refer to the following link.
Build Production Grade Debezium Cluster With Confluent Kafka
RealTime CDC From MySQL Using AWS MSK With Debezium was originally published in Searce Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.
[Less]
|