I Use This!
Very High Activity

News

Analyzed 13 days ago. based on code collected 13 days ago.
Posted 1 day ago by MySQL Performance Blog
Percona announces the GA release of Percona Server for MySQL 5.7.21-20 on February 19, 2018. Download the latest version from the Percona web site or the Percona Software Repositories. You can also run Docker containers from the images in the Docker ... [More] Hub repository. Based on MySQL 5.7.21, including all the bug fixes in it, Percona Server for MySQL 5.7.21-20 is the current GA release in the Percona Server for MySQL 5.7 series. Percona provides completely open-source and free software. New Features: A new string variable version_suffix allows to change suffix for the Percona Server version string returned by the read-only version variable. Also version_comment is converted from a global read-only to a global read-write variable. A new keyring_vault_timeout variable allows to set the amount of seconds for the Vault server connection timeout. Bug fixed #298. Bugs Fixed: mysqld startup script was unable to detect jemalloc library location for preloading, and that prevented starting Percona Server on systemd based machines. Bugs fixed #3784 and #3791. There was a problem with fulltext search, which could find a word with punctuation marks in natural language mode only, but not in boolean mode. Bugs fixed #258, #2501 (upstream #86164). Build errors were present on FreeBSD (caused by fixing the bug #255 in Percona Server 5.6.38-83.0) and on MacOS (caused by fixing the bug #264 in Percona Server 5.7.20-19). Bugs fixed #2284 and #2286. A bunch of fixes was introduced to remove GCC 7 compilation warnings for the Percona Server build. Bugs fixed #3780 (upstream #89420, #89421, and #89422). CMake error took place at compilation with bundled zlib. Bug fixed #302. A GCC 7 warning fix introduced regression in Percona Server that led to a wrong SQL query built to access the remote server when Federated storage engine was used. Bug fixed #1134. It was possible to enable encrypt_binlog with no binary or relay logging enabled. Bug fixed #287. Long buffer wait times where occurring on busy servers in case of the IMPORT TABLESPACE command. Bug fixed #276. Server queries that contained JSON special characters and were logged by Audit Log Plugin in JSON format caused invalid output due to lack of escaping. Bug fixed #1115. Percona Server now uses Travis CI for additional tests. Bug fixed #3777. Other bugs fixed:  #257,  #264,  #1090  (upstream #78048),  #1109,  #1127,  #2204,  #2414,  #2415,  #3767,  #3794, and  #3804 (upstream #89598).  This release also contains fixes for the following CVE issues: CVE-2018-2565, CVE-2018-2573, CVE-2018-2576, CVE-2018-2583, CVE-2018-2586, CVE-2018-2590, CVE-2018-2612, CVE-2018-2600, CVE-2018-2622, CVE-2018-2640, CVE-2018-2645, CVE-2018-2646, CVE-2018-2647, CVE-2018-2665, CVE-2018-2667, CVE-2018-2668, CVE-2018-2696, CVE-2018-2703, CVE-2017-3737. MyRocks Changes: A new behavior makes Percona Server fail to restart on detected data corruption;  rocksdb_allow_to_start_after_corruption variable can be passed to mysqld as a command line parameter to switch off this restart failure. A new cmake option ALLOW_NO_SSE42 was introduced to allow MyRocks build on hosts not supporting SSE 4.2 instructions set, which makes MyRocks usable without FastCRC32-capable hardware. Bug fixed MYR-207. rocksdb_bytes_per_sync  and rocksdb_wal_bytes_per_sync  variables were turned into dynamic ones. rocksdb_flush_memtable_on_analyze variable has been removed. rocksdb_concurrent_prepare is now deprecated, as it has been renamed in upstream to  rocksdb_two_write_queues. rocksdb_row_lock_deadlocks and rocksdb_row_lock_wait_timeouts global status counters were added to track the number of deadlocks and the number of row lock wait timeouts. Creating table with string indexed column to non-binary collation now generates warning about using inefficient collation instead of error. Bug fixed MYR-223. TokuDB Changes: A memory leak was fixed in the PerconaFT library, caused by not destroying PFS key objects on shutdown. Bug fixed TDB-98. A clang-format configuration was added to PerconaFT and TokuDB. Bug fixed TDB-104. A data race was fixed in minicron utility of the PerconaFT. Bug fixed TDB-107. Row count and cardinality decrease to zero took place after long-running REPLACE load. Other bugs fixed: TDB-48, TDB-78, TDB-93, and TDB-99. The release notes for Percona Server for MySQL 5.7.21-20 are available in the online documentation. Please report any bugs on the project bug tracking system. [Less]
Posted 2 days ago by Vicki Boykis
"Young Naturalists"Sergiy Grigoriev, 1948 pic.twitter.com/8KdwFS6PjU— SovietArtBot (@SovietArtBot) February 15, 2018 TLDR: I built a Twitter bot that tweets paintings from the WikiArt socialist realism category every 6 hours using Python and AWS ... [More] Lambdas. The post outlines why I decided to do that, architecture decisions I made, technical details on how the bot works, and my next steps for the bot. Follow @SovietArtBot Here Check out its website and code here. Table of Contents Why build an art bot? Technical Goals Personal Goals Why Socialist Realism Breaking a Project into Chunks Requirements and Design: High-Level Bot Architecture Development: Pulling Paintings from WikiArt Development: Processing Paintings and Metadata Locally Development: Using S3 and Lambdas Development: Scheduling the Lambda Deployment: Bot Tweets! Where to Next? Testing and Maintenance Conclusion Why build an art bot? Often when you’re starting out as a data scientist or developer, people will give you the well-intentioned advice of “just picking a project and doing it” as a way of learning the skills you need. That advice can be hard and vague, particularly when you don’t have a lot of experience to draw from to figure out what’s even feasible given how much you know, and how that whole process should work. By writing out my process in detail, I’m hoping it helps more people understand: 1) The steps of a software project from beginning to end. 2) The process of putting out a mininum viable project that’s “good enough” and iterating over your existing code to add features. 3) Picking a project that you’re going to enjoy working on. 4) The joy of socialist realism art. Technical Goals I’ve been doing more software development as part of my data science workflows lately, and I’ve found that: 1) I really enjoy doing both the analytical and development pieces of a data science project. 2) The more development skills a data scientist is familiar with, the more valuable they are because it ultimately means they can prototype production workflows, and push their models into production quicker than having to wait for a data engineer. A goal I’ve had recently is being able to take a full software development project from end-to-end, focusing on understanding modern production best practices, particularly in the cloud. Personal Goals But, a project that’s just about “cloud architecture delivery” is really boring. In fact, I fell asleep just reading that last sentence. When I do a project, it has to have an interesting, concrete goal. To that end, I’ve been extremely interested in Twitter as a development platform. I wrote recently that one of the most important ways we can fix the internet is to get off Twitter. Easier said than done, because Twitter is still one of my favorite places on the internet. It’s where I get most of my news, where I find out about new blog posts, engage in discussions about data science, and a place where I’ve made a lot of friends that I’ve met in real life. But, Twitter is extremely noisy, lately to the point of being toxic. There are systemic ways that Twitter can take care of this problem, but I decided to try to tackle this problem this on my own by starting #devart, a hashtag where people post classical works of art with their own tech-related captions to break up stressful content. There’s something extremely catharctic about being able to state a problem in technology well enough to ascribe a visual metaphor to it, then sharing it with other people who also appreciate that visual metaphor and find it funny and relatable. "Waiting for the build to finish"Albert Anker, 1867 #devart pic.twitter.com/3tAO3idZq1— Vicki Boykis (@vboykis) December 28, 2017 Debugging a memory leak. (Vincent van Gogh, 1890) #devart pic.twitter.com/A5eYvK1Zmq— Miria Grunick (@MiriaGrunick) December 9, 2017 "Migration plan from MySQL to Mongo"Alex Colville, 1954 #devart pic.twitter.com/ieCzv7Hfh8— Vicki Boykis (@vboykis) December 13, 2017 "Another day, another AGILE Sprint"~Peter Blume, circa 1944-8#devart pic.twitter.com/GaiueyZniS— Trevor Grant (@rawkintrevo) December 19, 2017 "Machine learning engineers, data scientists, data analysts, research engineers, and statisticians hold forth on what each of their actual titles means in terms of responsibilities. "Raphael, 1510 #devart pic.twitter.com/7uX0ompuAo— Vicki Boykis (@vboykis) January 14, 2018 another Friday night deploy #devart pic.twitter.com/8dSgMbZqwj— Dmitri Sotnikov ⚛ (@yogthos) January 11, 2018 VC runs into the founder of a blockchain AI startup at SoulCycle Palo Alto, 2017 #devart https://t.co/Kgnj4IcOlK— Alex Companioni (@achompas) February 9, 2018 C# Developer Contemplates Switching to Node.js - Laszlo Mednyanszky, 1898 #devart pic.twitter.com/FwyQxmXMt5— PoliticalMath (@politicalmath) November 27, 2017 And, sometimes you just want to break up the angry monotony of text with art that moves you. Turns out I’m not the only one. If you don't follow @vboykis, she does a wonderful #devart series where she tweets out art w/ hilarious tech-y titles. Chris would LOVE it.— PoliticalMath (@politicalmath) November 27, 2017 Vicki’s #devart project is my new favorite twitter thing. https://t.co/3MwTNNKJeB— Rian van der Merwe (@RianVDM) February 9, 2018 As I posted more #devart, I realized that I enjoyed looking at the source art almost as much as figuring out a caption, and that I enjoyed accounts like Archillect, Rabih Almeddine’s, and Soviet Visuals, who all tweet a lot of beautiful visual content with at least some level of explanation. I decided I wanted to build a bot that tweets out paintings. Particularly, I was interested in socialist realism artworks. Why Socialist Realism Socialist realism is an artform that was developed after the Russian Revolution. As the Russian monarchy fell, social boundaries dissovled,and people began experimenting with all kinds of new art forms, including futurism and abstractionism. I’ve previously written about this shift here. As the Bolsheviks consolidated power, they established Narkompros, a body to control the education and cultrual values of what they deemend was acceptable under the new regime, and the government laid out the new criteria for what was accetable Soviet art. Socialist realism as a genre had four explicit criteria, developed by the highest government officials, including Stalin himself. It was to be: + Proletarian: art relevant to the workers and understandable to them. + Typical: scenes of everyday life of the people. + Realistic: in the representational sense. + Partisan: supportive of the aims of the State and the Party.` In looking at socialist realism art, it’s obvious that the underlying goal is to promote communism. But, just because the works are blatant propaganda doesn’t discount what I love about the genre, which is that it is indeed representative of what real people do in real life. "PagerDuty"Vladimir Kutilin #devart pic.twitter.com/onfI7bOxJL— Vicki Boykis (@vboykis) February 14, 2018 These are people working, sleeping, laughing, frowning, arguing, and showing real emotion we don’t often see in art. They are relatable and humane, and reflect our humanity back to us. What I also strongly love about this genre of art is that women are depicted doing things other than sitting still to meet the artist’s gaze. "Young, idealistic data scientists harvesting their first models for pickling"Tetyana Yablonska, 1966 pic.twitter.com/iSlWhTEeED— Vicki Boykis (@vboykis) October 6, 2017 So, what I decided is that I’d make a Twitter bot that tweets out one socialist realism work every couple of hours. Here’s the final result: "Young Naturalists"Sergiy Grigoriev, 1948 pic.twitter.com/8KdwFS6PjU— SovietArtBot (@SovietArtBot) February 15, 2018 There are several steps in traditional software development: Requirements Design Development Testing Deployment Maintenance Breaking a Project into Chunks This is a LOT to take in. When I first started, I made a list of everything that needed to be done: setting up AWS credentials, roles, and permissions, version control, writing the actual code, learning how to download images with requests, how to make the bot tweet on a schedule, and more. When you look at it from the top-down, it’s overwhelming. But in “Bird by Bird,” one of my absolute favorite books that’s about the writing processs (but really about any creative process) Anne Lamott writes, Thirty years ago my older brother, who was ten years old at the time, was trying to get a report on birds written that he’d had three months to write, which was due the next day. We were out at our family cabin in Bolinas, and he was at the kitchen table close to tears, surrounded by binder paper and pencils and unopened books on birds, immobilized by the hugeness of the task ahead. Then my father sat down beside him, put his arm around my brother’s shoulder, and said, “Bird by bird, buddy. Just take it bird by bird.” And that’s how I view software development, too. One thing at a time, until you finish that, and then move on to the next piece. So, with that in mind, I decided I’d use a mix of the steps above from the traditional waterfall approach and mix them with the agile concept of making a lot of small, quick cycles of those steps to get closer to the end result. Requirements and Design: High-Level Bot Architecture I started building the app by working backwards from what my requirements: a bot on Twitter, pulling painting images and metadata from some kind of database, on a timed schedule, either cron or something similar. This helped me figure out the design. Since I would be posting to Twitter as my last step, it made sense to have the data already some place in the cloud. I also knew I’d eventually want to incorporate AWS because I didn’t want the code and data to be dependent on my local machine being on. I knew that I’d also need version control and continuous integration to make sure the bot was stable both on my local machine as I was developing it, and on AWS as I pushed my code through, and so I didn’t have to manually put to code in the AWS console. Finally, I knew I’d be using Python, because I like Python, and also because it has good hooks into Twitter through the Twython API (thanks to Timo for pointing me to Twython over Tweepy, which is deprecated) and AWS through the Boto library. I’d start by getting the paintings and metadata about the paintings from a website that had a lot of good socialist realism paintings not bound by copyright. Then, I’d do something to those paintings to get both the name, the painter, and title so I could tweet all of that out. Then, I’d do the rest of the work in AWS. So my high-level flow went something like this: Eventually, I’d refactor out the dependency on my local machine entirely and push everything to S3, but I didn’t want to spend any money in AWS before I figured out what kind of metadata the JSON returned. Beyond that, I didn’t have a specific idea of the tools I’d need, and made design and architecture choices as my intermediate goals became clearer to me. Development: Pulling Paintings from WikiArt Now, the development work began. WikiArt has an amazing, well-catalogued collection of artworks in every genre you can think of. It’s so well-done that some researchers use the catalog for their papers on deep learning, as well. Some days, I go just to browse what’s new and get lost in some art. (Please donate to them if you enjoy them.) WikiArt also has two aspects that were important to the project: 1) They have an explicit category for socialist realism art with a good number of works. 500 works in the socialist realism perspective, which was not a large amount (if I wanted to tweet more than one image a day), but good enough to start with. 2) Every work has an image, title, artist, and year,which would be important for properly crediting it on Twitter. My first step was to see if there was a way to acces the site through an API, the most common way to pull any kind of content from websites programmatically these days. The problem with WikiArt is that it technically doesn’t have a readily-available public API,so people have resorted to really creative ways of scraping the site. But, I really, really didn’t want to scrape, especially because the site has infinite scroll Javascript elements, which are annoying to pick up in BeautifulSoup, the tool most people use for scraping in Python. So I did some sleuthing, and found that WikiArt does have an API, even if it’s not official and, at this point, somewhat out of date. It had some important information on API rate limits, which tells us how often you can access the API without the site getting angry and kicking out out: API calls: 10 requests per 2.5 seconds Images downloading: 20 requests per second and,even more importantly, on how to access a specific category through JSON-based query parameters. The documentation they had, though, was mostly at the artist level: so I had to do some trial and error to figure out the correct link I wanted, which was: https://www.wikiart.org/en/paintings-by-style/socialist-realism?json=2&page=1 And with that, I was ready to pull the data. I started by using the Python [Requests library](http://docs.python-requests.org/en/master/) to connect to the site and pull two things: 1) A JSON file that has all the metadata 2) All of the actual paintings as `png/jpg/jpeg` files # Development: Processing Paintings and Metadata Locally The JSON I got back looked like this: ```json { ArtistsHtml: null, CanLoadMoreArtists: false, Paintings: [], Artists: null, AllArtistsCount: 0, PaintingsHtml: null, PaintingsHtmlBeta: null, AllPaintingsCount: 512, PageSize: 60, TimeLog: null } Within the paintings array, each painting looked like this: { "id": "577271cfedc2cb3880c2de61", "title": "Winter in Kursk", "year": "1916", "width": 634, "height": 750, "artistName": "Aleksandr Deyneka", "image": "https://use2-uploads8.wikiart.org/images/aleksandr-deyneka/winter-in-kursk-1916.jpg", "map": "0123**67*", "paintingUrl": "/en/aleksandr-deyneka/winter-in-kursk-1916", "artistUrl": "/en/aleksandr-deyneka", "albums": null, "flags": 2, "images": null } I also downloaded all the image files by returning response.raw from the JSON and using the shutil.copyfileobj method. I decided not to do anymore processing locally since my goal was to eventually move everything to the cloud anyway, but I now had the files available to me for testing so that I didn’t need to hit WikiArt and overload the website anymore. I then uploaded both the JSON and the image files to the same S3 bucket with the boto client, which lets you write : def upload_images_to_s3(directory): """ Upload images to S3 bucket if they end with png or jpg :param directory: :return: null """ for f in directory.iterdir(): if str(f).endswith(('.png', '.jpg', '.jpeg')): full_file_path = str(f.parent) + "/" + str(f.name) file_name = str(f.name) s3_client.upload_file(full_file_path, settings.BASE_BUCKET, file_name) print(f,"put") As an aside, the .iterdir() method here is from the pretty great pathlib library, new to Python 3, which handles file operations better than os. Check out more about it here. Development: Using S3 and Lambdas Now that I had my files in S3, I needed some way for Twitter to read them. To do that at a regular time interval, I decided on using an AWS Lambda function (not to be confused with Python lambda functions, a completely different animal.) Because I was already familiar with Lambdas and their capabilities - see my previous post on AWS - , they were a tool I could use without a lot of ramp-up time (a key component of architectural decisions.) Lambdas are snippets of code that you can run without needing to know anything about the machine that runs them. They’re triggered by other events firing in the AWS ecosystem. Or, they can be run on a cron-like schedule, which was perfect for what I wanted to do. This was exactly what I needed, since I needed to schedule the bot to post at an interval. Lambdas look like this in Python: def handler_name(event, context): ... return some_value The event is what you decide to do to trigger the function and the context sets up all the runtime information needed to interact with AWS and run the function. Because I wanted my bot to tweet both the artwork and some context around it, I’d need a way to tweet both the picture and the metadata, by matching the picture with the metadata. To do this, I’d need to create key-value pairs, a common programming data model, where the key was the filename part of the image attribute, and the value was the title, year, and artistName, so that I could match the two, like this: So, all in all, I wanted my lambda function to do several things. All of that code I wrote for that section is here. 1) Open the S3 bucket object and inspect the contents of the metadata file Opening an S3 bucket within a lambda usually looks something like this: def handler(event, context): for record in event['Records']: bucket = record['s3']['bucket']['name'] key = record['s3']['object']['key'] download_path = '/tmp/{}{}'.format(uuid.uuid4(), key) s3_client.download_file(bucket, key, download_path) where the event is the JSON file that gets passed in from Lambda that signifies that a trigger has occurred. Since our trigger is a timed event, our JSON file doesn’t have any information about that specific event and bucket, and we can exclude the event, in order to create a function that normally opens a given bucket and key. try: data = s3.get_object(Bucket=bucket_name, Key=metadata) json_data = json.loads(data['Body'].read().decode('utf-8')) except Exception as e: print(e) raise e 2) Pull out the metadata and pull it into a dictionary with the filename as the key and the metadata as the value. We can pull it into a defaultdict, because those are ordered by default (all dictionaries will be orded as of 3.6, but we’re still playing it safe here.) indexed_json = defaultdict() for value in json_data: artist = value['artistName'] title = value['title'] year = value['year'] values = [artist, title, year] # return only image name at end of URL find_index = value['image'].rfind('/') img_suffix = value['image'][find_index + 1:] img_link = img_suffix try: indexed_json[img_link].append(values) except KeyError: indexed_json[img_link] = (values) (By the way, a neat Python string utility that I didn’t know before which really helped with the filename parsing was (rsplit) [http://python-reference.readthedocs.io/en/latest/docs/str/rsplit.html]. ) 3) Pick a random filename to tweet (single_image_metadata = random.choice(list(indexed_json.items()))) 4) Tweet the image and associated metadata There are a couple of Python libraries in use for Twitter. I initially started using Tweepy, but much to my sadness, I found out it was no longer being maintained. (Thanks for the tip, Timo. ) So I switched to Twython, which is a tad more convoluted, but is up-to-date. The final piece of code that actually ended up sending out the tweet is here: twitter = Twython(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_SECRET) try: tmp_dir = tempfile.gettempdir() #clears out lambda dir from previous attempt, in case testing lambdas keeps previous lambda state call('rm -rf /tmp/*', shell=True) path = os.path.join(tmp_dir, url) print(path) s3_resource.Bucket(bucket_name).download_file(url, path) print("file moved to /tmp") print(os.listdir(tmp_dir)) with open(path, 'rb') as img: print("Path", path) twit_resp = twitter.upload_media(media=img) twitter.update_status(status="\"%s\"\n%s, %s" % (title, painter, year), media_ids=twit_resp['media_id']) except TwythonError as e: print(e) What this does is take advantage of a Lambda’s temp space: TIL that AWS Lambda Functions have miniature file systems that you can use as temporary storage (https://t.co/egCKwu6GJB).— Vicki Boykis (@vboykis) January 3, 2018 Pulls the file from S3 into the Lambda’s /tmp/ folder, and matches it by filename with the metadata, which at this point is in key-value format. The twitter.upload_media method uploads the image and gets back a media id that is then passed into the update_status method with the twit_resp['media_id']. And that’s it. The image and text are posted. Development: Scheduling the Lambda The second part was configuring the function. to run on a schedule. Lambdas can be triggered by two things: An event occurring A timed schedule. Events can be anything from a file landing in an S3 bucket, to polling a Kinesis stream. Scheduled events can be written either in cron, or at a fixed-rate. I started out writing cron rules, but since my bot didn’t have any specific requirements, only that it needed to post every six hours, the fixed rate turned out to be enough for what I needed: Finally, I needed to package the lambda for distribution. Lambdas run on Linux machines which don’t have a lot of Python libraries pre-installed (other than boto3, the Amazon Python client library I used previously that connects the Lambda to other parts of the AWS ecosystem, and json. ) In my script, I have a lot of library imports. Of these, Twython is an external library that needs to be packaged with the lambda and uploaded. from twython import Twython, TwythonError Deployment: Bot Tweets! So I packged the Lambda based on those instructions, manually the first time, by uploading a zip file to the Lambda console. And, that’s it! My two one-off scripts were ready, and my bot was up and running. And here’s the final flow I ended up with: Where to Next? There’s a lot I still want to get to with Soviet Art Bot. The most important first step is tweaking the code so that no painting repeats more than once a week. That seems like the right amount of time for Twitter followers to not get annoyed. In parallel, I want to focus on testing and maintenance. Testing and Maintenance The first time I worked through the entire flow, I started by working in a local Python project I had started in PyCharm and had version-controlled on GitHub. Me, trying to explain the cases when I use PyCharm, when I use Sublime Text, and when I use Jupyter Notebooks for development. pic.twitter.com/CEC0WlymlC— Vicki Boykis (@vboykis) January 18, 2018 So, when I made changes to any part of the process, my execution flow would be: Run Wikiart download functionality locally Test the lambda “locally” with python-lambda-local Zip up the lambda and upload to Lambda Make mistakes in the Lambda code Zip up the lambda and run again. This was not really an ideal workflow for me, because I didn’t want to have to manually re-uploading the lambda every time, so I decided to use Travis CI, which integrates with GitHub really well.. The problem is that there’s a lot of setup involved: virtualenvs, syncing to AWS credentials, setting up IAM roles and profiles that allow Travis to access the lambda, setting up a test Twitter and AWS environment to test travis integration, and more. For now, the bot is working in production, and while it works, I’m going to continue to automate more and more parts of deployment in my dev branch. (This post was particularly helpful in zipping up a lambda, and my deploy script is here. After these two are complete, I want to: 1) Refactor lambda code to take advantage of pathlib instead of OS so my code is standardized (should be a pretty small change) 2) Source more paintings. WikiArt is fantastic, but has only 500ish paintngs available in the socialist realism category. I’d like to find more sources with high-quality metadata and a significant collection of artworks. Then, I’d like to 3) Create a front-end where anyone can upload a work of socialist realism for the bot to tweet out. This would probably be easier than customizing a scraper and would allow me to crowdsource data. As part of this process, I’d need a way to screen content before it got to my final S3 bucket. Which leads to: 4) Go through current collection and make sure all artwork is relevant and SWF. See if there’s a way I can do that programmatically. And: 5) Machine learning and deep learning potential possibilities: Look for a classifier to filter out artworks with nudity/questionable content and figure out how to decide what “questionable” means. Potentially with AWS Rekognition, or building my own CNN. Other machine learning opportunities: Mash with #devart to see if the bot can create fun headlines for paintings based on painting content Extract colors from artworks by genre and see how they differ between genres and decades Conclusion Software development can be a long, exhausting process with a lot of moving parts and decision-making involved, but it becomes much easier and more interesting if you you break up a project into byte-sized chunks that you can continuously work on to stop yourself from getting overwhelemed with the entire task at hand. The other part, of course, is that it has to be fun and interesting for you so that you make it through all of the craziness with a fun, finished product at the end. [Less]
Posted 4 days ago by MySQL Performance Blog
In this blog post, we’ll look at how ZFS affects MySQL performance when used in conjunction. ZFS and MySQL have a lot in common since they are both transactional software. Both have properties that, by default, favors consistency over performance. ... [More] By doubling the complexity layers for getting committed data from the application to a persistent disk, we are logically doubling the amount of work within the whole system and reducing the output. From the ZFS layer, where is really the bulk of the work coming from? Consider a comparative test below from a bare metal server. It has a reasonably tuned config (discussed in separate post, results and scripts here). These numbers are from sysbench tests on hardware with six SAS drives behind a RAID controller with a write-backed cache. Ext4 was configured with RAID10 softraid, while ZFS is the same (striped three pairs of mirrored VDEvs). There are a few obvious observations here, one being ZFS results have a high variance between median and the 95th percentile. This indicates a regular sharp drop in performance. However, the most glaring thing is that with write-only only workloads of update-index, overall performance could drop to 50%: Looking further into the IO metrics for the update-index tests (95th percentile from /proc/diskstats), ZFS’s behavior tells us a few more things.   ZFS batches writes better, with minimal increases in latency with larger IO size per operation. ZFS reads are heavily scattered and random – the high response times and low read IOPs and throughput means significantly higher disk seeks. If we focus on observation #2, there are a number of possible sources of random reads: InnoDB pages that are not in the buffer pool When ZFS records are updated, metadata also has to be read and updated This means that for updates on cold InnoDB records, multiple random reads are involved that are not present with filesystems like ext4. While ZFS has some tunables for improving synchronous reads, tuning them can be touch and go when trying to fit specific workloads. For this reason, ZFS introduced the use of L2ARC, where faster drives are used to cache frequently accessed data and read them in low latency. We’ll look more into the details how ZFS affects MySQL, the tests above and the configuration behind them, and how we can further improve performance from here in upcoming posts. [Less]
Posted 4 days ago by Mark Callaghan
MongoDB used to have a great story for sharded replica sets. But the storage engine, sharding and replica management code had significant room for improvement. Over the last few releases they made remarkable progress on that and the code is starting ... [More] to match the story. I continue to be impressed by the rate at which they paid off their tech debt and transactions coming to MongoDB 4.0 is one more example.It is time for us to do the same in the MySQL community.I used to be skeptical about the market for sharded replica sets with MySQL. This is popular with the web-scale crowd but that is a small market. Today I am less skeptical and assume the market extends far beyond web-scale. This can be true even if the market for replicasets, without sharding, is so much larger.The market for replica sets is huge. For most users, if you need one instance of MySQL then you also need HA and disaster recovery. So you must manage failover and for a long time (before crash-proof slaves and GTID) that was a lousy experience. It is better today thanks to cloud providers and DIY solutions even if some assembly is required. Upstream is finally putting a solution together with MySQL Group Replication and other pieces. But sharded replica sets are much harder, and even more so if you want to do cross-shard queries and transactions. While there have been many attempts at sharding solutions for the MySQL community, it is difficult to provide something that works across customers. Fortunately Vitess has shown this can be done and already has many customers in production. ProxySQL and Orchestrator might also be vital pieces of this stack. I am curious to see how the traditional vendors (MySQL, MariaDB, Percona) respond to this progress.Updates:I think binlog server should be part of the solution. But for that to happen we need a GPLv2 binlog server and that has yet to be published. [Less]
Posted 4 days ago by Frederic Descamps
Today, let’s have a look at the TOP 10 new features in MySQL 8.0 that will improve DBA’s life. To shrink the list to 10 only items wasn’t an easy task, but here is the top 10: Temporary Tables Improvements Persistent global variables No more ... [More] MyISAM System Tables Reclaim UNDO space from large transactions UTF8 performance Removing Query Cache Atomic DDLs Faster & More Complete Performance Schema (Histograms, Indexes, …) and Information Schema ROLES REDO & UNDO logs encrypted if tablespace is encrypted Temporary Tables Improvements Since 5.7, all internal temporary tables are created in a unique shared tablespace called “ibtmp1“. Additionally, the metadata for temp tables will also be stored in memory (not anymore in .frm files). In MySQL 8.0, the MEMORY storage engine will also be replaced as default engine for internal temporary tables (those created by the Optimizer during JOIN, UNION, …) by the TempTable storage engine. This new engine provides more efficient storage for VARCHAR and VARBINARY columns (with Memory the full maximum size is allocated). Persistent Global Variables With MySQL 8.0 it is now also possible to set variables and make the change persistent to server’s reboot. I’ve written a dedicated blog post that you can check for more information. Combined this syntax and the new RESTART command, makes very easy to configure MySQL from its shell. This is a cloud friendly feature! No more MyISAM System Tables With the new native Data Dictionary, we won’t need MyISAM system tables anymore ! Those tables and the data dictionary tables are now created in a single InnoDB tablespace file named mysql.idb in the data directory. This means that if you don’t explicitly use MyISAM tables (which is totally inadvisable if you care about your data) you can have a MySQL instance without any MyISAM table. Reclaim UNDO space from large transactions In MySQL 5.7, we already added the possibility to truncate undo spaces (innodb_undo_log_truncate, disabled by default). In MySQL8, we changed the undo disk format to support a huge number of rollback segments per undo tablespaces. Also, by default, the rollback segments are now created in two separate undo tablespaces instead of the InnoDB system tablespace (2 is now the minimum and this setting is now dynamic). We also deprecated the variable to set that value (innodb_undo_tablespaces) as we will provide SQL commands giving DBAs a real interface to interact with UNDO Tablespaces too. Automatic truncation of undo tablespaces is also now enabled by default. UTF8 Performance The default character set has changed from latin1 to utf8mb4 as UTF8 is now much faster up to 1800% faster on specific queries ! Emojis are everywhere now and MySQL supports them without problem ! Removing Query Cache The first thing I was always advising during a performance audit was to disable the Query Cache as it didn’t scale by design. The MySQL QC was creating more issues than it solved. We decided to simply remove it in MySQL 8.0 as nobody should use it. If your workload requires a Query Cache, then you should have a look to ProxySQL as Query Cache. Atomic DDLs With the new Data Dictionary, MySQL 8.0 now supports Atomic Data Definition Statements (Atomic DDLs). This means that when a DDL is performed, the data dictionary updates, the storage engine operation and the writes in the binary log are combined into a single atomic transaction that is either fully executed or not at all. This provides a better reliability where unfinished DDLs don’t leave any incomplete data. Faster & More Complete Performance Schema (Histograms, Indexes, …) and Information Schema Many improvements were made to Performance Schema like fake indexes or histograms. With the contribution of fake indexes, queries like SELECT * FROM sys.session became 30x faster. Tables scans are now avoided as much as possible and the use of indexes improves a lot the execution time. Additionally to that, Performance Schema also provides histograms of statements latency. The Optimizer can also benefit form these new histograms. Information Schema has also been improved by the use of the Data Dictionary. No more .frm files are needed to know the table’s definition. Also this allow to scale to more than 1.000.000 tables ! ROLES SQL Roles have been added to MySQL 8.0. A role is a named collection of privileges. Like user accounts, roles can have privileges granted to and revoked from them. Roles can be applicable by default or by session. There is also the possibility to set roles to be mandatory. REDO & UNDO logs encrypted if tablespace is encrypted In MySQL 5.7, it was possible to encrypt an InnoDB tablespace for tables stored in file-per-table. In MySQL 8.0 we completed this feature by adding encryption for UNDO and REDO logs too. And once again, the list of improvements doesn’t finish here. There are many other nice features. I would like to list below some other important ones (even if they are all important of course ) persistent auto increment InnoDB self tuning JSON performance Invisible Indexes new lock for backup Resource Groups additional metadata into binary logs OpenSSL for Community Edition too Please check the online manual to have more information about all these new features. [Less]
Posted 4 days ago by MySQL High Availability
Since January 2017, the MySQL Replication Team has been involved in processing many Community Contributions ! We are really happy to receive contributions (and not only in the replication team), but this also implies a lot of work from our ... [More] engineers, as more than resolving a bug or developing a new feature, code contributions need to be analyzed, the code needs to be understood and validated.… [Less]
Posted 5 days ago by The Pythian Group
With support of multi-threads replication starting from MySQL 5.7, the operations on slave are slightly different from single-thread replication. Here is a list of some operation tips for the convenience of use as below: 1. Skip a statement for a ... [More] specific channel. Sometimes, we might find out that one of the channels stop replication due to some error, and we may want to skip the statement for that channel so that we can restart a slave for it. We need to be very careful not to skip the statement from the other channel, since the command SET GLOBAL sql_slave_skip_counter = N is for global. How can we make sure the global sql_slave_skip_counter is applied to a specific channel and not to the other channel? Here are the steps: 1.1: Stop all slaves by: stop slave; stop slave; 1.2: Set up the count of statement to skip by: SET GLOBAL sql_slave_skip_counter = N; SET GLOBAL sql_slave_skip_counter = 1; 1.3: Start slave on the channel we want to skip the statement on. The command will use the setting for global sql_slave_skip_counter = 1 to skip one statement and start slave on that channel (for example ‘main’) by: starting slave for channel ‘channel-name’; start slave for channel 'main'; 1.4: Start slave on all the other channels by: start slave; start slave; 2. Check the status of replication with detailed messages in the table performance_schema.replication_applier_status_by_worker through select * from the table: mysql> select * from performance_schema.replication_applier_status_by_worker; | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP | | metrics | 1 | 1784802 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | accounting | 1 | 1851760 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | main | 1 | NULL | OFF | ANONYMOUS | 1051 | Worker 0 failed executing transaction 'ANONYMOUS' at master log mysql-bin.019567, end_log_pos 163723076; Error 'Unknown table 'example.accounts'' on query. Default database: 'pythian'. Query: 'DROP TABLE `example`.`accounts` /* generated by server */' | 2018-02-14 23:57:52 | | log | 1 | 1784811 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | mysql> select * from performance_schema.replication_applier_status_by_worker; | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP | | metrics | 1 | 1965646 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | accounting | 1 | 1965649 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | main | 1 | 1965633 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | log | 1 | 1965652 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | 3. Check the status for a specific channel by: show slave status for channel ‘channel-name’\G : mysql> show slave status for channel 'main'\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: db-test-01.int.example.com Master_User: replicator Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.019567 Read_Master_Log_Pos: 869255591 Relay_Log_File: db-test-02-relay-bin-example.000572 Relay_Log_Pos: 45525401 Relay_Master_Log_File: mysql-bin.019567 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: test.sessions,test.metrics Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 869255591 Relay_Log_Space: 869256195 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 4118338212 Master_UUID: b8cee5b1-3161-11e7-8109-3ca82a217b08 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: insight Master_TLS_Version: I hope this short list of tips helps you enjoy multi-threads replication. [Less]
Posted 5 days ago by The Pythian Group
With support of multi-threads replication starting from MySQL 5.7, the operations on slave are slightly different from single-thread replication. Here is a list of some operation tips for the convenience of use as below: 1. Skip a statement for a ... [More] specific channel. Sometimes, we might find out that one of the channels stop replication due to some error, and we may want to skip the statement for that channel so that we can restart a slave for it. We need to be very careful not to skip the statement from the other channel, since the command SET GLOBAL sql_slave_skip_counter = N is for global. How can we make sure the global sql_slave_skip_counter is applied to a specific channel and not to the other channel? Here are the steps: 1.1: Stop all slaves by: stop slave; stop slave; 1.2: Set up the count of statement to skip by: SET GLOBAL sql_slave_skip_counter = N; SET GLOBAL sql_slave_skip_counter = 1; 1.3: Start slave on the channel we want to skip the statement on. The command will use the setting for global sql_slave_skip_counter = 1 to skip one statement and start slave on that channel (for example ‘insight’) by: starting slave for channel ‘channel-name’; start slave for channel 'insight'; 1.4: Start slave on all the other channels by: start slave; start slave; 2. Check the status of replication with detailed messages in the table performance_schema.replication_applier_status_by_worker through select * from the table: mysql> select * from performance_schema.replication_applier_status_by_worker; | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP | | account | 1 | 1784802 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | core | 1 | 1851760 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | insight | 1 | NULL | OFF | ANONYMOUS | 1051 | Worker 0 failed executing transaction 'ANONYMOUS' at master log mysql-bin.019567, end_log_pos 163723076; Error 'Unknown table 'rpt_workspace.pj_widgetsCreated'' on query. Default database: 'rpt_main_02'. Query: 'DROP TABLE `rpt_workspace`.`pj_widgetsCreated` /* generated by server */' | 2018-02-14 23:57:52 | | log | 1 | 1784811 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | mysql> select * from performance_schema.replication_applier_status_by_worker; | CHANNEL_NAME | WORKER_ID | THREAD_ID | SERVICE_STATE | LAST_SEEN_TRANSACTION | LAST_ERROR_NUMBER | LAST_ERROR_MESSAGE | LAST_ERROR_TIMESTAMP | | account | 1 | 1965646 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | core | 1 | 1965649 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | insight | 1 | 1965633 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | | log | 1 | 1965652 | ON | ANONYMOUS | 0 | | 0000-00-00 00:00:00 | 3. Check the status for a specific channel by: show slave status for channel ‘channel-name’\G : mysql> show slave status for channel 'insight'\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: db-insight-1201.ord.smartsheet.com Master_User: repl Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.019567 Read_Master_Log_Pos: 869255591 Relay_Log_File: db-insight-1202-relay-bin-insight.000572 Relay_Log_Pos: 45525401 Relay_Master_Log_File: mysql-bin.019567 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: ss_core_02.reportExportServer,ss_core_02.externalSyncGrid Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 869255591 Relay_Log_Space: 869256195 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 4118338212 Master_UUID: b8cee5b1-3161-11e7-8109-3ca82a217b08 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: Executed_Gtid_Set: Auto_Position: 0 Replicate_Rewrite_DB: Channel_Name: insight Master_TLS_Version: I hope this short list of tips helps you enjoy multi-threads replication [Less]
Posted 5 days ago by The Pythian Group
Whenever we do upgrades for our clients from one major version of MySQL to another we strongly recommend to test in two forms. First, it would be a performance test between the old version and the new version to make sure there aren’t going to be ... [More] any unexpected issues with the query processing rates. Secondly, do a functional test to ensure all queries that are running on the old version will not have syntactic errors or problems with reserved words in the new version that we’re upgrading to. If a client doesn’t have an appropriate testing platform to perform these types of tests, we will leverage available tools to test to the best of our ability. More often than not this includes using pt-upgrade after capturing slow logs with long_query_time set to 0 in order to catch everything that’s running on the server for a period of time. One of the issues you can run into with this sort of test is it has to run the queries one at a time. If you have a query that takes much longer to run in the new version this can slow things down considerably. This also gets a little frustrating if you have that long running query listed thousands of times in your slow query log. If your objective is to run a functional test and just ensure that you’re not going to run into a syntax error in the new version, it makes no sense to run a query more than once. If it ran okay the first time, it should run okay every time assuming that the litterals in the query are also properly enclosed. So instead of replaying the entire log against the target server, we can first use pt-query-digest to create a slow log that contains one of each type of query. Let’s take a look at the example below where I created a slow log with 5 identical write queries and 5 identical read queries. [root@cent5 slowlog]# cat ./testslow.log ...... use ptupgrade; ...... # Time: 180207 12:55:05 # User@Host: root[root] @ localhost [] # Query_time: 0.000134 Lock_time: 0.000051 Rows_sent: 0 Rows_examined: 0 SET timestamp=1518026105; insert into t1 (c1) values (1); # Time: 180207 12:55:06 # User@Host: root[root] @ localhost [] # Query_time: 0.000126 Lock_time: 0.000049 Rows_sent: 0 Rows_examined: 0 SET timestamp=1518026106; insert into t1 (c1) values (2); # Time: 180207 12:55:08 # User@Host: root[root] @ localhost [] # Query_time: 0.000125 Lock_time: 0.000051 Rows_sent: 0 Rows_examined: 0 SET timestamp=1518026108; insert into t1 (c1) values (3); # Time: 180207 12:55:10 # User@Host: root[root] @ localhost [] # Query_time: 0.000130 Lock_time: 0.000052 Rows_sent: 0 Rows_examined: 0 SET timestamp=1518026110; insert into t1 (c1) values (4); # Time: 180207 12:55:12 # User@Host: root[root] @ localhost [] # Query_time: 0.000126 Lock_time: 0.000050 Rows_sent: 0 Rows_examined: 0 SET timestamp=1518026112; insert into t1 (c1) values (5); # Time: 180207 12:55:17 # User@Host: root[root] @ localhost [] # Query_time: 0.000134 Lock_time: 0.000055 Rows_sent: 2 Rows_examined: 10 SET timestamp=1518026117; select c1 from t1 where c1 = 1; # Time: 180207 12:55:19 # User@Host: root[root] @ localhost [] # Query_time: 0.000121 Lock_time: 0.000053 Rows_sent: 2 Rows_examined: 10 SET timestamp=1518026119; select c1 from t1 where c1 = 2; # Time: 180207 12:55:20 # User@Host: root[root] @ localhost [] # Query_time: 0.000118 Lock_time: 0.000052 Rows_sent: 2 Rows_examined: 10 SET timestamp=1518026120; select c1 from t1 where c1 = 3; # Time: 180207 12:55:22 # User@Host: root[root] @ localhost [] # Query_time: 0.000164 Lock_time: 0.000074 Rows_sent: 2 Rows_examined: 10 SET timestamp=1518026122; select c1 from t1 where c1 = 4; # Time: 180207 12:55:24 # User@Host: root[root] @ localhost [] # Query_time: 0.000121 Lock_time: 0.000052 Rows_sent: 2 Rows_examined: 10 SET timestamp=1518026124; select c1 from t1 where c1 = 5; I then used pt-query-digest to create a new version of this slow query log with only 1 of each type of query. [root@cent5 slowlog]# pt-query-digest --limit=100% --sample 1 --no-report --output slowlog ./testslow.log # Time: 180207 12:55:05 # User@Host: root[root] @ localhost [] # Query_time: 0.000134 Lock_time: 0.000051 Rows_sent: 0 Rows_examined: 0 use ptupgrade; insert into t1 (c1) values (1); # Time: 180207 12:55:17 # User@Host: root[root] @ localhost [] # Query_time: 0.000134 Lock_time: 0.000055 Rows_sent: 2 Rows_examined: 10 use ptupgrade; select c1 from t1 where c1 = 1; You’ll notice that not only did we get 1 query of each type, it also added a use statement before each query so MySQL knows what schema to run the query against when being replayed. You can now take this new slow log and run it via pt-upgrade against your target servers. Conclusion If you have a large slow query log file that you are trying to test against your server using a log replay tool like pt-upgrade, you can make your life a lot simpler by getting one sample of each query using pt-query-digest. In the field we’ve seen this reduce log file sizes from hundreds of gigs to less than a meg and have reduced log replay times from weeks to minutes. Please note that this is mainly something you’ll want to consider for functional testing as you may want to have a lot of variety with your litterals when doing a performance test. [Less]
Posted 5 days ago by MySQL Performance Blog
Join Percona Chief Evangelist Colin Charles as he covers happenings, gives pointers and provides musings on the open source database community. In case you missed last week’s column, don’t forget to read the fairly lengthy FOSDEM MySQL & Friends ... [More] DevRoom summary. From a Percona Live Santa Clara 2018 standpoint, beyond the tutorials getting picked and scheduled, the talks have also been picked and scheduled (so you were very likely getting acceptance emails from the Hubb.me system by Tuesday). The rejections have not gone out yet but will follow soon. I expect the schedule to go live either today (end of week) or early next week. Cheapest tickets end March 4, so don’t wait to register! Amazon Relational Database Service has had a lot of improvements in 2017, and the excellent summary from Jeff Barr is worth a read: Amazon Relational Database Service – Looking Back at 2017. Plenty of improvements for the MySQL, MariaDB Server, PostgreSQL and Aurora worlds. Spectre/Meltdown and its impact are still being discovered. You need to read Brendan Gregg’s amazing post: KPTI/KAISER Meltdown Initial Performance Regressions. And if you visit Percona Live, you’ll see an amazing keynote from him too! Are you still using MyISAM? MyISAM and KPTI – Performance Implications From The Meltdown Fix suggests switching to Aria or InnoDB. Probably the biggest news this week though? Transactions are coming to MongoDB 4.0. From the site, “MongoDB 4.0 will add support for multi-document transactions, making it the only database to combine the speed, flexibility, and power of the document model with ACID guarantees. Through snapshot isolation, transactions will provide a globally consistent view of data, and enforce all-or-nothing execution to maintain data integrity.”. You want to read the blog post, MongoDB Drops ACID (the title works if you’re an English native speaker, but maybe not quite if you aren’t). The summary diagram was a highlight for me because you can see the building blocks, plus future plans for MongoDB 4.2. Releases ProxySQL 1.4.6 – improvements and bug fixes, and you can upgrade straight to 1.4.6 (you don’t, for example, have to go to 1.4.5 then 1.4.6) MariaDB Server 10.2.13 – updated InnoDB (from MySQL 5.7.21), Galera wsrep library, fixes for slow starts, and more Percona Server for MySQL 5.6.39-83.1 – bug fixes, plus a some TokuDB changes Link List Compiling ProxySQL on FreeBSD – I’d be interested in knowing how many FreeBSD users actively want to deploy MySQL and her variants + ecosystem. MySQL 8.0 Roles and Graphml – visualize roles, I like this (despite MariaDB Server having roles since 10.0.5, this is not one of the available features). TOP 10 MySQL 8.0 features for developers – if you haven’t already tried the second release candidate, this might be a good reason to try it. From the document store to JSON enhancements, CTEs, window functions and more, I suggest taking a look at this great list. How To Enable Binary Logging On An Amazon RDS Read Replica Collect PostgreSQL Metrics with Percona Monitoring and Management (PMM); this largely thanks to external monitor support. I think another feature people would benefit from? Amazon Aurora MySQL Monitoring with Percona Monitoring and Management (PMM). From the just for fun department, MariaDB source visualisation with Gource. You see the source tree growing in the video, but as the commentary tells you, you don’t clean too much info from this. Would be nice to visualize how much the code-base has diverged? Upcoming appearances SCALE16x – Pasadena, California, USA – March 8-11 2018 FOSSASIA 2018 – Singapore – March 22-25 2018 Feedback I look forward to feedback/tips via e-mail at colin.charles@percona.com or on Twitter @bytebot. [Less]