23
I Use This!
High Activity

News

Analyzed about 6 hours ago. based on code collected about 6 hours ago.
Posted over 8 years ago
Since September, Pacemaker has started using Git for the 1.1 and devel trees. There were some minor technical advantages over Mercurial (which I still personally prefer), but mostly the decision was driven by the pain associated with switching ... [More] between SCMs multiple times a day. The majority of development now happens on GitHub, which has some great features for reviewing patches and general collaboration. The Pacemaker tree is also periodically sync’d to the Cluster Labs server in case GitHub is unavailable for any reason. For those new to Git, GitHub has many tips for setting up Git, creating a local copy of the Pacemaker repo to work in, submitting your changes upstream (we use the Fork + Pull Model), and other assorted resources. Be sure to configure email and user information so you get credit for your hard work too! [Less]
Posted over 8 years ago
Since it’s clearly not acceptable for our issue tracker to be offline for months at a time, it is time to replace the Bugzilla instance hosted by the Linux Foundation with something else. One candidate that came close was the github issue tracker ... [More] , but alas it doesn’t support attachments. The end result is that we now have an instance of Bugzilla v4 at: http://bugs.clusterlabs.org Bug numbers start at 5000. This avoids clashing with older ones and may enable us to import the old ones if it ever comes back up again. I would advise people to assume this wont happen and to re-create any unresolved issues. [Less]
Posted over 8 years ago
The latest installment of the Pacemaker 1.0 release series is now ready for general consumption. Changesets 85  Diff 500 files changed, 69642 insertions(+), 58270 deletions(-) Thanks once again to the efforts of Keisuke MORI ... [More] and NTT, the latest bug fixes have been back-ported from 1.1 Important changes since Pacemaker-1.0.10 include: cib: Repair the processing of updates sent from peer nodes crmd: All pending operations should be recorded, even recurring ones with high start delays crmd: Bug lf#2509 - Watch for config option changes from the CIB even if we’re not the DC crmd: Bug lf#2528 - Introduce a slight delay when creating a transition to allow attrd time to perform its updates crmd: Bug lf#2545 - Ensure notify variables are accurate for stop operations crmd: Bug lf#2559 - Fail actions that were scheduled for a failed/fenced node crmd: Cancel recurring operations while we’re still connected to the lrmd crmd: Don’t abort transitions when probes are completed on a node crmd: Ensure the CIB is always writable on the DC by removing a timing hole crmd: Update failcount for failed promote and demote operations PE: Bug lf#2495 - Prevent segfault by validating the contents of ordering sets PE: Bug lf#2508 - Correctly reconstruct the status of anonymous cloned groups PE: Bug lf#2544 - Prevent unstable clone placement by factoring in the current node’s score before all others PE: Bug lf#2554 - target-role alone is not sufficient to promote resources PE: Ensure fencing of the DC preceeds the STONITH_DONE operation PE: Ensure that fencing has completed for stop actions on stonith-dependent resources (lf#2551) PE: Prevent clones from being stopped because resources colocated with them cannot be active PE: Prevet use-after-free resulting from unintended recursion when chosing a node to promote master/slave resources Shell: don’t create empty optional sections (bnc#665131) Tools: Bug lf#2528 - Make progress when attrd_updater is called repeatedly within the dampen interval but with the same value Tools: Prevent crm_resource commands from being lost due to the use of cib_scope_local You also can see the full changelog, As per our release calendar, the next 1.0.x release is planned for mid-September. The source tarball is also available directly from Mercurial. Pre-built packages for Pacemaker and it’s immediate dependancies are available immediately for openSUSE 11.2, 11.3, Fedora-13 and EPEL-5 from the ClusterLabs Build Area. Users of more recent distributions are encouraged to use the latest 1.1.x - either from the 1.1 Build Area or the distribution directly. General installation instructions are available at from the ClusterLabs wiki. [Less]
Posted almost 9 years ago
The latest installment of the Pacemaker 1.1 release series is now ready for general consumption. Changesets 184  Diff 605 files changed, 46103 insertions(+), 26417 deletions(-) As well as the usual round of bug fixes, see ... [More] the full changelog, S.U.S.E. has implemented support for ACLs. This means that you can now delegate permission to control parts of the cluster (as defined by you) to non-root users. ACLs are still disabled by default, but you can read their documentation, provide feedback and decide if its something you want to use. As per our release calendar, the next 1.1 release is planned for mid-April and 1.0.11 should be available in March depending on how quickly we can get the bugfixes from 1.1 backported. Pre-built packages for Pacemaker and it’s immediate dependancies are available immediately for openSUSE 11.3, Fedora-14 and EPEL-5 from the ClusterLabs Build Area. The source tarball is also available directly from Mercurial. General installation instructions are available at from the ClusterLabs wiki. [Less]
Posted about 9 years ago
One unexpected outcome from the recent Linux Plumbers conference was the contribution of a new logo to the project by NTT. Quite possibly you’re now wondering how this logo relates at all to clustering and the Pacemaker project. Don’t worry, they ... [More] came up with a backstory too! In various forms of racing there is quite often someone/something setting a benchmark time or speed. This entity is often referred to as the pace-setter, pacemaker, or colloquially as a “rabbit”. The logo is therefor a stylized pair of rabbit ears and the implication is that we’re setting new standards for cluster resource management. As well as the logo, NTT also contributed some very professional looking banner images they’d created a Japanese cluster site they’ve been busy building up. Even if you can’t speak Japanese, be sure to check out the shiny intro movie on the front page! I quite like the logo and the message, but I’m interested in the community’s reaction. I’ve created an online poll, be sure to let us know what you think. [Less]
Posted about 9 years ago
It may have seemed quiet since July, but things were actually so busy that I couldn’t find the time to publicize our new releases. First up, the long awaited 1.0.10 is finally here. Thanks once again to the hard work of Keisuke MORI from NTT ... [More] , 1.0.10 contains all the bug fixes from the recent 1.1.3 and 1.1.4 releases. You can preview the list of updates with the new online change log. In addition to general bugfixes, the big news in 1.1.3 was the addition of a master control process and support for cman. Cman support allows us to run on top of a traditional RHCS cluster stack - replacing just the rgmanager component (more details on this in a subsequent post). 1.1.3 also introduced a new logging system inspired by the kernel and a PoC from Lars Ellenberg. It enables us to selectively enable logs for specific files, functions and even individual lines. Eventually this should result in less being logged by default. The successor to 1.1.3 was all about performance. In 1.1.4 we managed to speed up the CIB and Policy Engine by about 80% each. So if you have 100’s of resources, you really want to be using this version (the changes were far too invasive to consider including in a 1.0 release). Packages for all three releases are available from the rpm and rpm-next repositories on clusterlabs.org In other news, I have also recently updated the release calendar for 2011. [Less]
Posted over 9 years ago
One question I still get a lot is what all these projects are/do and how they all relate. Here is the list of the possible components that might make up a Pacemaker install is: Pacemaker - Resource manager Corosync - Messaging layer Heartbeat - ... [More] Also a messaging layer Resource Agents - Scripts that know how to control various services Pacemaker is the thing that starts and stops services (like your database or mail server) and contains logic for ensuring both that they’re running, and that they’re only running in one location (to avoid data corruption). But it can’t do that without the ability to talk to instances of itself on the other node(s), which is where Heartbeat and/or Corosync come in. Think of Heartbeat and Corosync as dbus but between nodes. Somewhere that any node can throw messages on and know that they’ll be received by all its peers. This bus also ensures that everyone agrees who is (and is not) connected to the bus and tells Pacemaker when that list changes. For two nodes Pacemaker could just as easily use sockets, but beyond that the complexity grows quite rapidly and is very hard to get right - so it really makes sense to use existing components that have proven to be reliable. You only need one of them though :-) Finally, in order to avoid teaching Pacemaker about every possible service that people might want to make highly available, we make use of the OCF standard to hide the details in scripts - which we call Resource Agents. Any series of command-line actions can be easily turned into a resource agent by adding them to an existing template. However a collection of the most commonly useful ones are made available as part of the Resource Agents project. And of course pre-built packages for all these come with most of the popular Linux distributions, including Fedora, openSUSE, SLES >= 10, RHEL >= 6, Debian, and Ubuntu. [Less]
Posted over 9 years ago
Over the last few days, I’ve spent a bunch of time improving Pacemaker’s performance in large clusters. This involved profiling the CIB and Policy Engine, identifying and optimizing hotspots and improving algorithm designs. Since most of my work is ... [More] done in virtual machines, it wasn’t possible to use oprofile. Strictly speaking oprofile worked, but without hardware performance counters the results weren’t very helpful. I also tried gprof, but that is more about counting calls rather than time spent. Eventually I switched to callgrind and when combined with a tool Tim found called Gprof2Dot and/or kcachegrind, finally got the data I was looking for. To do your own profiling, simply set PCMK_callgrind_enabled to either yes or to the name of a Pacemaker daemon you wish to profile. Eg. PCMK_callgrind_enabled=cib Overall, the CIB (which is the main bottleneck in a large cluster) and the Policy Engine are about 70% faster. The improvements will be available with 1.1.4 is released next month, or from our 1.1 code repository right now. A summary of the various changes and description of future work is below. Any assistance in further optimization would be appreciated :-) — Andrew PE Use case: * 100 nodes * 100 clones, clone-max=100 (10,000 effective resources) * 100 resource location constraints Baseline: with probes 20-30 minutes Baseline: without probes 28s Phase 1 Use hashtables instead of lists for stores the available nodes for a resource New time without probes: 18s Phase 2 Defer creation of deletion,promote and demote constraints until they are needed New time without probes: 13s Phase 3 Use g_list_prepend() instead of g_list_append() for the list of ordering constraints New time without probes: 5s Phase 4 New algorithm for determining which clone instances need probing New time with probes: 31s Future work Further improve the algorithm for determining which resources need to be probed Further optimize the algorithm for enforcing ordering constraints CIB The CIB was harder to profile. Rather than give it one large task to chew through and see how long it took using a few printf’s to provide granularity, I had to run it through a profiler while it was operating in a real cluster and see where most of the time was being spent. Phase 1 Remove most uses of cib_msg_copy(), reduced the amount of needless copying. Phase speedup: 10% Phase 2 Compression costs a LOT, don’t do it unless we’re hitting message limits. For now, use 256k as the threshold at which compression kicks in. The previous limit was 10k, compressing 184 of 1071 messages accounted for 23% of the total CPU used by the cib. Each time we validated the CIB, we were re-reading and re-parsing the RelaxNG schema, which accounted for 28% of the CIB’s CPU usage on the DC. We now read it once and cache the result for the life of the CIB process. Phase speedup: 51% Phase 3 Push detection of group and set ordering changes to (the less busy) slave instances. This detection was costing 15% of the CIB’s total CPU time on the DC. Phase speedup: 15% Future work The majority of CPU spent by the CIB is in post-processing. Detecting what changed so we can minimize the network load: diff_xml_object, 35.5% CPU time Calculating the current digest so peers can verify the diffs and detect ordering changes: calculate_xml_digest, 31% CPU time [Less]
Posted over 9 years ago
The latest addition to the Pacemaker 1.1 series is a master control process (MCP) and associated init script. This means that Pacemaker is now started/stopped independently of the messaging layer. We anticipate that this should result in a simpler ... [More] and more reliable startup/shutdown procedure when used in combination with Corosync. Forking inside a multi-threaded process like Corosync causes all sorts of pain. This has been problematic for Pacemaker as it needs a number of daemons to be spawned. Likewise, Corosync was never designed for staggered shutdown - something previously needed in order to prevent the cluster from leaving before Pacemaker could stop all active resources. By moving this functionality into the MCP, the whole system should become more reliable It should be noted that when using the MCP, Corosync will refuse to shutdown if Pacemaker is still running. Pacemaker will also naturally fail to start if Corosync isn’t active yet. So, starting with 1.1.3, the following Corosync-based options are possible: corosync + pacemaker plugin (v0) corosync + pacemaker plugin (v1) + mcp corosync + cpg + cman + mcp corosync + cpg + quorumd + mcp Option ‘1’ corresponds to what people have been using since openais/corosync started being supported. If Pacemaker starts being supported in RHEL6, its probably going to look like option ‘3’. Option ‘4’ is what we’re all working towards. Anyone having startup or shutdown problems (with Pacemaker 1.1 or 1.0) should immediately move to clusters based on option ‘2’ or ‘3’. Both involve the new master control process and therefor benefit from the more reliable startup/shutdown design. Additionally, ‘3’ uses CPG for messaging (whereas ‘2’ still uses the plugin which makes it compatible with option nodes running ‘1’). Unfortunately option ‘4’ isn’t fully baked yet, there’s still a few kinks in the pacemaker/quorumd interaction to be worked out. This will happen in the coming months, however any assistance in this process would be highly appreciated. To use option ‘2’, simply change: ver: 0 to ver: 1 in the pacemaker service block of corosync.conf. To use option ‘3’, you can either: * use cluster.conf and service cman start or, * add the cman bits to corosync.conf. Using cluster.conf is the preferred approach. Its far easier to maintain and start automatically starts the necessary pieces for using GFS2. Alternative 1 - Sample cluster.conf for a two-node cluster Alternative 2 - Sample corosync.conf additions for a two node CMAN cluster Be sure to set nodename appropriately for each host. cluster { name: beekhof clusternodes { clusternode { votes: 1 nodeid: 1 name: pcmk-1 } clusternode { votes: 1 nodeid: 2 name: pcmk-2 } } cman { expected_votes: 2 cluster_id: 123 nodename: pcmk-1 two_node: 1 max_queued: 10 } } service { name: corosync_cman ver: 0 } quorum { provider: quorum_cman } [Less]
Posted over 9 years ago
The latest installment of the Pacemaker 1.0 stable series is now ready for general consumption. Coinciding with 1.0.9 is a new version of Corosync (1.2.5). Included in both are some important fixes that should resolve most of the startup issues ... [More] people have been seeing. Also included in this release are the fixes for issues reported by Valgrind and Coverity. As per our release calendar, the next 1.0 release is planned for mid-September and 1.1.3 will be available in late July. I’d like to particularly thank Keisuke MORI for his help with this release. Keisuke-san has taken on the role of Patch Manager for 1.0, so it is because of his hard work that we have backports of all the bugfixes from 1.1 :-) This change has enabled me to focus on 1.1 and, I hope, be slightly more responsive to bug reports and questions on the mailing list(s). Pre-built packages for Pacemaker and it’s immediate dependancies are available immediately for openSUSE, SLES, Fedora, RHEL, CentOS from the ClusterLabs Build Area. Regular updaters may also have noticed the expanded version scheme used by packages on clusterlabs.org. The build scripts now automatically bump the version numbers when rebuilding the stack. This usually occurs when new versions of corosync, cluster-glue or heartbeat come out. Versions are now of the form: x.y.x-a.b x.y.z is the upstream version (this is the only time the tarball is changed) a indicates the number of spec file changes (ie. changes to dependancies) b indicates how many times the package has been rebuilt with unchanged tarballs and spec files So the following version: pacemaker-1.0.9-1.4 would mean the fourth rebuild of the initial spec file for the upstream version 1.0.9 of Pacemaker. Debian users should check for updates Martin’s repo over the coming days and Ubuntu fans can visit LaunchPad for 8.04 and 9.10 packages. The source tarball is also available directly from Mercurial. General installation instructions are available at from the ClusterLabs wiki. Release Statistics Changesets 152  Diff 266 files changed, 14324 insertions(+), 3842 deletions(-) Changes of note since Pacemaker-1.0.8 High: ais: Ensure the list of active processes sent to clients is always up-to-date High: ais: Fix previous commit, actually return a result in get_process_list() High: ais: Fix two more uses of getpwnam() in non-thread-safe locations High: ais: Look for the correct conf variable for turning on file logging High: ais: Need to find a better and thread-safe way to set core_uses_pid. Disable for now. High: ais: Use the threadsafe version of getpwnam High: cib: Also free query result for xpath operations that return more than one hit High: cib: Fix the application of unversioned diffs High: cib: Remove old developmental error logging High: Core: Bug lf#2414 - Prevent use-after-free reported by valgrind when doing xpath based deletions High: Core: Fix memory leak in replace_xml_child() reported by valgrind High: Core: fix memory leaks exposed by valgrind High: crmd: Bug 2401 - Improved detection of partially active peers High: crmd: Bug lf#2379 - Ensure the cluster terminates when the PE is not available High: crmd: Bug lf#2414 - Prevent use-after-free of the PE connection after it dies High: crmd: Bug lf#2439 - cancel_op() can also return HA_RSCBUSY High: crmd: Bug lf#2439 - Handle asynchronous notification of resource deletion events High: crmd: Do not allow the target_rc to be misused by resource agents High: crmd: Do not ignore action timeouts based on FSA state High: crmd: Ensure we dont get stuck in S_PENDING if we loose an election to someone that never talks to us again High: crmd: Fix memory leaks exposed by valgrind High: crmd: Remove race condition that could lead to multiple instances of a clone being active on a machine High: crmd: Send erase_status_tag() calls to the local CIB when the DC is fenced, since there is no DC to accept them High: PE: Bug lf#1959 - Fail unmanaged resources should not prevent other services from shutting down High: PE: Bug lf#2383 - Combine failcounts for all instances of an anonymous clone on a host High: PE: Bug lf#2384 - Fix intra-set colocation and ordering High: PE: Bug lf#2403 - Enforce mandatory promotion (colocation) constraints High: PE: Bug lf#2412 - Correctly locate clone instances by their prefix High: PE: Bug lf#2422 - Ordering dependencies on partially active groups not observed properly High: PE: Bug lf#2424 - Use notify oepration definition if it exists in the configuration High: PE: Bug lf#2433 - No services should be stopped until probes finish High: PE: Do not be so quick to pull the trigger on nodes that are coming up High: PE: Fix colocation for interleaved clones High: PE: Fix colocation with partially active groups High: PE: Fix memory leaks reported by valgrind High: PE: Make the current data set a global variable so it does not need to be passed around everywhere High: PE: Prevent endless loop when looking for operation definitions in the configuration High: PE: Rewrite native_merge_weights() to avoid Fix use-after-free High: Shell: always reload status if working with the cluster (bnc#590035) High: Tools: crm_mon - fix memory leaks exposed by valgrind Medium: ais: Correctly set logfile permissions in all cases Medium: ais: create the final directory too for resource agents (bnc#603190) Medium: ais: Make sure debug messages make it into the logfiles too Medium: Build: Do not enable the -ansi compiler option by default, prevents use of strtoll() Medium: cib: Bug lf#2352 - Changes to group order are not detected or broadcast to peers Medium: cib: Correctly free the cib contents at signoff when in file-based mode Medium: cib: xpath - Allow all hits to be deleted, allow the no_children option to return multiple hits Medium: PE: Bug lf#2391 - Ensure important options (notify, unique, etc) are always exposed during resource operations Medium: PE: Bug lf#2410 - Do not complain about missing agents during probes of a-symetric clusters Medium: PE: Bug lf#2426 - stop-all-resources should not apply to stonith resources Medium: PE: Bug lf#2435 - Support colocation sets with negative scores Medium: PE: Check for use-of-NULL in dump_node_scores() Medium: PE: Do not overwrite existing meta attributes (like timeout) for notify operations Medium: PE: Ensure deallocated resources are stopped Medium: PE: If there are no compatible peers when interleaving clones, ensure the instance is stopped Medium: PE: Ignore colocation weights from clone instances Medium: RA: SystemHealth: exit properly when the required software is not installed (bnc#587940) Medium: Shell: do not error on missing resource agent with asymmetrical clusters (lf#2410) Medium: Shell: do not verify empty configurations (bnc#602711) Medium: shell: find hb_delnode in correct directory Medium: Shell: observe op_defaults when verifying primitives (bnc#590033) Medium: Shell: on no id match the first of property-like elements (lf#2420) Medium: Shell: skip resource checks for property-like elements (lf#2420) Medium: Shell: verify meta attributes and properties (bnc#589867) Medium: Shell: verify only changed elements on commit (bnc#590033) Medium: Tools: crm_mon: refresh screen on terminal resize (bnc#589811) [Less]