I Use This!
Very High Activity


Analyzed 21 days ago. based on code collected 21 days ago.
Posted 6 months ago by Stig Telfer
What is the single biggest avoidable toil in deployment of OpenStack? Hard to choose, but at StackHPC we've recently been looking at one of our pet grievances, around the way that we've been creating the images for provisioning an HPC-enabled ... [More] overcloud. An HPC-enabled overcloud might differ in various ways, in order to offer high performance connectivity, or greater efficiency - whether that be in compute overhead or data movement. In this specific instance, we are looking at incorporating Open Fabrics for binding the network-oriented data services that our hypervisor is providing to its guests. Open Fabrics on Mellanox Ethernet We take the view that CPU cycles spent in the hypervisor are taken from our clients, and we do what we can to minimise this. We've had good success in demonstrating the advantages of both SR-IOV and RDMA for trimming the fat from hypervisor data movement. Remote DMA (RDMA) is supported by integrating packages from Open Fabrics enterprise distribution (OFED), an alternative networking stack that bypasses the kernel's TCP/IP stack to deliver data directly to the processes requesting it. Mellanox produce their own version of OFED, developed and targeted specifically for their NICs. TripleO and the Red Hat OpenStack Ecosystem Red Hat's ecosystem is built upon TripleO, which uses DiskImage-Builder (DIB) - with a good deal of extra customisation in the form of DIB elements. The TripleO project have done a load of good work to integrate the invocation of DIB into the OpenStack client. The images created in TripleO's process include the overcloud images used for hypervisors and controller nodes deployed using TripleO. Conventionally the same image is used for all overcloud roles but as we've shown in previous articles we can built distinct images tailored to compute, control, networking or storage as required. Introducing the Tar Pit We'd been following a process of taking the output from the OpenStack client's openstack overcloud image build command (the overcloud images are in QCOW2 format at this point) and then using virt-customize to boot a captive VM in order to apply site-specific transformations, including the deployment of OFED. We've previously covered the issues around creating Mellanox OFED packages specifically built for the kernel version embedded in OpenStack overcloud images. The repo produced is made available on our intranet, and accessed by the captive VM instantiated by virt-customize. This admittedly works, but sucks in numerous ways: It adds a heavyweight extra stage to our deployment process (and one that requires a good deal of extra software dependencies). OFED really fattens up the image and this is probably the slowest possible way in which it could be integrated into the deployment. It adds significant complexity to scripting an automated ground-up redeployment. The Rainy Day Through our work on Kolla-on-Bifrost (a.k.a Kayobe) we have been building our own DiskImage-Builder elements. Our deployments for the Square Kilometre Array telescope have had us looking again at the image building process. A quiet afternoon led us to put the work in to integrating our own HPC-specific DIB elements into a single-step process for generating overcloud images. For TripleO deployments, we now integrate our steps into the invocation of the TripleO OpenStack CLI, as described in the TripleO online documentation. Here's how: We install our MLNX-OFED repo on an intranet webserver acting as a package repo as before. In TripleO this can easily be the undercloud seed node. It's best for future control plane upgrades if it is a server that is reachable from the OpenStack overcloud instances when they are active. We use a git repo of StackHPC's toolbox of DIB elements We define some YAML for adding our element to TripleO's overcloud-full image build (call this overcloud-images-stackhpc.yaml): disk_images: - imagename: overcloud-full elements: - mlnx-ofed environment: # Example: point this to your intranet's unpacked MLNX-OFED repo DIB_MLNX_OFED_VERSION: 4.0-2 DIB_MLNX_OFED_REPO: DIB_MLNX_OFED_DELETE_REPO: n DIB_MLNX_OFED_PKGLIST: "mlnx-ofed-hypervisor mlnx-fw-updater" Define some environment variables. Here we select to build Ocata stable images. DiskImage-Builder doesn't extend any existing value assigned for ELEMENTS_PATH, so we must define all of TripleO's elements locations, plus our own: export STABLE_RELEASE="ocata" export DIB_YUM_REPO_CONF="/etc/yum.repos.d/delorean*" export ELEMENTS_PATH=/home/stack/stackhpc-image-elements/elements:\ /usr/share/tripleo-image-elements:\ /usr/share/instack-undercloud:\ /usr/share/tripleo-puppet-elements Invoke the OpenStack client providing configurations - here for a CentOS overcloud image - plus our overcloud-images-stackhpc.yaml fragment: openstack overcloud image build \ --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images.yaml\ --config-file /usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos7.yaml \ --config-file /home/stack/stackhpc-image-elements/overcloud-images-stackhpc.yaml All going to plan, the result is an RDMA-enabled overcloud image, done right (or at least, better than it was before). Share and enjoy! Further Reading SKA Telescope SKA Science Data Processor (SDP) Kayobe [Less]
Posted 6 months ago by Mark Goddard
This post is the first in a series on HPC networking in OpenStack. In the series we'll discuss StackHPC's current and future work on integrating OpenStack with high performance network technologies. This post sets the scene and the varied ... [More] networking capabilities of one of our recent OpenStack deployments, the Performance Prototype Platform (P3), built for the Square Kilometre Array (SKA) telescope's Science Data Processor (SDP). (Not Too) Distant Cousins There are many similarities between the cloud and HPC worlds, driving the adoption of OpenStack for scientific computing. Viewed from a networking perspective however, HPC clusters and modern cloud infrastructure can seem worlds apart. OpenStack clouds tend to rely on overlay network technologies such as GRE and VXLAN tunnels to provide separation between tenants. These are often implemented in software, running atop a statically configured physical Ethernet fabric. Conversely, HPC clusters may feature a variety of physical networks, potentially including technologies such as Infiniband and Intel Omnipath Architecture. Low overhead access to these networks is crucial, with applications accessing the network directly in bare metal environments or via SR-IOV in when running in virtual machines. Performance may be further enhanced by using NICs with support for Remote Direct Memory Access (RDMA). Background: the SKA and its SDP The SKA is an awe-inspiring project, to which any short description of ours is unlikely to do justice. Here's what the SKA website has to say: The Square Kilometre Array (SKA) project is an international effort to build the world’s largest radio telescope, with eventually over a square kilometre (one million square metres) of collecting area. The scale of the SKA represents a huge leap forward in both engineering and research & development towards building and delivering a unique instrument, with the detailed design and preparation now well under way. As one of the largest scientific endeavours in history, the SKA will bring together a wealth of the world’s finest scientists, engineers and policy makers to bring the project to fruition. The SDP Consortium forms part of the SKA project, aiming to build a supercomputer-scale computing facility to process and store the data generated by the SKA telescope. The data ingested by the SDP is expected to exceed the global Internet traffic per day. Phew! The SKA will use around 3000 dishes, each 15 m in diameter. Credit: SKA Organisation Performance Prototype Platform: a High Performance Melting Pot The SDP architecture is still being developed, but is expected to incorporate the concept of a compute island, a scalable unit of compute resources and associated network connectivity. The SDP workloads will be partitioned and scheduled across these compute islands. During its development, a complex project such as the SDP has many variables and unknowns. For the SDP this includes a variety of workloads and an assortment of new hardware and software technologies which are becoming available. The Performance Prototype Platform (P3) aims to provide a platform that roughly models a single compute island, and allows SDP engineers to evaluate a number of different technologies against the anticipated workloads. P3 provides a variety of interesting compute, storage and network technologies including GPUs, NVMe memory, SSDs, high speed Ethernet and Infiniband. OpenStack offers a compelling solution for managing the diverse infrastructure in the P3 system, and StackHPC is proud to have built an OpenStack management plane that allows the SDP team to get the most out of the system. The compute plane is managed as a bare metal compute resource using Ironic. The Magnum and Sahara services allow the SDP team to explore workloads based on container and data processing technologies, taking advantage of the native performance provided by bare metal compute. How Many Networks? The P3 system features multiple physical networks with different properties: 1GbE out of band management network for BMC management 10GbE control and provisioning network for bare metal provisioning, private workload communication and external network access 25/100GbE Bulk Data Network (BDN) 100Gbit/s EDR Infiniband Low Latency Network (LLN) On this physical topology we provision a set of static VLANs for the control plane and external network access, and dynamic VLANS for use by workloads. Neutron manages the control/provisioning network switches, but due to current limitations in ironic it cannot also manage the BDN or LLN, so these are provided as a shared resource. The complexity of the networking in the P3 system means that automation is crucial to making the system managable. With the help of ansible's network modules, the Kayobe deployment tool is able to configure the physical and virtual networks of the switches and control plane hosts using a declarative YAML format. Ironic's networking capabilities are improving rapidly, adding features such as multi-tenant network isolation and port groups but still have a way to go to reach parity with VMs. In a later post we'll discuss the work being done upstream in ironic by StackHPC to support multiple physical networks. Next Time In the next article in this series we'll discuss how the Kayobe project uses Ansible's network modules to define physical and virtual network infrastructure as code. Further Reading SKA Telescope SKA Science Data Processor (SDP) OpenStack and HPC Network Fabrics Kayobe Zero touch provisioning of P3 [Less]
Posted 6 months ago by Superuser
Containers align nicely with OpenStack, providing infrastructure, allowing them to share networking and storage with other types of computer resources in rich environments, says Thierry Carrez. The post What makes OpenStack relevant in a container-driven world appeared first on OpenStack Superuser.
Posted 6 months ago by JavaCruft
The Ubuntu OpenStack team is pleased to announce the general availability of the OpenStack Pike b2 milestone in Ubuntu 17.10 and for Ubuntu 16.04 LTS via the Ubuntu Cloud Archive. Ubuntu 16.04 LTS You can enable the Ubuntu Cloud Archive for OpenStack ... [More] Pike on Ubuntu 16.04 LTS installations by running the following commands: sudo add-apt-repository […] [Less]
Posted 6 months ago by Rich Bowen
Posted 6 months ago by Anne Bertucio
The vendor-neutral certification offered by the OpenStack Foundation receives new features and upgrades as it reaches the 1,000 test taker mark. The post Certified OpenStack Administrator exam upgrades to Newton appeared first on OpenStack Superuser.
Posted 6 months ago by notartom
Now that Nova’s device role tagging feature talked about in a previous blog post is getting some real world usage, I’m starting to realise that it’s woefully under-documented and folks are having some misconceptions about what it is and how to use ... [More] it. Let’s start with an example. You boot a VM with 3 network interfaces, … Continue reading Virtual device role tagging, better explained → [Less]
Posted 6 months ago by Superuser
Inspiration, tips and a kick in the pants for the July 14 deadline. The post How to craft a successful OpenStack Summit proposal appeared first on OpenStack Superuser.
Posted 6 months ago by Rich Bowen
Posted 6 months ago by Superuser
Superuser talks to Adrien Lebre, who co-chairs the OpenStack Fog, Edge and Distributed Computing Working Group. The post Clearing up why fog computing is important appeared first on OpenStack Superuser.