Red Hat OpenStack Platform 10

Replacing Networker Nodes

An end-to-end scenario on replacing a Red Hat OpenStack Platform Networker node

Abstract

This document explains how to replace a Red Hat OpenStack Platform 10 Networker node in an enterprise environment using the Red Hat OpenStack Platform director. This document was prepared for Telecom Italia to address Bugzilla #1578502. The procedure requires the hotfix for Bugzilla #1600178.

Preface

In certain circumstances a node with a Networker profile as described in Tagging Nodes into Profiles in a high availability cluster might fail. In these situations, you must remove the node from the cluster and replace it with a new Networker node. This procedure assumes node discovery and ensuring the node can connect to the other nodes in the cluster over the network.

This section provides instructions on how to replace a Networker node. The process involves running the openstack overcloud deploy command to update the overcloud with a request to replace a Networker node.

Important

This procedures was prepared for Telecom Italia to address Bugzilla #1578502. This procedure requires the hotfix for Bugzilla #1600178. The following procedure only applies to high availability environments. Do not use this procedure if only using one Networker node.

Chapter 1. Preliminary Checks

Before attempting to replace an overcloud Networker node, it is important to check the current state of the Red Hat OpenStack Platform environment. Checking the current state can help avoid complications during the Networker replacement process. Use the following list of preliminary checks to determine if it is safe to perform a Networker node replacement. Run all commands for these checks on the undercloud.

  1. Check the current status of the overcloud stack on the undercloud:

    [stack@director ~]$ source stackrc
    [stack@director ~]$ openstack stack list --nested

    The overcloud stack and its subsequent child stacks should have either CREATE_COMPLETE or UPDATE_COMPLETE.

  2. Perform a backup of the undercloud databases:

    [stack@director ~]$ mkdir /home/stack/backup
    [stack@director ~]$ sudo mysqldump --all-databases --quick --single-transaction | gzip > /home/stack/backup/dump_db_undercloud.sql.gz
  3. Ensure the undercloud contains 10 GB free storage to accommodate image caching and conversion when provisioning the new node.
  4. Check the status of Pacemaker on the running Networker nodes. For example, if 192.168.0.47 is the IP address of a running Networker node, use the following command to get the Pacemaker status:

    [stack@director ~]$ ssh heat-admin@192.168.0.47 'sudo pcs status'

    Replace the exemplary IP address with the IP address of the running Networker node. The output should show all services running on the existing nodes and stopped on the failed node.

  5. Check the following parameters on each node of the overcloud’s MariaDB cluster:

    • wsrep_local_state_comment: Synced
    • wsrep_cluster_size: 2

      Use the following command to check these parameters on each running Networker node, but replace the exemplary 192.168.0.47 and 192.168.0.46 IP addresses with IP addresses from the cluster:

      [stack@director ~]$ for i in 192.168.0.47 192.168.0.46 ; do echo "*** $i ***" ; ssh heat-admin@$i "sudo mysql -p\$(sudo hiera -c /etc/puppet/hiera.yaml mysql::server::root_password) --execute=\"SHOW STATUS LIKE 'wsrep_local_state_comment'; SHOW STATUS LIKE 'wsrep_cluster_size';\""; done
  6. Check the RabbitMQ status. For example, if 192.168.0.47 is the IP address of a running Networker node, use the following command to get the status:

    [stack@director ~]$ ssh heat-admin@192.168.0.47 "sudo docker exec \$(sudo docker ps -f name=rabbitmq-bundle -q) rabbitmqctl cluster_status"

    The running_nodes key should only show the two available nodes and not the failed node.

  7. Check the nova-compute service on the director node:

    [stack@director ~]$ sudo systemctl status openstack-nova-compute
    [stack@director ~]$ openstack hypervisor list

    The output should show all non-maintenance mode nodes as up.

  8. Make sure all undercloud services are running:

    [stack@director ~]$ sudo systemctl -t service

Chapter 2. Node Replacement

  1. Identify the index of the node to remove. The node index is the suffix on the instance name from Nova list output. For example:

    [stack@director ~]$ openstack server list
    +--------------------------------------+------------------------+
    | ID                                   | Name                   |
    +--------------------------------------+------------------------+
    | 861408be-4027-4f53-87a6-cd3cf206ba7a | overcloud-compute-0    |
    | 0966e9ae-f553-447a-9929-c4232432f718 | overcloud-compute-1    |
    | 9c08fa65-b38c-4b2e-bd47-33870bff06c7 | overcloud-compute-2    |
    | a7f0f5e1-e7ce-4513-ad2b-81146bc8c5af | overcloud-controller-0 |
    | cfefaf60-8311-4bc3-9416-6a824a40a9ae | overcloud-controller-1 |
    | 97a055d4-aefd-481c-82b7-4a5f384036d2 | overcloud-controller-2 |
    | 844c9a88-713a-4ff1-8737-6410bf551d4f | overcloud-networker-0  |
    | aef7c27a-f0b4-4814-b0ff-aaf8d05ad721 | overcloud-networker-1  |
    | c2e40164-c659-4849-a28f-507eb7edb79f | overcloud-networker-2  |
    +--------------------------------------+------------------------+

    In this example, the aim is to remove the overcloud-networker-1 node and replace it with overcloud-networker-3. First, set the node into maintenance mode so the director does not re-provision the failed node. Correlate the instance ID from nova list with the node ID from openstack baremetal node list. For example:

    [stack@director ~]$ openstack baremetal node list
    +------------------------+------+---------------------------------+
    | UUID                   | Name | Instance UUID                   |
    +------------------------+------+---------------------------------+
    | 36404147-7c8a-41e6-8c72| None | 7bee57cf-4a58-4eaf-b851         |
    | 91eb9ac5-7d52-453c-a017| None | None                            |
    | 75b25e9a-948d-424a-9b3b| None | None                            |
    | 038727da-6a5c-425f-bd45| None | 763bfec2-9354-466a-ae65         |
    | dc2292e6-4056-46e0-8848| None | 2017b481-706f-44e1-852a         |
    | c7eadcea-e377-4392-9fc3| None | 5f73c7d7-4826-49a5-b6be         |
    | da3a8d19-8a59-4e9d-923a| None | cfefaf60-8311-4bc3-9416         |
    | 807cb6ce-6b94-4cd1-9969| None | c07c13e6-a845-4791-9628         |
    | 0c245daa-7817-4ae9-a883| None | 844c9a88-713a-4ff1-8737         |
    | e6499ef7-3db2-4ab4-bfa7| None | aef7c27a-f0b4-4814-b0ff         |
    | 7545385c-bc49-4eb9-b13c| None | c2e40164-c659-4849-a28f         |
    +------------------------+------+---------------------------------+
    (truncated UUIDs)
  2. Set the node into maintenance mode.

    [stack@director ~]$ openstack baremetal node maintenance set \
                  e6499ef7-3db2-4ab4-bfa7-ef59539bf972
  3. Tag the new node with the networker profile.

    [stack@director ~]$ openstack baremetal node set --property \
         capabilities='profile:networker,boot_option:local' \
         91eb9ac5-7d52-453c-a017-c0e3d823efd0
  4. Create a ~/templates/remove-networker.yaml YAML file that defines the node index to remove:

    parameters:
     NetworkerRemovalPolicies:
        [{'resource_list': ['1']}]
  5. Create a ~/templates/node-count-networker.yaml file and set the total count of Networker nodes in the file. For example, if the cluster has 3 Networker nodes, the file will look like this:

    parameter_defaults:
      OvercloudNetworkerFlavor: networker
      NetworkerCount: 3
  6. Redeploy the overcloud including the node-count-networker.yaml and remove-networker.yaml environment files:

    [stack@director ~]$ openstack overcloud deploy --templates -e  ~/templates/node-count-networker.yaml -e ~/templates/remove-networker.yaml [OTHER OPTIONS]

    If you passed any extra environment files or options when you created the overcloud, pass them again here to avoid making undesired changes to the overcloud. However, note that -e ~/templates/remove-networker.yaml is only required once in this instance.

The director removes the old node, creates a new one, and updates the overcloud stack. Check the status of the overcloud stack using the following command:

[stack@director ~]$ openstack stack list --nested

Verify that the new network node is listed, and the old one is removed.

[stack@director ~]$ openstack server list
+--------------------------------------+------------------------+
| ID                                   | Name                   |
+--------------------------------------+------------------------+
| 861408be-4027-4f53-87a6-cd3cf206ba7a | overcloud-compute-0    |
| 0966e9ae-f553-447a-9929-c4232432f718 | overcloud-compute-1    |
| 9c08fa65-b38c-4b2e-bd47-33870bff06c7 | overcloud-compute-2    |
| a7f0f5e1-e7ce-4513-ad2b-81146bc8c5af | overcloud-controller-0 |
| cfefaf60-8311-4bc3-9416-6a824a40a9ae | overcloud-controller-1 |
| 97a055d4-aefd-481c-82b7-4a5f384036d2 | overcloud-controller-2 |
| 844c9a88-713a-4ff1-8737-6410bf551d4f | overcloud-networker-0  |
| c2e40164-c659-4849-a28f-507eb7edb79f | overcloud-networker-2  |
| 425a0828-b42f-43b0-940c-7fb02522753a | overcloud-networker-3  |
+--------------------------------------+------------------------+

Chapter 3. Neutron Cleanup and Rescheduling

Following the previous procedure for replacing a Networker node, remove all neutron agents on the removed Networker node from the database to ensure that they don’t show up as dead agents, and to ensure that DHCP resources are automatically rescheduled to other Networker nodes.

  1. Source overcloudrc to gain admin credentials on the OpenStack deployment overcloud.

    [stack@director ~]$ source ~/overcloudrc
  2. Verify that 4 agents exist, and are marked dead as indicated by xxx for the overcloud-networker-1 (metadata, l3, openvswitch and dhcp).

    [stack@director ~]$ neutron agent-list -c id -c binary -c host -c alive  | grep overcloud-networker-1
    | 8377-66d75323e466 | neutron-metadata-agent | overcloud-networker-1 | xxx |
    | b55d-797668c33670 | neutron-l3-agent       | overcloud-networker-1 | xxx |
    | 9dcb-00a9e32ecde4 | neutron-ovs-agent      | overcloud-networker-1 | xxx |
    | be83-e4d932984654 | neutron-dhcp-agent     | overcloud-networker-1 | xxx |
    (truncated UUIDs)
  3. Capture the UUIDs of the agents registered for the removed overcloud-networker-1.

    [stack@director ~]$ AGENT_UUIDS=$(neutron agent-list -c id -c binary -c host -c alive -f value | grep overcloud-networker-1 | cut -d\  -f1)
  4. Delete any remaining overcloud-networker-1 agents from the database.

    [stack@director ~]$ for agent in $AGENT_UUIDS; do neutron agent-delete $agent ; done
    Deleted agent(s): 5024f9b5-7ad9-4692-8377-66d75323e466
    Deleted agent(s): 9f49adba-50a1-48ca-b55d-797668c33670
    Deleted agent(s): b66221f8-61cf-4017-9dcb-00a9e32ecde4
    Deleted agent(s): b6b1e492-9420-4406-be83-e4d932984654

Chapter 4. Rescheduling Tenant Routers

Reschedule all tenant routers on all Networker nodes.

  1. Verify that all the existing L3 agents are marked alive as indicated by :-), and that the number of agents are correct. In the foregoing examples, there were three Networker nodes. So there would be three neutron-l3-agent lines. For example:

    [stack@director ~]$ openstack network agent list -c ID -c Binary -c Host -c Alive | grep neutron-l3-agent
    | 41d3-ab4e-66f1267ce4f8 | neutron-l3-agent | overcloud-networker-0 | :-) |
    | 4ba6-9696-623759039af8 | neutron-l3-agent | overcloud-networker-2 | :-) |
    | 4112-b3e3-e93fb3826ce7 | neutron-l3-agent | overcloud-networker-3 | :-) |
    (UUID truncated)
  2. Ensure that all routers are associated to an agent. Start by setting the number of agents that should be hosting each of the routers. This should match the max_l3_agents_per_router setting in neutron configuration (the default is 3).

    [stack@director ~]$ export MAX_L3_AGENTS=3
    Warning

    If you are not using using l3-ha, set MAX_L3_AGENTS to 1.

    Once the MAX_L3_AGENTS variable is set, continue by running the following script in the console (or from a Bash file).

    MAX_L3_AGENTS=${MAX_L3_AGENTS:-3}
    L3_AGENT_UUIDS=$(openstack network agent list -c ID -c Binary -f value | grep neutron-l3-agent | cut -d\  -f1)
    ROUTER_UUIDS=$(openstack router list -c ID -f value)
    
    for router_id in $ROUTER_UUIDS; do
    
       echo "Processing router $router_id"
    
       R_AGENTS=$(neutron l3-agent-list-hosting-router $router_id -f value  -c id)
       SHUFF_AGENTS=$(shuf -e $L3_AGENT_UUIDS)
       N_AGENTS=$(echo $R_AGENTS | wc -w)
    
       if [ "$MAX_L3_AGENTS" -gt "$N_AGENTS" ]; then
           COUNT=`expr $N_AGENTS - $MAX_L3_AGENTS`
           for agent_id in $SHUFF_AGENTS; do
    
               if echo "$R_AGENTS" | grep "$agent_id" >/dev/null ; then
               	  # skipping agent, since router is already associated to it
                  continue
               fi
               neutron l3-agent-router-add $agent_id $router_id
    
               N_AGENTS=`expr $N_AGENTS + 1`
               if [ "$N_AGENTS" -ge "$MAX_L3_AGENTS" ]; then
                  break
               fi
           done
       fi
    done

Chapter 5. Rescheduling Tenant DHCP Services

OpenStack enables DHCP automatic failure by default. This procedure ensures that existing networks are properly scheduled to several DHCP agents.

  1. Configure an environment variable to match the NeutronDhcpAgentsPerNetwork (dhcp_agents_per_network) configuration setting in the overcloud deployment templates. The default is 3.

    [stack@director ~]$ export MAX_DHCP_AGENTS=3
  2. Once the MAX_DHCP_AGENTS variable is set, run the following script in the console (or from a Bash file).

    MAX_DHCP_AGENTS=${MAX_DHCP_AGENTS:-3}
    DHCP_AGENT_UUIDS=$(openstack network agent list -c ID -c Binary -c Alive -f value | grep neutron-dhcp-agent | grep True | cut -d\  -f1)
    DHCP_NETWORK_UUIDS=$(openstack subnet list --dhcp  -c Network -f value)
    
    for network_id in $DHCP_NETWORK_UUIDS; do
    
       echo "Processing network $network_id"
    
       NET_AGENTS=$(neutron dhcp-agent-list-hosting-net $network_id -c id -c alive -f value | grep ":-)" | cut -f1 -d\ )
       SHUFF_AGENTS=$(shuf -e $DHCP_AGENT_UUIDS)
       N_AGENTS=$(echo $NET_AGENTS | wc -w)
    
       if [ "$MAX_DHCP_AGENTS" -gt "$N_AGENTS" ]; then
           COUNT=`expr $N_AGENTS - $MAX_DHCP_AGENTS`
           for agent_id in $SHUFF_AGENTS; do
    
               if echo "$NET_AGENTS" | grep "$agent_id" >/dev/null ; then
               	  # skipping agent, since network is already associated to it
                  continue
               fi
               neutron dhcp-agent-network-add $agent_id $network_id
    
               N_AGENTS=`expr $N_AGENTS + 1`
               if [ "$N_AGENTS" -ge "$MAX_DHCP_AGENTS" ]; then
                  break
               fi
           done
       fi
    
    done

Legal Notice

Copyright © 2018 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.