Wednesday, December 31, 2014

x86 Hardware RAID Traps with BS_RAID_CHK


The following script is designed to run on Solaris x86 or Redhat systems with LSI and Adaptec hardware controllers.  It will check for degraded states of those controllers using mpt-status, arccon or raidctl depending on hardware vendor and OS.   If a degraded state is found it will send an snmptrap to your snmptrap collector.

#!/usr/bin/perl
#################################################################
# Script checks status of hardware raid on X86 hardware        #
# Supported OS's: Solaris x86, Redhat                #
# Supported Controller's: LSI & Adaptec                #
# Note: Requires mpt-status for LSI                #
# Note: Requires arccon for adaptec                #
# Note: Requires raidctl for Solaris x86            #
# Sends trap if degraded state                    #
#################################################################
use strict;
my $prefix = "bs_raid_chk";
my $servicechk = "unix_traps";
my $community = "asdpublic";
my $manager = "10.66.65.23";
my $raidctl = '/usr/sbin/raidctl';
my $mptstatus = '/usr/sbin/mpt-status';
my $arcconf = '/usr/StorMan/arcconf';
my (@components,@command);
my ($num,$status,$sendtrap,$volume);
### If Solaris System use this check ###
if (`uname -a` =~ /SunOS/) {
    if ( -e $raidctl) {
        @command = `$raidctl -S`;
        foreach (@command) {
            chomp();
            @components = split();
            $num = $components[1] + 2;
            $status = "$components[$#components] - Controller: $components[0], RAID: $components[$num], Number of Disks: $components[1]\n";
            if ($_ =~ /DEGRADED/) {
                system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                system "/bin/touch /tmp/$prefix.CRITICAL";
                system "/usr/sfw/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1005 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                        exit;
            } elsif ($_ =~ /SYNC/) {
                system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                                system "/bin/touch /tmp/$prefix.WARNING";
                system "/usr/sfw/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1006 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                        exit;
            } elsif ($_ =~ /OPTIMAL/) {
                if ( !-e "/tmp/$prefix.OK" ) {
                    system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                    system "/bin/touch /tmp/$prefix.OK";
                    system "/usr/sfw/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1007 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                }
            }
        }
    }
}
### If Linux system use this check  ###
if (`uname -a` =~ /Linux/) {
    # If system has LSI controller, then mptstatus should be installed
    if (-e $mptstatus) {
        my $modstatus = `/sbin/lsmod |grep mptctl|wc -l`;
        chomp($modstatus);
        if ($modstatus eq "0") {
            my $modload = `/sbin/modprobe mptctl`;
            $modstatus = `/sbin/lsmod |grep mptctl|wc -l`;
            chomp($modstatus);
            if ($modstatus eq "0") { print "ABORT: Failed to load mptctl module.\n";exit;}
        }
        my $controller = `$mptstatus -p -s|grep Found`;
        chomp($controller);
        my ($id,$junk) = split(/,/,$controller);
        $id =~ s/Found SCSI id=//g;
        @command = `$mptstatus -i $id -s`;
        $status="";
        foreach (@command) {
            chomp();
            $status = "$status $_";   
        }
        $status = "$status";
        #print "$status\n";
        foreach (@command) {
            chomp();
            if ( $_ =~ /DEGRADED/ ) {
                system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                system "/bin/touch /tmp/$prefix.CRITICAL";   
                system "/usr/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1005 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                exit;
            } elsif ($_ =~ /SYNC/ ) {
                system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                system "/bin/touch /tmp/$prefix.WARNING";
                system "/usr/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1006 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                exit;
            } elsif ($_ =~ /OPTIMAL/ ) {
                if ( !-e "/tmp/$prefix.OK" ) {
                    system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                    system "/bin/touch /tmp/$prefix.OK";
                    system "/usr/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1007 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                }
            }
        }
    }
    # if system has Adaptec controller then arcconf should be installed
    if ( -e $arcconf ) {
        @command = `$arcconf getconfig 1|grep Status|grep :`;
        foreach (@command) {
            if (( $_ =~ /Controller Status/ ) && ($_ !~ /Optimal/ )) {
                $status = "Controller not optimal";
                system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                system "/bin/touch /tmp/$prefix.CRITICAL";
                system "/usr/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1005 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                exit;
            }
            if (( $_ =~ /  Status  / ) && ($_ !~ /Optimal/ )) {
                $status = "Battery not optimal";
                system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                system "/bin/touch /tmp/$prefix.WARNING";
                system "/usr/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1006 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                exit;
            }
            if (( $_ =~ /Status of logical device/) && ($_ !~ /Optimal/ )) {
                $status = "Logical HW RAID Volume not optimal";
                system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                system "/bin/touch /tmp/$prefix.CRITICAL";
                system "/usr/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1005 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                exit;
            }
            if ( $_ =~ /Optimal/ ) {
                if ( !-e "/tmp/$prefix.OK" ) {
                    $status = "Hardware RAID - OK";
                    system "/bin/rm /tmp/$prefix.* >/dev/null 2>&1";
                    system "/bin/touch /tmp/$prefix.OK";
                    system "/usr/bin/snmptrap -v 2c -c $community $manager '' .1.3.6.1.4.1.11.2.17.1.0.1007 .1.3.6.1.4.1.11.2.17 s \"$servicechk\" .1.3.6.1.4.1.11.2.17 s \"$status\"";
                }
            }
        }
    }   
}
exit; 


Perl Script to Generate Logon.bat for SAMBA Users


The following script will generate a vanilla logon.bat file for SAMBA users. 

#!/usr/bin/perl
################################
# Usage: smb-logon-script      #
################################

$startpath="/data/smb-logon-scripts";
$endpath="/data/netlogon/scripts";
$smbhost = "sambahost.domain.com";


@alpha = ("g"..."t","v"..."z","aa"..."zz");
if (! defined $ARGV[0] ) {
        print " Usage: smb-logon-script \n";
        exit;
}
$username = $ARGV[0];
@group = `/usr/bin/getent group|/bin/grep $username |/bin/cut -d: -f1 -`;
$counter=0;
open FILE, ">$startpath/$username.bat.unix";
foreach $group (@group) {
        print FILE "net use $alpha[$counter]: \\\\$smbhost\\$group";
        $counter++;
}
close (FILE);
$convert = `/usr/bin/dos2unix < $startpath/$username.bat.unix > $endpath/$username.bat`;
exit;

OpenStack Neutron Distributed Virtual Routing Architectural Overview (Icehouse vs Juno)

Layer 3 Routing in Icehouse:


Layer 3 Routing in Juno with DVR:


Sun Microsystems SunFire V100

Here is a trip down memory lane today with the old SunFire V100.    


And under the cover:




Ceph Architectural Overview


Monday, December 29, 2014

Persistantly Bind Tape Devices in Solaris via Perl

The following script will look for fiber channel tape devices and then configure thedevlinks.tab file with the appropriate information so the tape drives will persistently bind to the same device across reboots on a Solaris server.   This script was tested on Solaris 10.

#!/usr/bin/perl
use strict;
my($junk,$path,$devices,$dev,$file);
my(@devices,@file);
my $date = `date +%m%d%Y`;
$file = `/usr/bin/cp /etc/devlink.tab /etc/devlink.tab.$date`;
@file = `cat /etc/devlink.tab`;
@file = grep !/type=ddi_byte:tape/, @file;
open (FILE,">/etc/devlink.tab.new");
print FILE @file;
close (FILE);
 
@devices = `ls -l /dev/rmt/*cbn|awk {'print \$9 \$11'}`;
open (FILE,">>/etc/devlink.tab.new");
foreach $devices (@devices) {
        chomp($devices);
        ($dev,$path) = split(/\.\.\/\.\./,$devices);
        $dev =~ s/cbn//g;
        $dev =~ s/\/dev\/rmt\///g;
        $path =~ s/:cbn//g;
        ($junk,$path) = split(/st\@/,$path);
        print FILE "type=ddi_byte:tape;addr=$path;\trmt/$dev\\M0\n";
}
close (FILE);
$file = `/usr/bin/mv /etc/devlink.tab.new /etc/devlink.tab`;
exit;

Ceph Repair One Liner

The following one liner will look for inconsistent page groups in a Ceph page group dump and repair them.    Nice quick way to fix-up inconsistencies!

ceph pg dump | grep -i incons | cut -f 1 | while read i; do ceph pg repair ${i} ; done

Solaris LUN Online Report

If you are using fiber channel storage with Solaris in a multipath configuration, sometimes before fabric maintenance or array maintenance you might want to check and confirm the status of all the paths on the Solaris client.   The following script utilizing luxadm will report on the status of each path for a fiber channel device.

#!/bin/perl
@luns = `/usr/sbin/luxadm probe | grep Logical | sed -e 's/.*\://g'|grep rdsk`;
foreach $lun (@luns) {
        chomp($lun);
        $lun2 = $lun;
        $lun2 =~ s/\/dev\/rdsk\///g;
        print "Disk:$lun2\t";
        @luxadm = `/usr/sbin/luxadm display $lun`;
        $pathcount = 0;
        foreach $luxadm (@luxadm) {
                chomp($luxadm);
                if ($luxadm =~ /State/) {
                        $luxadm =~ s/State//g;
                        $luxadm =~ s/^\s+//;
                        print "Path$pathcount:$luxadm\t";
                        $pathcount++;
                }
        }
        print "\n";
}

The output from the script will look something like the output below:

#perl pathfinder.pl
Disk:c6t60060E80054337000000433700000526d0s2    Path0:ONLINE    Path1:ONLINE    Path2:ONLINE    Path3:ONLINE
Disk:c6t60060E80054337000000433700000527d0s2    Path0:ONLINE    Path1:ONLINE    Path2:ONLINE    Path3:ONLINE
Disk:c6t60060E80054337000000433700000301d0s2    Path0:ONLINE    Path1:ONLINE    Path2:ONLINE    Path3:ONLINE
Disk:c6t60060E80054337000000433700000300d0s2    Path0:ONLINE    Path1:ONLINE    Path2:ONLINE    Path3:ONLINE
Disk:c6t60060E80054337000000433700000278d0s2    Path0:ONLINE    Path1:ONLINE    Path2:ONLINE    Path3:ONLINE
Disk:c6t60060E80054337000000433700000277d0s2    Path0:ONLINE    Path1:ONLINE    Path2:ONLINE    Path3:ONLINE
Disk:c6t60060E80054337000000433700000275d0s2    Path0:ONLINE    Path1:ONLINE    Path2:ONLINE    Path3:ONLINE
#

Cleanup Shared Memory Segments Solaris

If you have ever used an application in Solaris that uses shared memory and that application has a tendency to not cleanup those memory segments properly on shutdown (SAP comes to mind)  then this little Perl script is what you have been waiting for.

All this script does is take certain field output from the ipcs command and then iterate through it to determine if the memory is still actively in use by a process or if it it can safely be purged.   I recommended testing this out with the $memclean line commented out to gain a good understanding before you remove the comment and allow the cleanup (#$memclean = `ipcrm -m $shmem`;).  This script was tested on Solaris 10.


#!/usr/bin/perl
@sms = `ipcs -pm|grep "^m"|awk {'print \$2":"\$7":"\$8'}`;
foreach $sms (@sms) {
        chomp($sms);
       ($shmem,$cpid,$lpid) = split(/:/,$sms);
        $cpids=` ps -ef|grep $cpid|grep -v grep >/dev/null 2>&1;echo \$?`;
        $lpids=` ps -ef|grep $lpid|grep -v grep >/dev/null 2>&1;echo \$?`;
        chomp($cpids,$lpids);
        if (($cpids eq "1") && ($lpids eq "1")) {
                $message = "Memory can be reclaimed";
                #$memclean = `ipcrm -m $shmem`;
        } else {
                $message = "Memory active";
        }
        print "$shmem - $cpid - $lpid - $cpids - $lpids - $message\n";
}

The output from the script will look similar to the following:

# perl mem_clean.pl
587203562 - 10885 - 17891 - 0 - 0 - Memory active
922746991 - 9728 - 10885 - 0 - 0 - Memory active
150995435 - 9728 - 10885 - 0 - 0 - Memory active
150995432 - 9728 - 17891 - 0 - 0 - Memory active
150995398 - 17421 - 17891 - 1 - 0 - Memory active
150995396 - 13421 - 13891 - 1 - 1 - Memory can be reclaimed
150995387 - 9728 - 10885 - 0 - 0 - Memory active
150995382 - 9728 - 10885 - 0 - 0 - Memory active
150995380 - 9728 - 10885 - 0 - 0 - Memory active
150995379 - 9728 - 10885 - 0 - 0 - Memory active
150995377 - 9728 - 17891 - 0 - 0 - Memory active
150995374 - 9728 - 17891 - 0 - 0 - Memory active
150995371 - 9727 - 10886 - 1 - 0 - Memory active
117440821 - 9728 - 10885 - 0 - 0 - Memory active
117440819 - 9728 - 10885 - 0 - 0 - Memory active
#

Saturday, December 20, 2014

Cleaning Up OpenStack Instances in Redhat Satellite or Spacewalk

When using OpenStack with instances that I wanted to have registered with Redhat Satellite or Spacewalk, I was left wondering what would happen to all those registered hosts once they were terminated in OpenStack?

If I chose to do nothing, the answer was I would be left of orphaned hosts in Redhat Satellite or Spacewalk and over time this could lead to higher license costs if leveraging support for Redhat Linux or just pure database bloat due to having all these previously used instances still referenced in my database.

This issue bothered me and I wanted a mechanism that would cleanup instances one they were terminated but the question was how to go about it?

Well I soon realized that OpenStack keeps a record of all the instances it ever created and or terminated.  It was the terminated part that would be a key component to what I wanted to accomplish.  I figured if I could mine out the deleted data of instances, I could cross check those against Redhat Satellite or Spacewalk.

The Perl script below does just that.   I have it run every 24 hrs out of cron and it first goes into the OpenStack nova database and scrapes the instances table for any instances that were marked deleted in the last 24 hours.   Any instances it finds it puts into an array that I then enumerate through using the spacecmd tools and check within Satellite or Spacewalk to see if the host is registered.  If the host is registered, I then remove the host given that it is no longer a valid host that is up and running.

#!/usr/bin/perl
$cmd=`rm -r -f /root/.spacecmd/spacewalk.schmaustech.com`;
$yestdate=`TZ=CDT+24 /bin/date +%y-%m-%d`;
#$yestdate=`TZ=CDT /bin/date +%Y-%m-%d`;
chomp($yestdate);
@delhosts=`mysql -e "select hostname,uuid,deleted_at from nova.instances where deleted_at is not null"|grep $yestdate`;
foreach $delhost (@delhosts) {
        ($hostname,$uuid) = split(/\s+/,$delhost);
        $uuid2 = $uuid;
        $uuid2 =~ s/-//g;
        @cmdout=`spacecmd -q system_details $hostname.schmaustech.com`;
        foreach $cmd (@cmdout) {
                chomp($cmd);
                if ($cmd =~ /$uuid2/) {
                        $message = "Removing from satellite hostname: $hostname with UUID: $uuid...\n";
                        $cmdtmp = `logger $message`;
                        $cmdtmp = `spacecmd -y -q system_delete $hostname.schmaustech.com`;
                }
        }
}
exit;

Configuring DVR in OpenStack Juno

Before Juno, when we deploy Openstack in production, there was always a painful point about the single l3-agent node which caused two issues: a performance bottleneck and a single point of failure (albeit there were some non-standard ways around this issue).   Now Juno comes with new Neutron features to provide HA L3-agent and Distributed Virtual Router (DVR).

DVR distributes East-West traffic via virtual routers running on compute nodes. Also virtual routers on compute nodes handle North-South floating IP traffic locally for VM running on the same node. However if floating IP is not in use, VM originated external SNAT traffic is still handled centrally by virtual router in controller/network node.  These aspects spread the load of network traffic across your compute nodes and your network controller nodes thus distributing network performance.

HA L3 Agent provides virtual router HA by VRRP. A virtual gateway IP is always available from one of controller/network nodes thus eliminating the single point of failure.

The following blog will discuss how to configure DVR in Juno in a complete configuration aspect.   In this example we used RHEL7 on Redhat’s RDO for Juno.

The host configuration is 3 nodes, one management node, and two compute nodes.   Each node has a data interface for access to the node itself and a bridge interface for the floating-ip network that allows instances access outside of their private subnet to the physical network.

I ran through a standard packstack install specifying GRE tunnels for my connectivity between my management and compute nodes.  Be aware that the current version of DVR only supports GRE or VXLAN tunnels as VLANS are not yet supported.    I then configured the setup as if I was using standard neutron networking for a multi-tenant setup, that is all my instances would route traffic through the l3-agent running on the management node (similar behavior in Icehouse and Havana).  Once I confirmed this legacy setup was working then moved on to changing it to use DVR on the compute nodes.


On the management node where the neutron server runs edit the following files: neutron.conf, l3_agent.ini, ml2_conf.ini and ovs_neutron_plugin.ini

In /etc/neutron/neutron.conf

Edit the lines to state the following by either adding or uncommenting them:

router_distributed = True
dvr_base_mac = fa:16:3f:00:00:00

Note:  When creating a network as admin, one can override the distributed router by using the following flag:  "--distributed False"

In /etc/neutron/l3_agent.ini

Edit the line to state the following:

agent_mode = dvr_snat

Note:  This will provide the SNAT translation for any instances that do not get assigned a floating-ip.  Therefore they will route through the central l3-agent on the management node if they need outside access but will not have a floating-ip associated.  Given the l3-agent at the management node can be HA in Juno, this will still not ne a single point of failure.  However we are not covering that topic in this article.

In /etc/neutron/plugins/ml2/ml2_conf.ini

Edit the line to state the following:

mechanism_drivers = openvswitch, l2population

In /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

Edit or add the lines to state the following:

l2_population = True
enable_distributed_routing = True

One each of the compute nodes do the following steps:

Make the ml2 plugin directory. copy over the ml2_conf.ini from neutron node and setup softlink:

mkdir /etc/neutron/plugins/ml2
rsync -av root@ctl1:/etc/neutron/plugins/ml2 /etc/neutron/plugins
cd /etc/neutron
ln -s plugins/ml2/ml2_conf.ini plugin.ini

Copy over the metadata_agent.ini from the neutron server node:

rsync -av root@ctl1:/etc/neutron/metadata_agent.ini /etc/neutron

In /etc/neutron/ l3_agent.ini

Edit the line to state the following:

agent_mode = dvr

In /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini

Edit or add the lines to state the following:

l2_population = True

enable_distributed_routing = True

One final step on the compute node is to associate the br-ex interface with the physical interface on the compute node that will bridge the floating-ip’s to the physical vlan.

ovs-vsctl add-port br-ex

Restart the openstack services on the management node.

Restart the openstack services on the compute node as well.  Also ensure you start the l3-agent and metadata service on the compute node.

If you plan on using Horizon to spin up instances and associate floating-ip’s, you will need to make the following edit in the Horizon code as there is a bug:  https://bugs.launchpad.net/horizon/+bug/1388305.   Without the code update, you will not see a list of valid ports to associate the floating-ip to on the instance.  This association does work from the cli however without modification.

Edit the following file:  /usr/share/openstack-dashboard/openstack_dashboard/api/neutron.py

Find the line:

p.device_owner == 'network:router_interface'

And replace it with:

p.device_owner == 'network:router_interface'   or p.device_owner == 'network:router_interface_distributed'

Restart the httpd service.

Once you have followed the steps above you should be able to spin up an instance and associate a floating-ip to it and that instance will be accessible via the compute node l3-agent.   You can confirm a proper namespace is setup by running the following on the compute node:

ip netns

fip-4a7697ba-c29c-4a19-9b92-2a9194e1d6de
qrouter-6b4a2758-3aa7-4603-9fcd-f86f05d0c62

The fip is the floating-ip namespace and the qrouter is just like the namespaces previously seen on a network management node.  You can use ip netns exec commands to explore those namespaces and further troubleshoot should the configuration not be working.

Another way to confirm traffic is coming to your instance directly on the compute node is to use tcpdump and sniff on the physical network interface that is bridging to the physical network for the floating-ip network.  Then while running tcpdump you can ping your instance from another host somewhere on your network and you will see the packets in the tcpdump.

DVR promising to provide a convenient way of distributing network traffic loads to the compute nodes of the instances on them and helps to alleviate the bottleneck of the neutron management node.

Thursday, November 27, 2014

Lookup Tenant of Floating IP Address in OpenStack

Lets say your security team is doing routine scanning and they find that a few of your OpenStack instances running in your cloud are not passing the security test, what do you do?

You whip up a quick and dirty bash script that takes the floating ip address as an argument and then provides the name of the tenant that ip address belongs to:

#!/bin/bash
FLOAT=`neutron floatingip-list |grep $1|awk -F '|' {'print $2'}`
TENANT=`neutron floatingip-show $FLOAT|grep tenant|awk -F '|' {'print $3'}`
keystone tenant-get $TENANT

Sample run:

 
 ./float2tenant.sh 10.63.10.193
+-------------+---------------------------------------------------------+
|   Property     |                          Value                                                    |
+-------------+---------------------------------------------------------+
| description | This is a sample project                                                     |
|   enabled   |                           True                                                        |
|      id         |             981690ddbe5347bda5c73415134d6664              |
|     name     |                     Project 1                                                       |
+-------------+---------------------------------------------------------+

Friday, May 16, 2014

Faking Out Ceph-Deploy in OpenStack

I wanted to build a functional Ceph deployment for testing but did not have hardware to use.   So I decided I would use my instances in OpenStack.   The image choice I used for this configuration was the stock RHEL 6.5 cloud image from Redhat.   However when I went to do a ceph-deploy install on my monitor server, I ran into this:

[root@ceph-mon ceph]# ceph-deploy install ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.2): /usr/bin/ceph-deploy install ceph-mon
[ceph_deploy.install][DEBUG ] Installing stable version firefly on cluster ceph hosts ceph-mon
[ceph_deploy.install][DEBUG ] Detecting platform for host ceph-mon ...
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph_deploy][ERROR ] UnsupportedPlatform: Platform is not supported:

It didn;t really say what platform it thought this was that was unsupported, but I knew that Redhat 6.5 was supported so it really did not make any sense.   What I discovered though was that the following file was missing within my cloud image:

/etc/redhat-release

So I manually add it:

vi /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5 (Santiago)

Then when I reran ceph-deploy it detected a supported platform:
 
[root@ceph-mon ceph]# ceph-deploy install ceph-mon
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.2): /usr/bin/ceph-deploy install ceph-mon
[ceph_deploy.install][DEBUG ] Installing stable version firefly on cluster ceph hosts ceph-mon
[ceph_deploy.install][DEBUG ] Detecting platform for host ceph-mon ...
[ceph-mon][DEBUG ] connected to host: ceph-mon
[ceph-mon][DEBUG ] detect platform information from remote host
[ceph-mon][DEBUG ] detect machine type
[ceph_deploy.install][INFO  ] Distro info: Red Hat Enterprise Linux Server 6.5 Santiago
[ceph-mon][INFO  ] installing ceph on ceph-mon
[ceph-mon][INFO  ] Running command: yum clean all


 

Cleaning Up Expired Tokens in OpenStack Keystone

Keystone is an OpenStack project that provides Identity, Token, Catalog and Policy services for use specifically by projects in the OpenStack family.  When a client obtains a token from Keystone, that token has a validity period before it expires.  However even after it is marked expired, it is kept in the MySQL database of OpenStack.  This can create issues if your environment is passing out a lot of tokens and can cause the token table to grow.

To prevent this infinite growth, you can implement the following command in a cron to clean up the expired tokens within the MySQL DB:

keystone-manage token-flush

Thursday, May 15, 2014

OpenStack Cinder: VNX Volume Cleanup

I recently had an issue where I had multiple Cinder volumes in OpenStack (Havana) that were stuck in a deleting state.   Unfortunately the trick of trying to put them back into a state of Available and trying to delete again did not work.    However I was able to come up with a solution to get the job completed and restore consistency.

In my example, my Cinder volumes were being provisioned on a EMC VNX.   So the first step I needed to do was validate if the volumes themselves were removed from the VNX. 

Cleanup on VNX:

1) Obtain the volume ID from either OpenStack Dashboard and/or CLI.
2) Log into Unisphere on the VNX that contains the volume pool where the volumes/luns for Cinder are being provisioned.
3) Select the volume pool and show the luns associated with the volume pool.
4) Filter on the luns using the volume ID obtained in step one.
5) Delete the lun.

Now that we have removed the reference on the VNX, we can continue to do the cleanup on the OpenStack side within the database itself.  This involves editing 3 tables in the cinder mysql database.

1)Obtain the volume ID from either OpenStack Dashboard and/or CLI.   Make note of the volume size as well.   You will also need to obtain the project/tenant ID that the volume belongs to.
2) Login to the OpenStack management controller that runs the MySQL DB.
3) Run the mysql command to access mysql.  Note your deployment may require password and hostname.
4) Select the cinder database with the following syntax:

 mysql>use cinder;

5) Next check if the volume id resides in the volume_admin_metadata table:

mysql> select * from volume_admin_metadata where volume_id="";

6) Delete the volume id if it does:

 mysql> delete from volume_admin_metadata where volume_id="";

7) Next check if the volume id resides in the volumes table:

 mysql> select * from volumes where id="";

8) Delete the volume id if it does:

mysql> delete from volumes where id="";

9) Next update the quota_usages table and reduce the values for the quota_usages fields for that project.  First get a listing to see where things are at:

mysql> select * from quota_usages where project_id="";

10) Then execute the update.  Depending on your setup you will have to update multiple fields from the output in step 9.  In the example below since I was clearing out all volumes for a given project tenant I was able to get away with the following update:

mysql> update quota_usages set in_use='0' where project_id="";

However in cases where you are removing just one volume, you will need to specify the project_id and resource type in the WHERE clause of your MySQL syntax to match on the right in_use field.  Further, your in_use will be either number of GB's  minus volume removed GB's or number of volumes minus one volume removed.

Once I completed this, my system was back in sync and the volumes stuck in Deleting status were gone.

Sunday, May 11, 2014

OpenStack Havana - Recovering Services Account When Deleted

I was working with one of my colleagues who had accidentally deleted the services account with OpenStack.   Unfortunately if this happens, it tends to break your setup in a really big way.   After opening a case with Redhat whose OpenStack distribution we were using led to no results.  I managed to reverse engineer where the services account resided and reestablish it.  Here are the steps I did:

Symptoms:

1) In web gui user gets "Oops something went wrong!" when trying to login.   User can get get valid token at command line (keystone token-get) but authorization fails.
2) Openstack-status shows the following:

                == Glance images ==
                Request returned failure status.
                Invalid OpenStack Identity credentials.
                ==Nova managed services ==
                ERROR: Unauthorized (HTTP 401)
                == Nova networks ==
                ERROR: Unauthorized (HTTP 401)
                == Nova instance flavors ==
                ERROR: Unauthorized (HTTP 401)
                == Nova instances ==
                ERROR: Unauthorized (HTTP 401)

Resolution:

Create New Services Project:

Create new "services" project/tenant via CLI (keystone tenant-create).
Obtain new "services" project/tenant ID via CLI (keystone tenant-list).

Determine NEW_SERVICES_ID:

Determine old project/tenant id of services project by looking at following users (nova,glance,neutron,heat,cinder) default_project_id in the user table of keystone database.   There default_project_id should all be the same and was the ID of the previous services project that was removed.


Edit MySQL Database:

use keystone;
update user set default_project_id="NEW_SERVICES_ID" where default_project_id="OLD_SERVICES_ID";
use ovs_neutron;
update networks set tenant_id="NEW_SERVICES_ID" where tenant_id="OLD_SERVICES_ID";
update subnets set tenant_id="NEW_SERVICES_ID" where tenant_id="OLD_SERVICES_ID";
update securitygroups set tenant_id="NEW_SERVICES_ID" where tenant_id="OLD_SERVICES_ID";