Blog-S_Secure_100x385

Release: Enterprise Chef 11.2.1

Enterprise Chef 11.2.1 is a critical bug-fix release for customers who installed Enterprise Chef 11.2.0. It corrects a single defect experienced by customers who upgraded from earlier releases.

Bug Fixes:

  • Fixes an issue where private-chef was being changed to private\_chef unexectedly in upstart/runit configuration files

Notes:

If you upgrade from an earlier release of EC, your servers may now have two runit processes configured in upstart

  1. /etc/init/private-chef-runsvdir.conf
  2. /etc/init/private\_chef-runsvdir.conf

The second one is incorrect, introduced by the aforementioned issue in EC 11.2.0. In this condition, you will see two runsvdir processes running with many errors:

ps:

root       924     1  0 05:20 ?        00:00:00 runsvdir -P /opt/opscode/service log: /lock: temporary failure runsv oc_id: fatal: unable to lock supervise/lock: temporary failure runsv couchdb: fatal: unable to lock supervise/lock: temporary failure runsv bookshelf: fatal: unable to lock supervise/lock: temporary failure runsv postgresql: fatal: unable to lock supervise/lock: temporary failure runsv opscode-certificate: fatal: unable to lock supervise/lock: temporary failure 
root       926     1  0 05:20 ?        00:00:00 runsvdir -P /opt/opscode/service log: ry failure runsv opscode-expander: fatal: unable to lock supervise/lock: temporary failure runsv opscode-solr: fatal: unable to lock supervise/lock: temporary failure runsv rabbitmq: fatal: unable to lock supervise/lock: temporary failure runsv oc_bifrost: fatal: unable to lock supervise/lock: temporary failure runsv opscode-chef-mover: fatal: unable to lock supervise/lock: temporary failure 

pstree:

Correcting the error:

HA

  1. on both the active/bootstrap and standby backend: remove the errant runsvdir config file
    [code]root@backend1# rm -f /etc/init/private\_chef-runsvdir.conf
    root@backend2# rm -f /etc/init/private\_chef-runsvdir.conf
    [/code]
  2. On the standby (non-bootstrap) backend: reboot your server to clear all remaining orphaned processes and to restart runsvdir to a working state
    root@backend2# init 6
  3. On the standby backend: Verify that there is only a single runsvdir process and it is error-free (all dots)
    root@backend2# ps -ef |grep 'runsvdir -P /opt/opscode/service'
    root       921     1  0 05:35 ?        00:00:00 runsvdir -P /opt/opscode/service log: ...........................................................................................................................................................................................................................................................................................................................................................................................................
    
    root@backend2# private-chef-ctl ha-status
    [OK] keepalived HA services enabled.
    [OK] DRBD disk replication enabled.
    [OK] DRBD partition /dev/opscode/drbd found.
    [OK] DRBD device /dev/drbd0 found.
    [OK] cluster status = backup
    [OK] did not find VIP IP address and I am not master
    [OK] found VRRP communications interface eth0
    [OK] my DRBD status is Connected/Secondary/UpToDate and I am not master
    [OK] my DRBD partition is not mounted and I am not master
    [OK] DRBD primary IP address pings
    [OK] DRBD secondary IP address pings
    [OK] bookshelf is not running, and I am not master.
    [OK] couchdb is not running, and I am not master.
    [OK] keepalived is running.
    [OK] nginx is not running, and I am not master.
    [OK] oc\_bifrost is not running, and I am not master.
    [OK] oc\_id is not running, and I am not master.
    [OK] opscode-account is not running, and I am not master.
    [OK] opscode-certificate is not running, and I am not master.
    [OK] opscode-erchef is not running, and I am not master.
    [OK] opscode-expander is not running, and I am not master.
    [OK] opscode-expander-reindexer is not running, and I am not master.
    [OK] opscode-org-creator is not running, and I am not master.
    [OK] opscode-solr is not running, and I am not master.
    [OK] opscode-webui is not running, and I am not master.
    [OK] postgresql is not running, and I am not master.
    [OK] rabbitmq is not running, and I am not master.
    [OK] redis\_lb is not running, and I am not master.
    
    [OK] all checks passed. 
    
  4. on the active/bootstrap backend: trigger a failover and then reboot
    root@backend1# private-chef-ctl stop keepalived
    ok: down: keepalived: 1s, normally up
    root@backend1# sleep 30
    root@backend1# init 6
    
  5. on the bootstrap (now standby backend): Verify that there is only a single runsvdir process and it is error-free (all dots)
    root@backend1# ps -ef |grep 'runsvdir -P /opt/opscode/service'
    root       921     1  0 05:35 ?        00:00:00 runsvdir -P /opt/opscode/service log: ...........................................................................................................................................................................................................................................................................................................................................................................................................
    
  6. On the active (non-bootstrap) backend, trigger another failover back to the bootstrap backend
    root@backend2# private-chef-ctl restart keepalived
  7. Test your now-active bootstrap backend to ensure full functionality (note: you may need to point your api_fqdn address at localhost using the server’s /etc/hosts file
    root@backend1# private-chef-ctl ha-status
    [OK] keepalived HA services enabled.
    [OK] DRBD disk replication enabled.
    [OK] DRBD partition /dev/opscode/drbd found.
    [OK] DRBD device /dev/drbd0 found.
    [OK] cluster status = master
    [OK] found VIP IP address and I am master
    [OK] found VRRP communications interface eth0
    [OK] my DRBD status is Connected/Primary/UpToDate and I am master
    [OK] my DRBD partition is mounted and I am master
    [OK] DRBD primary IP address pings
    [OK] DRBD secondary IP address pings
    [OK] bookshelf is running correctly, and I am master.
    [OK] couchdb is running correctly, and I am master.
    [OK] keepalived is running.
    [OK] nginx is running correctly, and I am master.
    [OK] oc\_bifrost is running correctly, and I am master.
    [OK] oc\_id is running correctly, and I am master.
    [OK] opscode-account is running correctly, and I am master.
    [OK] opscode-certificate is running correctly, and I am master.
    [OK] opscode-chef-mover is running.
    [OK] opscode-erchef is running correctly, and I am master.
    [OK] opscode-expander is running correctly, and I am master.
    [OK] opscode-expander-reindexer is running correctly, and I am master.
    [OK] opscode-org-creator is running correctly, and I am master.
    [OK] opscode-solr is running correctly, and I am master.
    [OK] opscode-webui is running correctly, and I am master.
    [OK] postgresql is running correctly, and I am master.
    [OK] rabbitmq is running correctly, and I am master.
    [OK] redis\_lb is running correctly, and I am master.
     
    [OK] all checks passed.
     
    root@backend1# private-chef-ctl test
    ...
    Finished in 1 minute 23.67 seconds
    116 examples, 0 failures, 3 pending
    
    • Note: pending errors are OK
    • Note: This command may fail on the first attempt after a fail-over, please contact support if it continues to fail.
  8. On your frontends, follow the Standalone procedure as detailed below
  9. Upgrade following the normal procedure to Enterprise Chef 11.2.1

Standalone

  1. stop the errant runsvdir process:
    # initctl status private\_chef-runsvdir
    private\_chef-runsvdir start/running, process 926
     
     # initctl stop private\_chef-runsvdir
    private\_chef-runsvdir stop/waiting
    
  2. remove the errant runsvdir config file
    # rm -f /etc/init/private\_chef-runsvdir.conf
    
  3. stop all private-chef services
    # private-chef-ctl stop
    
  4. reboot your server to clear all remaining orphaned processes and to restart runsvdir to a working state.
  5. Verify that there is only a single runsvdir process and it is error-free (all dots)
    # ps -ef |grep 'runsvdir -P /opt/opscode/service'
    root       921     1  0 05:35 ?        00:00:00 runsvdir -P /opt/opscode/service log: ...........................................................................................................................................................................................................................................................................................................................................................................................................
    
  6. Test your system to ensure full functionality (note: you may need to point your api_fqdn address at localhost using the server’s /etc/hosts file
    # private-chef-ctl test
    ...
    Finished in 1 minute 23.67 seconds
    116 examples, 0 failures, 3 pending
    

    Note: pending errors are OK

Irving Popovetsky

Irving leads the Customer Engineering team at Chef