Sunday, June 28, 2015

Resolving unreach or unavail nodes in OpenLava-3.0

After configuring OpenLava-3.0 using the tar ball and following the instruction according to the OpenLava – Getting Started Guide After fixing OpenLava with LM is Down Error Messages for OpenLava-3.0, you may errors

HOST_NAME          STATUS       JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV
compute-c00     unreach              -     16      0      0      0      0      0
headnode-h00     ok              -     16      0      0      0      0      0

Suggestions:
  1. Check your permission where openlava-3.0 reside. Make sure the HeadNode and ComputeNode has the user and group openlava and openlava have permission on the folder
    drwxr-xr-x. 10 openlava openlava 4096 Jun 26 00:32 openlava-3.0
  2. Install pdsh. See Installing pdsh to issue commands to a group of nodes in parallel in CentOS on all the compute nodes and use pdcp to copy /etc/passwd /etc/shadow /etc/group to all the nodes
    # pdcp -a /etc/passwd /etc
    # pdcp -a /etc/shadow /etc
    # pdcp -a /etc/group /etc
  3. Make sure your /etc/hosts reflect the short hostname of the cluster both in the HeadNode and ComputeNode. Refrain from putting 2 hostnames per line.
  4. Check your firewalls settings. Make sure the ports 6322:6325 are opened.
  5. Ensure your NTP are synchronized across the clients and HeadNode with the designated NTP Server. If the NTP

No comments: