Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createLocator did not throw an exception #341

Open
hcbfmbm opened this issue Nov 29, 2017 · 7 comments
Open

createLocator did not throw an exception #341

hcbfmbm opened this issue Nov 29, 2017 · 7 comments

Comments

@hcbfmbm
Copy link

hcbfmbm commented Nov 29, 2017

With snappy0.9, we did not start locator successfully, and get these message:
Starting Thrift server for SnappyData at address snappydata-locator-0/10.244.7.213[1527] java.lang.Exception: No available status. Either status file ".snappylocator.ser" is not readable or reading the status file timed out. at com.gemstone.gemfire.internal.cache.CacheServerLauncher.waitForRunning(CacheServerLauncher.java:1489) at com.gemstone.gemfire.internal.cache.CacheServerLauncher.start(CacheServerLauncher.java:659) at com.pivotal.gemfirexd.tools.internal.GfxdServerLauncher.run(GfxdServerLauncher.java:796) at io.snappydata.tools.LocatorLauncher.run(GfxdLauncherOverrides.scala:73) at io.snappydata.tools.LocatorLauncher$.main(GfxdLauncherOverrides.scala:90) at io.snappydata.tools.LocatorLauncher.main(GfxdLauncherOverrides.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.gemstone.gemfire.internal.GemFireUtilLauncher.invoke(GemFireUtilLauncher.java:176) at com.pivotal.gemfirexd.tools.GfxdUtilLauncher.invoke(GfxdUtilLauncher.java:257) at io.snappydata.tools.SnappyUtilLauncher.invoke(SnappyUtilLauncher.scala:91) at io.snappydata.tools.SnappyUtilLauncher$.main(SnappyUtilLauncher.scala:137) at io.snappydata.tools.SnappyUtilLauncher.main(SnappyUtilLauncher.scala)

The reason is the method "createLocator" in InternalLocator.java failed, then .snappylocator.ser is not generated, "waitForRunning" in CacheServerLauncher.java report the ".snappylocator.ser" is not readable.
But we did not known why does "createLocator" fail? The method "createLocator" did not throw exception. Does anyone help adding "throw Exeption" for method "createLocator"?

@sumwale
Copy link

sumwale commented Nov 29, 2017

@hcbfmbm Can you attach the locator logs?

@hcbfmbm
Copy link
Author

hcbfmbm commented Nov 30, 2017

@sumwale The above message is the only information we could get. We start locator by kubernets+docker, the message is the docker's logs. Because the locator start fail, there is nothing in "start_snappylocator.log".

@sumwale
Copy link

sumwale commented Nov 30, 2017

@hcbfmbm And there is no "snappylocator.log" either? Can you tell the full conf/locators used to start? @ashetkar @dshirish have any of you tried running snappydata on kubernetes and any known pitfalls?

@dshirish
Copy link

@hcbfmbm
How are you launching the cluster? Could you share your yaml configuration files?

@hcbfmbm
Copy link
Author

hcbfmbm commented Dec 1, 2017

@sumwale There is no "snappylocator.log" generated. @dshirish the yaml is

kind: StatefulSet
metadata:
  name: snappydata-locator
spec:
  serviceName: "snappydata-locator"
  replicas: 1
  template:
    metadata:
      labels:
        app: snappydata-locator
    spec:
      nodeSelector:
        snappydata-locator: locator
      containers:
      - name: snappydata-locator
        # Runs the current snappydata release
        image: ***.com/sensebd/sensesearch-cassandra:1.1.0.0-snappy-0.9
        imagePullPolicy: Always
        resources:
          requests:
            memory: "1024Mi"
            cpu: "200m"
        env:
          - name: MY_NODE_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
        ports:
        - containerPort: 9990
          name: locator
        - containerPort: 1527
          name: jdbc
#        livenessProbe:
#          tcpSocket:
#            port: 9990
#          initialDelaySeconds: 80
#        readinessProbe:
#          tcpSocket:
#            port: 9990
#          initialDelaySeconds: 80
        command:
          - "/bin/bash"
          - "-ecx"
          - |
            exec /bin/bash -c "umask 000 && mkdir -p /opt/snappydata/wd/locator && export USER_ID=$(id -u) && export GROUP_ID=$(id -g) && export LD_PRELOAD=/usr/lib64/libnss_wrapper.so && export NSS_WRAPPER_GROUP=/etc/group && start st_locator -peer-discovery-port=9990 -dir=/opt/snappydata/wd/locator -member-timeout=30000 -client-port=1527 -enable-network-partition-detection=true -bind-address=${MY_NODE_NAME} "
        lifecycle:
          preStop:
            exec:
              command:
              - /opt/snappydata/sbin/snappy-locator.sh stop -dir=/opt/snappydata/wd/locator
        volumeMounts:
          - name: snappydata-data
            #mountPath: /opt/snappydata/wd/locator
            mountPath: /opt/snappydata/wd
      volumes:
        - name: snappydata-data
          hostPath:
            path: /data/snappydata/wd
            #path: /data/snappydata/wd/locator
      terminationGracePeriodSeconds: 60

@ashetkar
Copy link

ashetkar commented Dec 1, 2017

@hcbfmbm
So you are using a custom docker image of SnappyData with modifications to "start" script? Can you share the script and the steps used to launch the cluster? That may help us diagnose the issue further.

While we could not reproduce the failure you are seeing but found few other issues with SnappyData's docker image. We'll fix those and update you.

@hcbfmbm
Copy link
Author

hcbfmbm commented Dec 6, 2017

@ashetkar In fact, we are not sure whether the problem is from kubernet or snappydata. A error occurred during the locator creating process, but the method "createLocator" did not throw any exception. Then we could get any information about the error reason. We suggest the method "createLocator" should be added "throw exception".
the start script is as follows:

#!/usr/bin/env bash

set +x

#
# Copyright (c) 2016 SnappyData, Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"); you
# may not use this file except in compliance with the License. You
# may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
# implied. See the License for the specific language governing
# permissions and limitations under the License. See accompanying
# LICENSE file.
#
cat /etc/passwd > /tmp/passwd
echo "$(id -u):x:$(id -u):$(id -g):dynamic uid:/opt/snappydata:/bin/false" >> /tmp/passwd

export NSS_WRAPPER_PASSWD=/tmp/passwd
export NSS_WRAPPER_GROUP=/etc/group
export LD_PRELOAD=/usr/lib64/libnss_wrapper.so

echo "starting sshd service"
# TODO 取消 sshd
# service sshd start

HOSTNAME=`hostname`

IP=`ping -c 1 $HOSTNAME | grep "64 bytes from"|awk '{print $5}' | awk -F"(" '{print $2}' | awk -F")" '{print $1}'`
sbin=/opt/snappydata/sbin
die() { echo "ERROR: $*" 1>&2; exit 1; }

export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export SH_ALL_LIB_HOME=/usr/hdp_3rd/libs/feature_cpp
# value from yaml env
# export SH_MODEL_LIB_PATH=/usr/hdp_3rd/libs/model241
export LD_LIBRARY_PATH=${SH_MODEL_LIB_PATH}:${SH_ALL_LIB_HOME}/lib/iva-license/lib:${SH_ALL_LIB_HOME}/lib/license/verify_1.3.3_native:${SH_ALL_LIB_HOME}/lib/license/cuda8/lib64:${SH_ALL_LIB_HOME}/lib/license/cuda8/lib64/stubs:${SH_ALL_LIB_HOME}/lib/license/sdk/linux-x86_64:${SH_ALL_LIB_HOME}/lib/license/x86_64-linux-gnu:${SH_ALL_LIB_HOME}/lib/iva-license/lib:${SH_ALL_LIB_HOME}/lib/license/others/ubuntu1404

case "$1" in
   "")
      echo "Usage: start [server|lead|locator|all]"
      exit 1
      ;;
   server)
      echo $IP > /opt/snappydata/conf/servers
      $sbin/snappy-servers.sh start "${@:2}"
      ;;
   locator)
      echo $IP > /opt/snappydata/conf/locators
      sbin/snappy-locators.sh start "${@:2}"
      ;;
   st_locator)
      # echo $IP > /opt/snappydata/conf/locators
      echo "params: ${@:2}"
      /opt/snappydata/sbin/snappy-locator.sh start "${@:2}"
      ;;
   st_leader)
      # echo $IP > /opt/snappydata/conf/locators
      echo "params: ${@:2}"
      # start dongle
      if [ -d /etc/init.d ];then
          script_dir=/etc/init.d
      elif [ -d /etc/rc.d ];then
          script_dir=/etc/rc.d
      else
          echo "Unsupported init script system!"
          echo "Aborting..."
          exit 1
      fi
      ${script_dir}/aksusbd start
      echo "aksusbd processes : $(ps -ef | grep aksusbd)"
      #
      /opt/snappydata/sbin/snappy-lead.sh start "${@:2}"
      ;;
   st_server)
      # echo $IP > /opt/snappydata/conf/locators
      echo "params: ${@:2}"
      /opt/snappydata/sbin/snappy-server.sh start "${@:2}"
      ;;
   lead)
      echo $IP > /opt/snappydata/conf/leads
      $sbin/snappy-leads.sh start "${@:2}"
      ;;
   all)
      $sbin/snappy-start-all.sh -client-bind-address=0.0.0.0 -prefer-netserver-ipaddress=true "${@:2}"
      ;;
   cmd)
      echo "starting terminal."
      ;;
   * )
   echo "Usage: start [server|lead|locator|all]"
   die "Invalid Argument"
esac


tail -f /dev/null

Thanks a lot for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants