Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splunk Operator: indexers don't start if search-heads still starting #1390

Open
yaroslav-nakonechnikov opened this issue Oct 18, 2024 · 5 comments
Assignees
Labels

Comments

@yaroslav-nakonechnikov
Copy link

Please select the type of request

Bug

Tell us more

Describe the request

[yn@ip-100-65-8-59 /]$ kubectl get pods -n splunk-operator
NAME                                                  READY   STATUS    RESTARTS   AGE
splunk-43105-cluster-manager-0                        1/1     Running   0          19m
splunk-43105-license-manager-0                        1/1     Running   0          30m
splunk-c-43105-standalone-0                           1/1     Running   0          30m
splunk-e-43105-deployer-0                             0/1     Running   0          8m15s
splunk-e-43105-search-head-0                          0/1     Running   0          8m15s
splunk-e-43105-search-head-1                          0/1     Running   0          8m15s
splunk-e-43105-search-head-2                          /1     Running   0          8m15s
splunk-operator-controller-manager-58b545f67c-8rrhx   2/2     Running   0          31m

and then:

NAME                                                  READY   STATUS    RESTARTS   AGE
splunk-43105-cluster-manager-0                        1/1     Running   0          21m
splunk-43105-license-manager-0                        1/1     Running   0          32m
splunk-c-43105-standalone-0                           1/1     Running   0          32m
splunk-e-43105-deployer-0                             0/1     Running   0          11m
splunk-e-43105-search-head-0                          1/1     Running   0          11m
splunk-e-43105-search-head-1                          1/1     Running   0          11m
splunk-e-43105-search-head-2                          1/1     Running   0          11m
splunk-operator-controller-manager-58b545f67c-8rrhx   2/2     Running   0          34m
splunk-site3-43105-indexer-0                          0/1     Running   0          2m17s
splunk-site3-43105-indexer-1                          0/1     Running   0          2m17s
splunk-site3-43105-indexer-2                          0/1     Running   0          2m17s

this is unbelievable, and extremely strange that still, in 2.6.1 there is a dependency check between splunk search-heads and indexers!!!!

Expected behavior
Indexers should start without dependency of search-heads!

@yaroslav-nakonechnikov
Copy link
Author

yaroslav-nakonechnikov commented Oct 18, 2024

it was already reported before #1260, and then there were 2 calls, where i described why logic with dependency is broken for kubernetes deployment.

and now can test 2.6.1 and we still see, that part of platform can't be started just because of problematic logic.
old case: 3448046

@vivekr-splunk
Copy link
Collaborator

@yaroslav-nakonechnikov we will get back to you with regard to this issue.

@yaroslav-nakonechnikov
Copy link
Author

so, sadly, this is extremely painful, as there maybe issues like that:

FAILED - RETRYING: [localhost]: Initialize SHC cluster config (2 retries left).
FAILED - RETRYING: [localhost]: Initialize SHC cluster config (1 retries left).

TASK [splunk_search_head : Initialize SHC cluster config] **********************
fatal: [localhost]: FAILED! =>

{ "attempts": 60, "changed": false, "cmd": [ "/opt/splunk/bin/splunk", "init", "shcluster-config", "-auth", "admin:j3Q9SWJlLBOlc3RWejMnUb6e", "-mgmt_uri", "https://splunk-e-43345-search-head-1.splunk-e-43345-search-head-headless.splunk-operator.svc.cluster.local:8089", "-replication_port", "9887", "-replication_factor", "3", "-conf_deploy_fetch_url", "https://splunk-e-43345-deployer-service:8089", "-secret", "RNr25biFMA4Z3SUbXB3VGwW6", "-shcluster_label", "she_cluster" ], "delta": "0:00:00.806237", "end": "2024-10-31 08:05:54.588881", "rc": 24, "start": "2024-10-31 08:05:53.782644" }
STDERR:

WARNING: Server Certificate Hostname Validation is disabled. Please see server.conf/[sslConfig]/cliVerifyServerName for details.
Login failed

MSG:

non-zero return code

PLAY RECAP *********************************************************************
localhost : ok=132 changed=11 unreachable=0 failed=1 skipped=68 rescued=0 ignored=0

problem in that section, as i understand: https://github.com/splunk/splunk-ansible/blob/53a9a70897896e279b43478583b13256e75894a2/roles/splunk_search_head/tasks/search_head_clustering.yml#L6

and search heads in infinitive loop, which leads to none of indexers are started.

it happened on splunk-operator 2.6.1 and splunk 9.1.6

@yaroslav-nakonechnikov
Copy link
Author

extremly strange that standalone instance started without issues:

NAME                                                  READY   STATUS    RESTARTS      AGE
splunk-43345-cluster-manager-0                        1/1     Running   1 (70m ago)   79m
splunk-43345-license-manager-0                        1/1     Running   0             79m
splunk-c-43345-standalone-0                           1/1     Running   0             79m
splunk-e-43345-deployer-0                             0/1     Running   0             66m
splunk-e-43345-search-head-0                          0/1     Running   3 (14m ago)   65m
splunk-e-43345-search-head-1                          0/1     Running   3 (14m ago)   65m
splunk-e-43345-search-head-2                          0/1     Running   3 (14m ago)   65m
splunk-operator-controller-manager-5c684d667d-smgdq   2/2     Running   0             80m

@yaroslav-nakonechnikov
Copy link
Author

and with this test i can confirm, that 9.1.6 is not working at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants