Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VNC Container failed to startup because of missing X11 connection (timing issue) #353

Open
mikevader opened this issue Oct 18, 2022 · 8 comments

Comments

@mikevader
Copy link

Bug description

When starting a new selenium browser over the moon-ui a new browser pod is scheduled correctly but the VNC connnection can not be established. We tested it with edge, chrome and firefox but in the following print outs I always use the output of chrome.

From our analysis it looks like X11 was started correctly but was ready after VNC was already started and therefore failed.

Analysis/Experienced Behaviour

According to the output of kubectl the vnc-server was terminated. Output of oc describe pod chrome-...:
kubectl-describe-chrome-101-0-4951-54-1-e9865711-e22c-4e1e-9d28-462416abd57b.log

I attached the log of the VNC Container but as far as I could see, the log says that VNC could not be started because of the missing connection to X11:
chrome-101-0-4951-54-1-e9865711-e22c-4e1e-9d28-462416abd57b-vnc-server.log

The X-Server Container seems to work and verified it with debugging the VNC Container were I could start VNC correctly. It looks to me like a timing issue. Here is what I did:

$ oc debug pod/chrome-101-0-4951-54-1-e9865711-e22c-4e1e-9d28-462416abd57b -c vnc-server
Starting pod/chrome-101-0-4951-54-1-e9865711-e22c-4e1e-9d28-462416abd57b-debug ...
Pod IP: ***
If you don't see a command prompt, try pressing enter.

$ x11vnc -xrandr -passwd selenoid -noxrecord -forever -display :0 -shared -rfbport 5900
18/10/2022 12:18:10 passing arg to libvncserver: -passwd
18/10/2022 12:18:10 passing arg to libvncserver: -rfbport
18/10/2022 12:18:10 passing arg to libvncserver: 5900
18/10/2022 12:18:10 x11vnc version: 0.9.16 lastmod: 2019-01-05  pid: 7
18/10/2022 12:18:10 Using X display :0
18/10/2022 12:18:10 rootwin: 0x50d reswin: 0x400001 dpy: 0x79c065a0
...
The VNC desktop is:      chrome-101-0-4951-54-1-e9865711-e22c-4e1e-9d28-462416abd57b:0
PORT=5900

Expected Behaviour

There should either be some flag for a startup delay or some type of retry or startup when x11 ready behaviour.

Additional Context

  • OpenShift 4.8.24 (Kubernetes v1.23.5+012e945)
  • Moon 2.3.7
  • Selenium Browsers:
    • microsoft-edge-stable 101.0.1210.39-1
    • google-chrome-stable 101.0.4951.54-1
    • firefox-mozilla-build 99.0.0-1
@aandryashin
Copy link
Member

aandryashin commented Oct 18, 2022 via email

@vania-pooh
Copy link
Member

@mikevader i.e. aerokube/xvfb-server:<your-moon-release>.

@mikevader
Copy link
Author

Yes we are using the xvfb-server image as you can see in the describe output. But looking at the checksums, the two images are identical.

@mikevader
Copy link
Author

I created a workaround which confirmed my suspicion about the timing issue:

I wrapped your VNC image with the following container:

FROM quay.io/aerokube/vnc-server:2.3.7
USER root

ADD wait-for-it.sh /opt
RUN chmod +x /opt/wait-for-it.sh

USER 4096

CMD ["x11vnc", "-xrandr", "-passwd", "selenoid", "-noxrecord", "-forever", "-display", ":0", "-shared", "-rfbport", "5900"]

ENTRYPOINT ["/opt/wait-for-it.sh", "localhost:6000", "-t", "10", "--", "/usr/sbin/init"]

With this it works. But it is more a hack than something else. I guess it would be a good idea if you add something similar to the official vnc-server image. I can give it a go but I could not find the correct repositories for those images.

@vania-pooh
Copy link
Member

@mikevader ok, one more possible solution is trying to use a bit older image: aerokube/xvfb-server:2.3.5.

@bukovjanmic
Copy link

Any update on this? We seem to run into the same issue with Moon 2.6.7 (we use official images), the vnc server container sometimes (e.g. 1 in 10 runs) exits with:

16/10/2024 11:45:12 passing arg to libvncserver: -passwd
16/10/2024 11:45:12 passing arg to libvncserver: -rfbport
16/10/2024 11:45:12 passing arg to libvncserver: 5900
16/10/2024 11:45:12 x11vnc version: 0.9.16 lastmod: 2019-01-05 pid: 11
16/10/2024 11:45:12 XOpenDisplay("127.0.0.1:0") failed.
16/10/2024 11:45:12 Trying again with XAUTHLOCALHOSTNAME=localhost ...

16/10/2024 11:45:12 ***************************************
16/10/2024 11:45:12 *** XOpenDisplay failed (127.0.0.1:0)

*** x11vnc was unable to open the X DISPLAY: "127.0.0.1:0", it cannot continue.
*** There may be "Xlib:" error messages above with details about the failure.
...

Maybe a simple loop with nslookup for X-server service should be added to the vnc container, since the order of container startup is not guaranteed by Kubernetes, the x-server may start too early (or too late).

@aandryashin
Copy link
Member

aandryashin commented Oct 16, 2024 via email

@palec99
Copy link

palec99 commented Nov 6, 2024

Hello,

we have tried version 2.7.2 and it is the same. @bukovjanmic has a suspicion that this might be related to startup order as mentioned in @mikevader comment - the solution is so far to wait for the service to start

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants