Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factorio crashes under Steam Linux Runtime 1.0 if uid not in /etc/passwd, e.g. systemd-homed #705

Open
zhaoweny opened this issue Nov 8, 2024 · 54 comments

Comments

@zhaoweny
Copy link

zhaoweny commented Nov 8, 2024

Your system information

  • Steam Runtime Version: (steam version 1730853027, steam-runtime_0.20241024.105847)
  • Distribution (e.g. Ubuntu 18.04): Arch Linux
  • Link to your full system information (Help -> Steam Runtime Diagnostics) in a Gist: see link
  • Have you checked for system updates?: yes - fresh install and up to date
  • What compatibility tool are you using?: Steam Linux Runtime
  • What versions are listed in steamapps/common/SteamLinuxRuntime/VERSIONS.txt? 0.20240806.0
  • What versions are listed in steamapps/common/SteamLinuxRuntime_soldier/VERSIONS.txt? 0.20240917.101880
  • What versions are listed in steamapps/common/SteamLinuxRuntime_sniper/VERSIONS.txt? 0.20240916.101795

Please describe your issue in as much detail as possible:

When launching Factorio on my physical Arch Linux machine, it crashes almost immediately. I tried reinstall Steam, reinstall Arch Linux then reinstall Steam, the issue presists.

Currently I found 3 workaround:

  • run steam with CLI flag -compat-force-slr off
  • modify the launch option to steam-runtime-launch-options -- %command% and configure container runtime to None
  • launch Factorio from a fresh Arch Linux virtual machine - factorio can launch normally

FYI same issue on Factorio Forum

attaching slr log file as requested: slr-app427520-t20241108T233119.log

Steps for reproducing this issue:

  • From a fresh Arch Linux install, install steam and Factorio
  • launch Factorio from steam library page

expected behavior: game loads up loading screen, lands me on main menu

actual behavior:

  • game crashes without game specific log file
  • from system journal I can see that crash handler prints out the game encounters a SIGSEGV
@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

launch Factorio from a fresh Arch Linux virtual machine

To clarify, is that running it by copying the installed game files and running it like an independent non-Steam game, without Steam being involved at all?

Or do you have both Steam and Factorio installed in the VM?

@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

modify the launch option to steam-runtime-launch-options -- %command% and configure container runtime to None

Since you've already discovered that developer tool: what happens if you switch the container runtime to SteamLinuxRuntime_sniper? Does that help any? (If you don't already have sniper installed, you can get it by running steam steam://install/1628350)

I don't see any immediately obvious problems in Factorio with bundled libraries or anything like that. Presumably it's making some sort of assumption about the host system that isn't true any more when it runs in a container, but it's hard to say what that assumption would be.

We've had people running Factorio successfully in Steam Linux Runtime 1.0 (scout) in the past (#262) but presumably it doesn't work in all system configurations.

@TTimo
Copy link
Collaborator

TTimo commented Nov 8, 2024

I'm on Arch and Factorio launches just fine in scout SLR fwiw.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

About Arch Linux VM with Steam and Factorio:

I was installed steam, from steam installs Factorio then launching Factorio from steam in that virtual machine.
The VM is now nuked due to clean reinstall of Arch Linux, sadly.

About the developer tool:

No, changing runtime does not solve this issue for me.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

I made a test user with useradd instead of systemd-homed managed (my normal account), factorio is working even with SLR. So the difference between not working and working could be summarized as:

user home directory uses container runtime work as expected
btrfs subvolume yes yes
systemd-homed yes no
systemd-homed no yes

systemd-homed managed user directory is located at /home/username.homedir and bind mounts to /home/username on unlock (I asssume)

@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

Interesting...

Your SLR log says we're using /home/zhaow as your home directory. Is /home/zhaow a "real" directory, rather than a symbolic link? Does it contain everything that you think it should normally contain?

I also notice this in your log:

   0.000 Error MessageDialog.cpp:218: Unable to show message dialog. SDL Error: [zenity reported error or failed to launch: 255]

so maybe this means there's a problem with X11 or Wayland?

Do other Steam games launch successfully in the same runtime? Floating Point is a good one to test, because it's very small (and is free).

@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

It would be useful if you can get a new log with STEAM_LINUX_RUNTIME_VERBOSE=1 in addition to STEAM_LINUX_RUNTIME_LOG=1.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

about the /home/zhaow directory: it's a real directory as far as I can tell. here's some relevant console output:

$ file /home/zhaow
/home/zhaow: directory

$ realpath /home/zhaow
/home/zhaow

$ ls /home
zhaow  zhaow.homedir  zwydbg

$ mount | grep /home
/dev/nvme0n1p2 on /home type btrfs (rw,relatime,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=257,subvol=/@home)
/dev/nvme0n1p2 on /home/zhaow type btrfs (rw,nosuid,nodev,relatime,idmapped,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=264,subvol=/@home/zhaow.homedir)

I'll collect logs as soon as possible. Please wait for a moment

@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

it's a real directory as far as I can tell

Yes, I agree. (The reason I asked is that symlinks sometimes break container frameworks, including ours - but your home directory isn't a symlink, so that should be OK)

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

I collected 3 log files this round:

In the console log you can tell I launched factorio with developer tool steam-runtime-launch-options and tested out each runtime, where the last option being None, and it's known good and works as expected

@zhaoweny
Copy link
Author

zhaoweny commented Nov 8, 2024

I'm abandoning this issue and I'll move to a normal non-homed user. Seems systemd-homed is not stable enough for my use cases :(

@zhaoweny zhaoweny closed this as completed Nov 8, 2024
@smcv
Copy link
Contributor

smcv commented Nov 8, 2024

I notice that you're using AMDVLK, which has been known to cause weird issues in the past. We generally recommend Mesa's Vulkan driver for AMD GPUs.

I also notice that you're using a dual-GPU setup (discrete + integrated GPUs, both AMD) which can sometimes have weird effects.

It's weird that it makes a difference whether you're using systemd-homed, though... I wouldn't have expected that to have an effect.

When you tested without systemd-homed, was it with the same $HOME contents? Or is it possible that you might have been comparing a pre-existing user with non-default configuration in $HOME to a new user with all settings at defaults?

If someone can reproduce a similar issue, the next step would probably be to see whether this affects all games or just Factorio, and either reproduce the crash with something open-source that we can analyze (like maybe xterm), or strace something that is crashing to get an idea of what it's doing immediately before the crash.

@adomaskizogian
Copy link

@smcv I can reproduce the issue. running steam -compat-force-slr off does solve it.

ubuntu 24.10. Up to date.
6.11.0-8-generic

how can I help

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

after some sleep, I'm back :)

AMDVLK

this is because archinstall script default selects this vulkan driver - it's now uninstalled.

dual gpu setup

my system is 7800X3D + 7900XTX, with monitor plugged into GPU directly, which means the integrated GPU is mostly idle.

$HOME content

it's slightly different - the zwydbg user is freshly created with useradd -m --btrfs-subvolume-home, but my normal zhaow user is also fresh, due to re-installation of Arch Linux

strace logs or something open source for analysis

I'm working on this. I did obtain strace log awhile back when I reported this issue to Factorio Forum. I'll try get some fresh logs and reproduce with other native title.

I should note that Floating Point works inside SLR; next target would be Dota 2 for me (since it's effectively open source to you guys, right?)

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

some test result, by launching each game from steam:

  • Tiny Glade: works as expected
  • Euro Truck Simulator 2: works as expected
  • Don't Starve Together: works as expected
  • Dota 2: works as expected
  • Counter Strike 2: works as expected
  • Don't Starve: works as expected
  • Portal 2: works as expected
  • Civ 6: crashes and produces a coredump
  • Stellaris: game launcher works, the game crashes when click play from the launcher
  • Team Fortress 2: works as expected
  • Left 4 Dead 2: works as expected
  • Surviving Mars: works as expected
  • Stardew Vally: works as expected

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

I ran Civ 6 and Factorio with strace -tt -ff -o _log_dir_/strace.log %command% and got strace logs for each game:

strace-logs.zip

you can use strace-log-merge to combine these logs according to strace man page

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

I think Stellaris have same issue as Civ6 and Factorio. Paradox launcher for stellaris can start, but the game failed to launch. Here's strace logs for stellaris:
stellaris-strace_logs.zip

@zhaoweny
Copy link
Author

zhaoweny commented Nov 9, 2024

One additional note, I tried to launch gdbserver inside the runtime to debug Factorio, but it would fail with error (something like unknown register ymm0h) even with SLR_sniper - which is annoying.

@zhaoweny
Copy link
Author

re-opening this since my setup for reproducing the issue is still valid, hope we can solve this mystery together :)

@zhaoweny zhaoweny reopened this Nov 11, 2024
@smcv
Copy link
Contributor

smcv commented Nov 11, 2024

Civ 6: crashes and produces a coredump

It's probably best if you can open a separate issue for this: if your issue with Factorio does not affect most of your games, then it seems likely that Civ 6, Factorio and Stellaris have different things going wrong.

And if I'm wrong about that and there is a common root cause, closing issues as duplicates is much easier than understanding an issue thread that has three separate conversations about three separate bugs :-)

Stellaris: game launcher works, the game crashes when click play from the launcher

Similarly this is probably best as its own issue.

@smcv
Copy link
Contributor

smcv commented Nov 11, 2024

Civ 6 has had known compatibility problems in the past because it bundles a lot of libraries that it shouldn't, so definitely open a separate issue for that one instead of discussing Civ 6 further on this particular issue.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 12, 2024

OK, let’s focus on Factorio for now, as I’m not currently playing Civ 6 or Stellaris.

Could you please guide me on how to properly set up gdbserver with Steam Linux Runtime enabled to obtain a stack trace for the SIGSEGV error when launching Factorio? I’ve attempted this before, but my last try resulted in an internal error from gdbserver—specifically, it mentioned an unknown register ymm0h or something similar. I suspect that the game is utilizing AVX2 registers, and the gdbserver bundled with the Steam Linux Runtime may be outdated.

@zhaoweny
Copy link
Author

I downloaded the steam-runtime SDK (soldier, according to /doc/reporting-steamlinuxruntime-bugs.md).
Then I started a shell with env PRESSURE_VESSEL_SHELL=instead steam-runtime-launch-options -- %command% from steam.
Finally I ran gdb bin/x64/factorio to launch factorio inside the SDK runtime.

I got this stack trace with SIGSEGV (finally!)

(gdb) bt
#0  Paths::getSystemWriteData () at /tmp/factorio-build-EZorjK/src/Paths.cpp:259
#1  0x00000000013bb28c in PathMacroReplacer::apply[abi:cxx11](re2::StringPiece const*) const () at /tmp/factorio-build-EZorjK/src/Info/PathMacroReplacer.cpp:12
#2  0x00000000023dcbec in ReplacerWrapper::operator()[abi:cxx11](re2::StringPiece const*) const () at /tmp/factorio-build-EZorjK/src/Info/MacroReplacer.cpp:12
#3  std::__invoke_impl<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ReplacerWrapper&, re2::StringPiece const*> ()
    at /opt/gcc-13.2.0/include/c++/13.2.0/bits/invoke.h:61
#4  std::__invoke_r<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ReplacerWrapper&, re2::StringPiece const*> ()
    at /opt/gcc-13.2.0/include/c++/13.2.0/bits/invoke.h:116
#5  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (re2::StringPiece const*), ReplacerWrapper>::_M_invoke(std::_Any_data const&, re2::StringPiece const*&&) () at /opt/gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:291
#6  0x00000000012496ab in std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (re2::StringPiece const*)>::operator()(re2::StringPiece const*) const () at /opt/gcc-13.2.0/include/c++/13.2.0/bits/std_function.h:591
#7  RegexUtil::replace<2u>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, re2::RE2 const&, std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (re2::StringPiece const*)> const&) () at /tmp/factorio-build-EZorjK/src/Util/RegexUtil.hpp:17
#8  MacroReplacer::replace () at /tmp/factorio-build-EZorjK/src/Info/MacroReplacer.cpp:30
#9  0x00000000021c5881 in GlobalContext::init () at /tmp/factorio-build-EZorjK/src/GlobalContext.cpp:332
#10 0x00000000021db366 in MainLoop::run(Filesystem::Path const&, Filesystem::Path const&, bool, bool, std::function<void ()>, Filesystem::Path const&, MainLoop::HeavyMode) () at /tmp/factorio-build-EZorjK/src/MainLoop.cpp:286
#11 0x00000000021e837b in fmain () at /tmp/factorio-build-EZorjK/src/Main.cpp:1348
#12 0x00000000024241be in main () at /tmp/factorio-build-EZorjK/src/Main.cpp:1370

I'll report this issue to Factorio dev and I'll try dig a little deeper.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I see you've figured out a way to get a backtrace while I was writing this, but for completeness...

Could you please guide me on how to properly set up gdbserver with Steam Linux Runtime enabled to obtain a stack trace for the SIGSEGV error when launching Factorio?

To get a stack trace, it's often simpler if you can use a post-mortem crash analysis tool like systemd-coredump rather than fighting with gdb. Since you say you're an Arch user, https://wiki.archlinux.org/title/Core_dump might be useful.

Or, the next best thing is:

  1. Get a shell inside the game container.
  2. Run the game like gdbserver 127.0.0.1:12345 ./bin/x64/factorio
  3. Connect an external gdb to the gdbserver, e.g. see https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/blob/main/docs/slr-for-game-developers.md#attaching-a-debugger-by-using-gdbserver

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I don't currently have access to Factorio the full game, but for what it's worth, the demo is working fine for me on Arch Linux under SLR 1.0.

However, I haven't yet tried it with a user that is managed by systemd-homed.

One thing I notice from your backtrace:

at /opt/gcc-13.2.0/include/...

This seems like it indicates that Factorio was compiled with a third-party compiler, and not with one of the ones we provide in the Steam Runtime SDK. The demo shows signs of having been compiled with the same compiler.

This hopefully shouldn't be a problem: the demo is statically linked with libstdc++, which probably means the full game is the same.

Looking at the demo executable with objdump -T -x, it looks like it's accidentally exporting libstdc++ data symbols like std::__cxx11::numpunct<char>::id, which is a possible cause of crashes if these symbols "interpose" symbols from the dynamically-linked libstdc++ that will be pulled in by your graphics drivers. If the full game is the same, this might be something that the developers should look into - hiding those symbols from the dynamic symbol table would probably be safer.

I don't have any real evidence that this is the reason for your crash, though.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

It's weird that it makes a difference whether you're using systemd-homed, though...

This is just speculation, but one thing that occurs to me is that systemd-homed creates a user with a large numeric uid (on my test system, my normal user has uid 1000 but the user created via systemd-homed has uid 60032) so if some component assumes that a uid will fit into a signed 16-bit integer, systemd-homed would break that assumption?

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

#0 Paths::getSystemWriteData () at /tmp/factorio-build-EZorjK/src/Paths.cpp:259

If the Factorio developers can tell us what's happening in that function (and, more specifically, around that line), that would probably be the most useful piece of information here.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I can reproduce what appears to be the same crash by logging in as a user that is managed by systemd-homed, using the Factorio demo.

It is not necessary to be using a btrfs subvolume or any other elaborate storage mechanisms, but either systemd-homed is significant somehow, or using a freshly-created user is significant somehow: this is on the same Arch system where I was unable to reproduce the crash as my normal user, desktop, uid 1000.

Steps to reproduce:

  1. Have Steam installed and working
  2. As a user with administrative privileges: sudo systemctl enable --now systemd-homed.service
  3. As a user with administrative privileges: sudo homectl create --storage=directory usinghomed
  4. sudo passwd usinghomed and set a password (possibly unnecessary, I think I mistyped the password during initial account creation)
  5. Log out
  6. Log in as usinghomed for the first time
  7. Run steam from a terminal, and log in to it (I used my pre-existing Steam account)
  8. Install the free Factorio demo. See that it pulls in Steam Linux Runtime 1.0 (scout) as a dependency.
  9. Run the Factorio demo. It crashes with SIGSEGV, error messages are similar to what @zhaoweny reported
  10. Set its launch options to: STEAM_COMPAT_LAUNCHER_SERVICE=scout-in-container SRT_LAUNCHER_SERVICE_STOP_ON_EXIT=0 %command%
  11. Run it again. This time, after the actual game crashes, the command-launcher service continues to run
  12. In another terminal/tab, or via a ssh login as usinghomed, run: ~/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel/bin/steam-runtime-launch-client --list
  13. Still as usinghomed, run: ~/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel/bin/steam-runtime-launch-client --bus-name=com.steampowered.App452280 -- gdbserver 127.0.0.1:12345 ./bin/x64/factorio
  14. In another terminal/tab, or via a ssh login as usinghomed, run: gdb -ex 'target remote 127.0.0.1:12345'
  15. At the gdb prompt: cont
  16. When you are ready to end the debugging session, run: ~/.local/share/Steam/steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel/bin/steam-runtime-launch-client --bus-name=com.steampowered.App452280 --terminate (in Steam, the Stop button will change state back to Play)

Backtrace:

#0  0x00000000007bc9c0 in Paths::getSystemWriteData() ()
#1  0x0000000000f2ad54 in PathMacroReplacer::apply(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) const ()
#2  0x00000000014abf5c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > RegexUtil::regex_replace<char const*, std::__cxx11::regex_traits<char>, char, ReplacerWrapper>(char const*, char const*, std::__cxx11::basic_regex<char, std::__cxx11::regex_traits<char> > const&, ReplacerWrapper) ()
#3  0x0000000000f18966 in MacroReplacer::replace(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const ()
#4  0x0000000001b3f82a in GlobalContext::init(bool, bool, bool, std::optional<WindowPositionData>) [clone .constprop.0] [clone .isra.0] ()
#5  0x0000000001a01f12 in MainLoop::run(Filesystem::Path const&, Filesystem::Path const&, bool, bool, std::function<void ()>, Filesystem::Path const&, MainLoop::HeavyMode) [clone .constprop.0] ()
#6  0x00000000006142a3 in main ()

@kisak-valve
Copy link
Member

Hello @raiguard, this issue report might interest you, at least to listen in on the ongoing discussion.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 12, 2024

Factorio demo version has these lines in <Factorio game root>/config-path.cfg:

config-path=__PATH__executable__/../../config
use-system-read-write-data-directories=false

at first launch the game would generate lots of configuration files according to these two lines, including <game dir>/config/config.ini:

[path]
read-data=__PATH__executable__/../../data
write-data=__PATH__executable__/../..

If I change write-data to write-data=__PATH__system-write-data__ and run game with ~/.steam/root/steamapps/common/SteamLinuxRuntime_sniper/run -- ./bin/x64/factorio the game would crash, but give me a different error message (may be it's a different bug?)

   0.000 Error CrashHandler.cpp:639: Received SIGSEGV
   0.000 Error Util.cpp:100: Unexpected error occurred. If you're running the latest version of the game you can help us solve the problem by posting the contents of the log file on the Factorio forums.
Please also include the save file(s), any mods you may be using, and any steps you know of to reproduce the crash.

(zenity:226311): dbind-WARNING **: 21:41:44.872: Couldn't connect to accessibility bus: Failed to connect to socket /run/user/60104/at-spi/bus_1: No such file or directory

Edit: I'd like to explain these behavior from my POV.

  • config-path=__PATH__executable__/../../config states that config.ini should be written to config relative to game installation root
  • write-data states where to write game saves, etc. default value of __PATH__executable__/../.. means it's writing to game installation root; __PATH__system-write-data__ (default for steam version) means it's writing to ~/.factorio

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

If a Factorio developer can look at this issue, a tl;dr is:

This could either be a bug in some library that is part of the container runtime, or a bug in Factorio, or an assumption that it makes about the system becoming untrue, or some subtle interaction between multiple components.

The change that prompted this is that until recently, most native Linux Steam games on desktop were run in the legacy LD_LIBRARY_PATH-based runtime environment, with the Steam Linux Runtime 1.0 (scout) container runtime available to users on an opt-in basis. Since earlier this month, Steam now runs most native Linux Steam games in the Steam Linux Runtime 1.0 (scout) container runtime (which was already used more widely on Steam Deck).

@raiguard
Copy link

raiguard commented Nov 12, 2024

Thanks for the ping, this is on my radar. Unfortunately this is badly timed, because I am currently on a 3-week vacation in Japan so my work output is a bit limited.

My ideal solution would be to not run the game against the steam Linux runtime at all - we provide a standalone version of the game and it works great.

I haven't been able to reproduce the issue on my laptop (Framework 13 running Fedora 41) yet, but I will work on it more tomorrow. Thanks for the detailed reproduction steps!

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

__PATH__system-write-data__ (default for steam version) means it's writing to ~/.factorio

Perhaps the way this special token gets expanded in the Factorio code is relying on some assumption that is not true in the container environment.

I can see one possible issue with this:

If you are using systemd-homed, then your username does not appear in /etc/passwd, and functions like getpwnam() and getpwuid() rely on the nsswitch mechanism (specifically libnss_systemd) to be able to resolve the username to details like a uid and a home directory.

At the moment, we copy the system /etc/passwd into the container, but we do not have libnss_systemd available inside the container. As a result, glibc API calls like getpwuid(getuid()) will fail to find details of the current user.

Normally, this doesn't matter, because when application code wants to know about the current user, it usually only wants to know the home directory, which usually respects the HOME environment variable as higher-precedence than getpwuid(). We ensure that HOME is correctly set to the home directory (for example HOME=/home/usinghomed in my case) and this works OK. For example, GLib's g_get_home_dir() uses the HOME environment variable before falling back to getpwuid(), while SDL's SDL_GetUserFolder(SDL_FOLDER_HOME) only uses HOME and will just fail if it's unset.

However, if Factorio doesn't take HOME into account, relies on getpwuid(getuid()) or similar, and also doesn't take into account the possibility that getpwuid() might fail, then that would explain the symptoms we're seeing.

I would recommend that Factorio should try getenv("HOME") first, only falling back to the "official" home directory from getpwuid(getuid()) if HOME is unset. As well as hopefully fixing this crash, this will help to give it the behaviour that users normally expect when the HOME environment variable is set to somewhere different. In particular, Snap apps normally run with something like HOME=/home/me/snap/steam/common, even though the "official" home directory is something more like /home/me, and users will normally expect this to result in using a directory like /home/me/snap/steam/common/.factorio.

We can also mitigate this from the Steam Runtime side, by programmatically generating an /etc/passwd with the contents that Factorio expects to see, instead of passing through the one from the host system as-is. Flatpak generates an /etc/passwd containing only the current user and nfsnobody - but I'm a little concerned that if we do that, we'll be breaking some other game's expectations, so instead we might have to do some sort of merge operation between the host /etc/passwd and that information.

@zhaoweny
Copy link
Author

zhaoweny commented Nov 12, 2024

I think I got this minimized down to a non-game example. I searched around the Internet and landed myself on this post. Which indicate that Factorio is might be using getpwuid to obtain a passwd entry for a user, at least around the time of original post (2021).

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

My ideal solution would be to not run the game against the steam Linux runtime at all - we provide a standalone version of the game and it works great.

I'm sure it works great today, but the goal of the Steam Linux Runtime is that it still works in 10 years' time, and that's hard to achieve in a standalone Linux binary - assumptions about the underlying system that seem completely reasonable today are not going to remain true forever.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I searched around the Internet and landed myself on this post. Which indicate that Factorio is using getpwuid to obtain a passwd entry for a user, at least around the time of original post (2021).

That's consistent with my theory in #705 (comment), and confirms that users do expect $HOME to be used as a higher precedence than whatever getpwuid() says.

@raiguard
Copy link

raiguard commented Nov 12, 2024

However, if Factorio doesn't take HOME into account, relies on getpwuid(getuid()) or similar, and also doesn't take into account the possibility that getpwuid() might fail, then that would explain the symptoms we're seeing.

You are correct. Here is the entire contents of Paths::getSystemWriteData() on Linux:

Filesystem::Path Paths::getSystemWriteData()
{
  struct passwd* pw = getpwuid(getuid());
  const char* homedir = pw->pw_dir;
  return Filesystem::Path(homedir + std::string("/.factorio"));
}

Ironically enough, I actually did catch this flaw a few months ago, but the fix didn't get merged because it was bundled with a few other changes that were rejected (the change being that we would use $XDG_DATA_HOME instead of ~/.factorio by default, but was rejected because of potentially wreaking havoc with steam cloud saves, among other things). That was bad branch etiquette on my part.

I'm sure it works great today, but the goal of the Steam Linux Runtime is that it still works in 10 years' time, and that's hard to achieve in a standalone Linux binary - assumptions about the underlying system that seem completely reasonable today are not going to remain true forever.

Point. I tend to try not to think about the day when I inevitably stop working on Factorio. :)

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

  struct passwd* pw = getpwuid(getuid());
  const char* homedir = pw->pw_dir;

Yeah, that's the segfault I expected: if getpwuid() fails, it will return NULL, and then the next line is a NULL dereference. I'd suggest something more like this (untested!):

  const char* homedir = getenv("HOME");
  if (!homedir) {
    pid_t pid = getuid();
    errno = 0;
    struct passwd* pw = getpwuid(pid);
    if (!pw) {
      errx(1, "Unable to find uid %d: %s", pid, errno ? strerror(errno) : "not found");
      /* or whatever way you prefer to handle fatal errors */
    }
    homedir = pw->pw_dir;
  }
  ...

(The error behaviour of getpwuid() is odd - it can return 0 without setting errno)

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

@zhaoweny or @kisak-valve, can we perhaps retitle this to something like Factorio crashes under Steam Linux Runtime 1.0 if uid not in /etc/passwd, e.g. systemd-homed now that we know why it's crashing?

I'll look at mitigating this from the SLR side.

@zhaoweny zhaoweny changed the title Factorio crashes with SIGSEGV after recent steam client update, which enables scout runtime by default Factorio crashes under Steam Linux Runtime 1.0 if uid not in /etc/passwd, e.g. systemd-homed Nov 12, 2024
@zhaoweny
Copy link
Author

I edited the title as you suggested, but I'd like to add that it's same behavior across different Steam Linux Runtime versions.

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

I edited the title as you suggested, but I'd like to add that it's same behavior across different Steam Linux Runtime versions.

That makes sense, it's a problem with SLR in general rather than that version specifically. (But SLR 1.0 is (currently) the only one that is available for running Factorio without using unsupported tweaks, because SLR 3.0 is only meant to be for games whose developers have specifically told us they want a newer runtime, like CS2 and Retroarch.)

@smcv
Copy link
Contributor

smcv commented Nov 12, 2024

In the short term, a workaround for this is to append a record for the systemd-homed-managed user to /etc/passwd, making sure to replace the home directory field with the user's intended home directory.

In some brief testing on Arch, the result of getent passwd "$(id -nu)" will show a home directory of /, which is unsuitable.

For example, on my test system, getent passwd says:

usinghomed:x:60032:60032:usinghomed:/:/usr/bin/systemd-home-fallback-shell

but when I log in as usinghomed, I get HOME=/home/usinghomed. So I appended this to /etc/passwd as a workaround:

usinghomed:x:60032:60032:usinghomed:/home/usinghomed:/usr/bin/systemd-home-fallback-shell

Obviously this workaround loses a few of the benefits of systemd-homed, so it would be better to make SLR mitigate this failure mode (in progress) or to teach Factorio to use $HOME.

@smcv
Copy link
Contributor

smcv commented Nov 13, 2024

We can also mitigate this from the Steam Runtime side, by programmatically generating an /etc/passwd with the contents that Factorio expects to see, instead of passing through the one from the host system as-is.

I prototyped this and it seems to resolve the crash, at least for the demo.

If you're comfortable with using unreleased software, you can try this out by replacing steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel with the result of unpacking this build: https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/jobs/800334/artifacts/raw/_build/pressure-vessel-bin.tar.gz. It would be useful if a user of systemd-homed could verify this with the full game.

This change will hopefully be part of the next Steam Linux Runtime 2.0 beta when it has been through review and more testing. Because of the way the container runtime works internally, this would be a change to SLR 2.0, and not SLR 1.0 as you might expect.

[note to self: this is !767 v4]

@smcv
Copy link
Contributor

smcv commented Nov 13, 2024

@adomaskizogian, I don't have enough information about your system or your situation to guess whether you were experiencing the same bad interaction between systemd-homed and Factorio that was originally reported here, or something different.

If your issue was the same thing originally reported here, then the pressure-vessel build in #705 (comment) should hopefully resolve it.

Or, if that isn't it, please open a separate issue with the info/logs that are requested by the issue template, and we can look into that separately.

@smcv
Copy link
Contributor

smcv commented Nov 13, 2024

@zhaoweny:

* Civ 6: **crashes and produces a coredump**

Looking at your strace logs, I think you might be correct to have thought that this is actually the same issue as Factorio, either in Civ 6 itself or in some library that it uses. The end of the log for process 134315 looks like the same order of operations I would expect from what Factorio does:

12:30:21.674785 getuid()                = 60104
12:30:21.674811 newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644, st_size=505, ...}, 0) = 0
12:30:21.674839 newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=420, ...}, 0) = 0
12:30:21.674864 openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3
12:30:21.674889 fstat(3, {st_mode=S_IFREG|0644, st_size=505, ...}) = 0
12:30:21.674911 read(3, "# /etc/nsswitch.conf\n#\n# Example"..., 4096) = 505
12:30:21.674937 read(3, "", 4096)       = 0
12:30:21.674958 fstat(3, {st_mode=S_IFREG|0644, st_size=505, ...}) = 0
12:30:21.674980 close(3)                = 0
12:30:21.675005 openat(AT_FDCWD, "/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
12:30:21.675029 fstat(3, {st_mode=S_IFREG|0644, st_size=1307, ...}) = 0
12:30:21.675051 lseek(3, 0, SEEK_SET)   = 0
12:30:21.675073 read(3, "root:x:0:0::/root:/usr/bin/bash\n"..., 4096) = 1307
12:30:21.675101 read(3, "", 4096)       = 0
12:30:21.675121 close(3)                = 0
12:30:21.675144 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x20} ---

So it would be useful if you could retry Civ 6 with the pressure-vessel build from #705 (comment), or with the workaround from #705 (comment).

* Stellaris: game launcher works, **the game crashes when click play from the launcher**

Stellaris shows a similar pattern, so it would be useful if you could retry Stellaris in a similar way.

@emberfade
Copy link

We can also mitigate this from the Steam Runtime side, by programmatically generating an /etc/passwd with the contents that Factorio expects to see, instead of passing through the one from the host system as-is.

I prototyped this and it seems to resolve the crash, at least for the demo.

If you're comfortable with using unreleased software, you can try this out by replacing steamapps/common/SteamLinuxRuntime_soldier/pressure-vessel with the result of unpacking this build: https://gitlab.steamos.cloud/steamrt/steam-runtime-tools/-/jobs/800334/artifacts/raw/_build/pressure-vessel-bin.tar.gz. It would be useful if a user of systemd-homed could verify this with the full game.

I use systemd-homed and am affected by the crash as well. I can verify this fixes the issue and Factorio starts.

@zhaoweny
Copy link
Author

replacing steamapps/common/SteamLinuxRuntime_soldier/pressure-vesselwith the result of unpacking this build

I was busy playing Factorio last night (It's a great game!). I will try this fix tonight when I get home.

@zhaoweny
Copy link
Author

I tested Civ6, Stellaris, and Factorio (full game, version 2.0.17). They all works with pressure-vessel fix. Thank you, for your hard work and excellent support!

@smcv
Copy link
Contributor

smcv commented Nov 14, 2024

@zhaoweny: Would you be able to get a backtrace from Civ 6 and Stellaris, with a method similar to what you did for Factorio in #705 (comment) ? If we can find out where their similar pattern is happening (in the main executable, or in some library that they use), that would give us better information to report to those games' developers.

You can use Properties → Installed Files → Verify integrity on Steam Linux Runtime 2.0 (soldier) to get it back to the version that has this bug.

@zhaoweny
Copy link
Author

backtrace for stellaris and civ6

Sure, here's backtrace (and a small section of disassembled code) for stellaris:

Program received signal SIGSEGV, Segmentation fault.
0x000000000337df2b in GetUserDir(char const*, char*, int) ()
(gdb) bt
#0  0x000000000337df2b in GetUserDir(char const*, char*, int) ()
#1  0x000000000337e5e9 in VFSGetDefaultUserDir(char const*) ()
#2  0x00000000012eae08 in StartVFS(CString&, char const*, bool, CPdxArray<CString, int>&) ()
#3  0x00000000012f0032 in RunGame(int, char**) ()
#4  0x00000000012ea38c in main ()
(gdb) disassemble 
Dump of assembler code for function _Z10GetUserDirPKcPci:
   0x000000000337df10 <+0>:     push   %r15
   0x000000000337df12 <+2>:     push   %r14
   0x000000000337df14 <+4>:     push   %rbx
   0x000000000337df15 <+5>:     sub    $0x20,%rsp
   0x000000000337df19 <+9>:     mov    %rsi,%r14
   0x000000000337df1c <+12>:    mov    %rdi,%r15
   0x000000000337df1f <+15>:    call   0x120e7c0 <getuid@plt>
   0x000000000337df24 <+20>:    mov    %eax,%edi
   0x000000000337df26 <+22>:    call   0x120f1b0 <getpwuid@plt>
=> 0x000000000337df2b <+27>:    mov    0x20(%rax),%rsi

Here's Civ6 under same Steam Linux runtime, a bit of backtrace and some disassembled code:

(gdb) bt
#0  0x0000000002caedac in ?? ()
#1  0x0000000002caef57 in ?? ()
#2  0x0000000002caf13a in ASL::ASL_GetAspyrDataPath() ()
#3  0x0000000002cb03e2 in ?? ()
#4  0x0000000002cb018d in ASL::ASL_GetJsonData(char const*) ()
#5  0x0000000002ccc829 in ?? ()
#6  0x0000000002ccc267 in ASL::Internal::Prefs::Prefs() ()
#7  0x0000000002cce8c6 in ASL::Internal::Prefs& ASL::ASL_Singleton<ASL::Internal::Prefs>::Create<ASL::Internal::Prefs>(long) ()
#8  0x0000000002cafb7f in ASL::ASL_Main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >&&, bool) ()
#9  0x0000000002caf7e2 in main ()

(gdb) display/20i ($pc - 0x10)
2: x/20i ($pc - 0x10)
   0x2caed9c:   push   %rbx
   0x2caed9d:   mov    %rdi,%r15
   0x2caeda0:   call   0x2c62d10 <getuid@plt>
   0x2caeda5:   mov    %eax,%edi
   0x2caeda7:   call   0x2c63780 <getpwuid@plt>
=> 0x2caedac:   mov    0x20(%rax),%r14

If disassembling code is not welcome here, please tell me :P

@smcv
Copy link
Contributor

smcv commented Nov 14, 2024

Thanks! I was half expecting you to report two matching backtraces, indicating that Aspyr and Paradox were both linking to (or perhaps even bundling) the same utility library; but it seems that instead, they've each made the same mistake independently.

Do I assume correctly that both of those are somewhere inside their respective games' main executables?

@smcv
Copy link
Contributor

smcv commented Nov 14, 2024

Reported to Aspyr, for Civ 6 (ticket 233908) and to Paradox, for Stellaris (ticket 308296). I'm assuming we don't need a support ticket for Factorio since a developer is already in this conversation.

For best robustness I'm hoping we can get this fixed from both sides, in SLR and in the affected games.

@raiguard
Copy link

I have merged the fix into Factorio - the game will now prefer $HOME over the results of getpwuid and will have a better error message if the directory can't be determined.

However, I am unable to test this because, as I mentioned before, I am on vacation in Japan with limited resources. I would kindly ask those affected by this to test the next experimental release (2.0.20) when it is released and let me know if there are issues.

@Ealrann
Copy link

Ealrann commented Nov 18, 2024

@raiguard
I just tested with the new 2.0.20, it's working 🎉
No need to run steam with -compat-force-slr off anymore to play Factorio
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants