Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better log management #46

Closed
norswap opened this issue Sep 27, 2023 · 5 comments · Fixed by #91
Closed

Better log management #46

norswap opened this issue Sep 27, 2023 · 5 comments · Fixed by #91
Labels
Milestone

Comments

@norswap
Copy link
Member

norswap commented Sep 27, 2023

Right now we log the output of every L2 component (and L1 when running a devnet) to dedicated files.

These logs don't roll and can become quite large, consuming a lot of disk space, and becoming unusable (because opening a single file that is hundreds of MBs or even GBs large will tend to bork most tools).

Ideally, we would implement a configurable logging policy in the tool.

  • Let's investigate if there are existing tools that can help here.
  • If tools are not adequate, a very simple thing we can do is spin a new thread to implement the logging policy without modifying the existing logic. The policies could periodically copy the existing log file to a new file and clear the file being actively written. They could compress/archive/delete old log files.
@norswap
Copy link
Member Author

norswap commented Nov 27, 2023

The best option is probably to use logrotate, and trigger it manually in a thread.

@eviterin
Copy link

I made this: https://github.com/eviterin/LogRotator/
Is this what you were looking for? In that case I can try to integrate it into this project in an upcoming PR?

@norswap
Copy link
Member Author

norswap commented Dec 31, 2023

Hey Eviterin! Thanks for the link! I think I'd like to do something similar, but built in Python and integrated with our codebase (for instance, controlled via options in the config).

I think this needs to happen as a part of a bigger refactor, as there are some issues with how with handle process outputs and logs at the moment: #59 and I noticed that if we delete log files currently they are not re-creating when new output comes in.

Feel free to have a look at these issues if you want and report here so we can pick a direction to go in. You can also message me on Telegram (same as github handle) if you have any question.

@eviterin
Copy link

eviterin commented Jan 8, 2024

I see two ways with the issue of deleting log files.

  1. Before writing to any log file, check if it exists. If it doesn't, reopen it. Create a function for this and refactor all log writes.
  2. Use logrotate to remove the need of manual deletion of files. Instead automatically rotate them when they get too large.

I have verified that option 2 works, because logrotate doesn't actually remove the file. Instead, it copies the log file to a new file (e.g., logfile1). Then it truncates the original file to zero length. Thus, all processes can continuously write to fresh file without interruption.

Bonus of option 2 is that you can tell logrotate to compress the rotated logs so that they aren't lost. As opposed to option 1 where files would be deleted manually by the user.

On the other hand, option 1 doesn't rely on system-specific tools. Windows users in shambles.

@norswap
Copy link
Member Author

norswap commented Jan 15, 2024

This will be fixed in the #91 PR, but just sharing some learning:

Regarding (2), it only does this when specifying the copytruncate directive. However this doesn't work well because the file descriptor keeps the current offset, meaning it'll create a file with a large empty prefix.

Something akin to (1) is the solution, but requires piping the output of the process to our own code. File descriptors are inflexible: they write to a specific os file which is not the same as a path. If you move the underlying file, the fd will keep writing to it (at least on common filesystems). If you delete the file, the fd keeps writing to a file that has no associate inode on the disk.

What I ended doing is piping to our code, but simply opening / closing the file (path) everytime we write.

I initially came up with a version that would keep a file descriptor open has long has it had an inode. This did work when the file was deleted, but not when it was moved or truncated. By default, I set the log policy to compress the rotated logs, so this worked, but then I also enabled the user to override the settings, so if he disable compression or uses copytruncate then things wouldn't work anymore. By opening/closing every time, things always work.

I'm not too sure about the performance implications, all these things (checking if a file exist on disk, opening a file, ...) require system calls, so there at least should not be order of magnitudes differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants