Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce frequency of progress indicator's output #197

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

romuald
Copy link

@romuald romuald commented Dec 30, 2024

I've noticed that when running fdupes on a large directory, the CPU usage of my terminal application (gnome-terminal) would ramp up to 50%.

This is mostly due to the fact that the progress indicator will print its output for every file, so a directory with 400k files would print 800k (build + progress) lines in a short time

This PR simply skip the progress print to only print it on every 65th iteration (65 to still show the progress indicator properly)

@jbruchon
Copy link

My first patch was like this. It was rejected. It's also not ideal and I eventually found a better way.

What you need to do is track the time in seconds and use gettimeofday() to check for a seconds change and update output every second instead. That minimizes output while still showing the user the program hasn't crashed or frozen. It's a little more work but not very much.

@adrianlopezroche
Copy link
Owner

Yes, I agree with Jody. Updating the progress indicator on every 65th iteration would cause the progress indicator to freeze very badly when there are lots of large potential duplicates in a row. I'd rather see updates at regular intervals so it's not as susceptible to freezing. I would accept a patch involving progress indicator updates every so often (preferably compile-time configurable with sub-second precision), barring of course any other problems with the code.

@romuald
Copy link
Author

romuald commented Dec 31, 2024

I was initially thinking of doing so, but though that hundreds of thousands of call to gettimeofday() would also take a toll on performance.

I'll try it and come back (hopefully)

Only print the progress indicator every 100 milliseconds to reduce CPU
usage.

Mostly noticeable for large directories
@romuald
Copy link
Author

romuald commented Dec 31, 2024

I've pushed an updated version, to my surprise it is actually slightly faster than the n-th version on my laptop (tested with a 360k files directory)

Note that I'm not an experimented C developer so I may be lacking some obvious things related to coherency with the existing code

I also failed to find out how to actually change the configure option during the ./configure, can you tell me how?

@@ -140,6 +145,38 @@ char *fmttime(time_t t) {
return buf;
}

// current timestamp in milliseconds
uint64_t now64(void)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from another project of mine where I need a monotonic clock. I assume this project does not compile on Windows, but I did leave the WIN32 part

@jbruchon
Copy link

I'm hesitant to share my own stuff here, but I think it'll help enough to justify dropping it. This is the progress timer setup code I use today: https://codeberg.org/jbruchon/jdupes/src/branch/v1.x/jdupes.c#L643-L648

This is the timer check: https://codeberg.org/jbruchon/jdupes/src/branch/v1.x/jdupes.c#L800-L805

This is the libjodycode timer code called in each of those (note that jc_alarm_ring is a global integer): https://codeberg.org/jbruchon/libjodycode/src/branch/master/alarm.c

Feel free to steal as much of my code as you like to make the process easier (the licenses are the same). You'll want to dump the stuff that's #ifdef ON_WINDOWS if you do. It's POSIX-compliant so it'll work on any modern UNIX-like system. Using a pure POSIX alarm-based approach reduces the in-loop overhead to a single variable check, so no function calls like gettimeofday() are needed at all.

@adrianlopezroche
Copy link
Owner

adrianlopezroche commented Dec 31, 2024 via email

@adrianlopezroche
Copy link
Owner

adrianlopezroche commented Dec 31, 2024 via email

@jbruchon
Copy link

Spinning on gettimeofday for each loop still incurs overhead, especially compared to using a POSIX alarm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants