Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[not urgent] low priority fair share contribution #280

Open
KjongLehmann opened this issue Jul 6, 2015 · 21 comments
Open

[not urgent] low priority fair share contribution #280

KjongLehmann opened this issue Jul 6, 2015 · 21 comments

Comments

@KjongLehmann
Copy link

Hi,
this is really nothing urgent, but it would be nice to revisit the question whether jobs in low priority q should contribute towards fair share since all those jobs can be preempted by any other job. Right now users are getting punished for being nice.

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

Are you seeing a problem with that @KjongLehmann ? Because I've already attempted to not penalize users for using the low priority queue and if its not working I will indeed revisit but I did the config I was suggested to do by Adaptive.

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

I assume you are seeing a problem BTW or you would not be opening an issue but I wanted it clear I attempted to make low priority "fairshare non-impacting" per a ticket with Adaptive quite awhile ago.

@KjongLehmann
Copy link
Author

I just got ready to send a few batch jobs and out of curiosity checked my fair share status which is quite high considering that I only ran low priority jobs. So far it has not been a problem, but may become one if things get more busy again. Thus I marked it "not urgent". Maybe I misunderstand my fair share value?

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

Are you seeing your lowpriority jobs being held due to fairshare? Because that was the config I was attempting per their instructions.

@akahles
Copy link

akahles commented Jul 6, 2015

I think the point is different. It would be good if jobs run under the lowpriority Q would not contribute to the spending as shown under mdiag -f. Right now, these jobs can increase my fairshare value and thus contribute to later batch jobs not being scheduled.

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

I went around and around with this concept above with Adaptive and didn't get much help. So while I understand what you are saying, the support for what you describe (a queue completely non-tracked by fairshare) didn't seem easily implemented.

@akahles
Copy link

akahles commented Jul 6, 2015

I see, but then there is no real benefit in the queue other than having a way to be nice ;). How tightly are the fairshare groups coupled to the actual linux groups / users? So would it be possible to just accumulate fairshare for a different user (e.g., me_lowprio instead of my regular user me) depending on the queue used? This way one's batch jobs (run as me) would not suffer from the load generated as (me_lowprio).

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

Please note the key word "easily". They were recommending switching to the "Hierarchical Fairshare/Share Trees" methods for tracking fairshare which exceeded my risk/change impact threshold.

@akahles
Copy link

akahles commented Jul 6, 2015

Ah, I see :) Maybe its worth to get an opinion from "leadership" here on how urgent this is. Did not want to spam @KjongLehmann 's thread btw. Just wanted to add my two cents and support @KjongLehmann that this would be a really nice feature to have.

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

I understand but the impression I got was a major change to the config. I did the closest I could come up with. I would recommend you indeed discuss the priority in view of the remaining hours. In particular given I don't know a darn thing about "Hierarchical Fairshare/Share Trees".

@KjongLehmann
Copy link
Author

Thank you Andre for clarifying. That was in fact the feature I was wondering about. Certainly not urgent right now and also a rather specialized problem for a small subset of users.

On Jul 6, 2015, at 11:26 AM, tatarsky [email protected] wrote:

I understand but the impression I got was a major change to the config. I did the closest I could come up with. I would recommend you indeed discuss the priority in view of the remaining hours. In particular given I don't know a darn thing about "Hierarchical Fairshare/Share Trees".


Reply to this email directly or view it on GitHub.

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

If it was as easy guys as something like "DOESNOTADDTOFAIRSHARE" on the lowpriority queue I would have already done it ;) I did not get the impression after numerous discussions that it was in the current fairshare method being used (user/group). I will scan it again however.

@KjongLehmann
Copy link
Author

Thank you for looking into that! Much appreciated. Again, one of these nice-to-have items….

On Jul 6, 2015, at 11:31 AM, tatarsky [email protected] wrote:

If it was as easy guys as something like "DOESNOTADDTOFAIRSHARE" on the lowpriority queue I would have already done it ;) I did not get the impression after numerous discussions that it was in the current fairshare method being used (user/group). I will scan it again however.


Reply to this email directly or view it on GitHub.

@jchodera
Copy link
Member

jchodera commented Jul 6, 2015

I think it would be great to actually get this to work as intended, but @tatarsky is correct that we have to assess the cost to us. Perhaps this would be a great feature to have in a batch queue redesign/update discussion with @juanperin for when the MSK HPC staff assumes control of the batch queue system.

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

I think it would an excellent topic for somebody to start learning the ways of Moab. I spelled out what I wanted per the above to Adaptive and got told to switch to another way of tracking fairshare than what I was handed from SDSC.

But I'm not likely to do that lightly or really without a test environment to tell me its worth it. I'm casting around a few other waters.

Right now I believe the lowpriority queue IGNORES a users fairshare but it does indeed accrue at what I believe is lower weight value. But I'm double checking that with some runs myself shortly. But I don't know how to make it "not count at all"

@jchodera
Copy link
Member

jchodera commented Jul 6, 2015

But I'm not likely to do that lightly or really without a test environment to tell me its worth it. I'm casting around a few other waters.

This should likely be coupled with the purchase of a new Torque batch queue master node, since that would allow us to experiment without disrupting existing production systems.

@tatarsky
Copy link
Contributor

tatarsky commented Jul 6, 2015

Agreed on above. I am looking at the "Fairshare" stats files which are emitted at 8:00AM and 10:00PM to get some insight into the scoring weights of the classes (aka queues)

@tatarsky
Copy link
Contributor

tatarsky commented Jul 8, 2015

My review continues at a moderate pace. I can clearly state all usage of the cluster in the current fairshare mode is tracked in the DEDICATEDPES mode of operation (dedicated processor-equivalent seconds). Regardless of class (what moab maps to a queue) your usage is accumulated (as you have clearly noted but 100% confirmed)

The scheduling priority decisions made based on that data is where I am re-reviewing my options with both the user/group FS method currently in use and these Hierarchical Fairshare/Share Trees.

@tatarsky
Copy link
Contributor

tatarsky commented Aug 4, 2015

I have mostly confirmed that the fairshare system would probably need to be moved to this other method (they are not runable at the same time). But to truly confirm that a test environment would be wise.

@jchodera
Copy link
Member

jchodera commented Aug 4, 2015 via email

@tatarsky
Copy link
Contributor

tatarsky commented Dec 4, 2015

This issue will be investigated via #349. Although it is believed that a completely different fairshare system would be required to support the above desire.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants