-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding cost script to create a costs summary #26
base: main
Are you sure you want to change the base?
Conversation
scripts/README.md
Outdated
@@ -110,6 +110,16 @@ This functionality is also wrapped into estimate\_billing.py under the | |||
I'd still run these separately just to have both, but if you're only | |||
after the CSV this may be more convenient. | |||
|
|||
# cost\_script.py | |||
|
|||
This is a script to be used on the costs tsv. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested edit:
`Takes the output of costs_json_to_csv.py and collapses tasks that have been rerun (due to failure or premption) and those that have been split into shards, giving one cost for the entire task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now, I only collapsed tasks that had been split into shards, and not ones that were recorded as 'retry'.
So, if there were rows as-
doBqsr.bqsr_shard-14_retry1
doBqsr.bqsr_shard-14
doBqsr.bqsr_shard-13_retry1
doBqsr.bqsr_shard-13
I collapsed them into-
doBqsr.bqsr
and
doBqsr.bqsr_retry1
That said, if we want to collapse all 4 of those into doBqsr.bqsr I can change the code to do that.
I'll also edit my comment with what you mentioned and make it clearer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, I think upon further reflection, keeping retries separate is probably the right move here, so yeah, you got it right. Nice work!
Created a cost script that parses the costs TSV file and adds up costs for the same task.
Edited the dockerfile to add cost_script.py to it.
Edited the requirements.txt file so that the docker also installed numpy, pandas, and regex.