Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add maxRetry and patch option #145

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dongjiang1989
Copy link
Contributor

@volcano-sh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign kevin-wangzefeng
You can assign the PR to them by writing /assign @kevin-wangzefeng in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


type Patch struct {
// +optional
Spec *v1alpha1.JobSpec `json:"spec,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to store this field here? Can't we use the known information to get the original jobtemplate object? Or directly patch itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my usage scenario, I still need to Patch some information to replace the content of the JobTemplate. Avoid partial differences causing an explosion in the number of JobTemplates.
User case:

apiVersion: flow.volcano.sh/v1alpha1
kind: JobTemplate
metadata:
  name: a
spec:
  tasks:
    - spec:
          containers:
            - image: tf2.16-cuda11-xxx:v1.0.1
              command:
                - sh
                - -c
                - ./script.py --dateset a.txt --mode 1
              imagePullPolicy: IfNotPresent
              name: nginx
              resources:
                requests:
                  cpu: "1"
          restartPolicy: OnFailure
---
apiVersion: flow.volcano.sh/v1alpha1
kind: JobFlow
metadata:
  name: test-flow1
  namespace: default
spec:
  jobRetainPolicy: delete
  flows:
    - name: a
---
apiVersion: flow.volcano.sh/v1alpha1
kind: JobFlow
metadata:
  name: test-flow2
  namespace: default
spec:
  jobRetainPolicy: delete
    - name: a
      patch:
          spec:
             tasks:
               - spec:
                     containers:
                        - image: tf2.16-cuda11-xxx:v1.0.2 # just different image
                           command:
                              - sh
                              - -c
                              - ./script.py --dateset b.txt --mode 2 # just different args

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for this. I don't see what patch does. It seems to make the whole cr configuration more complicated.

@hwdef
Copy link
Member

hwdef commented Dec 2, 2024

It doesn't look good. How about just adding a maxRetry field?

@hwdef
Copy link
Member

hwdef commented Dec 2, 2024

type Flow struct {
	// +kubebuilder:validation:MinLength=1
	// +required
	Name string `json:"name"`
	// +optional
	DependsOn *DependsOn `json:"dependsOn,omitempty"`
	// +optional
	MaxRetry *MaxRetry `json:"maxRetry,omitempty"`
}

Signed-off-by: dongjiang <[email protected]>
@dongjiang1989 dongjiang1989 force-pushed the add-global-maxretry-and-patch branch from cebc0eb to 615f915 Compare December 3, 2024 07:52
@dongjiang1989
Copy link
Contributor Author

It doesn't look good. How about just adding a maxRetry field?

Thanks @hwdef @googs1025
Got it. Just add maxRetry field done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants