Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no artifact logs are available when workflow is archived but still live #12948

Open
3 of 4 tasks
liudongqing opened this issue Apr 17, 2024 · 11 comments · May be fixed by #13873
Open
3 of 4 tasks

no artifact logs are available when workflow is archived but still live #12948

liudongqing opened this issue Apr 17, 2024 · 11 comments · May be fixed by #13873
Labels
area/ui area/workflow-archive P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@liudongqing
Copy link

liudongqing commented Apr 17, 2024

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issue exists when I tested with :latest
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

We just upgrade the argo workflow from 3.4.4 to 3.5.5. We enabled archive

persistence:
    connectionPool:
      maxIdleConns: 100
      maxOpenConns: 0
    # save the entire workflow into etcd and DB
    nodeStatusOffLoad: true
    # enable archiving of old workflows
    archive: true
    postgresql:

but didn't enable archive logs.

artifactRepository:
  # -- Archive the main container logs as an artifact
  archiveLogs: false

Before upgrade, we can see logs of the finished workflow (either success or fail) from UI(the server gets the log from pod I guess), but after upgrade, the UI will complain " no artifact logs are available " and no logs returned.

Is it an expected result ? or is any configuration item controlling this behavior ?

Version

v3.5.5

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

any workflows

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
@agilgur5 agilgur5 changed the title " no artifact logs are available " when workflow is achived. no artifact logs are available when workflow is archived and artifact logs are disabled Apr 17, 2024
@agilgur5 agilgur5 added area/workflow-archive type/support User support issue - likely not a bug type/bug and removed type/bug type/support User support issue - likely not a bug labels Apr 17, 2024
@agilgur5
Copy link
Contributor

agilgur5 commented Apr 17, 2024

from UI(the server gets the log from pod I guess)

Correct, it retrieves Pod logs.

but after upgrade, the UI will complain " no artifact logs are available " and no logs returned.

I'm not sure that this is related to the upgrade? You changed your configuration after the upgrade? Or before it?

An Archived Workflow is typically a deleted Workflow, therefore there are no Pods for it to retrieve logs from. So if you want logs for deleted Pods, you can either link to a log provider or use artifact logs. You don't have artifact logs, so the error message certainly sounds correct.

@liudongqing
Copy link
Author

liudongqing commented Apr 17, 2024

An Archived Workflow is typically a deleted Workflow, therefore there are no Pods for it to retrieve logs from. So if you want logs for deleted Pods, you can either link to a log provider or use artifact logs. You don't have artifact logs, so the error message certainly sounds correct.

We didn't change any configuration during the upgrade, the only change is the image tag from "v3.4.4" to "v3.5.5". The problem is, the workflow will be archived once the workflow finished, we have no chance to check the log event it is failed just 1 min before. By enabling the artifacts logs, we can see log now.

Is it correct for a finished workflow became archived immediately?

@agilgur5
Copy link
Contributor

Is it correct for a finished workflow became archived immediately?

A Workflow is labeled for archiving when it completes and when that label is detected, archiving is kicked off

That is generally independent of deletion, however, which is based on your TTL or retentionPolicy.
It sounds like you have a longer TTL potentially, and so you have Workflows that are simultaneously in the archive and still in the cluster? In that case, the pod logs should still be retrievable.

I think I see the issue here, it's probably not falling back to Pod logs properly in 3.5.

3.5 unified the Archived + Live UI into one page (#11121) so there is no distinction now in the UI. In particular, this line would previously only be triggered if you were navigating archived workflows specifically, but now it can be triggered on a live workflow that is also archived. The comment above that line is not quite correct in your case

@agilgur5 agilgur5 added type/regression Regression from previous behavior (a specific type of bug) area/ui labels Apr 17, 2024
@agilgur5 agilgur5 added this to the v3.5.x patches milestone Apr 17, 2024
@agilgur5 agilgur5 added the P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important label Apr 17, 2024
@agilgur5 agilgur5 changed the title no artifact logs are available when workflow is archived and artifact logs are disabled no artifact logs are available when workflow is archived but still live Apr 17, 2024
@y-elip
Copy link

y-elip commented Jul 2, 2024

@agilgur5 Hello, any idea when this degradation will be fixed?
It is preventing us to update to newer version of Argo-Workflows, because having access to completed or failed workflows logs is important part of our daily routine

@agilgur5 agilgur5 added solution/suggested A solution to the bug has been suggested. Someone needs to implement it. P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority and removed P2 Important. All bugs with >=3 thumbs up that aren’t P0 or P1, plus: Any other bugs deemed important labels Jul 2, 2024
@agilgur5
Copy link
Contributor

agilgur5 commented Jul 2, 2024

Hello, any idea when this degradation will be fixed?

No, any updates would be in the thread. PRs welcome.

having access to completed or failed workflows logs is important part of our daily routine

To be clear this only affects users of Archived Workflows with long Workflow or podGC TTLs. If you're not using Archived Workflows or have short TTLs, this doesn't affect you.

@miltalex
Copy link
Member

miltalex commented Jul 4, 2024

I will have a look to check if I can prepare an PR with a fix

@miltalex
Copy link
Member

Could I ask for some example configuration or a way to reproduce the issue consistently? I tried using archived workflows with different TTL values and strategies without much success and I feel some of my settings might be different from the ones that produce the above bug.

@y-elip
Copy link

y-elip commented Jul 24, 2024

Sure. We are using helm chart 0.41.11 for argo-wf ver 3.5.8

persistence:
  archive: true
  postgresql:
    <postgresql related block>
controller:
    workflowDefaults:
      spec:
        ttlStrategy:
          secondsAfterSuccess: 432000
          secondsAfterFailure: 864000
          secondsAfterCompletion: 432000

@y-elip
Copy link

y-elip commented Jul 24, 2024

I also forgot to mention this important part of configuration

artifactRepository:
    archiveLogs: false

@tooptoop4
Copy link
Contributor

where @miltalex

@miltalex
Copy link
Member

miltalex commented Nov 6, 2024

where @miltalex

Sorry didn't progress cause I couldn't reproduce it consistently locally. Feel free to pick it up if you are interested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ui area/workflow-archive P1 High priority. All bugs with >=5 thumbs up that aren’t P0, plus: Any other bugs deemed high priority solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
5 participants