Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails at hog_big and hog_rest #48

Open
esud1 opened this issue Jan 10, 2025 · 6 comments
Open

Fails at hog_big and hog_rest #48

esud1 opened this issue Jan 10, 2025 · 6 comments

Comments

@esud1
Copy link

esud1 commented Jan 10, 2025

Hello, I tried to run FastOMA with both my own data and testdata sets, but it always encounter problems during the hog_big and hog_rest steps. FastOMA retried several times, but it never succeeded.

Here is the final output of the program running testdata:

Completed at    : 2025-01-10T11:17:09.619507716+01:00
Duration        : 1m 20s
Processes       : 6 (success), 4 (failed)
Output in       : FastOMA_output
Nextflow report : FastOMA_output/stats
Oops .. something went wrong

executor >  local (10)
[53/65cba0] check_input (1)                | 1 of 1 ✔
[f1/d20096] omamer_run (CHLTR.fa)          | 3 of 3 ✔
[b2/74dc33] infer_roothogs (1)             | 1 of 1 ✔
[f6/ffb18f] batch_roothogs (1)             | 1 of 1 ✔
[-        ] hog_big                        -
[9e/d67e39] hog_rest (1)                   | 4 of 4, failed: 4, retries: 3 ✘
[-        ] collect_subhogs                -
[-        ] ext…airwise_ortholog_relations -
[-        ] fastoma_report                 -
[06/7c415f] NOTE: Process `hog_rest (1)` terminated with an error exit status (1) -- Execution is retried (1)
[ca/1df564] NOTE: Process `hog_rest (1)` terminated with an error exit status (1) -- Execution is retried (2)
[c8/7e57be] NOTE: Process `hog_rest (1)` terminated with an error exit status (1) -- Execution is retried (3)
ERROR ~ Error executing process > 'hog_rest (1)'

Caused by:
  Process `hog_rest (1)` terminated with an error exit status (1)

Command executed:

  fastoma-infer-subhogs --input-rhog-folder /scratch/users/e/s/esud/data/phylogeny/cyano/add-heterotrophy/test_FastOMA/work/f6/ffb18f96588b930066471fe6e14465/rhogs_rest/0                                --species-tree species_tree_checked.nwk                               --output-pickles pickle_hogs                               -vv                               --msa-filter-method col-row-threshold                               --gap-ratio-row 0.3                               --gap-ratio-col 0.5                               --number-of-samples-per-hog 5                               --msa-write                               --gene-trees-write

Command exit status:
  1

Command output:
  there are  10 rhogs in the input folder
  there are  10 rhogs remained in the input folder ['E1027301', 'E1027829', 'E1027325', 'E1027309', 'E1027626']
Command error:
                             ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/lib/python3.11/site-packages/FastOMA/_infer_subhog.py", line 625, in process
      msa = self.align_subhogs()
            ^^^^^^^^^^^^^^^^^^^^
    File "/app/lib/python3.11/site-packages/FastOMA/_infer_subhog.py", line 382, in align_subhogs
      merged_msa = _wrappers.merge_msa(sub_msas)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/lib/python3.11/site-packages/FastOMA/_wrappers.py", line 55, in merge_msa
      merged = wrapper_mafft_merge()
               ^^^^^^^^^^^^^^^^^^^^^
    File "/app/lib/python3.11/site-packages/FastOMA/zoo/wrappers/aligners/mafft.py", line 123, in __call__
      raise WrapperError('Mafft did not compute any alignments. StdErr: {}'.format(error))
  FastOMA.zoo.wrappers.WrapperError: Mafft did not compute any alignments. StdErr: mktemp: failed to create directory via template '/tmpscratch/esud/8123824/mafft.XXXXXXXXXX': No such file or directory
  mktemp seems to be obsolete. Re-trying without -t
  mkdir: cannot create directory '/tmpscratch': Read-only file system
  mktemp: failed to create directory via template '/tmpscratch/esud/8123824/tmp/mafft.XXXXXXXXXX': No such file or directory
  /usr/bin/mafft: line 1111: /infile: Read-only file system
  /usr/bin/mafft: line 1112: /infile: Read-only file system
  /usr/bin/mafft: line 1113: /_addfile: Read-only file system
  /usr/bin/mafft: line 1121: /infile: Read-only file system
  /usr/bin/mafft: line 1123: /_aamtx: Read-only file system
  /usr/bin/mafft: line 1124: /_subalignmentstable: Read-only file system
  /usr/bin/mafft: line 1125: /_guidetree: Read-only file system
  /usr/bin/mafft: line 1126: /_codonpos: Read-only file system
  /usr/bin/mafft: line 1127: /_codonscore: Read-only file system
  /usr/bin/mafft: line 1128: /_seedtablefile: Read-only file system
  /usr/bin/mafft: line 1129: /_lara.params: Read-only file system
  /usr/bin/mafft: line 1130: /pdblist: Read-only file system
  /usr/bin/mafft: line 1131: /ownlist: Read-only file system
  /usr/bin/mafft: line 1132: /_externalanchors: Read-only file system
  OS = linux
 The number of physical cores =  128
  /usr/bin/mafft: line 1270: /infile: No such file or directory
  awk: cannot open /size (No such file or directory)
  awk: cannot open /size (No such file or directory)
  /usr/bin/mafft: line 1274: [: too many arguments
  /usr/bin/mafft: line 1279: [: too many arguments
  /usr/bin/mafft: line 1284: [: too many arguments
  /usr/bin/mafft: line 1289: [: -lt: unary operator expected
  /usr/bin/mafft: line 1294: [: -lt: unary operator expected
  /usr/bin/mafft: line 1301: [: -lt: unary operator expected
  /usr/bin/mafft: line 1308: [: -lt: unary operator expected
  grep: /infile: No such file or directory
  /usr/bin/mafft: line 1807: [: -gt: unary operator expected
  grep: /infile: No such file or directory
  /usr/bin/mafft: line 1816: [: -eq: unary operator expected
  /usr/bin/mafft: line 1823: [: too many arguments
  mv: cannot stat 'infile': No such file or directory
  inputfile = orig
  Cannot open orig

Here is my command-line:

nextflow run dessimozlab/FastOMA -r dev \
    --input_folder .    \
    --species_tree species_tree.nwk     \
    --omamer_db ../FastOMA/LUCA.h5      \
    --output_folder FastOMA_output      \
    --report    \
    --write_msas        \
    --write_genetrees   \
    -profile singularity

Can you please help me identify the problem?

Thanks for your help!

@sinamajidian
Copy link
Collaborator

sinamajidian commented Jan 14, 2025

Hi @esud1!
Thanks for contacting us. It's very helpful that you tried testdata. However, I wasn't able to reproduce the error. Could you please share with us some details about your system, Operating system, python version, singularity version, and how you installed singularity (like Conda)?
and please share an screenshot of the following when you open python as command line in terminal (the same environment as when you run fastoma)

$ which python
$ python
import tempfile
tmpdir= tempfile.TemporaryDirectory()
print(tmpdir)

or you could open a file code.py with the last three lines and run it as python code.py.

@esud1
Copy link
Author

esud1 commented Jan 14, 2025

Hi,
Thank you for your reply!

I am running it in our HPC server, with OS Rocky Linux release 8.7.
Python version 3.6.8
Singularity - apptainer version 1.1.5-1.el8, installed by default in the server

which python
/usr/bin/python

tmpdir:
image

@sinamajidian
Copy link
Collaborator

Sorry for slow response.
It seems that mafft (multiple sequence aligner inside fastOMA) is trying to create temporary folder but there is an issue related to temp configuration in the HPC server and used by mafft in the singularity env.

and if possible cd to the folder where you ran fastOMA

cd /scratch/users/e/s/esud/data/phylogeny/cyano/add-heterotrophy/test_FastOMA/work/singularity
singularity shell dessimozlab-fastoma-dev.img

Then you should be able to see Singularity>
now write this

echo -e ">r1\nAAAAAAA\n>r2\nAATGAAA" >  test.fa
mafft test.fa 

Could you please run each of the following one by one

echo $HOME
MAFFT_TMPDIR="$HOME/maffttmp2"
mkdir -p "$MAFFT_TMPDIR"
echo $TMPDIR
mkdir -p "$TMPDIR/tmp"
echo $?
mktemp -dt "mafft.XXXXXXXXXX"
TMPFILE=`env TMPDIR="$MAFFT_TMPDIR" mktemp -dt "$mafft.XXXXXXXXXX"`
echo $?
echo $TMPFILE

(Btw, we have dropin zoom meeting on this Tuesday (first and third Tuesdays of each month at 3PM CET). If you could join, we can discuss it further. Otherwsie, we can continue our discussion here.)

Thanks for your patience,
Sina

@esud1
Copy link
Author

esud1 commented Jan 21, 2025

Hi Sina,

Thanks for your response.

I can run them in the singularity shell.
Here are the output:

Apptainer> echo -e ">r1\nAAAAAAA\n>r2\nAATGAAA" >  test.fa
Apptainer> mafft test.fa 
nthread = 0
nthreadpair = 0
nthreadtb = 0
ppenalty_ex = 0
stacksize: -1 kb
generating a scoring matrix for nucleotide (dist=200) ... done
Gap Penalty = -1.53, +0.00, +0.00



Making a distance matrix ..
    1 / 2
done.

Constructing a UPGMA tree (efffree=1) ... 
    0 / 2
done.

Progressive alignment 1/1... 
STEP     1 / 1  f
done.

disttbfast (nuc) Version 7.505
alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.0
0 thread(s)


Strategy:
 FFT-NS-1 (Very fast but very rough)
 Progressive method (rough guide tree was used.)

If unsure which option to use, try 'mafft --auto input > output'.
For more information, see 'mafft --help', 'mafft --man' and the mafft page.

The default gap scoring scheme has been changed in version 7.110 (2013 Oct).
It tends to insert more gaps into gap-rich regions than previous versions.
To disable this change, add the --leavegappyregion option.

>r1
aaaaaaa
>r2
aatgaaa

This is for the tmp file and folders:

Apptainer> echo $HOME
/home/esud
Apptainer> MAFFT_TMPDIR="$HOME/maffttmp2"
Apptainer> mkdir -p "$MAFFT_TMPDIR"
Apptainer> echo $TMPDIR

Apptainer> mkdir -p "$TMPDIR/tmp"
Apptainer> echo $?
0
Apptainer> mktemp -dt "mafft.XXXXXXXXXX"
/tmp/mafft.dMXS7maR54
Apptainer> TMPFILE=`env TMPDIR="$MAFFT_TMPDIR" mktemp -dt "$mafft.XXXXXXXXXX"`
Apptainer> echo $?
0
Apptainer> echo $TMPFILE
/home/esud/maffttmp2/.icNDKchrL2

What should I do next?

@alpae
Copy link
Member

alpae commented Jan 21, 2025

Hi @esud1 ,

could it be that on your host system either $TMP or $TMPDIR is defined in your environment, but that folder is not mounted inside the singularity container? from the initial output it looks like FastOMA tries to create a temporary file in /tmpscratch, which is not a standard location on linux. On HPCs often a different location is used by setting $TMPDIR or $TMP. when you run singularity / apptainer containers, you might need to add this folder to the mount options. you can try this by setting in the nextflow.config file in the profile singularity -> profile a new line with containerOptions = "--bind /tmpscratch:/tmpscratch":

singularity {
   process {
     container = "$params.container_name:$params.container_version"
     containerOptions  = "--bind /tmpscratch:/tmpscratch"
   }
   singularity.enabled = true
   singularity.autoMounts = true
}

make sure to use the proper value of $TMPDIR/$TMP instead of /tmpscratch.

@sinamajidian
Copy link
Collaborator

Update for future reference:
during the meeting we discussed that mafft tmp behavior happened when run with sbatch (inside the computation node) but it was fine on the login node. (we couldn't ssh to computation node to see the exact error, due to hpc configuration. No sinteractive).

As Adrian suggested, the singularity section of FastOMA/nextflow.config should be edited

containerOptions  = "--bind /scratch/users/e/s/esud/data/phylogeny/cyano/add-heterotrophy/test_FastOMA/:/tmpscratch" 

then run as

nextflow run dessimozlab/FastOMA -r dev \
    --input_folder .    \
    --species_tree species_tree.nwk     \
    --omamer_db ../FastOMA/LUCA.h5      \
    --output_folder FastOMA_output      \
    --report    \
    --write_msas        \
    --write_genetrees   \
    -profile singularity
    -c FastOMA/nextflow.config

Looks like working on the test dataset.
Note that here nextflow is downloading the fastoma docker file from the docker hub. But it is using the local config file FastOMA/nextflow.config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants