-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Help with pipeline command #23
Comments
2, 3 so for: 130.211.33.64 - is the public IP address of existing alignment cluster, Cluster with aligner could be provisioned using sh files from "aligner" directory. Regarding files bwa.cgi and kalign.cgi - they are baked into Docker image during build stage, see Dockerfile: |
I'm a little confused about this as well, do you know what is being copied at the |
"google/cloud-sdk" is a docker image owned by Google with gcloud built inside. |
Regarding 4. |
Thank you for sharing those resistant files!
so does |
It's just specifics of the Dockerfile syntax, it works in the following way - "FROM" instruction defines base image from which new image is built. |
Ah I see now... did not realise to view that git before. This would be very helpful. Thank you! |
Should these scripts be loaded onto cloud shell and executed from there? Or are they deployed in another way? Update: Running them from cloud shell did not appear to do anything. |
Could you specify which script did you run and what was the output, please? |
Apologies, realised I had to run either |
Sorry, just ran them, and realised that the generated instance does not allow HTTP traffic. but the instance has HTTP unchecked, but 2 other network tags of Is this a typo, and provision.sh should have Or am I looking for the wrong IP address to put into 2? UPDATE: realised the UPDATE 2: I have SSH-ed in and looked at docker container with |
Could you check the logs of the docker container with |
Output from
I figured the problem might be with the openssl code in
I have found these 2 threads which might be related, but their code has slightly different format, not sure if they're used in the same context |
Realised I had mixed up the 2 entrypoint.sh, and was accidentally running the one in the root folder instead of the /http folder. They have been fixed now, but provision_species.html still does not make an accessible external IP. |
Turns out these problems were due to my adapter docker container not being set up properly. I have tried with allenday's original bwa, and it works. However, the external IP Is this the right IP to be looking at to substitute into (2) |
As I see, external IP 34.85.27.91 is accessible now, isn't it?? |
Ah it is! I guess the start-up time was longer than I expected. Thanks! Running the main |
it looks like there are some newlines not-escaped after parameters in multiline command, |
Yep, that was the problem, one of my lines had an extra space after the . Great thing we have your experience! For reference, this is the full code snippet I'm trying to run now:
now I'm getting this error:
It appears Java wasn't installed by default on the cloud VM, will installing it be a fix? I had the idea that dataflow wasn't running directly on the cloud VM instance... Update: Installed Java11 and ran it again, got this scary message:
|
Regarding the official Apache Beam documentation, current SDK doesn't support JAVA 11 version (https://beam.apache.org/roadmap/java-sdk/). Now the official docs recommend using JAVA 8 (https://beam.apache.org/get-started/quickstart-java/#set-up-your-development-environment) |
Regarding the PubSub subscription, does it matter if it's a PUSH or PULL type? |
Attempted running with the new jar (pushed yesterday), came up with these errors that were not present before:
then terminated with this chunk
didn't run into this with the previous Update 1: In the meantime, it'd be good to hear everyones inputs on whether I'm on the right troubleshooting track. I'm having doubts since earlier I could run the old jar in the default region, Update 2: Update 3: But nothing seems to be happening. PubSub notifications are being sent properly. Firebase collections remain empty. Currently I have it set up to use the same PUSH subscription as the monitoring website. Does this pipeline require PULL? I did not notice PULL functions in the java source, so I assumed it's PUSH.
|
It means that we used 20 seconds windows to gather fastq data into the batch to make alignment in the next step |
Please, try with PULL type |
It’s hard to say from the pipeline pictures what exactly goes wrong. Could you check the following thins in the pipeline, please:
if above two doesn’t show the problem, statistics for other steps should be checked. |
From the aligner "14m" step: So I went back to check the compute instance and VPC rules, everything looks normal. This summary also shows proper tagging of rules and correct external IP. What else could be wrong? Could it be using the custom tag of Or could it be the Dataflow pipeline being unable to connect to external IPs? |
I don't think there is a problem with firewall rules, as we checked previously, that Apache page is available from the public internet. Maybe Load-balancer's default timeout of 30 seconds is too small and that's why connections timed out. Try to increase timeout, with gcloud:
|
Thanks! That got it through as does next step, but Do you know what might be going on here? I assume the VM with dockers have already finished their job by this stage? UPDATE: Re-deployed the updated pipeline and japsa committed ~13 hours ago by Allen, still running into the same issue at |
It looks that there were no matched sequences after alignment stage. that skips all records, where reference name equals “*”. I think it was the case, because there is a log record about number of SAMRecord items in this step, but there is no output or errors logged. I've noticed, that you were using IP address of the bwa-resistance-genes VM, while running pipeline Also regarding aligner - you should use Load Balancer's IP address: |
I've checked VM instances, there is a deployed file |
OK so we will do a new build of japsa and uber-jar. |
No, 400 for the GET requests is expected, you can check that it works by submitting POST request with sample data:
Update:
|
It's not clear at the moment what's the issue there, I'll try to debug it. |
I've committed a new build and created separate issue #54 to deal with failing test. |
Does the same test fail for you when you build it too? |
No, for me this test executes successfully. I'm not sure right now why it's failing in your case. |
Ok, I have ignored that and re-cloned the updated git. I am now running into this problem the Alignment step is running much faster than real time (i.e. ~5 minutes have passed IRL, it's showing 21 minutes) |
As I see there is an element the passed Alignment step, and finally there are results into Firestore. |
Yes, there are. So are these errors safe to ignore? Or are we getting incomplete results? |
I see, safe for us to ignore then. I have been unable to get the (visualization app](https://nano-stream1.appspot.com/) to show data from firestore. From Firebase, I get this code
And I have updated those |
I've updated Firestore security configuration to allow public read access: And added collection and document names to URL: |
I see, thank you! The public read access part should probably be added to the readme. Regarding the URL, is there any way to make it more user-friendly? For example, add code that scans |
I think this issue can be closed, and create other issues for individual feature requests and optimization? |
Yes, I'll close it now, it becoming a bit hard to navigate this issue's page)
|
Sorry, we noticed that the results were odd, only staph strains were detected, and in exactly equal proportions. Other bacteria like helicobacter were absent. So we ran the same .fastq file again, this time we got error: And many minutes later, no further entries were added to |
@obsh sorry just tagging you for notifications, not sure if closed topics still send them automatically. We ran the pipeline a few more times, it is quite inconsistent. Sometimes it gives the above errors, sometimes they don't, even though I use the exact same deployment command each time. |
No problem at all.
I've made adjustments to the alignment step in #65 , it should work more stable now. |
My first attempt today after a clean clone and
and 5 minutes later:
Hang on, Lachlan just told me this was a problem in japsa that was just fixed, let me try with the new release. |
OK it's running, but still inconsistently. Sometimes works, sometimes gives that 5 mins error.Also I noticed that on a run that worked, 2 collections were generated, ~35 seconds apart. Is the sessioning being overzealous? https://nano-stream1.appspot.com/?c=new_scanning_species_sequences_statistic&d=resultDocument--2019-02-14T03-31-42UTC both generated from UPDATE 1: It might be related to input .fastq size. It now always fails on
UPDATE 2: Stopped and re-deployed to try with another file, 189 KB, got this error instead
If I add the small |
I've been trying to adapt various components to the new project_id and buckets. There are some parts of the pipeline command that I either need help with, or am unsure where the files are, or on what the paths are on.
com.theappsolutions...
servicesUrl
currently points to a generic apache installation success page. Are there further setup steps associated? Also, should this IP address be changed, or remain exactly as is?bwa
andkalign
endpoints andDB.fasta
files, where are they? I could not find them in the buckets nor on the dockerallenday/bwa-http-docker
, and they are probably not bucket paths since there's nogs://
. Are the .cgi files generated by bwa-mem on install?resistantGenes
stuff in buckets, could you please share them, or where I could find them?The text was updated successfully, but these errors were encountered: