Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with pipeline command #23

Closed
Firedrops opened this issue Feb 4, 2019 · 58 comments
Closed

Help with pipeline command #23

Firedrops opened this issue Feb 4, 2019 · 58 comments

Comments

@Firedrops
Copy link
Contributor

Firedrops commented Feb 4, 2019

I've been trying to adapt various components to the new project_id and buckets. There are some parts of the pipeline command that I either need help with, or am unsure where the files are, or on what the paths are on.

image

  1. Nevermind solved, ignore the rectangle around com.theappsolutions...
  2. The 2nd red rectangle, servicesUrl currently points to a generic apache installation success page. Are there further setup steps associated? Also, should this IP address be changed, or remain exactly as is?
  3. The following bwa and kalign endpoints and DB.fasta files, where are they? I could not find them in the buckets nor on the docker allenday/bwa-http-docker, and they are probably not bucket paths since there's no gs://. Are the .cgi files generated by bwa-mem on install?
  4. the resistantGenes stuff in buckets, could you please share them, or where I could find them?
@obsh
Copy link
Collaborator

obsh commented Feb 4, 2019

2, 3
Following options - "servicesUrl", "bwaEndpoint", "bwaDatabase", "kAlignEndpoint" - configure how HTTP requests are made from data-pipeline to cluster of aligment machines.

so for:
--servicesUrl=http://130.211.33.64
--bwaEndpoint=/cgi-bin/bwa.cgi
--bwaDatabase=DB.fasta
The system will post request with fastq data to http://130.211.33.64/cgi-bin/bwa.cgi with parameter database=DB.fasta

130.211.33.64 - is the public IP address of existing alignment cluster,
for http://130.211.33.64/ it returns default Apache page, but there are working endpoints:
http://130.211.33.64/cgi-bin/bwa.cgi
and http://130.211.33.64/cgi-bin/kalign.cgi

Cluster with aligner could be provisioned using sh files from "aligner" directory.

Regarding files bwa.cgi and kalign.cgi - they are baked into Docker image during build stage, see Dockerfile:
https://github.com/allenday/bwa-http-docker/blob/master/http.Dockerfile

@Firedrops
Copy link
Contributor Author

Regarding files bwa.cgi and kalign.cgi - they are baked into Docker image during build stage, see Dockerfile:
https://github.com/allenday/bwa-http-docker/blob/master/http.Dockerfile

I'm a little confused about this as well, do you know what is being copied at the COPY lines? Is it FROM google/cloud-sdk? What exactly is google/cloud-sdk and can I look at what is there?

@obsh
Copy link
Collaborator

obsh commented Feb 4, 2019

"google/cloud-sdk" is a docker image owned by Google with gcloud built inside.
See on dockerhub:
https://hub.docker.com/r/google/cloud-sdk/

@obsh
Copy link
Collaborator

obsh commented Feb 4, 2019

Regarding 4.
For now I've uploaded files to "nano-stream" bucket, see:
https://console.cloud.google.com/storage/browser/nano-stream/ResistanceGenes/?project=nano-stream

@Firedrops
Copy link
Contributor Author

Thank you for sharing those resistant files!

"google/cloud-sdk" is a docker image owned by Google with gcloud built inside.
See on dockerhub:
https://hub.docker.com/r/google/cloud-sdk/

so does COPY copy the files like bwa.cgi from google/cloud-sdk to the docker image being generated? That does not really make sense to me, because why would Google have those files in their docker?

@obsh
Copy link
Collaborator

obsh commented Feb 4, 2019

so does COPY copy the files like bwa.cgi from google/cloud-sdk to the docker image being generated? That does not really make sense to me, because why would Google have those files in their docker?

It's just specifics of the Dockerfile syntax, it works in the following way - "FROM" instruction defines base image from which new image is built.
"COPY" instruction - copies files from the local folder into image. In our case this files are:
https://github.com/allenday/bwa-http-docker/blob/master/bwa.cgi
and
https://github.com/allenday/bwa-http-docker/blob/master/kalign.cgi

@Firedrops
Copy link
Contributor Author

"COPY" instruction - copies files from the local folder into image. In our case this files are:

Ah I see now... did not realise to view that git before. This would be very helpful. Thank you!

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 5, 2019

Cluster with aligner could be provisioned using sh files from "aligner" directory.

Should these scripts be loaded onto cloud shell and executed from there? Or are they deployed in another way?

Update: Running them from cloud shell did not appear to do anything.

@obsh
Copy link
Collaborator

obsh commented Feb 5, 2019

Could you specify which script did you run and what was the output, please?
This scripts can be run in any environment where you have gcloud installed and initialized.

@Firedrops
Copy link
Contributor Author

Apologies, realised I had to run either provision_species.sh or provision_resistance_genes.sh. Have tried provision_species.sh and it works. I think this issue can be closed now. Thanks!

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 6, 2019

Sorry, just ran them, and realised that the generated instance does not allow HTTP traffic.
I have checked VPC rules that the rule exists

image

but the instance has HTTP unchecked, but 2 other network tags of http and http-server1, neither of which has VPC rules.

image

Is this a typo, and provision.sh should have allow-http added to the line 11 --tags http-server,http \?

Or am I looking for the wrong IP address to put into 2?

UPDATE: realised the target tags of allow-http rule is http-server, so it should be working. Any other idea why I'm unable to access the VM's IP?

UPDATE 2: I have SSH-ed in and looked at docker container with docker ps, the docker container's status is always Restarting. I don't think this is expected behavior, but not quite sure what's the fix yet. Might have something to do with #26

@obsh
Copy link
Collaborator

obsh commented Feb 6, 2019

Could you check the logs of the docker container with docker logs command?
Meanwhile I'll run provision_resistance_genes.sh in a new GCP project in order to reproduce issue.

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 7, 2019

Output from docker logs reveals mainly repeats of these 2 chunks:

..............+++++
...............................................................+++++
e is 65537 (0x010001)
140680312165760:error:28069065:UI routines:UI_set_result:result too small:../crypto/ui/ui_lib.c:765:You must type in 4 to 1023 characters
140680312165760:error:28069065:UI routines:UI_set_result:result too small:../crypto/ui/ui_lib.c:765:You must type in 4 to 1023 characters
140680312165760:error:0906906F:PEM routines:PEM_ASN1_write_bio:read key:../crypto/pem/pem_lib.c:330:
Generating RSA private key, 2048 bit long modulus
........................................................................................................................+++++
................................................+++++
e is 65537 (0x010001)
unable to load Private Key
139751600240000:error:28069065:UI routines:UI_set_result:result too small:../crypto/ui/ui_lib.c:765:You must type in 4 to 1023 characters
139751600240000:error:06065064:digital envelope routines:EVP_DecryptFinal_ex:bad decrypt:../crypto/evp/evp_enc.c:536:
139751600240000:error:0906A065:PEM routines:PEM_do_header:bad decrypt:../crypto/pem/pem_lib.c:439:
Generating RSA private key, 2048 bit long modulus
.+++++
..................+++++

I figured the problem might be with the openssl code in entrypoint.sh, which is currently

openssl genrsa -des3 -passout pass:x -out /etc/apache2/ssl/pass.key 2048
openssl rsa -passin pass:x -in /etc/apache2/ssl/pass.key -out /etc/apache2/ssl/server.key
cat /tmp/ssl-info.txt | openssl req -new -key /etc/apache2/ssl/server.key -out /etc/apache2/ssl/server.csr
openssl x509 -req -days 365 -in /etc/apache2/ssl/server.csr -signkey /etc/apache2/ssl/server.key -out /etc/apache2/ssl/server.crt

I have found these 2 threads which might be related, but their code has slightly different format, not sure if they're used in the same context
Fedora "Issue"
Servefault thread

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 7, 2019

Realised I had mixed up the 2 entrypoint.sh, and was accidentally running the one in the root folder instead of the /http folder. They have been fixed now, but provision_species.html still does not make an accessible external IP.

@Firedrops
Copy link
Contributor Author

Turns out these problems were due to my adapter docker container not being set up properly. I have tried with allenday's original bwa, and it works. However, the external IP 34.85.27.91 is still not accessible.

Is this the right IP to be looking at to substitute into (2) --servicesUrl=http://130.211.33.64?

@obsh
Copy link
Collaborator

obsh commented Feb 7, 2019

As I see, external IP 34.85.27.91 is accessible now, isn't it??
Right, this should be used as --servicesUrl=http://34.85.27.91

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 7, 2019

Ah it is! I guess the start-up time was longer than I expected. Thanks!

Running the main java cp (path to jar).... code on the Cloud Shell returns this:
-bash: --bwaEndpoint=/cgi-bin/bwa.cgi: No such file or directory
seems like those are local paths, so this command should be run be run from inside the VM via SSH, is that correct?

@obsh
Copy link
Collaborator

obsh commented Feb 7, 2019

it looks like there are some newlines not-escaped after parameters in multiline command,
please check that each newline after parameter is escaped like that:
--servicesUrl=http://34.85.27.91 \
--bwaEndpoint=/cgi-bin/bwa.cgi \
...

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 7, 2019

Yep, that was the problem, one of my lines had an extra space after the . Great thing we have your experience!

For reference, this is the full code snippet I'm trying to run now:

java -cp /home/coingroupimb/git_larry_2019-02-06/NanostreamDataflowMain/build/NanostreamDataflowMain.jar \
  com.theappsolutions.nanostream.NanostreamApp \
  --runner=org.apache.beam.runners.dataflow.DataflowRunner \
  --project=nano-stream1 \
  --streaming=true \
  --processingMode=species \
  --inputDataSubscription=projects/nano-stream1/topics/file_upload \
  --alignmentWindow=20 \
  --statisticUpdatingDelay=30 \
  --servicesUrl=http://34.85.27.91 \
  --bwaEndpoint=/cgi-bin/bwa.cgi \
  --bwaDatabase=DB.fasta \
  --kAlignEndpoint=/cgi-bin/kalign.cgi \
  --outputFirestoreDbUrl=https://nano-stream1.firebaseio.com \
  --outputFirestoreSequencesStatisticCollection=resistant_sequences_statistic \
  --outputFirestoreSequencesBodiesCollection=resistant_sequences_bodies \
  --outputFirestoreGeneCacheCollection=resistant_gene_cache \

now I'm getting this error:

Error occurred during initialization of VM
java.lang.Error: Properties init: Could not determine current working directory.
        at java.lang.System.initProperties(Native Method)
        at java.lang.System.initializeSystemClass(System.java:1166)

It appears Java wasn't installed by default on the cloud VM, will installing it be a fix? I had the idea that dataflow wasn't running directly on the cloud VM instance...

Update: Installed Java11 and ran it again, got this scary message:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.inject.internal.cglib.core.$ReflectUtils$1 (file:/home/coingroupimb/git_larry_2019-02-06/NanostreamDataflowMain/build/NanostreamDataflowMain.jar) to method java.lang.ClassLoader.defineClass(java.lang.String,byte[],int,int,java.security.ProtectionDomain)
WARNING: Please consider reporting this to the maintainers of com.google.inject.internal.cglib.core.$ReflectUtils$1
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
[main] INFO org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory - No tempLocation specified, attemptingto use default bucket: dataflow-staging-us-central1-465460488211
[main] INFO org.apache.beam.runners.dataflow.options.DataflowPipelineOptions$StagingLocationFactory - No stagingLocation provided, falling back to gcpTempLocation
Exception in thread "main" java.lang.RuntimeException: Failed to construct instance from factory method DataflowRunner#fromOptions(interface org.apache.beam.sdk.options.PipelineOptions)
        at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:224)
        at org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:155)
        at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)
        at org.apache.beam.sdk.Pipeline.create(Pipeline.java:145)
        at com.theappsolutions.nanostream.NanostreamApp.main(NanostreamApp.java:80)
Caused by: java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
        ... 4 more
Caused by: java.lang.IllegalArgumentException: Unable to use ClassLoader to detect classpath elements. Current ClassLoader is jdk.internal.loader.ClassLoaders$AppClassLoader@4b85612c, only URLClassLoaders are supported.
        at org.apache.beam.runners.core.construction.PipelineResources.detectClassPathResourcesToStage(PipelineResources.java:57)
        at org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:270)
        ... 9 more

@pseveryn
Copy link
Contributor

pseveryn commented Feb 7, 2019

Regarding the official Apache Beam documentation, current SDK doesn't support JAVA 11 version (https://beam.apache.org/roadmap/java-sdk/). Now the official docs recommend using JAVA 8 (https://beam.apache.org/get-started/quickstart-java/#set-up-your-development-environment)

@Firedrops
Copy link
Contributor Author

Regarding the PubSub subscription, does it matter if it's a PUSH or PULL type?

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 8, 2019

Attempted running with the new jar (pushed yesterday), came up with these errors that were not present before:
this line around 10 times:

[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2019-02-08T03:29:15.620Z: Unable to bring up enough workers.  Will retry in 5 seconds.

then terminated with this chunk

[main] ERROR org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2019-02-08T03:29:37.738Z: Workflow failed. Causes: Unable to bring up enough workers: minimum 1, actual 0. Please check your quota and retry later, or please try in a different zone/region.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2019-02-08T03:29:37.929Z: Cleaning up.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2019-02-08T03:29:37.958Z: Worker pool stopped.
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2019-02-08T03:29:37.970Z: Stopping worker pool...
[main] INFO org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler - 2019-02-08T03:29:37.979Z: Worker pool stopped.
[main] INFO org.apache.beam.runners.dataflow.DataflowPipelineJob - Job 2019-02-07_19_26_52-9841017688330516146 failed with status FAILED.

didn't run into this with the previous NanostreamDataflowMain.jar from a few days ago.

Update 1:
Have tried specifiying --region=asia-northeast1 to match the region of our provisioning cluster, ran into a quota issue where all 8 of our vCPU cores are being used by the provisioning instance.
Have submitted a request to increase quota to 16 vCPUs, Google e-mail says it will take about 2 business days to process.

In the meantime, it'd be good to hear everyones inputs on whether I'm on the right troubleshooting track. I'm having doubts since earlier I could run the old jar in the default region, us-central.

Update 2:
Rolled back to yesterday's version before changes were made to the .jar, same issues as above regarding worker nodes and quotas. Anyone know what might be going on? Why did it deploy successfully ~3 hours ago, but never again?

image

Update 3:
Requesting further quotas failed due to Lachlan's GCP account being "free trial". I "activated" it (still free until we run out of the USD$300) and it automatically increased existing quota to 24 vCPUs, which is enough. The updated pipeline deployed successfully.

image

But nothing seems to be happening. PubSub notifications are being sent properly.

Firebase collections remain empty.

Currently I have it set up to use the same PUSH subscription as the monitoring website. Does this pipeline require PULL? I did not notice PULL functions in the java source, so I assumed it's PUSH.

--alignmentWindow=20 does this line mean that alignment only gets sent when there are 20 fastq files? Or every 20 seconds?

@pseveryn
Copy link
Contributor

pseveryn commented Feb 8, 2019

Regarding the PubSub subscription, does it matter if it's a PUSH or PULL type?

You should use a PULL type. Here is an example of some of our subscribtions:
image

@pseveryn
Copy link
Contributor

pseveryn commented Feb 8, 2019

--alignmentWindow=20 does this line mean that alignment only gets sent when there are 20 fastq files? Or every 20 seconds?

It means that we used 20 seconds windows to gather fastq data into the batch to make alignment in the next step

@pseveryn
Copy link
Contributor

pseveryn commented Feb 8, 2019

Currently I have it set up to use the same PUSH subscription as the monitoring website. Does this pipeline require PULL? I did not notice PULL functions in the java source, so I assumed it's PUSH.

Please, try with PULL type

@Firedrops
Copy link
Contributor Author

I have switched to PULL, and after fixing some random problems with PubSub today, there is finally activity in Dataflow:

image

image

However, the Firestore remains empty except for the pre-generated file. It currently looks like this:

image

Are there further steps to set up the Firestore that we missed?

@obsh
Copy link
Collaborator

obsh commented Feb 9, 2019

It’s hard to say from the pipeline pictures what exactly goes wrong.
Alignment step looks a bit suspicious, as the wall time there is "14min 51 sec” and for next step - “0 sec”.

Could you check the following thins in the pipeline, please:

  1. general pipeline errors:

pipeline_errors

  1. statistics for alignment step, click on the step to reveal statistics on the right panel:

step_statistics

if above two doesn’t show the problem, statistics for other steps should be checked.

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 10, 2019

From the aligner "14m" step:

image

So I went back to check the compute instance and VPC rules, everything looks normal. This summary also shows proper tagging of rules and correct external IP.

image

What else could be wrong? Could it be using the custom tag of http-server instead of default-allow-http be causing confusion?

image

Or could it be the Dataflow pipeline being unable to connect to external IPs?

@obsh
Copy link
Collaborator

obsh commented Feb 10, 2019

Or could it be the Dataflow pipeline being unable to connect to external IPs?

I don't think there is a problem with firewall rules, as we checked previously, that Apache page is available from the public internet.

Maybe Load-balancer's default timeout of 30 seconds is too small and that's why connections timed out.

image

Try to increase timeout, with gcloud:

gcloud compute backend-services update bwa-resistance-genes-backend-service --timeout=600 --global

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 11, 2019

Thanks! That got it through alignment, and it works

image

as does next step, Extract Sequences,

image

but Group by SAM reference doesn't seem to be receiving anything. The log remains empty even at Any log level.

image

Do you know what might be going on here? I assume the VM with dockers have already finished their job by this stage?

UPDATE: Re-deployed the updated pipeline and japsa committed ~13 hours ago by Allen, still running into the same issue at Group by SAM reference

@obsh
Copy link
Collaborator

obsh commented Feb 11, 2019

It looks that there were no matched sequences after alignment stage.
“Extract Sequences” step has condition:
https://github.com/allenday/nanostream-dataflow/blob/master/NanostreamDataflowMain/src/main/java/com/theappsolutions/nanostream/aligner/GetSequencesFromSamDataFn.java#L31
image

that skips all records, where reference name equals “*”.

I think it was the case, because there is a log record about number of SAMRecord items in this step, but there is no output or errors logged.

I've noticed, that you were using IP address of the bwa-resistance-genes VM, while running pipeline species--2019-02-11t13-46-23ddut in the species mode. Could it be the cause, that there were no matches to the reference sequences?

Also regarding aligner - you should use Load Balancer's IP address:
image
other way auto-scaling won't work, as all requests will hit single server.
I'll update documentation regarding Load Balancer's IP.

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 12, 2019

Also regarding aligner - you should use Load Balancer's IP address:
other way auto-scaling won't work, as all requests will hit single server.
I'll update documentation regarding Load Balancer's IP.

I've updated the IP to 35.201.96.177 and re-deployed the pipeline. Ran the timeout=600 command again as well, just in case. It now gets stuck at Alignment with
Status: 200, response length: 0
image

UPDATE 1: Since I didn't have this error before, I tried deploying with the old IP address again (of just the single VM instance). It also returned the same error.

UPDATE 2: Sending in a single big fastq file (~160mb) will stop the pipeline much earlier, even the Alignment step will show no logs at all instead of Status: 200. Is this an unexpected limitation?

UPDATE 3: I see a worker VM instance, species--2019-02-12t13-29-02111930-jvqa-harness-wdnc, running like 4 docker containers with only 15 GB memory. Is this causing a bug, as the database files total ~26 GB? 15 GB would've been enough for the previous database ~12 GB, but not the new ones we're testing with. I've been looking for where this 15 GB was specified, but could not find it to change it.

UPDATE 4: I have cleared the previous provisioning and dataflow setups, and re-deployed with the original CombinedDatabases which are only ~12 GB, unfortunately, I'm seeing a new set of errors at Alignment step:

image

UPDATE 5: I cleared and re-deployed provisioning and dataflow AGAIN, I'm back to having only Status 200 errors, as before. Still doesn't fully work. Checking the provisioning VM instance's subpages, http://35.243.64.91/cgi-bin/bwa.cgi returns content, but http://35.243.64.91/cgi-bin/kalign.cgi returns HTTP ERROR 400. Could be a problem there?

@obsh
Copy link
Collaborator

obsh commented Feb 12, 2019

I've checked VM instances, there is a deployed file genomeDB.fasta,
therefore pipeline should be run with the --bwaDatabase=genomeDB.fasta and currently it's running with --bwaDatabase=DB.fasta wich results in empty response.

@Firedrops
Copy link
Contributor Author

OK so we will do a new build of japsa and uber-jar.
Should anything be done about http://35.243.64.91/cgi-bin/kalign.cgi returning HTTP ERROR 400, and/or the NCBI querying?

@obsh
Copy link
Collaborator

obsh commented Feb 12, 2019

Should anything be done about http://35.243.64.91/cgi-bin/kalign.cgi returning HTTP ERROR 400

No, 400 for the GET requests is expected, you can check that it works by submitting POST request with sample data:

curl -v -F fasta=@NanostreamDataflowMain/src/test/resources/kAlignResult.txt http://35.243.64.91/cgi-bin/kalign.cgi

Update:
By the way, you can check alignment with:

curl -v -F database='genomeDB.fasta' -F fastq=@NanostreamDataflowMain/src/test/resources/fasqQOutputData.txt http://35.243.64.91/cgi-bin/bwa.cgi

@obsh
Copy link
Collaborator

obsh commented Feb 12, 2019

and/or the NCBI querying?

It's not clear at the moment what's the issue there, I'll try to debug it.

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 12, 2019

I have attempted to build the uber jar from the Cloud Shell after installing Maven,

At the last step, mvn clean package, after a few minutes of outputs, it ends with this
image

Looking through the surefire-reports, the error is found in com.theappsolutions.nanostream.EndToEndPipelineTest.txt

-------------------------------------------------------------------------------
Test set: com.theappsolutions.nanostream.EndToEndPipelineTest
-------------------------------------------------------------------------------
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 55.248 s <<< FAILURE! - in com.theappsolutions.nanostream.EndToEndPipelineTest
testEndToEndPipelineSpeciesMode(com.theappsolutions.nanostream.EndToEndPipelineTest)  Time elapsed: 55.246 s  <<< ERROR!
org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: cannot encode a null List
	at com.theappsolutions.nanostream.EndToEndPipelineTest.testEndToEndPipelineSpeciesMode(EndToEndPipelineTest.java:114)
Caused by: java.lang.RuntimeException: org.apache.beam.sdk.coders.CoderException: cannot encode a null List
Caused by: org.apache.beam.sdk.coders.CoderException: cannot encode a null List

Is this something worth troubleshooting, or is it easier for you to do another build and commit?

@obsh
Copy link
Collaborator

obsh commented Feb 12, 2019

I've committed a new build and created separate issue #54 to deal with failing test.

@Firedrops
Copy link
Contributor Author

Does the same test fail for you when you build it too?

@obsh
Copy link
Collaborator

obsh commented Feb 12, 2019

Does the same test fail for you when you build it too?

No, for me this test executes successfully. I'm not sure right now why it's failing in your case.
Maybe we'll exclude it from the default test-suite, as this test do real HTTP calls to alignment endpoints.

@Firedrops
Copy link
Contributor Author

Ok, I have ignored that and re-cloned the updated git.

I am now running into this problem

image

the Alignment step is running much faster than real time (i.e. ~5 minutes have passed IRL, it's showing 21 minutes)
The logs are split into 2 distinct sections,
the top section don't specify what error is occuring, but it's at MakeAlignmentViaHttpFn,
the bottom section is Status: 200, rerponse length: 0 like yesterday, but I've made sure that --bwaDatabase=genomeDB.fasta.

@obsh
Copy link
Collaborator

obsh commented Feb 13, 2019

As I see there is an element the passed Alignment step, and finally there are results into Firestore.

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 13, 2019

Yes, there are. So are these errors safe to ignore? Or are we getting incomplete results?

@obsh
Copy link
Collaborator

obsh commented Feb 13, 2019

I think we have logger misconfiguration, because this logs should be actually "INFO" level.
image
And in message text it says "INFO" for both types of message:
image

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 13, 2019

I see, safe for us to ignore then.

I have been unable to get the (visualization app](https://nano-stream1.appspot.com/) to show data from firestore.

From Firebase, I get this code

<script src="https://www.gstatic.com/firebasejs/5.8.2/firebase.js"></script>
<script>
  // Initialize Firebase
  var config = {
    apiKey: "AIzaSyDLtxwk4r3ahh-R7aTGIXlMvgrBi5pc_P0",
    authDomain: "nano-stream1.firebaseapp.com",
    databaseURL: "https://nano-stream1.firebaseio.com",
    projectId: "nano-stream1",
    storageBucket: "nano-stream1.appspot.com",
    messagingSenderId: "465460488211"
  };
  firebase.initializeApp(config);
</script>

And I have updated those var config values from the previous upwork-nano-stream everywhere I could find it. I have also tried pasting that entire code into sunburst.html's <head> section. Both cases only show No Data Available

@obsh
Copy link
Collaborator

obsh commented Feb 13, 2019

I've updated Firestore security configuration to allow public read access:
image

And added collection and document names to URL:
https://nano-stream1.appspot.com/?c=resistant_sequences_statistic&d=resultDocument--2019-02-13T00-28-52UTC

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 13, 2019

I see, thank you! The public read access part should probably be added to the readme.

Regarding the URL, is there any way to make it more user-friendly? For example, add code that scans resistant_sequences_statistic for entries and auto-generate hotlinks on nano-stream1.appspot.com, which will become just a 'hub'?

@Firedrops
Copy link
Contributor Author

I think this issue can be closed, and create other issues for individual feature requests and optimization?

@obsh
Copy link
Collaborator

obsh commented Feb 13, 2019

I think this issue can be closed, and create other issues for individual feature requests and optimization?

Yes, I'll close it now, it becoming a bit hard to navigate this issue's page)

The public read access part should probably be added to the readme.

Regarding the URL, is there any way to make it more user-friendly? For example, add code that scans resistant_sequences_statistic for entries and auto-generate hotlinks on nano-stream1.appspot.com, which will become just a 'hub'?

Totally agree, we'll work on it. #56 #57

@Firedrops
Copy link
Contributor Author

Sorry, we noticed that the results were odd, only staph strains were detected, and in exactly equal proportions. Other bacteria like helicobacter were absent.

So we ran the same .fastq file again, this time we got error:

image

And many minutes later, no further entries were added to resistant_sequences_statistic

@Firedrops
Copy link
Contributor Author

@obsh sorry just tagging you for notifications, not sure if closed topics still send them automatically.

We ran the pipeline a few more times, it is quite inconsistent. Sometimes it gives the above errors, sometimes they don't, even though I use the exact same deployment command each time.

@obsh
Copy link
Collaborator

obsh commented Feb 14, 2019

@obsh sorry just tagging you for notifications, not sure if closed topics still send them automatically.

No problem at all.

We ran the pipeline a few more times, it is quite inconsistent. Sometimes it gives the above errors, sometimes they don't, even though I use the exact same deployment command each time.

I've made adjustments to the alignment step in #65 , it should work more stable now.
Build file is updated, but also I've marked integration test as ignored for now, so you can try to build jar using mvn clean install.

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 14, 2019

My first attempt today after a clean clone and mvn clean install with no errors, gave the above error again

 2019-02-14 (12:03:33) Processing stuck in step Alignment for at least 05m00s without outputting or completing in state pro...
Processing stuck in step Alignment for at least 05m00s without outputting or completing in state process
  at java.net.SocketInputStream.socketRead0(Native Method)
  at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
  at java.net.SocketInputStream.read(SocketInputStream.java:170)
  at java.net.SocketInputStream.read(SocketInputStream.java:141)
  at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
  at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
  at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
  at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
  at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
  at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
  at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
  at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
  at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
  at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
  at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
  at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
  at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
  at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
  at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
  at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
  at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
  at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)

and 5 minutes later:

2019-02-14 (12:08:33) Processing stuck in step Alignment for at least 10m00s without outputting or completing in state pro...
Processing stuck in step Alignment for at least 10m00s without outputting or completing in state process
  at java.net.SocketInputStream.socketRead0(Native Method)
  at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
  at java.net.SocketInputStream.read(SocketInputStream.java:170)
  at java.net.SocketInputStream.read(SocketInputStream.java:141)
  at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
  at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
  at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
  at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
  at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
  at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
  at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
  at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
  at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
  at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
  at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
  at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
  at org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
  at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
  at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
  at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
  at com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
  at com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
  at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)
  at com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn$DoFnInvoker.invokeProcessElement(Unknown Source)

org.apache.http.client.ClientProtocolException: Unexpected response status: 502
        com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:39)
        com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:17)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:223)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
        com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
        com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
        com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)

Hang on, Lachlan just told me this was a problem in japsa that was just fixed, let me try with the new release.

@Firedrops
Copy link
Contributor Author

Firedrops commented Feb 14, 2019

OK it's running, but still inconsistently. Sometimes works, sometimes gives that 5 mins error.Also I noticed that on a run that worked, 2 collections were generated, ~35 seconds apart. Is the sessioning being overzealous?

https://nano-stream1.appspot.com/?c=new_scanning_species_sequences_statistic&d=resultDocument--2019-02-14T03-31-42UTC
and
https://nano-stream1.appspot.com/?c=new_scanning_species_sequences_statistic&d=resultDocument--2019-02-14T03-31-07UTC

both generated from Erwinia_amylovora.fastq

UPDATE 1: It might be related to input .fastq size. It now always fails on 20170731_GP01_MNP_nohuman, which is 866 kb. If I feed another .fastq, regardless of size, after getting the 5 minute errors, I get these errors:

 2019-02-14 (14:15:05) java.net.SocketException: Broken pipe
java.net.SocketException: Broken pipe
        java.net.SocketOutputStream.socketWrite0(Native Method)
        java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
        java.net.SocketOutputStream.write(SocketOutputStream.java:153)
        org.apache.http.impl.io.SessionOutputBufferImpl.streamWrite(SessionOutputBufferImpl.java:124)
        org.apache.http.impl.io.SessionOutputBufferImpl.flushBuffer(SessionOutputBufferImpl.java:136)
        org.apache.http.impl.io.SessionOutputBufferImpl.write(SessionOutputBufferImpl.java:167)
        org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)
        org.apache.http.entity.mime.content.StringBody.writeTo(StringBody.java:174)
        org.apache.http.entity.mime.AbstractMultipartForm.doWriteTo(AbstractMultipartForm.java:134)
        org.apache.http.entity.mime.AbstractMultipartForm.writeTo(AbstractMultipartForm.java:157)
        org.apache.http.entity.mime.MultipartFormEntity.writeTo(MultipartFormEntity.java:113)
        org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:156)
        org.apache.http.impl.conn.CPoolProxy.sendRequestEntity(CPoolProxy.java:160)
        org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:238)
        org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
        org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
        org.apache.http.impl.execchain.ServiceUnavailableRetryExec.execute(ServiceUnavailableRetryExec.java:85)
        org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
        org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
        com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
        com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
        com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49) 

image

UPDATE 2: Stopped and re-deployed to try with another file, 189 KB, got this error instead

 2019-02-14 (14:22:06) org.apache.http.client.ClientProtocolException: Unexpected response status: 502
org.apache.http.client.ClientProtocolException: Unexpected response status: 502
        com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:39)
        com.theappsolutions.nanostream.http.NanostreamResponseHandler.handleResponse(NanostreamResponseHandler.java:17)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:223)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
        org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
        com.theappsolutions.nanostream.util.HttpHelper.executeRequest(HttpHelper.java:105)
        com.theappsolutions.nanostream.http.NanostreamHttpService.generateAlignData(NanostreamHttpService.java:58)
        com.theappsolutions.nanostream.aligner.MakeAlignmentViaHttpFn.processElement(MakeAlignmentViaHttpFn.java:49)

image

If I add the small Erwinia after this, I get the stuck in alignment 5 minutes error again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants