Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix local development mode to accept paris-traceroute archives #1025

Open
cristinaleonr opened this issue Oct 11, 2021 · 3 comments
Open

Fix local development mode to accept paris-traceroute archives #1025

cristinaleonr opened this issue Oct 11, 2021 · 3 comments
Labels
review/triage Team should review and assign priority

Comments

@cristinaleonr
Copy link
Contributor

Currently, etl_worker crashes in local development mode when a paris-traceroute archive is supplied as a URL.

Steps to reproduce:

  1. Navigate to cmd/etl_worker within the ETL project.
  2. Run go run ./etl_worker.go -service_port :8080 -output_dir ./output -output local.
  3. Open up another terminal and set the URL variable to some paris-traceroute archive (e.g., URL=gs://archive-measurement-lab/paris-traceroute/2019/11/19/20191119T000000Z-mlab1-ord03-paris-traceroute-0000.tgz).
  4. Run curl "http://localhost:8081/v2/worker?filename=$URL"
  5. The etl_worker crashes with:
2021/10/11 18:48:31 worker.go:174: <nil> creating parser for traceroute gs://archive-measurement-lab/paris-traceroute/2013/05/08/20130508T000000Z-mlab3-akl01-paris-traceroute-0000.tgz
2021/10/11 18:48:31 server.go:3159: http: panic serving [::1]:56948: runtime error: invalid memory address or nil pointer dereference
goroutine 135 [running]:
net/http.(*conn).serve.func1()
        /usr/local/go/src/net/http/server.go:1801 +0xb9
panic({0xd47080, 0x151a520})
        /usr/local/go/src/runtime/panic.go:1047 +0x266
github.com/m-lab/etl/task.(*Task).Close(0x0)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/task/task.go:67 +0x19
panic({0xd47080, 0x151a520})
        /usr/local/go/src/runtime/panic.go:1038 +0x215
github.com/m-lab/etl/task.(*Task).ProcessAllTests(0x4, 0x50)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/task/task.go:85 +0x4f
github.com/m-lab/etl/worker.DoGKETask(_, {{0xc0002ee150, 0x6f}, {0xc0002ee16d, 0x52}, {0xc0002ee155, 0x17}, {0x0, 0x0}, {0xc0002ee16d, ...}, ...})
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/worker/worker.go:209 +0x30
github.com/m-lab/etl/worker.ProcessGKETask({_, _}, {{0xc0002ee150, 0x6f}, {0xc0002ee16d, 0x52}, {0xc0002ee155, 0x17}, {0x0, 0x0}, ...}, ...)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/worker/worker.go:204 +0x4de
main.(*runnable).Run(0xc00000c3c0, {0xfb61e8, 0xc00019a000})
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/cmd/etl_worker/etl_worker.go:313 +0x2f6
main.handleLocalRequest({0xfb1500, 0xc0001f5ea0}, 0x0)
        /usr/local/google/home/cristinaleon/go/src/github.com/m-lab/etl/cmd/etl_worker/etl_worker.go:196 +0x189
net/http.HandlerFunc.ServeHTTP(0x0, {0xfb1500, 0xc0001f5ea0}, 0x0)
        /usr/local/go/src/net/http/server.go:2046 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00022400f, {0xfb1500, 0xc0001f5ea0}, 0xc000432200)
        /usr/local/go/src/net/http/server.go:2424 +0x149
net/http.serverHandler.ServeHTTP({0xc0005bdb90}, {0xfb1500, 0xc0001f5ea0}, 0xc000432200)
        /usr/local/go/src/net/http/server.go:2878 +0x43b
net/http.(*conn).serve(0xc00045c000, {0xfb6258, 0xc0001335c0})
        /usr/local/go/src/net/http/server.go:1929 +0xb08
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:3033 +0x4e8

Note: this does not happen with other datatypes (e.g., PCAP, hopannotation1, scamper1).

@autolabel autolabel bot added the review/triage Team should review and assign priority label Oct 11, 2021
@stephen-soltesz
Copy link
Contributor

Ah, so there are currently two processing paths in the etl_worker: one for the "v1" system, and another for the "v2" system.

  • the "v1" path (/worker) is used by the v1 pipeline, web100 data types (and any others not yet migrated to v2).
  • the "v2" path (/v2/worker) is used by the v2 pipeline and ordinarily runs in the GKE environment. v2 does not yet support all data types. The "Traceroute migration" work is in part to solve this.

So, try the same GCS URL with the resource path /worker instead.

@stephen-soltesz
Copy link
Contributor

And, I think the next issue will be that the v1 system does not support local output.

@cristinaleonr
Copy link
Contributor Author

Thanks for clarifying!

I tried with the /worker path. I think you're right about the v1 system not supporting local output, because now the output is this error:
2021/10/12 13:57:26 insert.go:299: InsertErr googleapi: Error 400: The destination table is invalid: projec_id , dataset_id base_tables, table_id: traceroute., invalid on traceroute_20191119

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review/triage Team should review and assign priority
Projects
None yet
Development

No branches or pull requests

2 participants