Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong compilation order (uberjar, check) (WAS: with linux x86-64) #2508

Closed
ikitommi opened this issue Dec 16, 2018 · 15 comments · Fixed by #2518
Closed

Wrong compilation order (uberjar, check) (WAS: with linux x86-64) #2508

ikitommi opened this issue Dec 16, 2018 · 15 comments · Fixed by #2518

Comments

@ikitommi
Copy link

ikitommi commented Dec 16, 2018

Locally everything works fine, but when lein uberjar is run with default Clojure Docker container, the compilation order is different and the compilation fails as the Protocols satisfy? doesn't match to values.

I believe the compilation order should be the same (working one) on all platforms?

lein uberjar

Here's a sample project: https://github.com/ikitommi/uberjar-problem, with guides on how to reproduce this.

Original problem: metosin/muuntaja#90

lein check

Saw the same with lein check: works locally ok, but not on travis:

Compiling namespace muuntaja.format.msgpack
Compiling namespace muuntaja.interceptor
Compiling namespace muuntaja.middleware
Compiling namespace muuntaja.format.edn
Compiling namespace muuntaja.format.core ;; the defined protocol
Compiling namespace muuntaja.format.json
Compiling namespace muuntaja.format.transit
Compiling namespace muuntaja.parse
Compiling namespace muuntaja.core ;; fails here
Compiling namespace muuntaja.core
Compiling namespace muuntaja.format.core ;; the defined protocol
Compiling namespace muuntaja.format.edn
Compiling namespace muuntaja.format.json
Compiling namespace muuntaja.format.transit
Compiling namespace muuntaja.interceptor
Compiling namespace muuntaja.middleware
Compiling namespace muuntaja.parse
Compiling namespace muuntaja.protocols
Compiling namespace muuntaja.util
Compiling namespace muuntaja.format.cheshire
Compiling namespace muuntaja.format.yaml
@technomancy
Copy link
Owner

Can you provide a repro case outside docker?

@ikitommi
Copy link
Author

Only by setting the order manually to the same as compiled inside docker.

lein with-profile uberjar/fail uberjar

@technomancy
Copy link
Owner

I see; I misread this as the order of the tasks was different, not the order of the namespaces. This is probably due to implementation details of the underlying filesystem leaking thru, so we ought to explicitly sort them before compilation. Happy to take a patch for this.

@LoganGirard
Copy link

So if I'm understanding this issue correctly, the compilation order for namespaces is not identical across platforms?

Looking at the compile.clj source, it doesn't look like there is any strict ordering enforced by stale-namespaces. If that is the case, how does this error not occur more often by random chance? Adding a simple (sort ...) to the compilable-namespaces function still fails. Do we need to build a directed dependency graph of namespaces to correctly order compilation here?

@miikka
Copy link

miikka commented Dec 23, 2018

@LoganGirard, I believe you're correct.

There are a couple of things going on here:

  1. When you compile a namespace, Clojure reloads the namespace even if it was already loaded.
  2. When you reload a namespace with defprotocol, that protocol gets redefined.
  3. Any reify calls refer to the protocol definition they see at the load time, but any satisfy? calls refer the protocol definition that is available at the run time.
  4. In the example, muuntaja.core calls satisfy? on a reified object when you load the namespace.

Thus if you first compile a namespace that uses reify to implement a protocol (in the example muuntaja.format.edn) and then you compile a namespace that defines the protocol (muuntaja.format), then the reified objects will fail satisfy? calls (in muuntaja.core). The only way to get this right is to compile the namespaces in the dependency order.

Well, maybe Clojure could be changed so that reify would resolve the protocols/interfaces at the runtime so that reloading protocols wouldn't break stuff? I.e. late binding. The current behavior feels a bit surprising to me.

As for why this does not happen more often: probably not that many projects do the point 4? I don't know if the uberjar would actually work if the compilation would succeed.

@ikitommi
Copy link
Author

Thanks for the detailed explanation. Would CLJ-1814 help here? Still, don't think that issue is going to be resolved & deployed any time soon...

@technomancy
Copy link
Owner

how does this error not occur more often by random chance?

This failure does happen a lot! People come by IRC with broken protocols due to this problem pretty frequently, or at least they used to when IRC was more active. It's a big part of why I don't use protocols; they cause really weird, difficult-to-debug errors where two things look identical but behave differently depending on load order. Just a huge pain in the neck.

Do we need to build a directed dependency graph of namespaces to correctly order compilation?

There is one fix that will work for the uberjar case (though you'll still run into random bugs in the repl) is to avoid using :all for AOT and simply AOT-compiling your -main entry-point ns. As long as that ns calls :require correctly, all the other namespaces it uses will get AOT compiled transitively from that one compilation.

@LoganGirard
Copy link

So it sounds like this issue is ideally better supported by waiting on Clojure to support it? Any action for us at the moment, barring contributing to the clojure source?

@technomancy
Copy link
Owner

technomancy commented Dec 27, 2018 via email

@RutledgePaulV
Copy link

Sounds like this could explain some seemingly indeterministic behavior we've seen when using aot and protocols... now that this change has been merged do you anticipate another release soon? I'm eager to try it out.

@technomancy
Copy link
Owner

technomancy commented Jan 2, 2019 via email

@ikitommi
Copy link
Author

I would like to reopen this. The compilation-order is now consistent in 2.9.1, which means it fails on all environments. The compilation should be ordered by the dependencies.

@danielcompton danielcompton reopened this Mar 29, 2019
@mrrodriguez
Copy link

mrrodriguez commented Mar 29, 2019

@technomancy

There is one fix that will work for the uberjar case (though you'll still run into random bugs in the repl) is to avoid using :all for AOT and simply AOT-compiling your -main entry-point ns. As long as that ns calls :require correctly, all the other namespaces it uses will get AOT compiled transitively from that one compilation.

It only recently occurred to me that the :aot :all pattern was doing a forceful reload of all inter-related namespaces as it enumerated those in the project.

I'm glad you pointed out this "only provide a main entry point solution" and I strongly feel that that should be the way nearly anyone does :aot. There is a long history of Clojure Jira issues out there around problems with AOT compilation reloading namespaces. I've followed many of these and hit issues with them myself.
I do agree, it often comes down to records/protocols/types/interfaces, but my main takeaway has always been - try to avoid recompiling namespaces.

If you don't have a main entry point that works, that may be questionable as to why since it is a typically an "application" you AOT, not a library.

Also, doing :aot :all seems like it could potentially be wasting a lot of time to continually recompile namespaces that have already been compiled via dependencies of namespaces searched before.

Actually, in a typical ClojureScript build, you do typically just provide a "main entry point" and compilation happens in one pass from there. I see no reason why that wouldn't be the norm for Clojure.

@ikitommi

I would like to reopen this. The compilation-order is now consistent in 2.9.1, which means it fails on all environments. The compilation should be ordered by the dependencies.

Doing a sort in something like an alphabetical way can add consistency, but I don't think it solves the root problems with these overlapping compilation passes. Dependency order probably tends to work alright. I can't think of edge cases there. I still think it is doing more work than necessary and a main entry point is a better approach. Clojure AOT compilation semantics seem to strongly favor a single-pass entry point cycle since it doesn't do much to not break things with ns reloading.

@ikitommi
Copy link
Author

Thing is that many people use :aot :all and that's the default for example in the Luminus Template, which is very popular "Clojure Web Development Made Simple" template.

With the current setting, building an uberjar succeeds by default, but will fail if the user adds a namespace using some protocols and which happens to be named so that it is referenced before the namespaces defining those protocols.

"Don't use protocols of type X in a namespace that starts with alphabet < M" kinda problem.

@ikitommi ikitommi changed the title Wrong compilation order (uberjar, check) with linux x86-64 Wrong compilation order (uberjar, check) (WAS: with linux x86-64) Mar 29, 2019
saitouena added a commit to saitouena/leiningen that referenced this issue Jul 29, 2019
@technomancy
Copy link
Owner

The compilation should be ordered by the dependencies.

Sorry, but I don't think this is going to happen within Leiningen. It's easy to order the compilation by dependencies by specifying the entry points; then Clojure will just do the right thing for you. Using :aot :all is just not the right solution for this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants