Support Scala 2.12 #332

koertkuipers · 2019-06-14T17:37:28Z

Spark 2.4 has Scala 2.11 and 2.12 builds. Starting with Spark 3.0 Scala 2.12 will become default and Scala 2.11 will be dropped.

This depends on #184

This will require:

Changes to the gradle build support scala 2.11 and 2.12. See kafka for an example how to do this.
Some hopefully minor code changes for compilation in Scala 2.12
Updates/upgrades to dependencies. Some are not build for Scala 2.12 at all, other only have newer versions available for Scala 2.12

koertkuipers · 2019-06-14T17:41:49Z

Starting from #184 first issue i ran into is gradle-scalastyle-plugin not found for scala 2.12. resolved by adding this maven repo:

        maven {
            name = "ngbinh"
            url = "https://dl.bintray.com/ngbinh/maven"
        }

koertkuipers · 2019-06-14T17:43:20Z

To stop scala compiler from exiting with strange messages like assertion failed: ClassBType.info not yet assigned: Lorg/apache/spark/rdd/RDD; i removed -optimize flag

tovbinm · 2019-06-14T20:32:37Z

Just a heads up - with merged #274 and with Scala 2.12 compile the majority of the tests will fail, e.g. UnaryLamdaTransformerTest, due to stage serialization issues. We need to update all lambda functions with concrete classes as explained in #274

tovbinm · 2019-06-14T20:32:59Z

Thanks for making the effort to upgrade and support both Scala 2.11/12 versions @koertkuipers !!

koertkuipers · 2019-06-16T20:42:42Z

after publishing my own in-house versions of xgboost and mleap it seems everything compiles and tests fine using scala 2.12. seemed a little too easy... wonder if i am not running tests correctly.

this is before #274
i get a bunch of merge conflicts with that one that i need to take a look at. since i didnt make any significant changes the merge conflicts are likely with spark-2.4 branch.

koertkuipers · 2019-06-16T20:57:22Z

xgboost not having scala version in package name doesnt help
e.g. xgboost4j-spark-0.90.jar vs xgboost4j-spark_2.11-0.90.jar

koertkuipers · 2019-06-17T18:38:43Z

see also dmlc/xgboost#4574

koertkuipers · 2019-06-21T00:25:13Z

i build xgboost 0.90 for scala 2.11/2.12 here and published it to our bintray

i build mleap for scala 2.11/2.12 here and published it also to our bintray

my branch for transmogrifai for scala 2.12 is feat-scala212. its currently set to build for only scala 2.12 since i don't yet know how to do both with gradle in one pass but it also builds fine for scala 2.11 if you simply change scalaVersion and scalaVersionRevision

tovbinm · 2019-06-21T06:06:53Z

Nice! Please pull the latest from https://github.com/salesforce/TransmogrifAI/tree/mt/spark-2.4 since I just updated it.

I would like to try adding a cross build for 2.11/12

koertkuipers · 2019-06-21T14:20:00Z

i expected to see serialization issues after merging in https://github.com/salesforce/TransmogrifAI/tree/mt/spark-2.4 due to #274 but i don't 😕

i see you put a lot of functions in companion objects like:

object BinaryTransformerTest {
  def fn: (Real, RealNN) => Real = (i1, i2) => new Real(for {v1 <- i1.value; v2 <- i2.value} yield v1 / (v2 * v2))
}

do you expect this approach to work for scala 2.12?

tovbinm · 2019-06-21T14:30:35Z

I think it works only because models are serialized and loaded within the same code. If you train a model, save it then recompile the codebase and try to load the model - it would fail.

wsuchy · 2019-06-21T21:34:23Z

@koertkuipers I agree here with @tovbinm, the problem is that in Scala 2.12 all lambda functions are given random names for each time you compile your code. So model trained on your laptop will not load in your production environment due to missing references. We are currently working on the fix for that.

koertkuipers · 2019-06-22T16:11:07Z

@tovbinm @wsuchy is there any way you can think of to test this? would forking for tests do it?

tovbinm · 2019-06-23T05:18:24Z

The test should be constructed a follows:

Train several models (binclass, multi class and regression) with all feature types (example).
Save these models as a part of core/test/resources like this one.
Add a new test that would load these models and score them.

This way we would have a set of models what would contain stages saved from previous JVM runs and would allow us testing if the models would load & produce scores correctly.

tovbinm · 2019-06-23T05:22:15Z

I propose to follow the migration guide for 2.12 in #274 and replace all the lambda functions introduced with concrete classes.

koertkuipers · 2019-07-01T02:20:35Z

after publishing my own in-house versions of xgboost and mleap it seems everything compiles and tests fine using scala 2.12. seemed a little too easy... wonder if i am not running tests correctly.

this is before #274
i get a bunch of merge conflicts with that one that i need to take a look at. since i didnt make any significant changes the merge conflicts are likely with spark-2.4 branch.

i spoke to soon about everything testing fine. i was using compileTestScala but now i tried test which seems to run more tests and some fail. so i got some more work to do.

koertkuipers · 2019-07-07T04:44:00Z

all the tests except for one pass before #274. the only failure before #274 is:

  [-] RecordInsightsLOCO should return the most predictive features for data generated with a strong relation to the label (1.33 secs)

com.salesforce.op.stages.impl.insights.RecordInsightsLOCOTest > RecordInsightsLOCO should return the most predictive features for data generated with a strong relation to the label FAILED
    org.scalatest.exceptions.TestFailedException: 0.953500197613813 was not less than 0.8 The ratio of feature strengths between important and other features should be similar to the ratio of feature importances from Spark's RandomForest

after #274 i have lots of failures. but this is good news as this was expected and we know how to fix them.
the failures look like this:

    java.lang.RuntimeException: Failed to write out stage 'uid_1234'

        Caused by:
        java.lang.RuntimeException: Argument 'transformFn' [com.salesforce.op.stages.Lambdas$$$Lambda$5968/1132504838] cannot be serialized. Make sure com.salesforce.op.stages.Lambdas$$$Lambda$5968/1132504838 has either no-args ctor or is an object, and does not have any external dependencies, e.g. use any out of scope variables.

            Caused by:
            java.lang.RuntimeException: Failed to create an instance of class 'com.salesforce.op.stages.Lambdas$$$Lambda$5968/1132504838'. Class has to either have a no-args ctor or be an object.

                Caused by:
                java.lang.ClassNotFoundException: com.salesforce.op.stages.Lambdas$$$Lambda$5968/1132504838

koertkuipers · 2019-07-08T21:44:29Z

i tried to convert some lambdas to concrete classes as explained in #274 migration notes.

for example in PassengerFeaturesTest.scala this:

def genderFn: Passenger => MultiPickList = p => Set(p.getGender).toMultiPickList

i replaced by:

  class GenderFn extends Function1[Passenger, MultiPickList] {
    def apply(p: Passenger): MultiPickList = Set(p.getGender).toMultiPickList
  }

this did not work. i now get:

        java.io.NotSerializableException: com.salesforce.op.test.PassengerFeaturesTestLambdas$GenderFn

however it works if i add Serializable:

  class GenderFn extends Function1[Passenger, MultiPickList] with Serializable {
    def apply(p: Passenger): MultiPickList = Set(p.getGender).toMultiPickList
  }

koertkuipers · 2019-07-08T21:53:17Z

i tried to convert some lambdas to concrete classes as explained in #274 migration notes.

for example in PassengerFeaturesTest.scala this:

def genderFn: Passenger => MultiPickList = p => Set(p.getGender).toMultiPickList

i replaced by:

  class GenderFn extends Function1[Passenger, MultiPickList] {
    def apply(p: Passenger): MultiPickList = Set(p.getGender).toMultiPickList
  }

this did not work. i now get:

        java.io.NotSerializableException: com.salesforce.op.test.PassengerFeaturesTestLambdas$GenderFn

however it works if i add Serializable:

  class GenderFn extends Function1[Passenger, MultiPickList] with Serializable {
    def apply(p: Passenger): MultiPickList = Set(p.getGender).toMultiPickList
  }

i always thought scala functions extend serializable. but its a little more complex than that:

scala> class X extends Function1[String, String]{ def apply(s: String) = s + "!" }
defined class X

scala> val x = new X
x: X = <function1>

scala> x.asInstanceOf[Serializable]
java.lang.ClassCastException: X cannot be cast to scala.Serializable
  ... 32 elided

scala> val y = { (s: String) => s + "!" }
y: String => String = <function1>

scala> y.asInstanceOf[Serializable]
res6: Serializable = <function1>

koertkuipers · 2019-07-09T15:23:39Z

@koertkuipers I agree here with @tovbinm, the problem is that in Scala 2.12 all lambda functions are given random names for each time you compile your code. So model trained on your laptop will not load in your production environment due to missing references. We are currently working on the fix for that.

@wsuchy @tovbinm is what i am seeing the same issue? or is this something else?

i am seeing:

    java.lang.RuntimeException: Failed to write out stage 'uid_1234'

        Caused by:
        java.lang.RuntimeException: Argument 'transformFn' [com.salesforce.op.stages.Lambdas$$$Lambda$5968/1132504838] cannot be serialized. Make sure com.salesforce.op.stages.Lambdas$$$Lambda$5968/1132504838 has either no-args ctor or is an object, and does not have any external dependencies, e.g. use any out of scope variables.

            Caused by:
            java.lang.RuntimeException: Failed to create an instance of class 'com.salesforce.op.stages.Lambdas$$$Lambda$5968/1132504838'. Class has to either have a no-args ctor or be an object.

                Caused by:
                java.lang.ClassNotFoundException: com.salesforce.op.stages.Lambdas$$$Lambda$5968/1132504838

wsuchy · 2019-07-09T16:03:58Z

@koertkuipers the way I see this working is to explicitly wrap lambdas inside a specific object (one lambda per one object) e.g:

trait TMOGLambda extends Serializable{
  val f: (_) => _
}
object Lambda1 extends TMOGLambda {
 override val f =  (x: Int)=> x.toString
}

And then have this supported by serializer to check presence of that object and hardcode it to always point to f or whatever we name the property holding our lambda.

tovbinm · 2019-07-09T16:42:31Z

I think it cleaner to have classes extend FunctionX base classes, i.e.

class GenderFn extends Function1[Passenger, MultiPickList] with Serializable {
    def apply(p: Passenger): MultiPickList = Set(p.getGender).toMultiPickList
}

wsuchy · 2019-07-09T16:45:56Z

@tovbinm the problem will be supporting it from macros, as at the time we evaluate them there is type information.

koertkuipers · 2019-07-09T17:30:19Z

@tovbinm @wsuchy take a look at tresata-opensource@95095ed i simply replaced lambdas with concrete classes until all the tests passed. its not pretty but it works.

i am unsure if i have to add more tests for workflow independent model loading, e.g. load model from json files.

wsuchy · 2019-07-12T20:17:19Z

Found this out today:

In our hello world example, the code like this:

val sepalLength = FeatureBuilder.Real[Iris].extract(_.sepalLength.toReal).asPredictor

that result in json containing:

 "extractFn" : {
        "className" : "com.salesforce.hw.OpIrisSimple$$anonfun$5",
        "value" : {
          "className" : "com.salesforce.hw.OpIrisSimple$$anonfun$5"
        }
      },

Even though one can instantiate back the workflow, the names are still volatile and depend on the order of occurrence. I think we should throw an exception instead.

koertkuipers · 2019-08-08T21:06:50Z

ok we now have all tests pass in scala 2.12 and are fully caught up with master branch
see:
https://github.com/tresata-opensource/TransmogrifAI/tree/feat-scala212

tovbinm · 2019-08-08T23:47:40Z

That’s outstanding!! It would be great to have it merged into our codebase together with a cross build changes for Scala 2.11/2.12.

tovbinm · 2019-08-24T16:13:59Z

@koertkuipers are there any plans on your end to raise a PR with Scala 2.12 changes? ;)

koertkuipers · 2019-08-25T19:28:36Z

i don't know gradle well, so to make it cross-compile for 2.11 and 2.12 is probably not something i will get to.

…

On Sat, Aug 24, 2019, 12:14 Matthew Tovbin ***@***.***> wrote: @koertkuipers <https://github.com/koertkuipers> are there any plans on your end to raise a PR with Scala 2.12 changes? ;) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#332?email_source=notifications&email_token=AAGIQJEDYZYWRLIQT73OFF3QGFM47A5CNFSM4HYLISR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5CDBOQ#issuecomment-524562618>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAGIQJH6JH5RQIOBI3JPAQ3QGFM47ANCNFSM4HYLISRQ> .

tovbinm · 2019-08-25T19:34:59Z

I can help to add it. Let’s start from what you already have.

Helw150 · 2019-08-30T22:25:19Z

edit: Ah I just saw that you are reverting back to Spark 2.3 so I'll assume this is blocked until the overall repo is ready for 2.4!

@tovbinm I'm not particularly well versed in Gradle, but cross compiled Transmogrifai is something I'm interested in having so I'm happy to spend some time this weekend taking a stab if you don't have free cycles.

Have you had a chance to try from @koertkuipers branch or should I start?

tovbinm · 2019-09-03T02:09:05Z

We are planning to keep Spark 2.3 with Scala 2.11 in our master branch. Additionally, we would maintain the Spark 2.4 with Scala 2.11/12 branch separately, until the team is ready to move to Spark 2.4.

I havent had a chance to review @koertkuipers branch. Where is it?

koertkuipers · 2019-09-03T02:47:04Z

We are planning to keep Spark 2.3 with Scala 2.11 in our master branch. Additionally, we would maintain the Spark 2.4 with Scala 2.11/12 branch separately, until the team is ready to move to Spark 2.4.

I havent had a chance to review @koertkuipers branch. Where is it?

its here:
https://github.com/tresata-opensource/TransmogrifAI/tree/feat-scala212
the changes are minimal

tovbinm · 2019-09-04T05:17:57Z

@koertkuipers it seems that you had to recompile xgboost with scala 2.12. would they not accept the PR as well so that we could use the official release?

koertkuipers · 2019-09-04T15:03:29Z

see:
dmlc/xgboost#4574
it has been merged but no release yet

wsuchy · 2020-05-01T17:37:37Z

@tovbinm @koertkuipers it looks like we've finally got unblocked: https://search.maven.org/artifact/ml.dmlc/xgboost-jvm_2.12

tovbinm · 2020-05-04T16:51:04Z

Sweet! For a start, we need to start on the cross version build, since I assume you folks are not ready for 2.12 yet.

nicodv · 2020-09-11T21:11:44Z

Baby steps: I've done some work cross-version building in a branch based on @koertkuipers fork: https://github.com/salesforce/TransmogrifAI/tree/ndv/scala212

I'm using https://github.com/ADTRAN/gradle-scala-multiversion-plugin, which seems to work well.

EDIT: this has been resolved ----> Looks like we're still blocked by MLeap (and the XGBoost component of that) for 2.12: combust/mleap#697

nicodv · 2020-09-12T05:02:02Z

An observation: the only version of XGBoost that is based on 2.12 (1.2.0) also requires Spark 3 (https://github.com/dmlc/xgboost/releases). So if we want to keep using XGBoost, there may be no way to separate the Scala 2.12 upgrade from the Spark 3 upgrade.

koertkuipers mentioned this issue Jun 18, 2019

Support Spark 3.0.0 #336

Open

tovbinm added the enhancement label Jun 23, 2019

tovbinm mentioned this issue Jul 10, 2019

Convert lambda functions into concrete classes to allow compatibility with Scala 2.11/2.12 #357

Merged

tovbinm mentioned this issue Sep 4, 2019

Spark 2.4 support #402

Merged

nicodv mentioned this issue Mar 18, 2021

[WIP] Scala 2.12 / Spark 3 upgrade #550

Open

Support Scala 2.12 #332

Support Scala 2.12 #332

Comments

koertkuipers commented Jun 14, 2019

koertkuipers commented Jun 14, 2019

koertkuipers commented Jun 14, 2019

tovbinm commented Jun 14, 2019

tovbinm commented Jun 14, 2019 • edited Loading

koertkuipers commented Jun 16, 2019

koertkuipers commented Jun 16, 2019

koertkuipers commented Jun 17, 2019

koertkuipers commented Jun 21, 2019

tovbinm commented Jun 21, 2019 • edited Loading

koertkuipers commented Jun 21, 2019

tovbinm commented Jun 21, 2019

wsuchy commented Jun 21, 2019

koertkuipers commented Jun 22, 2019

tovbinm commented Jun 23, 2019 • edited Loading

tovbinm commented Jun 23, 2019 • edited Loading

koertkuipers commented Jul 1, 2019

koertkuipers commented Jul 7, 2019

koertkuipers commented Jul 8, 2019 • edited Loading

koertkuipers commented Jul 8, 2019

koertkuipers commented Jul 9, 2019

wsuchy commented Jul 9, 2019

tovbinm commented Jul 9, 2019 • edited Loading

wsuchy commented Jul 9, 2019

koertkuipers commented Jul 9, 2019

wsuchy commented Jul 12, 2019

koertkuipers commented Aug 8, 2019

tovbinm commented Aug 8, 2019

tovbinm commented Aug 24, 2019

koertkuipers commented Aug 25, 2019 via email

tovbinm commented Aug 25, 2019

Helw150 commented Aug 30, 2019 • edited Loading

tovbinm commented Sep 3, 2019

koertkuipers commented Sep 3, 2019

tovbinm commented Sep 4, 2019 • edited Loading

koertkuipers commented Sep 4, 2019

wsuchy commented May 1, 2020

tovbinm commented May 4, 2020

nicodv commented Sep 11, 2020 • edited Loading

nicodv commented Sep 12, 2020 • edited Loading

tovbinm commented Jun 14, 2019 •

edited

Loading

tovbinm commented Jun 21, 2019 •

edited

Loading

tovbinm commented Jun 23, 2019 •

edited

Loading

tovbinm commented Jun 23, 2019 •

edited

Loading

koertkuipers commented Jul 8, 2019 •

edited

Loading

tovbinm commented Jul 9, 2019 •

edited

Loading

Helw150 commented Aug 30, 2019 •

edited

Loading

tovbinm commented Sep 4, 2019 •

edited

Loading

nicodv commented Sep 11, 2020 •

edited

Loading

nicodv commented Sep 12, 2020 •

edited

Loading