Skip to content

Instantly share code, notes, and snippets.

@hubte1g
Forked from JoshRosen/README.md
Last active August 29, 2015 14:08

Revisions

  1. @JoshRosen JoshRosen revised this gist Nov 5, 2014. 3 changed files with 245 additions and 9 deletions.
    238 changes: 237 additions & 1 deletion README.md
    Original file line number Diff line number Diff line change
    @@ -1 +1,237 @@
    [TODO]
    # Confusing Scala serialization puzzle

    This writeup describes a tricky Scala serialization issue that we ran into while porting Spark to Scala 2.11.

    First, let's create a factory that builds anonymous functions:

    ```scala
    class FunctionFactory extends Serializable {
    // This method returns a function whose $outer scope is this class.
    // Note that the function does not reference any variables in this scope,
    // so the generated class won't contain any fields.
    def createFunc = (a: Int) => a
    }
    ```

    Let's also create a class that holds functions:

    ```scala
    class FunctionHolder(val func: Int => Int) extends Serializable { }
    ```

    Finally, let's put these pieces together by using the factory to create a function, putting that function into our function holder, then serializing the function holder using Java serialization:

    ```scala
    val functionFactory = new FunctionFactory()
    val functionHolder = new FunctionHolder(functionFactory.createFunc)
    val fs = new FileOutputStream("test.bin")
    val os = new ObjectOutputStream(fs)
    os.writeObject(functionHolder)
    os.close()
    fs.close()
    ```

    Let's read it back:

    ```scala
    val fs = new FileInputStream("test.bin")
    val os = new ObjectInputStream(fs)
    val functionHolder = os.readObject().asInstanceOf[FunctionHolder]
    ```

    So far, so good. Here's where things break: imagine that I want to deserialize my function in a remote JVM whose class path doesn't contain `FunctionFactory`. In Scala 2.10.4, I'm able to deserialize `test.bin` even after deleting `FunctionFactory.class`, whereas in Scala 2.11.2 this results in `ClassNotFoundException`.

    To be more concrete, let's run an actual example and compare the results on 2.10.4 and 2.11.2. In addition to this README, this gist contains a `Test.scala` file and `test.sh` script for running this. The test script is parameterized by different Scala versions and will use `sbt` to download the specified version, so we can test this with any Scala release.

    ### With 2.10.4

    ```
    Testing with Scala 2.10.4
    + rm test.bin
    + sbt '++ 2.10.4' clean compile
    [info] Loading global plugins from /Users/joshrosen/.dotfiles/.sbt/0.13/plugins
    [info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
    [info] Setting version to 2.10.4
    [info] Reapplying settings...
    [info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
    [success] Total time: 0 s, completed Nov 4, 2014 11:00:12 PM
    [info] Updating {file:/Users/joshrosen/Documents/gists/211/}root-211...
    [info] Resolving org.fusesource.jansi#jansi;1.4 ...
    [info] Done updating.
    [info] Compiling 1 Scala source to /Users/joshrosen/Documents/gists/211/target/scala-2.10/classes...
    [success] Total time: 2 s, completed Nov 4, 2014 11:00:15 PM
    + scala -cp target/scala-2.10/classes/ Main
    Serializing object to test.bin
    + scala -cp target/scala-2.10/classes/ Main read
    Deserializing object from test.bin
    + rm target/scala-2.10/classes/Functionfactory.class
    + scala -cp target/scala-2.10/classes/ Main read
    Deserializing object from test.bin
    ```

    ### With 2.11.2

    ```
    Testing with Scala 2.11.2
    + rm test.bin
    + sbt '++ 2.11.2' clean compile
    [info] Loading global plugins from /Users/joshrosen/.dotfiles/.sbt/0.13/plugins
    [info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
    [info] Setting version to 2.11.2
    [info] Reapplying settings...
    [info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/)
    [success] Total time: 0 s, completed Nov 4, 2014 11:00:54 PM
    [info] Updating {file:/Users/joshrosen/Documents/gists/211/}root-211...
    [info] Resolving jline#jline;2.12 ...
    [info] Done updating.
    [info] Compiling 1 Scala source to /Users/joshrosen/Documents/gists/211/target/scala-2.11/classes...
    [success] Total time: 3 s, completed Nov 4, 2014 11:00:57 PM
    + scala -cp target/scala-2.11/classes/ Main
    Serializing object to test.bin
    + scala -cp target/scala-2.11/classes/ Main read
    Deserializing object from test.bin
    + rm target/scala-2.11/classes/Functionfactory.class
    + scala -cp target/scala-2.11/classes/ Main read
    Deserializing object from test.bin
    java.lang.ClassNotFoundException: FunctionFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
    at java.lang.Class.getDeclaredConstructors(Class.java:1901)
    at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749)
    at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72)
    at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250)
    at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247)
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613)
    at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
    at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
    at Main$.main(Test.scala:38)
    at Main.main(Test.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:68)
    at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
    at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:99)
    at scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:68)
    at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:99)
    at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:22)
    at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:39)
    at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:29)
    at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:39)
    at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:65)
    at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87)
    at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)
    ```

    ## Understanding the exception

    Looking only at the `ClassNotFoundException`, your first instinct might be to see whether `FunctionFactory` objects are being included in the serialized data.

    If we disassemble the anonymous function, though, we see that it doesn't contain any fields. Here's the class generated by Scala 2.11.2:

    ```
    javap -p target/scala-2.11/classes/FunctionFactory\$\$anonfun\$createFunc\$1.class
    Compiled from "Test.scala"
    public final class FunctionFactory$$anonfun$createFunc$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
    public final int apply(int);
    public int apply$mcII$sp(int);
    public final java.lang.Object apply(java.lang.Object);
    public FunctionFactory$$anonfun$createFunc$1(FunctionFactory);
    }
    ```

    However, its constructor does reference `FunctionFactory`. This constructor isn't actually called during deserialization, though; it's only called from `createFunc` when creating the anonymous function.

    Let's compare this to the class generated by Scala 2.10.4:

    ```
    javap -p target/scala-2.10/classes/FunctionFactory\$\$anonfun\$createFunc\$1.class
    Compiled from "Test.scala"
    public final class FunctionFactory$$anonfun$createFunc$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable {
    public static final long serialVersionUID;
    public final int apply(int);
    public int apply$mcII$sp(int);
    public final java.lang.Object apply(java.lang.Object);
    public FunctionFactory$$anonfun$createFunc$1(FunctionFactory);
    }
    ```

    The key difference is that Scala 2.10.4 defined a `serialVersionUID`, whereas 2.11.2 did not.

    Since our class does not define a `serialVersionUID`, Java [attempts to calculate one](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/io/ObjectStreamClass.java#615) when deserializing our object in order to determine whether the serialized representation is compatible with its version of the function class. When calculating the default SUID, Java [reflectively inspects the class's constructors](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/io/ObjectStreamClass.java#1749). In our case, the anonymous function's constructor has a parameter `$outer` of type `FunctionFactory`, which triggers class loading of the missing `FunctionFactory` class. We can see this from the error stack trace:

    ```
    java.lang.ClassNotFoundException: FunctionFactory
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.getDeclaredConstructors0(Native Method)
    at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532)
    at java.lang.Class.getDeclaredConstructors(Class.java:1901)
    at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749)
    at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72)
    at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250)
    at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247)
    at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613)
    [...]
    ```

    ## Finding where the Scala behavior changed

    Our fancy testing script can help to find the specific Scala 2.11 release or milestone that changed this behavior. The test passes on `2.10.0` buts fails on `2.11.1`. There were only a [few changes](https://github.com/scala/scala/compare/v2.11.0...v2.11.1) between those releases, including three commits related to `serialVersionUID`:

    - [SI-8549 Serialization: fix regression with @SerialVersionUID / start enforcing backwards compatibility](https://github.com/scala/scala/pull/3711)
    - [SI-8574 Copy @SerialVersionUID, etc, to specialized subclasses](https://github.com/scala/scala/pull/3738)
    - [Avoid ClassfileAnnotation warning for @SerialVersionUID](https://github.com/scala/scala/pull/4028)

    My hunch is that the problem was introduced by one of these commits, but I'm unfamiliar with the Scala compiler internals and could be totally wrong.


    ## Appendix: the original Spark reproduction of this issue

    Here's a rough outline of the original Spark reproduction of this issue:

    ```scala
    package org.apache.spark

    import org.scalatest.FunSuite

    import org.apache.spark.{SparkConf, SparkContext}

    class ReproSuite extends FunSuite {
    test("Reproducing Scala 2.11 failure") {
    val sc = new SparkContext("local-cluster[%d, 1, 512]".format(2),
    "test", new SparkConf())
    sc.parallelize(1 to 10, 10).map(x).collect()
    }

    // Simply moving this to the same scope as the test fixes it
    def x = (k: Int) => k
    }
    ```

    In this test, the `map(x)` function is executed in a separate JVM (launched by Spark's `local-cluster` mode) that contains our test classes (such as `ReproSuite.class`) but not our third-party test dependencies (such as ScalaTest's `FunSuite` class).

    14 changes: 7 additions & 7 deletions Test.scala
    Original file line number Diff line number Diff line change
    @@ -4,28 +4,28 @@ import java.io.FileOutputStream
    import java.io.FileInputStream


    class OuterScopeThatDefinesFunction extends Serializable {
    class FunctionFactory extends Serializable {
    // This method returns a function whose $outer scope is this class.
    // Note that the function does not reference any variables in this scope,
    // so the generated class won't contain any fields.
    def func = (a: Int) => a
    def createFunc = (a: Int) => a
    }

    // This class is a simple wrapper that holds a function:
    class UsesFunction(val func: Int => Int) extends Serializable { }
    class FunctionHolder(val func: Int => Int) extends Serializable { }

    object Main {
    def main(args: Array[String]) {
    if (args.size == 0) {
    println("Serializing object to test.bin")
    // Define a function in an outer scope
    val functionDefinitionScope = new OuterScopeThatDefinesFunction()
    val functionFactory = new FunctionFactory()
    // Put that function into an object
    val c = new UsesFunction(functionDefinitionScope.func)
    val functionHolder = new FunctionHolder(functionFactory.createFunc)
    // Serialize that object
    val fs = new FileOutputStream("test.bin")
    val os = new ObjectOutputStream(fs)
    os.writeObject(c)
    os.writeObject(functionHolder)
    os.close()
    fs.close()
    }
    @@ -35,7 +35,7 @@ object Main {
    // Read the object serialized in the previous step
    val fs = new FileInputStream("test.bin")
    val os = new ObjectInputStream(fs)
    val c = os.readObject().asInstanceOf[UsesFunction]
    val functionHolder = os.readObject().asInstanceOf[FunctionHolder]
    }
    }
    }
    2 changes: 1 addition & 1 deletion test.sh
    Original file line number Diff line number Diff line change
    @@ -14,5 +14,5 @@ rm test.bin
    sbt "++ $SCALA_VERSION" clean compile
    scala -cp target/scala-*/classes/ Main
    scala -cp target/scala-*/classes/ Main read
    rm target/scala-*/classes/OuterScopeThatDefinesFunction.class
    rm target/scala-*/classes/Functionfactory.class
    scala -cp target/scala-*/classes/ Main read
  2. @JoshRosen JoshRosen revised this gist Nov 5, 2014. 2 changed files with 21 additions and 15 deletions.
    28 changes: 17 additions & 11 deletions Test.scala
    Original file line number Diff line number Diff line change
    @@ -1,23 +1,28 @@
    import java.io.ObjectOutputStream
    import java.io.ObjectOutputStream
    import java.io.ObjectInputStream
    import java.io.FileOutputStream
    import java.io.FileInputStream

    class A {}

    class B extends A with Serializable {
    def f = (a: Int) => a
    class OuterScopeThatDefinesFunction extends Serializable {
    // This method returns a function whose $outer scope is this class.
    // Note that the function does not reference any variables in this scope,
    // so the generated class won't contain any fields.
    def func = (a: Int) => a
    }

    class C(val func: Int => Int) extends Serializable { }
    // This class is a simple wrapper that holds a function:
    class UsesFunction(val func: Int => Int) extends Serializable { }

    object D {
    object Main {
    def main(args: Array[String]) {
    if (args.size == 0) {
    println("Writing object to test.bin")
    val b = new B()
    val c = new C(b.f)
    println("Serializing object to test.bin")
    // Define a function in an outer scope
    val functionDefinitionScope = new OuterScopeThatDefinesFunction()
    // Put that function into an object
    val c = new UsesFunction(functionDefinitionScope.func)
    // Serialize that object
    val fs = new FileOutputStream("test.bin")
    val os = new ObjectOutputStream(fs)
    os.writeObject(c)
    @@ -26,10 +31,11 @@ object D {
    }

    else {
    println("Reading object from test.bin")
    println("Deserializing object from test.bin")
    // Read the object serialized in the previous step
    val fs = new FileInputStream("test.bin")
    val os = new ObjectInputStream(fs)
    val c = os.readObject().asInstanceOf[C]
    val c = os.readObject().asInstanceOf[UsesFunction]
    }
    }
    }
    8 changes: 4 additions & 4 deletions test.sh
    100644 → 100755
    Original file line number Diff line number Diff line change
    @@ -12,7 +12,7 @@ set -x

    rm test.bin
    sbt "++ $SCALA_VERSION" clean compile
    scala -cp target/scala-*/classes/ D
    scala -cp target/scala-*/classes/ D read
    rm target/scala-*/classes/A.class
    scala -cp target/scala-*/classes/ D read
    scala -cp target/scala-*/classes/ Main
    scala -cp target/scala-*/classes/ Main read
    rm target/scala-*/classes/OuterScopeThatDefinesFunction.class
    scala -cp target/scala-*/classes/ Main read
  3. @JoshRosen JoshRosen created this gist Nov 5, 2014.
    1 change: 1 addition & 0 deletions README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1 @@
    [TODO]
    35 changes: 35 additions & 0 deletions Test.scala
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,35 @@
    import java.io.ObjectOutputStream
    import java.io.ObjectOutputStream
    import java.io.ObjectInputStream
    import java.io.FileOutputStream
    import java.io.FileInputStream

    class A {}

    class B extends A with Serializable {
    def f = (a: Int) => a
    }

    class C(val func: Int => Int) extends Serializable { }

    object D {
    def main(args: Array[String]) {
    if (args.size == 0) {
    println("Writing object to test.bin")
    val b = new B()
    val c = new C(b.f)
    val fs = new FileOutputStream("test.bin")
    val os = new ObjectOutputStream(fs)
    os.writeObject(c)
    os.close()
    fs.close()
    }

    else {
    println("Reading object from test.bin")
    val fs = new FileInputStream("test.bin")
    val os = new ObjectInputStream(fs)
    val c = os.readObject().asInstanceOf[C]
    }
    }
    }
    18 changes: 18 additions & 0 deletions test.sh
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,18 @@
    #!/usr/bin/env bash

    if [ -z $1 ]; then
    echo "Usage: $0 scalaVersion"
    exit -1
    fi

    SCALA_VERSION=$1
    echo "Testing with Scala $1"

    set -x

    rm test.bin
    sbt "++ $SCALA_VERSION" clean compile
    scala -cp target/scala-*/classes/ D
    scala -cp target/scala-*/classes/ D read
    rm target/scala-*/classes/A.class
    scala -cp target/scala-*/classes/ D read