|
|
@@ -1 +1,237 @@ |
|
|
[TODO] |
|
|
# Confusing Scala serialization puzzle |
|
|
|
|
|
This writeup describes a tricky Scala serialization issue that we ran into while porting Spark to Scala 2.11. |
|
|
|
|
|
First, let's create a factory that builds anonymous functions: |
|
|
|
|
|
```scala |
|
|
class FunctionFactory extends Serializable { |
|
|
// This method returns a function whose $outer scope is this class. |
|
|
// Note that the function does not reference any variables in this scope, |
|
|
// so the generated class won't contain any fields. |
|
|
def createFunc = (a: Int) => a |
|
|
} |
|
|
``` |
|
|
|
|
|
Let's also create a class that holds functions: |
|
|
|
|
|
```scala |
|
|
class FunctionHolder(val func: Int => Int) extends Serializable { } |
|
|
``` |
|
|
|
|
|
Finally, let's put these pieces together by using the factory to create a function, putting that function into our function holder, then serializing the function holder using Java serialization: |
|
|
|
|
|
```scala |
|
|
val functionFactory = new FunctionFactory() |
|
|
val functionHolder = new FunctionHolder(functionFactory.createFunc) |
|
|
val fs = new FileOutputStream("test.bin") |
|
|
val os = new ObjectOutputStream(fs) |
|
|
os.writeObject(functionHolder) |
|
|
os.close() |
|
|
fs.close() |
|
|
``` |
|
|
|
|
|
Let's read it back: |
|
|
|
|
|
```scala |
|
|
val fs = new FileInputStream("test.bin") |
|
|
val os = new ObjectInputStream(fs) |
|
|
val functionHolder = os.readObject().asInstanceOf[FunctionHolder] |
|
|
``` |
|
|
|
|
|
So far, so good. Here's where things break: imagine that I want to deserialize my function in a remote JVM whose class path doesn't contain `FunctionFactory`. In Scala 2.10.4, I'm able to deserialize `test.bin` even after deleting `FunctionFactory.class`, whereas in Scala 2.11.2 this results in `ClassNotFoundException`. |
|
|
|
|
|
To be more concrete, let's run an actual example and compare the results on 2.10.4 and 2.11.2. In addition to this README, this gist contains a `Test.scala` file and `test.sh` script for running this. The test script is parameterized by different Scala versions and will use `sbt` to download the specified version, so we can test this with any Scala release. |
|
|
|
|
|
### With 2.10.4 |
|
|
|
|
|
``` |
|
|
Testing with Scala 2.10.4 |
|
|
+ rm test.bin |
|
|
+ sbt '++ 2.10.4' clean compile |
|
|
[info] Loading global plugins from /Users/joshrosen/.dotfiles/.sbt/0.13/plugins |
|
|
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/) |
|
|
[info] Setting version to 2.10.4 |
|
|
[info] Reapplying settings... |
|
|
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/) |
|
|
[success] Total time: 0 s, completed Nov 4, 2014 11:00:12 PM |
|
|
[info] Updating {file:/Users/joshrosen/Documents/gists/211/}root-211... |
|
|
[info] Resolving org.fusesource.jansi#jansi;1.4 ... |
|
|
[info] Done updating. |
|
|
[info] Compiling 1 Scala source to /Users/joshrosen/Documents/gists/211/target/scala-2.10/classes... |
|
|
[success] Total time: 2 s, completed Nov 4, 2014 11:00:15 PM |
|
|
+ scala -cp target/scala-2.10/classes/ Main |
|
|
Serializing object to test.bin |
|
|
+ scala -cp target/scala-2.10/classes/ Main read |
|
|
Deserializing object from test.bin |
|
|
+ rm target/scala-2.10/classes/Functionfactory.class |
|
|
+ scala -cp target/scala-2.10/classes/ Main read |
|
|
Deserializing object from test.bin |
|
|
``` |
|
|
|
|
|
### With 2.11.2 |
|
|
|
|
|
``` |
|
|
Testing with Scala 2.11.2 |
|
|
+ rm test.bin |
|
|
+ sbt '++ 2.11.2' clean compile |
|
|
[info] Loading global plugins from /Users/joshrosen/.dotfiles/.sbt/0.13/plugins |
|
|
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/) |
|
|
[info] Setting version to 2.11.2 |
|
|
[info] Reapplying settings... |
|
|
[info] Set current project to root-211 (in build file:/Users/joshrosen/Documents/gists/211/) |
|
|
[success] Total time: 0 s, completed Nov 4, 2014 11:00:54 PM |
|
|
[info] Updating {file:/Users/joshrosen/Documents/gists/211/}root-211... |
|
|
[info] Resolving jline#jline;2.12 ... |
|
|
[info] Done updating. |
|
|
[info] Compiling 1 Scala source to /Users/joshrosen/Documents/gists/211/target/scala-2.11/classes... |
|
|
[success] Total time: 3 s, completed Nov 4, 2014 11:00:57 PM |
|
|
+ scala -cp target/scala-2.11/classes/ Main |
|
|
Serializing object to test.bin |
|
|
+ scala -cp target/scala-2.11/classes/ Main read |
|
|
Deserializing object from test.bin |
|
|
+ rm target/scala-2.11/classes/Functionfactory.class |
|
|
+ scala -cp target/scala-2.11/classes/ Main read |
|
|
Deserializing object from test.bin |
|
|
java.lang.ClassNotFoundException: FunctionFactory |
|
|
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) |
|
|
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) |
|
|
at java.security.AccessController.doPrivileged(Native Method) |
|
|
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) |
|
|
at java.lang.ClassLoader.loadClass(ClassLoader.java:425) |
|
|
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) |
|
|
at java.lang.Class.getDeclaredConstructors0(Native Method) |
|
|
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532) |
|
|
at java.lang.Class.getDeclaredConstructors(Class.java:1901) |
|
|
at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749) |
|
|
at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72) |
|
|
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250) |
|
|
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248) |
|
|
at java.security.AccessController.doPrivileged(Native Method) |
|
|
at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247) |
|
|
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613) |
|
|
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622) |
|
|
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517) |
|
|
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771) |
|
|
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) |
|
|
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) |
|
|
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) |
|
|
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) |
|
|
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) |
|
|
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) |
|
|
at Main$.main(Test.scala:38) |
|
|
at Main.main(Test.scala) |
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) |
|
|
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) |
|
|
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) |
|
|
at java.lang.reflect.Method.invoke(Method.java:606) |
|
|
at scala.reflect.internal.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:68) |
|
|
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) |
|
|
at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:99) |
|
|
at scala.reflect.internal.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:68) |
|
|
at scala.reflect.internal.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:99) |
|
|
at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:22) |
|
|
at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:39) |
|
|
at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:29) |
|
|
at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:39) |
|
|
at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:65) |
|
|
at scala.tools.nsc.MainGenericRunner.run$1(MainGenericRunner.scala:87) |
|
|
at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:98) |
|
|
at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:103) |
|
|
at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala) |
|
|
``` |
|
|
|
|
|
## Understanding the exception |
|
|
|
|
|
Looking only at the `ClassNotFoundException`, your first instinct might be to see whether `FunctionFactory` objects are being included in the serialized data. |
|
|
|
|
|
If we disassemble the anonymous function, though, we see that it doesn't contain any fields. Here's the class generated by Scala 2.11.2: |
|
|
|
|
|
``` |
|
|
javap -p target/scala-2.11/classes/FunctionFactory\$\$anonfun\$createFunc\$1.class |
|
|
Compiled from "Test.scala" |
|
|
public final class FunctionFactory$$anonfun$createFunc$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable { |
|
|
public final int apply(int); |
|
|
public int apply$mcII$sp(int); |
|
|
public final java.lang.Object apply(java.lang.Object); |
|
|
public FunctionFactory$$anonfun$createFunc$1(FunctionFactory); |
|
|
} |
|
|
``` |
|
|
|
|
|
However, its constructor does reference `FunctionFactory`. This constructor isn't actually called during deserialization, though; it's only called from `createFunc` when creating the anonymous function. |
|
|
|
|
|
Let's compare this to the class generated by Scala 2.10.4: |
|
|
|
|
|
``` |
|
|
javap -p target/scala-2.10/classes/FunctionFactory\$\$anonfun\$createFunc\$1.class |
|
|
Compiled from "Test.scala" |
|
|
public final class FunctionFactory$$anonfun$createFunc$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable { |
|
|
public static final long serialVersionUID; |
|
|
public final int apply(int); |
|
|
public int apply$mcII$sp(int); |
|
|
public final java.lang.Object apply(java.lang.Object); |
|
|
public FunctionFactory$$anonfun$createFunc$1(FunctionFactory); |
|
|
} |
|
|
``` |
|
|
|
|
|
The key difference is that Scala 2.10.4 defined a `serialVersionUID`, whereas 2.11.2 did not. |
|
|
|
|
|
Since our class does not define a `serialVersionUID`, Java [attempts to calculate one](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/io/ObjectStreamClass.java#615) when deserializing our object in order to determine whether the serialized representation is compatible with its version of the function class. When calculating the default SUID, Java [reflectively inspects the class's constructors](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/io/ObjectStreamClass.java#1749). In our case, the anonymous function's constructor has a parameter `$outer` of type `FunctionFactory`, which triggers class loading of the missing `FunctionFactory` class. We can see this from the error stack trace: |
|
|
|
|
|
``` |
|
|
java.lang.ClassNotFoundException: FunctionFactory |
|
|
at java.net.URLClassLoader$1.run(URLClassLoader.java:366) |
|
|
at java.net.URLClassLoader$1.run(URLClassLoader.java:355) |
|
|
at java.security.AccessController.doPrivileged(Native Method) |
|
|
at java.net.URLClassLoader.findClass(URLClassLoader.java:354) |
|
|
at java.lang.ClassLoader.loadClass(ClassLoader.java:425) |
|
|
at java.lang.ClassLoader.loadClass(ClassLoader.java:358) |
|
|
at java.lang.Class.getDeclaredConstructors0(Native Method) |
|
|
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2532) |
|
|
at java.lang.Class.getDeclaredConstructors(Class.java:1901) |
|
|
at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1749) |
|
|
at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72) |
|
|
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:250) |
|
|
at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:248) |
|
|
at java.security.AccessController.doPrivileged(Native Method) |
|
|
at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:247) |
|
|
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:613) |
|
|
[...] |
|
|
``` |
|
|
|
|
|
## Finding where the Scala behavior changed |
|
|
|
|
|
Our fancy testing script can help to find the specific Scala 2.11 release or milestone that changed this behavior. The test passes on `2.10.0` buts fails on `2.11.1`. There were only a [few changes](https://github.com/scala/scala/compare/v2.11.0...v2.11.1) between those releases, including three commits related to `serialVersionUID`: |
|
|
|
|
|
- [SI-8549 Serialization: fix regression with @SerialVersionUID / start enforcing backwards compatibility](https://github.com/scala/scala/pull/3711) |
|
|
- [SI-8574 Copy @SerialVersionUID, etc, to specialized subclasses](https://github.com/scala/scala/pull/3738) |
|
|
- [Avoid ClassfileAnnotation warning for @SerialVersionUID](https://github.com/scala/scala/pull/4028) |
|
|
|
|
|
My hunch is that the problem was introduced by one of these commits, but I'm unfamiliar with the Scala compiler internals and could be totally wrong. |
|
|
|
|
|
|
|
|
## Appendix: the original Spark reproduction of this issue |
|
|
|
|
|
Here's a rough outline of the original Spark reproduction of this issue: |
|
|
|
|
|
```scala |
|
|
package org.apache.spark |
|
|
|
|
|
import org.scalatest.FunSuite |
|
|
|
|
|
import org.apache.spark.{SparkConf, SparkContext} |
|
|
|
|
|
class ReproSuite extends FunSuite { |
|
|
test("Reproducing Scala 2.11 failure") { |
|
|
val sc = new SparkContext("local-cluster[%d, 1, 512]".format(2), |
|
|
"test", new SparkConf()) |
|
|
sc.parallelize(1 to 10, 10).map(x).collect() |
|
|
} |
|
|
|
|
|
// Simply moving this to the same scope as the test fixes it |
|
|
def x = (k: Int) => k |
|
|
} |
|
|
``` |
|
|
|
|
|
In this test, the `map(x)` function is executed in a separate JVM (launched by Spark's `local-cluster` mode) that contains our test classes (such as `ReproSuite.class`) but not our third-party test dependencies (such as ScalaTest's `FunSuite` class). |
|
|
|