-
Notifications
You must be signed in to change notification settings - Fork 13
Description
We hit an out of memory issue due to unstable streaming which resulted in Spark cancelling the job. The driver kept running with a stopped context. We need to recreate the context in the event that it's stopped unexpectedly.
Adding context.
From Kevin:
I believe JC Jimenez may have handled this back when he added the Stream Change listener stuff. JC, does that sound familiar? The behavior I was seeing was that StreamingContext (and also SparkContext?) had auto-stopped, but the driver didn't terminate.
From JC:
Yeah, I think you are correct, the context would close but it wasn’t possible to restart it. I think I ended up opting to exit with non-zero. However, the spark-submit tool may not have been restarting. (Despite having the —supervise arg). I would test it to make sure it works. The supervise option didn’t seem to work in single-node land.
Looks like the --supervise argument not working as @jcjimenez observed may be linked to the fact that the argument requires the cluster to be run in spark-standalone mode via --deploy-mode cluster which is incompatible with --master local[*] which is used in single-node land. Source: StackOverflow + Spark Docs
Note that --supervise and --deploy-mode cluster are already being set for Fortis in production by install-spark.sh so we should be good here.
Copied from CatalystCode/project-fortis-spark#98
