Recreate SparkContext if job is cancelled unexpectedly

We hit an out of memory issue due to unstable streaming which resulted in Spark cancelling the job. The driver kept running with a stopped context. We need to recreate the context in the event that it's stopped unexpectedly.

---

Adding context.

From Kevin:

> I believe JC Jimenez may have handled this back when he added the Stream Change listener stuff. JC, does that sound familiar? The behavior I was seeing was that StreamingContext (and also SparkContext?) had auto-stopped, but the driver didn't terminate.

From JC:

> Yeah, I think you are correct, the context would close but it wasn’t possible to restart it. I think I ended up opting to exit with non-zero. However, the spark-submit tool may not have been restarting. (Despite having the —supervise arg). I would test it to make sure it works. The supervise option didn’t seem to work in single-node land.

---

Looks like the `--supervise` argument not working as @jcjimenez observed may be linked to the fact that the argument requires the cluster to be run in spark-standalone mode via `--deploy-mode cluster` which is incompatible with `--master local[*]` which is used in single-node land. Source: [StackOverflow](https://stackoverflow.com/a/38007949/3817588) + [Spark Docs](http://spark.apache.org/docs/latest/streaming-programming-guide.html#failure-of-the-driver-node)

![image](https://user-images.githubusercontent.com/1086421/35009724-7235c6ca-face-11e7-8268-a547b228fc6d.png)

Note that `--supervise` and `--deploy-mode cluster` are already being set for Fortis in production by [install-spark.sh](https://github.com/CatalystCode/project-fortis/blob/7cacc44873a85d542f15f9cc47881837128d1511/project-fortis-pipeline/ops/install-spark.sh#L32) so we should be good here.

---

Copied from https://github.com/CatalystCode/project-fortis-spark/issues/98

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recreate SparkContext if job is cancelled unexpectedly #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recreate SparkContext if job is cancelled unexpectedly #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions