WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Recreate SparkContext if job is cancelled unexpectedly #27

@c-w

Description

@c-w

We hit an out of memory issue due to unstable streaming which resulted in Spark cancelling the job. The driver kept running with a stopped context. We need to recreate the context in the event that it's stopped unexpectedly.


Adding context.

From Kevin:

I believe JC Jimenez may have handled this back when he added the Stream Change listener stuff. JC, does that sound familiar? The behavior I was seeing was that StreamingContext (and also SparkContext?) had auto-stopped, but the driver didn't terminate.

From JC:

Yeah, I think you are correct, the context would close but it wasn’t possible to restart it. I think I ended up opting to exit with non-zero. However, the spark-submit tool may not have been restarting. (Despite having the —supervise arg). I would test it to make sure it works. The supervise option didn’t seem to work in single-node land.


Looks like the --supervise argument not working as @jcjimenez observed may be linked to the fact that the argument requires the cluster to be run in spark-standalone mode via --deploy-mode cluster which is incompatible with --master local[*] which is used in single-node land. Source: StackOverflow + Spark Docs

image

Note that --supervise and --deploy-mode cluster are already being set for Fortis in production by install-spark.sh so we should be good here.


Copied from CatalystCode/project-fortis-spark#98

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions