Fix – ‘User application exited with status 1’ in Spark

When a user application halts and ceases its operation, it may signal various underlying issues. In this article, We will discuss “User application exited with status 1” Error in Spark

User application exited with status 1” indicates that a user program has terminated its processes with an “exit code 1”. In Unix-based operating systems, a zero exit code is generally considered a successful completion, while a non-zero exit code is used to indicate a failure

Symptoms

In instances where errors arise during the execution of Spark applications, one might observe:

  • The application status is marked “FAILED”.
  • An abrupt termination with an exit code, such as “1” indicates a form of malfunction.
  • Logs record the cessation of the Spark job and the application’s halt in development.
  • The system shuts down the Spark context, which signifies the end of the job’s physical processes.
  • You will see the below error message in the spark console or Driver log

ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark

Cause

We will see this issue due to a code problem If there is a non-zero exit code upon completion or failure of an external task (like a shell script or impala query).

Simulate the ERROR:

Example Code to replicate the issue

  • Python script leveraging the Spark framework to calculate an approximation of Pi.
  • This script intentionally exits with a status code of 1 to simulate a condition
from __future__ import print_function

import sys
from random import random
from operator import add

from pyspark.sql import SparkSession


if __name__ == "__main__":
    """
        Usage: pi [partitions]
    """
    spark = SparkSession\
        .builder\
        .appName("PythonPi")\
        .getOrCreate()

    partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
    n = 100000 * partitions

    def f(_):
        x = random() * 2 - 1
        y = random() * 2 - 1
        return 1 if x ** 2 + y ** 2 <= 1 else 0

    count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
    print("Pi is roughly %f" % (4.0 * count / n))

    sys.exit(1)

    spark.stop()
  • After computing and outputting the estimated value of Pi, the line sys.exit(1) is called, deliberately resulting in an abnormal termination of the script.
  • For spark application, any non Zero exit code value means application failure
  • The job will fail with the final status ‘User application exited with status 1

Steps to run the sample code:

Save the above code as pi.py file and run the below command

spark-submit --master yarn --deploy-mode cluster pi.py 10
22/11/22 12:10:54 INFO yarn.Client: Application report for application_166943433470_0001 (state: RUNNING)
22/11/22 12:10:55 INFO yarn.Client: Application report for application_166943433470_0001 (state: FINISHED)
22/11/22 12:10:55 INFO yarn.Client: 
	 client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
	 diagnostics: User application exited with status 1
	 ApplicationMaster host: 
	 ApplicationMaster RPC port: 36269
	 queue: root.users.test_spark
	 start time: 1669119000924
	 final status: FAILED
	 tracking URL: https://:8090/proxy/application_166943433470_0001/
	 user: test_spark
22/11/22 12:10:55 ERROR yarn.Client: Application diagnostics message: User application exited with status 1
Exception in thread "main" org.apache.spark.SparkException: Application application_1669092136770_0001 finished with failed status
	at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155)
	at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1603)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/11/22 12:10:55 INFO util.ShutdownHookManager: Shutdown hook called

General Troubleshooting Steps:

  • Inspect the AM Container Logs: Look for messages indicating that the “application exited with status 1”. Such messages usually suggest that the application faced an unexpected issue causing it to terminate.
  • Correlation with Other Logs: Match the timestamp of failure in the AM container log with logs from other containers. This can help identify if there’s a connection between events leading to the breakdown.

Example Scenario

In this situation, imagine that you are executing a SQL command from a Spark application using a JDBC connection. If the SQL command fails, it will return an error code to Spark. Upon receiving the error code, Spark will designate itself as failed with the status “User application exited with status 1.”

As the subsequent attempt of the AM container fails, the application is marked as a failure

In this case, The code is trying to connect with an external service like (Hive, or Impala) to read the data and fails, But not an issue with the spark code itself.

Using this method, We can find which part of the job/code is failing

Application Master Container Log

1st Attempt (container_02_13746384932_1323_01_000001)

ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark@32dwf242f{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
INFO ui.SparkUI: Stopped Spark web UI at

Executor Container log: In the executor container, You will see the exact SQL statement that failed, While triggered from Spark job.

ERROR: Failed select * from table test_spark

Application Master Container Log

2nd Attempt (container_02_13746384932_1323_02_000001)

ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark@d3423423{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
INFO ui.SparkUI: Stopped Spark web UI at 
INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors

Executor Container log:

ERROR: Failed select * from table test_spark

Resolution

In resolving application issues where premature termination occurs, careful inspection of the code for unintended exit signals is essential. The appearance of an exit code 1 in Spark applications signifies a non-successful completion, interpreted as a failure by Spark, warranting further investigation.

Final status: User application exited with status 1

Corrective Actions:

  • Change the code to prevent the return of the exit code upon both successful runs and failures.
  • Proactively monitor for any unexpected behavior or exits during application runtime.
  • Document any anomalies and the steps taken for resolution.

Additional points:

To collect Spark application logs use the below command

yarn logs -applicationId <application ID> -appOwner <AppOwner>

Where

Application ID is the corresponding app ID

AppOwner is the user name, who submitted the job.

Happy Learning!!

Jerry Richard
Follow me

Was this post helpful?

Yes
No
Thanks for your feedback!

Leave a Comment