When a user application halts and ceases its operation, it may signal various underlying issues. In this article, We will discuss “User application exited with status 1” Error in Spark
“User application exited with status 1” indicates that a user program has terminated its processes with an “exit code 1”. In Unix-based operating systems, a zero exit code is generally considered a successful completion, while a non-zero exit code is used to indicate a failure
Table of Contents
Symptoms
In instances where errors arise during the execution of Spark applications, one might observe:
- The application status is marked “FAILED”.
- An abrupt termination with an exit code, such as “1” indicates a form of malfunction.
- Logs record the cessation of the Spark job and the application’s halt in development.
- The system shuts down the Spark context, which signifies the end of the job’s physical processes.
- You will see the below error message in the spark console or Driver log
ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark
Cause
We will see this issue due to a code problem If there is a non-zero exit code upon completion or failure of an external task (like a shell script or impala query).
Simulate the ERROR:
Example Code to replicate the issue
- Python script leveraging the Spark framework to calculate an approximation of Pi.
- This script intentionally exits with a status code of
1
to simulate a condition
from __future__ import print_function
import sys
from random import random
from operator import add
from pyspark.sql import SparkSession
if __name__ == "__main__":
"""
Usage: pi [partitions]
"""
spark = SparkSession\
.builder\
.appName("PythonPi")\
.getOrCreate()
partitions = int(sys.argv[1]) if len(sys.argv) > 1 else 2
n = 100000 * partitions
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y ** 2 <= 1 else 0
count = spark.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
print("Pi is roughly %f" % (4.0 * count / n))
sys.exit(1)
spark.stop()
- After computing and outputting the estimated value of Pi, the line
sys.exit(1)
is called, deliberately resulting in an abnormal termination of the script. - For spark application, any non Zero exit code value means application failure
- The job will fail with the final status ‘User application exited with status 1’
Steps to run the sample code:
Save the above code as pi.py file and run the below command
spark-submit --master yarn --deploy-mode cluster pi.py 10
22/11/22 12:10:54 INFO yarn.Client: Application report for application_166943433470_0001 (state: RUNNING)
22/11/22 12:10:55 INFO yarn.Client: Application report for application_166943433470_0001 (state: FINISHED)
22/11/22 12:10:55 INFO yarn.Client:
client token: Token { kind: YARN_CLIENT_TOKEN, service: }
diagnostics: User application exited with status 1
ApplicationMaster host:
ApplicationMaster RPC port: 36269
queue: root.users.test_spark
start time: 1669119000924
final status: FAILED
tracking URL: https://:8090/proxy/application_166943433470_0001/
user: test_spark
22/11/22 12:10:55 ERROR yarn.Client: Application diagnostics message: User application exited with status 1
Exception in thread "main" org.apache.spark.SparkException: Application application_1669092136770_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1603)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:847)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:922)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:931)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/11/22 12:10:55 INFO util.ShutdownHookManager: Shutdown hook called
General Troubleshooting Steps:
- Inspect the AM Container Logs: Look for messages indicating that the “application exited with status 1”. Such messages usually suggest that the application faced an unexpected issue causing it to terminate.
- Correlation with Other Logs: Match the timestamp of failure in the AM container log with logs from other containers. This can help identify if there’s a connection between events leading to the breakdown.
Example Scenario
In this situation, imagine that you are executing a SQL command from a Spark application using a JDBC connection. If the SQL command fails, it will return an error code to Spark. Upon receiving the error code, Spark will designate itself as failed with the status “User application exited with status 1.”
As the subsequent attempt of the AM container fails, the application is marked as a failure
In this case, The code is trying to connect with an external service like (Hive, or Impala) to read the data and fails, But not an issue with the spark code itself.
Using this method, We can find which part of the job/code is failing
Application Master Container Log
1st Attempt (container_02_13746384932_1323_01_000001)
ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark@32dwf242f{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
INFO ui.SparkUI: Stopped Spark web UI at
Executor Container log: In the executor container, You will see the exact SQL statement that failed, While triggered from Spark job.
ERROR: Failed select * from table test_spark
Application Master Container Log
2nd Attempt (container_02_13746384932_1323_02_000001)
ERROR yarn.ApplicationMaster: User application exited with status 1
INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User application exited with status 1)
INFO spark.SparkContext: Invoking stop() from shutdown hook
INFO server.AbstractConnector: Stopped Spark@d3423423{HTTP/1.1, (http/1.1)}{0.0.0.0:0}
INFO ui.SparkUI: Stopped Spark web UI at
INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
Executor Container log:
ERROR: Failed select * from table test_spark
Resolution
In resolving application issues where premature termination occurs, careful inspection of the code for unintended exit signals is essential. The appearance of an exit code 1 in Spark applications signifies a non-successful completion, interpreted as a failure by Spark, warranting further investigation.
Final status: User application exited with status 1
Corrective Actions:
- Change the code to prevent the return of the exit code upon both successful runs and failures.
- Proactively monitor for any unexpected behavior or exits during application runtime.
- Document any anomalies and the steps taken for resolution.
Additional points:
To collect Spark application logs use the below command
yarn logs -applicationId <application ID> -appOwner <AppOwner>
Where
Application ID is the corresponding app ID
AppOwner is the user name, who submitted the job.
Happy Learning!!
- Fix – ‘User application exited with status 1’ in Spark - March 21, 2024
- Unexpected EOF Encountered in BCP Data-File: How to Resolve Import/Export Errors - March 13, 2024
- RESOLVED: ‘Configure: error: unexpected output of ‘arch’ on os’ - March 13, 2024