Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[3-spark-fundamentals] Tests failing to initialise a Spark session #251

Open
alexgawrilow opened this issue Jan 15, 2025 · 1 comment
Open

Comments

@alexgawrilow
Copy link

alexgawrilow commented Jan 15, 2025

When running pytest or python -m pytest, I get the following error:

pyspark.errors.exceptions.base.PySparkRuntimeError: [JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number.

For the spark part I am using the setup with docker where I run docker compose up. For the python part I created a virtual environment and installed the requirements there.
I also tried to set the PYSPARK_SUBMIT_ARGS environment variable as suggested here.
Am I missing something in the setup with running spark through docker and having a local virtual python environment?\

The first part with running Spark in a Jupyter Notebook works fine.

@peterbonnesoeur
Copy link

This error typically occurs when the PySpark runtime cannot establish communication with the Java Spark backend.

For this exercise, spark needs to be set up locally (as said in the note in bootcamp/materials/3-spark-fundamentals/README.md)

For this, you first need to install java on your machine (which is the part that is likely failing) and then reinstall pyspark (that is in the requirements.txt)

Let me know if that helped

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants