Solucionado (ver solução)
Solucionado
(ver solução)
1
resposta

[Bug] problema erro JAVA_GATEWAY_EXITED

Estou tentando iniciar uma SparkSession no VS Code usando PySpark, mas sempre recebo o erro “Py4JError: JAVA_GATEWAY_EXITED – Java gateway process exited before sending its port number”. Já conferi minha instalação: tenho o Java 17 instalado (Spark recomenda Java 17 ou 21), configurei o JAVA_HOME corretamente, tenho Hadoop instalado e no PATH, e estou usando Python 33.13. Tudo está configurado no sistema.

Mesmo assim, quando faço a criação da SparkSession no VS Code, o erro aparece. Já reinstalei Java, revisei todas as variáveis de ambiente (JAVA_HOME, PATH, Hadoop), mas o Java Gateway simplesmente não inicia quando executo pelo VS Code. Preciso de ajuda para entender por que o Spark não está conseguindo iniciar o Java Gateway mesmo com todas as versões corretas instaladas.

meu código:

from pyspark.sql import SparkSession

spark = SparkSession.builder
.appName("Teste")
.master("local[*]")
.getOrCreate()

print("SparkSession criada com sucesso!")

erro que aparece:

File c:\Users\Pichau\AppData\Local\Programs\Python\Python313\Lib\site-packages\pyspark\core\context.py:523, in SparkContext.getOrCreate(cls, conf)
521 with SparkContext._lock:
522 if SparkContext._active_spark_context is None:
--> 523 SparkContext(conf=conf or SparkConf())
524 assert SparkContext._active_spark_context is not None
525 return SparkContext._active_spark_context
...
112 errorClass="JAVA_GATEWAY_EXITED",
113 messageParameters={},
114 )

1 resposta
solução!

DEPOIS DE MAIS DE 4HORAS CONSEGUIII!!!!
por favor peço que se puderem, ADICIONEM ISSO AO CURSO, porque não tem video no youtube explicando nem no reddit, nem twitter e nem o chat gpt consegue resolver, tive que juntar tudo isso mais o copilot pra resolver. Em resumo pelo que eu tive que fazer foi:
Baixar a versão 3.11 do python, spark n funciona com 3.13
Baixar o java jdk 17, já q n funciona o 25.
Depois adicionei tudo correto ao Path e a JAVA_HOME. (tbm cheguei a criar spark_home, hadoop, pelos videos que tinha visto mas acho que no final n serviu pra nada, só serviria pra rodar em cmd pelo que entendi)
Depois disso tive que fazer um import os, import sys
os.environ["PYSPARK_PYTHON"] = sys.executable
os.environ["PYSPARK_DRIVER_PYTHON"] = sys.executable
Isso pra forçar o pyspark a usar o python 11, recomendo criar venv tbm caso tenha mais versoes de python instalado.

Como não sou profissional pedi para o copilot criar um resumo pra mim, afinal eu n sei tanto sobre os e sys.
SEGUE O RESUMO DELE PASSO A PASSO:
Short answer first — what I changed, and then an exact, copy-paste checklist you can use to create another project + venv that runs PySpark reliably.

What I changed earlier (summary)

  • Added a diagnostics cell to teste.ipynb that:
    • Prints Python version, executable, platform and CWD.
    • Verifies java -version, JAVA_HOME, and PySpark env vars.
    • Sets PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON to the current interpreter if unset.
    • Uses a per-project Ivy dir and safer Spark builder options (local[1], spark.driver.host=127.0.0.1).
  • Verified SparkSession created successfully under Python 3.11 and Java 17.

If you want to reproduce this workflow for a new project, follow these steps in PowerShell.

Step-by-step (copy/paste in PowerShell)

  • Create project folder and enter it:

mkdir C:\path\to\NewProject
cd C:\path\to\NewProject

  • Create a venv (use the Python 3.11 executable if you have multiple Pythons):

If python is the 3.11 you want:

python -m venv .venv

If you have multiple Python installations, use the explicit path:

"C:\Path\To\Python311\python.exe" -m venv .venv

  • Activate the venv:

.\.venv\Scripts\Activate.ps1

  • Upgrade pip and install required packages:

pip install --upgrade pip
pip install pyspark notebook ipykernel

  • Register the venv as a Jupyter kernel (so notebooks can use it):

python -m ipykernel install --user --name newproject-py311 --display-name "Python (newproject-py311)"

  • (Optional) Save dependencies:

pip freeze > requirements.txt

  • Ensure Java (JDK) is installed and JAVA_HOME is set (replace path with your JDK):

java -version

Set JAVA_HOME for current user (replace with your JDK path)

[Environment]::SetEnvironmentVariable('JAVA_HOME', 'C:\Program Files\Eclipse Adoptium\jdk-17.0.17.10-hotspot', 'User')

Verify

java -version

  • Start Jupyter and select the new kernel:

jupyter notebook

In the browser: New / select "Python (newproject-py311)"

Minimal diagnostic cell to add at the top of a notebook
(put this in the first code cell; it helps catch mismatched kernels)

import os, sys, shutil, subprocess, tempfile
print("Python:", sys.version)
print("Executable:", sys.executable)
print("CWD:", os.getcwd())
print("java:", shutil.which("java"))
try:
p = subprocess.run(["java","-version"], capture_output=True, text=True)
print("java -version:", p.stderr.strip())
except Exception as e:
print("java -version failed:", e)
print("JAVA_HOME:", os.environ.get("JAVA_HOME"))

make sure pyspark uses this interpreter

os.environ.setdefault("PYSPARK_PYTHON", sys.executable)
os.environ.setdefault("PYSPARK_DRIVER_PYTHON", sys.executable)

Example safe Spark startup (put after diagnostics)

from pyspark.sql import SparkSession
import tempfile, os
ivy_dir = os.path.join(tempfile.gettempdir(), "ivy")
os.makedirs(ivy_dir, exist_ok=True)

spark = SparkSession.builder
.appName("MyProject")
.master("local[1]")
.config("spark.driver.host", "127.0.0.1")
.config("spark.jars.ivy", ivy_dir)
.config("spark.sql.shuffle.partitions", "1")
.getOrCreate()

print("Spark version:", spark.version)

Quick checklist / tips

  • Always ensure the Jupyter kernel uses the same Python as the venv (sys.executable should be inside .venv).
  • Prefer Python 3.10/3.11 for PySpark (you already used 3.11 and it worked).
  • Keep a small diagnostics cell at the top of new notebooks — it quickly shows mismatches.
  • Start with local[1] to verify things work, then switch to local[*] if you need more cores.
  • If you frequently create projects, copy a small template containing:
    • .venv creation + requirements.txt
    • diagnostics cell
    • spark startup cell
  • If you use Git, add .venv to .gitignore and keep requirements.txt under source control.