Ao executar:
data = [('Zeca', '35'), ('Eva', '29')]
colNames = ['Nome', 'Idade']
df = spark.createDataFrame(data, colNames)
df.show()
O seguinte erro é exibido:
Traceback (most recent call last):
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/serializers.py", line 437, in dumps
return cloudpickle.dumps(obj, pickle_protocol)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/cloudpickle/cloudpickle_fast.py", line 72, in dumps
cp.dump(obj)
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/cloudpickle/cloudpickle_fast.py", line 540, in dump
return Pickler.dump(self, obj)
^^^^^^^^^^^^^^^^^^^^^^^
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/cloudpickle/cloudpickle_fast.py", line 630, in reducer_override
return self._function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/cloudpickle/cloudpickle_fast.py", line 503, in _function_reduce
return self._dynamic_function_reduce(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/cloudpickle/cloudpickle_fast.py", line 484, in _dynamic_function_reduce
state = _function_getstate(func)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/cloudpickle/cloudpickle_fast.py", line 156, in _function_getstate
f_globals_ref = _extract_code_globals(func.__code__)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/cloudpickle/cloudpickle.py", line 236, in _extract_code_globals
out_names = {names[oparg] for _, oparg in _walk_global_ops(co)}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/cloudpickle/cloudpickle.py", line 236, in <setcomp>
out_names = {names[oparg] for _, oparg in _walk_global_ops(co)}
~~~~~^^^^^^^
IndexError: tuple index out of range
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/serializers.py in dumps(self, obj)
436 try:
--> 437 return cloudpickle.dumps(obj, pickle_protocol)
438 except pickle.PickleError:
15 frames
IndexError: tuple index out of range
During handling of the above exception, another exception occurred:
PicklingError Traceback (most recent call last)
/content/spark-3.1.2-bin-hadoop2.7/python/pyspark/serializers.py in dumps(self, obj)
445 msg = "Could not serialize object: %s: %s" % (e.__class__.__name__, emsg)
446 print_exec(sys.stderr)
--> 447 raise pickle.PicklingError(msg)
448
449
PicklingError: Could not serialize object: IndexError: tuple index out of range
Meu notebook contém os seguintes códigos; !pip install pyspark==3.3.1
instalar as dependências
!apt-get update -qq !apt-get install openjdk-8-jdk-headless -qq > /dev/null !wget -q https://archive.apache.org/dist/spark/spark-3.1.2/spark-3.1.2-bin-hadoop2.7.tgz !tar xf spark-3.1.2-bin-hadoop2.7.tgz !pip install -q findspark
import os os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64" os.environ["SPARK_HOME"] = "/content/spark-3.1.2-bin-hadoop2.7"
import findspark findspark.init() from pyspark.sql import SparkSession
spark = SparkSession.builder .master('local[*]') .appName("Iniciando com Spark") .config('spark.ui.port', '4050') .getOrCreate()
!wget -q https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip !unzip ngrok-stable-linux-amd64.zip
get_ipython().system_raw('./ngrok authtoken meutokenaqui') get_ipython().system_raw('./ngrok http 4050 &') spark
(daqui pra baixo precisei alterar pois a versão do curso não estava funcionando) !pip install pyngrok from pyngrok import ngrok !ngrok authtoken "meutoken" ngrok.connect(4050)
e por fim:
data = [('Zeca', '35'), ('Eva', '29')] colNames = ['Nome', 'Idade'] df = spark.createDataFrame(data, colNames) df.show()