pyspark.sql.functions.shuffle#

pyspark.sql.functions.shuffle(col, seed=None)[source]#

Array function: Generates a random permutation of the given array.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

The name of the column or expression to be shuffled.

seedColumn or int, optional

Seed value for the random generator.

New in version 4.0.0.

Returns
Column

A new column that contains an array of elements in random order.

Notes

The shuffle function is non-deterministic, meaning the order of the output array can be different for each execution.

Examples

Example 1: Shuffling a simple array

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT ARRAY(1, 20, 3, 5) AS data")
>>> df.select("*", sf.shuffle(df.data, sf.lit(123))).show()
+-------------+-------------+
|         data|shuffle(data)|
+-------------+-------------+
|[1, 20, 3, 5]|[5, 1, 20, 3]|
+-------------+-------------+

Example 2: Shuffling an array with null values

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT ARRAY(1, 20, NULL, 5) AS data")
>>> df.select("*", sf.shuffle(sf.col("data"), 234)).show()
+----------------+----------------+
|            data|   shuffle(data)|
+----------------+----------------+
|[1, 20, NULL, 5]|[NULL, 5, 20, 1]|
+----------------+----------------+

Example 3: Shuffling an array with duplicate values

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT ARRAY(1, 2, 2, 3, 3, 3) AS data")
>>> df.select("*", sf.shuffle("data", 345)).show()
+------------------+------------------+
|              data|     shuffle(data)|
+------------------+------------------+
|[1, 2, 2, 3, 3, 3]|[2, 3, 3, 1, 2, 3]|
+------------------+------------------+

Example 4: Shuffling an array with random seed

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT ARRAY(1, 2, 2, 3, 3, 3) AS data")
>>> df.select("*", sf.shuffle("data")).show() 
+------------------+------------------+
|              data|     shuffle(data)|
+------------------+------------------+
|[1, 2, 2, 3, 3, 3]|[3, 3, 2, 3, 2, 1]|
+------------------+------------------+