pyspark.sql.functions.shuffle#

pyspark.sql.functions.shuffle(col, seed=None)[source]#

Array function: Generates a random permutation of the given array.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

colColumn or str: The name of the column or expression to be shuffled.
seedColumn or int, optional: Seed value for the random generator.

New in version 4.0.0.

Returns

Column: A new column that contains an array of elements in random order.

Notes

The shuffle function is non-deterministic, meaning the order of the output array can be different for each execution.

Examples

Example 1: Shuffling a simple array

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT ARRAY(1, 20, 3, 5) AS data")
>>> df.select("*", sf.shuffle(df.data, sf.lit(123))).show()
+-------------+-------------+
|         data|shuffle(data)|
+-------------+-------------+
|[1, 20, 3, 5]|[5, 1, 20, 3]|
+-------------+-------------+

Example 2: Shuffling an array with null values

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT ARRAY(1, 20, NULL, 5) AS data")
>>> df.select("*", sf.shuffle(sf.col("data"), 234)).show()
+----------------+----------------+
|            data|   shuffle(data)|
+----------------+----------------+
|[1, 20, NULL, 5]|[NULL, 5, 20, 1]|
+----------------+----------------+

Example 3: Shuffling an array with duplicate values

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT ARRAY(1, 2, 2, 3, 3, 3) AS data")
>>> df.select("*", sf.shuffle("data", 345)).show()
+------------------+------------------+
|              data|     shuffle(data)|
+------------------+------------------+
|[1, 2, 2, 3, 3, 3]|[2, 3, 3, 1, 2, 3]|
+------------------+------------------+

Example 4: Shuffling an array with random seed

>>> import pyspark.sql.functions as sf
>>> df = spark.sql("SELECT ARRAY(1, 2, 2, 3, 3, 3) AS data")
>>> df.select("*", sf.shuffle("data")).show() 
+------------------+------------------+
|              data|     shuffle(data)|
+------------------+------------------+
|[1, 2, 2, 3, 3, 3]|[3, 3, 2, 3, 2, 1]|
+------------------+------------------+