pyspark.sql.functions.regexp_instr#
- pyspark.sql.functions.regexp_instr(str, regexp, idx=None)[source]#
Returns the position of the first substring in the str that match the Java regex regexp and corresponding to the regex group index.
New in version 3.5.0.
- Parameters
- Returns
Column
the position of the first substring in the str that match a Java regex and corresponding to the regex group index.
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([("1a 2b 14m", r"\d+(a|b|m)")], ["str", "regexp"])
Example 1: Returns the position of the first substring in the str column name that match the regex pattern (d+(a|b|m)) (one or more digits followed by ‘a’, ‘b’, or ‘m’).
>>> df.select('*', sf.regexp_instr('str', sf.lit(r'\d+(a|b|m)'))).show() +---------+----------+--------------------------------+ | str| regexp|regexp_instr(str, \d+(a|b|m), 0)| +---------+----------+--------------------------------+ |1a 2b 14m|\d+(a|b|m)| 1| +---------+----------+--------------------------------+
Example 2: Returns the position of the first substring in the str column name that match the regex pattern (d+(a|b|m)) (one or more digits followed by ‘a’, ‘b’, or ‘m’),
>>> df.select('*', sf.regexp_instr('str', sf.lit(r'\d+(a|b|m)'), sf.lit(1))).show() +---------+----------+--------------------------------+ | str| regexp|regexp_instr(str, \d+(a|b|m), 1)| +---------+----------+--------------------------------+ |1a 2b 14m|\d+(a|b|m)| 1| +---------+----------+--------------------------------+
Example 3: Returns the position of the first substring in the str column name that match the regex pattern in regexp Column.
>>> df.select('*', sf.regexp_instr('str', sf.col("regexp"))).show() +---------+----------+----------------------------+ | str| regexp|regexp_instr(str, regexp, 0)| +---------+----------+----------------------------+ |1a 2b 14m|\d+(a|b|m)| 1| +---------+----------+----------------------------+
Example 4: Returns the position of the first substring in the str Column that match the regex pattern in regexp column name.
>>> df.select('*', sf.regexp_instr(sf.col("str"), "regexp")).show() +---------+----------+----------------------------+ | str| regexp|regexp_instr(str, regexp, 0)| +---------+----------+----------------------------+ |1a 2b 14m|\d+(a|b|m)| 1| +---------+----------+----------------------------+