You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# spark is from the previous example
sc = spark.sparkContext
# A text dataset is pointed to by path.
# The path can be either a single text file or a directory of text files
path = "examples/src/main/resources/people.txt"
df1 = spark.read.text(path)
df1.show()
# +-----------+
# | value|
# +-----------+
# |Michael, 29|
# | Andy, 30|
# | Justin, 19|
# +-----------+
# You can use 'lineSep' option to define the line separator.
# The line separator handles all `\r`, `\r\n` and `\n` by default.
df2 = spark.read.text(path, lineSep=",")
df2.show()
# +-----------+
# | value|
# +-----------+
# | Michael|
# | 29\nAndy|
# | 30\nJustin|
# | 19\n|
# +-----------+
# You can also use 'wholetext' option to read each input file as a single row.
df3 = spark.read.text(path, wholetext=True)
df3.show()
# +--------------------+
# | value|
# +--------------------+
# |Michael, 29\nAndy...|
# +--------------------+
# "output" is a folder which contains multiple text files and a _SUCCESS file.
df1.write.csv("output")
# You can specify the compression format using the 'compression' option.
df1.write.text("output_compressed", compression="gzip")
The text was updated successfully, but these errors were encountered:
The text was updated successfully, but these errors were encountered: