Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add takestring! and make String(::Memory) not truncate #54372

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 33 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
bd1677e
Add take_string!
jakobnissen May 6, 2024
71a030f
Use take_string! on StringMemory uses in Base
jakobnissen May 6, 2024
d247d92
Rename to takestring!
jakobnissen May 10, 2024
699916e
Use more efficient fallback for String(::Memory) as suggested
jakobnissen May 10, 2024
1ab1785
Update docstring
jakobnissen May 10, 2024
50d62c8
Implement takestring(::IOBuffer) and unsafe_takestring!
jakobnissen May 10, 2024
9e14b92
Use (unsafe_)takestring! throughout Base
jakobnissen May 10, 2024
55f8d23
Fixup String(::Vector) impl
jakobnissen May 10, 2024
46d0357
Replace uses of String(_unsafe_take!(x))
jakobnissen May 10, 2024
2f36912
Fixup IOBuffer
jakobnissen May 10, 2024
d7174bf
Try fixing takestring of IOBuffer
jakobnissen May 10, 2024
94485e6
Fix IOBuffer again
jakobnissen May 10, 2024
a28c45d
Fix string from vector
jakobnissen May 10, 2024
ba78f83
Imports
jakobnissen May 10, 2024
ac5753b
Remove takestring!(::Memory)
jakobnissen May 10, 2024
4f838cd
Fold changes from #54438 into this PR
jakobnissen May 10, 2024
1ef6b59
Update NEWS.md
jakobnissen May 10, 2024
f3046c0
Define internal unsafe_takestring
jakobnissen May 12, 2024
1047577
Fix imports
jakobnissen May 12, 2024
513038e
More uses of takestring
jakobnissen May 12, 2024
ec6e128
Add imports
jakobnissen May 13, 2024
1d67150
More uses
jakobnissen May 13, 2024
1a498f4
More imports
jakobnissen May 13, 2024
7c7ce9c
Fix typo
jakobnissen May 13, 2024
0b9aff5
Imports
jakobnissen May 13, 2024
8fc3cfd
Don't reset non-writable streams
jakobnissen May 13, 2024
9948ec5
Do not copy string when possible
jakobnissen May 18, 2024
176506c
Add 1.12 compat
jakobnissen May 27, 2024
1689258
Try using unsafe_takestring internally in takestring
jakobnissen Jul 23, 2024
998c9f6
Check for init in takestring
jakobnissen Jul 23, 2024
c9bfe09
Address reviewer comments
jakobnissen Jul 24, 2024
67b4d9a
Use unsafe version of takestring more rarely
jakobnissen Jul 31, 2024
d7ff276
Fix error when rebasing
jakobnissen Sep 9, 2024
462fcae
Address some reviewer comments
jakobnissen Sep 10, 2024
35fe11c
Address review: Do not unsafe_take memory the user may hold externally
jakobnissen Oct 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ New library features
* `RegexMatch` objects can now be used to construct `NamedTuple`s and `Dict`s ([#50988])
* `Lockable` is now exported ([#54595])
* New `ltruncate`, `rtruncate` and `ctruncate` functions for truncating strings to text width, accounting for char widths ([#55351])
* `takestring!(v)` creates a `String` from `v` (a `Vector{UInt8}` or `IOBuffer`), truncating `v` and reusing its memory if possible )([#54372]).

Standard library changes
------------------------
Expand Down
2 changes: 1 addition & 1 deletion base/compiler/ssair/show.jl
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,7 @@ function compute_ir_line_annotations(code::IRCode)
loc_method = string(" "^printing_depth, loc_method)
last_stack = stack
end
push!(loc_annotations, String(take!(buf)))
push!(loc_annotations, takestring!(buf))
push!(loc_lineno, (lineno != 0 && lineno != last_lineno) ? string(lineno) : "")
push!(loc_methods, loc_method)
(lineno != 0) && (last_lineno = lineno)
Expand Down
4 changes: 2 additions & 2 deletions base/errorshow.jl
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ function showerror(io::IO, ex::MethodError)
iob = IOContext(buf, io) # for type abbreviation as in #49795; some, like `convert(T, x)`, should not abbreviate
show_signature_function(iob, Core.Typeof(f))
show_tuple_as_call(iob, :function, arg_types; hasfirst=false, kwargs = isempty(kwargs) ? nothing : kwargs)
str = String(take!(buf))
str = takestring!(buf)
str = type_limited_string_from_context(io, str)
print(io, str)
end
Expand Down Expand Up @@ -596,7 +596,7 @@ function show_method_candidates(io::IO, ex::MethodError, kwargs=[])
m = parentmodule_before_main(method)
modulecolor = get!(() -> popfirst!(STACKTRACE_MODULECOLORS), STACKTRACE_FIXEDCOLORS, m)
print_module_path_file(iob, m, string(file), line; modulecolor, digit_align_width = 3)
push!(lines, String(take!(buf)))
push!(lines, takestring!(buf))
push!(line_score, -(right_matches * 2 + (length(arg_types_param) < 2 ? 1 : 0)))
end
end
Expand Down
1 change: 1 addition & 0 deletions base/exports.jl
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,7 @@ export
split,
string,
strip,
takestring!,
textwidth,
thisind,
titlecase,
Expand Down
2 changes: 1 addition & 1 deletion base/filesystem.jl
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ import .Base:
bytesavailable, position, read, read!, readavailable, seek, seekend, show,
skip, stat, unsafe_read, unsafe_write, write, transcode, uv_error,
setup_stdio, rawhandle, OS_HANDLE, INVALID_OS_HANDLE, windowserror, filesize,
isexecutable, isreadable, iswritable, MutableDenseArrayType
isexecutable, isreadable, iswritable, MutableDenseArrayType, unsafe_takestring!

import .Base.RefValue

Expand Down
4 changes: 2 additions & 2 deletions base/gmp.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import .Base: *, +, -, /, <, <<, >>, >>>, <=, ==, >, >=, ^, (~), (&), (|), xor,
bin, oct, dec, hex, isequal, invmod, _prevpow2, _nextpow2, ndigits0zpb,
widen, signed, unsafe_trunc, trunc, iszero, isone, big, flipsign, signbit,
sign, hastypemax, isodd, iseven, digits!, hash, hash_integer, top_set_bit,
clamp
clamp, unsafe_takestring

if Clong == Int32
const ClongMax = Union{Int8, Int16, Int32}
Expand Down Expand Up @@ -761,7 +761,7 @@ function string(n::BigInt; base::Integer = 10, pad::Integer = 1)
sv[i] = '0' % UInt8
end
isneg(n) && (sv[1] = '-' % UInt8)
String(sv)
unsafe_takestring(sv)
end

function digits!(a::AbstractVector{T}, n::BigInt; base::Integer = 10) where {T<:Integer}
Expand Down
2 changes: 1 addition & 1 deletion base/indices.jl
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ function throw_promote_shape_mismatch(a::Tuple, b::Union{Nothing,Tuple}, i = not
if i ≢ nothing
print(msg, ", mismatch at dim ", i)
end
throw(DimensionMismatch(String(take!(msg))))
throw(DimensionMismatch(takestring!(msg)))
end

function promote_shape(a::Tuple{Int,}, b::Tuple{Int,})
Expand Down
2 changes: 1 addition & 1 deletion base/int.jl
Original file line number Diff line number Diff line change
Expand Up @@ -718,7 +718,7 @@ macro big_str(s)
is_prev_dot = (c == '.')
end
print(bf, s[end])
s = String(take!(bf))
s = unsafe_takestring!(bf)
end
n = tryparse(BigInt, s)
n === nothing || return n
Expand Down
12 changes: 6 additions & 6 deletions base/intfuncs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -766,7 +766,7 @@ function bin(x::Unsigned, pad::Int, neg::Bool)
i -= 1
end
neg && (@inbounds a[1] = 0x2d) # UInt8('-')
String(a)
unsafe_takestring(a)
end

function oct(x::Unsigned, pad::Int, neg::Bool)
Expand All @@ -780,7 +780,7 @@ function oct(x::Unsigned, pad::Int, neg::Bool)
i -= 1
end
neg && (@inbounds a[1] = 0x2d) # UInt8('-')
String(a)
unsafe_takestring(a)
end

# 2-digit decimal characters ("00":"99")
Expand Down Expand Up @@ -850,7 +850,7 @@ function dec(x::Unsigned, pad::Int, neg::Bool)
a = StringMemory(n)
append_c_digits_fast(n, x, a, 1)
neg && (@inbounds a[1] = 0x2d) # UInt8('-')
String(a)
unsafe_takestring(a)
end

function hex(x::Unsigned, pad::Int, neg::Bool)
Expand All @@ -871,7 +871,7 @@ function hex(x::Unsigned, pad::Int, neg::Bool)
@inbounds a[i] = d + ifelse(d > 0x9, 0x57, 0x30)
end
neg && (@inbounds a[1] = 0x2d) # UInt8('-')
String(a)
unsafe_takestring(a)
end

const base36digits = UInt8['0':'9';'a':'z']
Expand All @@ -896,7 +896,7 @@ function _base(base::Integer, x::Integer, pad::Int, neg::Bool)
i -= 1
end
neg && (@inbounds a[1] = 0x2d) # UInt8('-')
String(a)
unsafe_takestring(a)
end

split_sign(n::Integer) = unsigned(abs(n)), n < 0
Expand Down Expand Up @@ -972,7 +972,7 @@ function bitstring(x::T) where {T}
x = lshr_int(x, 4)
i -= 4
end
return String(str)
return unsafe_takestring(str)
end

"""
Expand Down
23 changes: 11 additions & 12 deletions base/io.jl
Original file line number Diff line number Diff line change
Expand Up @@ -277,13 +277,13 @@ julia> io = IOBuffer();
julia> write(io, "JuliaLang is a GitHub organization.", " It has many members.")
56

julia> String(take!(io))
julia> takestring!(io)
"JuliaLang is a GitHub organization. It has many members."

julia> write(io, "Sometimes those members") + write(io, " write documentation.")
44

julia> String(take!(io))
julia> takestring!(io)
"Sometimes those members write documentation."
```
User-defined plain-data types without `write` methods can be written when wrapped in a `Ref`:
Expand Down Expand Up @@ -543,8 +543,8 @@ julia> rm("my_file.txt")
```
"""
readuntil(filename::AbstractString, delim; kw...) = open(io->readuntil(io, delim; kw...), convert(String, filename)::String)
readuntil(stream::IO, delim::UInt8; kw...) = _unsafe_take!(copyuntil(IOBuffer(sizehint=16), stream, delim; kw...))
readuntil(stream::IO, delim::Union{AbstractChar, AbstractString}; kw...) = String(_unsafe_take!(copyuntil(IOBuffer(sizehint=16), stream, delim; kw...)))
readuntil(stream::IO, delim::UInt8; kw...) = takestring!(copyuntil(IOBuffer(sizehint=16), stream, delim; kw...))
jakobnissen marked this conversation as resolved.
Show resolved Hide resolved
readuntil(stream::IO, delim::Union{AbstractChar, AbstractString}; kw...) = takestring!(copyuntil(IOBuffer(sizehint=16), stream, delim; kw...))
readuntil(stream::IO, delim::T; keep::Bool=false) where T = _copyuntil(Vector{T}(), stream, delim, keep)


Expand All @@ -566,10 +566,10 @@ Similar to [`readuntil`](@ref), which returns a `String`; in contrast,
```jldoctest
julia> write("my_file.txt", "JuliaLang is a GitHub organization.\\nIt has many members.\\n");

julia> String(take!(copyuntil(IOBuffer(), "my_file.txt", 'L')))
julia> takestring!(copyuntil(IOBuffer(), "my_file.txt", 'L'))
"Julia"

julia> String(take!(copyuntil(IOBuffer(), "my_file.txt", '.', keep = true)))
julia> takestring!(copyuntil(IOBuffer(), "my_file.txt", '.', keep = true))
"JuliaLang is a GitHub organization."

julia> rm("my_file.txt")
Expand Down Expand Up @@ -616,8 +616,7 @@ Logan
"""
readline(filename::AbstractString; keep::Bool=false) =
open(io -> readline(io; keep), filename)
readline(s::IO=stdin; keep::Bool=false) =
String(_unsafe_take!(copyline(IOBuffer(sizehint=16), s; keep)))
readline(s::IO=stdin; keep::Bool=false) = takestring!(copyline(IOBuffer(sizehint=16), s; keep))

"""
copyline(out::IO, io::IO=stdin; keep::Bool=false)
Expand All @@ -642,10 +641,10 @@ See also [`copyuntil`](@ref) for reading until more general delimiters.
```jldoctest
julia> write("my_file.txt", "JuliaLang is a GitHub organization.\\nIt has many members.\\n");

julia> String(take!(copyline(IOBuffer(), "my_file.txt")))
julia> takestring!(copyline(IOBuffer(), "my_file.txt"))
"JuliaLang is a GitHub organization."

julia> String(take!(copyline(IOBuffer(), "my_file.txt", keep=true)))
julia> takestring!(copyline(IOBuffer(), "my_file.txt", keep=true))
"JuliaLang is a GitHub organization.\\n"

julia> rm("my_file.txt")
Expand Down Expand Up @@ -1291,7 +1290,7 @@ function iterate(r::Iterators.Reverse{<:EachLine}, state)
buf.size = _stripnewline(r.itr.keep, buf.size, buf.data)
empty!(chunks) # will cause next iteration to terminate
seekend(r.itr.stream) # reposition to end of stream for isdone
s = String(_unsafe_take!(buf))
s = unsafe_takestring!(buf)
else
# extract the string from chunks[ichunk][inewline+1] to chunks[jchunk][jnewline]
if ichunk == jchunk # common case: current and previous newline in same chunk
Expand All @@ -1308,7 +1307,7 @@ function iterate(r::Iterators.Reverse{<:EachLine}, state)
end
write(buf, view(chunks[jchunk], 1:jnewline))
buf.size = _stripnewline(r.itr.keep, buf.size, buf.data)
s = String(_unsafe_take!(buf))
s = unsafe_takestring!(buf)

# overwrite obsolete chunks (ichunk+1:jchunk)
i = jchunk
Expand Down
74 changes: 72 additions & 2 deletions base/iobuffer.jl
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ julia> io = IOBuffer();
julia> write(io, "JuliaLang is a GitHub organization.", " It has many members.")
56

julia> String(take!(io))
julia> takestring!(io)
"JuliaLang is a GitHub organization. It has many members."

julia> io = IOBuffer(b"JuliaLang is a GitHub organization.")
Expand All @@ -84,7 +84,7 @@ IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=fa
julia> write(io, "JuliaLang is a GitHub organization.")
34

julia> String(take!(io))
julia> takestring!(io)
"JuliaLang is a GitHub organization"

julia> length(read(IOBuffer(b"data", read=true, truncate=false)))
Expand Down Expand Up @@ -489,6 +489,76 @@ function take!(io::IOBuffer)
return data
end

"Internal method. This method can be faster than takestring!, because it does not
reset the buffer to a usable state, and it does not check for io.reinit.
Using the buffer after calling unsafe_takestring! may cause undefined behaviour.
This function is meant to be used when the buffer is only used as a temporary
string builder, which is discarded after the string is built."
function unsafe_takestring!(io::IOBuffer)
start = io.seekable ? io.offset + 1 : io.ptr
nbytes = io.size - start + 1
iszero(nbytes) && return ""
# The C function can only copy from the start of the memory.
# Fortunately, in most cases, the offset will be zero.
return if isone(start)
ccall(:jl_genericmemory_to_string, Ref{String}, (Any, Int), io.data, nbytes)
else
mem = StringMemory(nbytes)
unsafe_copyto!(mem, 1, io.data, start, nbytes)
unsafe_takestring(mem)
end
end

"""
takestring!(io::IOBuffer) -> String

Return the content of `io` as a `String`, resetting the buffer to its initial
state.
This is preferred over calling `String(take!(io))` to create a string from
an `IOBuffer`.

# Examples
```jldoctest
julia> io = IOBuffer();

julia> write(io, [0x61, 0x62, 0x63]);

julia> s = takestring!(io)
"abc"

julia> isempty(take!(io)) # io is now empty
true
```

!!! compat "Julia 1.12"
This function requires at least Julia 1.12.
"""
function takestring!(io::IOBuffer)
jakobnissen marked this conversation as resolved.
Show resolved Hide resolved
# If the buffer has been used up and needs to be replaced, there are no bytes, and
# we can return an empty string without interacting with the buffer at all.
io.reinit && return ""

s = unsafe_takestring!(io)
jakobnissen marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the buffer is not writable, this must copy the data into a StringMemory, see line 471 in the take! function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, but perhaps I'm wrong. The reason it copies in line 471 is because, if the buffer is not writable, io.reinit is not set to true, and thus the returned value is an alias of the memory in the buffer. However, this is only safe is one can guarantee the returned Vector is never mutated. We can guarantee that when returning String, so it's fine that the returned string shares memory with the unwriteable buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will normally be fine, but the following example seg faults with this PR.

iow = IOBuffer()
write(iow, zeros(UInt8, 10_000_000))
v1 = take!(iow)
ior = IOBuffer(v1)
s1 = takestring!(ior)
s1 = nothing
GC.gc()
GC.gc()
GC.gc()
v1[1]

This is because after a call to unsafe_takestring(m::Memory{UInt8}) m becomes unsafe to read because its memory may get freed by Julia if the string isn't needed.


# Restore the buffer to a usable state, making it no longer undefined behaviour to
# use the buffer after the `unsafe_takestring!` call.
# Note that if the buffer is not writable, there is no need to reinitialize the buffer,
# since it doesn't matter that the returned string looks into the same memory -
# this is because the buffer is not mutable through either the string nor the buffer.
if io.writable
io.reinit = true
io.mark = -1
io.ptr = 1
io.size = 0
io.offset = 0
end
s
end

jakobnissen marked this conversation as resolved.
Show resolved Hide resolved
# Fallback methods
unsafe_takestring!(io::GenericIOBuffer) = takestring!(io)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unsafe_takestring!(io::GenericIOBuffer) = takestring!(io)

This fallback isn't needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it and the tests passed locally. There are also no functions that call unsafe_takestring! on a GenericIOBuffer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem could arise if you construct a GenericIOBuffer with a backing buffer that is not Vector{UInt8}. I'm not sure if this is exercised by the tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe_takestring! is an internal function, so it shouldn't be an issue.

takestring!(io::GenericIOBuffer) = String(take!(io))

"""
_unsafe_take!(io::IOBuffer)

Expand Down
6 changes: 3 additions & 3 deletions base/iostream.jl
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ julia> write(io, "JuliaLang is a GitHub organization.")
julia> truncate(io, 15)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=15, maxsize=Inf, ptr=16, mark=-1)

julia> String(take!(io))
julia> takestring!(io)
"JuliaLang is a "

julia> io = IOBuffer();
Expand All @@ -107,7 +107,7 @@ julia> write(io, "JuliaLang is a GitHub organization.");

julia> truncate(io, 40);

julia> String(take!(io))
julia> takestring!(io)
"JuliaLang is a GitHub organization.\\0\\0\\0\\0\\0"
```
"""
Expand Down Expand Up @@ -456,7 +456,7 @@ function readuntil_string(s::IOStream, delim::UInt8, keep::Bool)
end
readuntil(s::IOStream, delim::AbstractChar; keep::Bool=false) =
isascii(delim) ? readuntil_string(s, delim % UInt8, keep) :
String(_unsafe_take!(copyuntil(IOBuffer(sizehint=70), s, delim; keep)))
takestring!(copyuntil(IOBuffer(sizehint=70), s, delim; keep))

function readline(s::IOStream; keep::Bool=false)
@_lock_ios s ccall(:jl_readuntil, Ref{String}, (Ptr{Cvoid}, UInt8, UInt8, UInt8), s.ios, '\n', 1, keep ? 0 : 2)
Expand Down
2 changes: 1 addition & 1 deletion base/logging/ConsoleLogger.jl
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ function handle_message(logger::ConsoleLogger, level::LogLevel, message, _module
for (key, val) in kwargs
key === :maxlog && continue
showvalue(valio, val)
vallines = split(String(take!(valbuf)), '\n')
vallines = split(takestring!(valbuf), '\n')
if length(vallines) == 1
push!(msglines, (indent=2, msg=SubString("$key = $(vallines[1])")))
else
Expand Down
2 changes: 1 addition & 1 deletion base/pkgid.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ function binpack(pkg::PkgId)
uuid = pkg.uuid
write(io, uuid === nothing ? UInt128(0) : UInt128(uuid))
write(io, pkg.name)
return String(take!(io))
return unsafe_takestring!(io)
end

function binunpack(s::String)
Expand Down
Loading