-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] [proposal] Make String.split()
and .splitlines()
return List[StringSlice]
#3879
Comments
I think I'd rather see |
@owenhilyard I'd also like to have an iterator for them (see PR #3858), but we can't deviate from Python APIs too much in functions that are used so ubiquitously like these two |
We absolutely can. We can make it return a |
I agree in principle, but how would we set it up? It's not just a matter of implicit constructors in many use-cases for these functions e.g. some python code which should work the same in Mojo: if __name__ == "__main__":
data = "some variable user input with their name: MyName"
words = data.split()
idx = words.index("name:")
print("hi", words[idx + 1], "welcome to Mojo 🔥") Iterators in Mojo edit the iterator on every |
We can make a trait for iterators over things that support random access which requires indexing and |
Ok that's an interesting abstraction. But if we're talking about concrete steps then we would need to implement:
Then we'd need to add implicit constructors for IMO we should wait until we have flexible enough |
+1, I think there's a lot of stuff better done after some type system work. |
Hey folks, I just wanted to communicate outward the results of a Mojo standard library team discussion about this proposal:
|
Review Mojo's priorities
What is your request?
Make
String.split()
and.splitlines()
returnList[StringSlice]
What is your motivation for this change?
String.split()
and.splitlines()
are normally used as an intermediary step for other logic, I think once we setup some more implicit casting infrastructure the negative impact to existing code would be very minimal. Especially given that edge cases like:are quite simple to solve if we have an implicit constructor from
List[String].__init__(self, other: List[StringSlice[_]])
and only the linevar lines: List[String] = data.split(" ")
would need to be changed. This would also allow function signatures that receiveList[String]
to be passed aList[StringSlice]
.Other than inplace mutation of a string, most code shouldn't be too impacted by this deviation. If anyone can think of more edge cases please do write them bellow, I wouldn't want us to break every codebase out there because of a couple of "innocent" assumptions, and I'm not sure how "bad" it is to deviate from Python in this manner.
Any other details?
In some benchmarks I've been doing, our current splitlines implementation is around 2x slower than similar code in Python. If the Mojo code is changed to do
.as_string_slice().splitlines()
it's actually 16% faster than Python's implementation. My hypothesis is that it's because strings in python are copy-on-write, and as such when building a string from another they are virtually the same as aStringSlice
.Another option would be to make
String
be like Python's where the underlying buffer can either be owning or non-owning, using a data structure like what was proposed in #3797 (reference implementation in #3807)The text was updated successfully, but these errors were encountered: