-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Integrate hint_trivial_type
in List.__copyinit__
#3809
base: nightly
Are you sure you want to change the base?
Conversation
bc3ad12
to
eb869a4
Compare
hint_trivial_type
in List.__copyinit__
hint_trivial_type
in List.__copyinit__
Hi @rd4com, I don't think this needed a benchmark it's a bit overkill but nice 🤣. FYI, I just opened PR #3810 to merge a change from another PR which got stuck long ago, that is useful for this kind of optimization. I was thinking of tackling these kind of holes in performance later on, I completely forgot we had the hint parameter 😆. If we combine that with your request #3581 we will be able to optimize this automatically without the hint 🚀. |
@martinvuyk , great work on #3810 ! @parameter
if hint_trivial_type or get_dtype[T]!=DType.invalid:
memcpy(self.data, other.data, 1)
else:
for i in range(len(existing)):
... For the |
Yep, I'm pretty stoked too 😄
I was thinking more along the lines of having a private alias inside fn _get_same_bitwidth_dtype[T: AnyType, hint_trivial_type: Bool]() -> DType:
@parameter
if not hint_trivial_type:
return DType.invalid
alias width = bitwidthof[T]()
@parameter
if width == 64:
return DType.uint64
elif width == 32:
return DType.uint32
elif width == 16:
return DType.uint16
elif width == 8:
return DType.uint8
else:
return DType.invalid
struct List[T: CollectionElement, hint_trivial_type: Bool = False]:
alias _D = (
DType.get_dtype[T]() if DType.is_scalar[T]()
else _get_same_bitwidth_dtype[T, hint_trivial_type]()
) This way the compiler runs the logic only once and not on every function call, and we'll reduce compile times.
We can totally add a temporary hack for it inside the |
❤️🔥 ✅ auto-optimize @martinvuyk , it works! But if we change
(ping to @JoeLoser) |
@rd4com that's cool though I didn't mean to do it in this PR, it might extend it more since I think this concept alone needs some independent review, for example:
this seems like a compiler frontend bug (the function is not getting executed for one version but it is for the other) Until PR #3810 gets merged I think its perfectly fine to keep using |
26df10d
to
db7f797
Compare
Thanks, yep 👍 Just rebased and split the commits for another PR ! (ping to @JoeLoser) |
!sync |
Thanks! I have an open PR split off from this for adding a test + the optimization for not destroying trivial elements when the hint is provided. I'll need to do a few things to this imported PR before we can land it:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formally requesting changes until #3809 (comment) is addressed. I'll do some internal benchmarking with compile-times on e2e models like llama as well.
Signed-off-by: rd4com <[email protected]>
Signed-off-by: rd4com <[email protected]>
Signed-off-by: rd4com <[email protected]>
…al].__copyinit__` Signed-off-by: rd4com <[email protected]>
db7f797
to
f08eea7
Compare
Hello,
Again, still learning (ping to @JoeLoser) |
Signed-off-by: rd4com <[email protected]>
Signed-off-by: rd4com <[email protected]>
465b633
to
6cb64b4
Compare
num_repetitions=1, | ||
max_runtime_secs=0.5, | ||
min_runtime_secs=0.25, | ||
min_warmuptime_secs=0, # FIXME: adjust the values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question Any harm in using the default BenchConfig
values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It takes too much time to benchmark all the ⚙️ parametrized
on my small laptop with defaults
(please replace it with what makes sense)
Would you prefer a commit with defaults ?
!sync |
Hello,
List.__copyinit__
needed to implement@parameter if hint_trivial_type
,so here is a PR 👍 !
Because
String.__copyinit__
doesList.__copyinit__
,and that it's buffer uses
hint_trivial_type
,there is a nice speedup from it!
🥳 nice speedup for:
List[_, hint_trivial_type=True].__copyinit__
String.__copyinit__
(57x, increase with length)(
var buffer: List[UInt8, hint_trivial_type=True]
)fn(**kwargs)
should be faster too !(see PR [stdlib] Add
Dict._resize_down
and_under_load_factor()
#3133, because it currently does__copyinit__
ofDictEntry[K,V]
)(
_find_slot
just need to usePointer
👍 )hint_trivial_type=True
?I need to learn the
Benchmark
, but here is a simple one (benchmarks/collections/bench_list
)(please somebody help try to convert it to
Benchmark
)Is it possible to check if it is faster or that the benchmark is good ?
The benchmark for
Int
is less than forUInt8
, is it a bug ?4512122÷720588 = 6.2x
) for size4096
5425619÷102102 = 53x
) for size4096
results PR:
results currently: