Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Integrate hint_trivial_type in List.__copyinit__ #3809

Open
wants to merge 6 commits into
base: nightly
Choose a base branch
from

Conversation

rd4com
Copy link
Contributor

@rd4com rd4com commented Nov 25, 2024

Hello,
List.__copyinit__ needed to implement @parameter if hint_trivial_type,
so here is a PR 👍 !

Because String.__copyinit__ does List.__copyinit__,
and that it's buffer uses hint_trivial_type,
there is a nice speedup from it!

🥳 nice speedup for:

  • List[_, hint_trivial_type=True].__copyinit__
  • String.__copyinit__ (57x, increase with length)
    (var buffer: List[UInt8, hint_trivial_type=True])
  • fn(**kwargs) should be faster too !
    (see PR [stdlib] Add Dict._resize_down and _under_load_factor() #3133, because it currently does __copyinit__ of DictEntry[K,V])
    (_find_slot just need to use Pointer 👍 )
  • More types that uses hint_trivial_type=True ?

I need to learn the Benchmark, but here is a simple one (benchmarks/collections/bench_list)
(please somebody help try to convert it to Benchmark)

Is it possible to check if it is faster or that the benchmark is good ?

The benchmark for Int is less than for UInt8, is it a bug ?

  • (4512122÷720588 = 6.2x) for size 4096
  • (5425619÷102102 = 53x) for size 4096

results PR:

iterations:  1024
list_hint_trivial_type
	 1		33281
	 2		43820
	 4		41106
	 8		28076
	 16		26829
	 32		27688
	 128	35084
	 256	46950
	 512	73477
	 1024	115724
	 2048	339350
	 4096	720588

string_copyinit
	 1		26295
	 2		28299
	 4		39899
	 8		40835
	 16		32544
	 32		27912
	 128	30006
	 256	31930
	 512	32685
	 1024	39246
	 2048	59733
	 4096	102102

results currently:

iterations:  1024
list_hint_trivial_type
	 1		43318
	 2		29643
	 4		30210
	 8		32667
	 16		42152
	 32		58145
	 128	157062
	 256	325332
	 512	534089
	 1024	1036117
	 2048	2079208
	 4096	4512122

string_copyinit
	 1		28839
	 2 	 	30548
	 4 	 	33308
	 8 	 	37221
	 16		47267
	 32		67287
	 128	190275
	 256	347621
	 512 	714950
	 1024 	1327402
	 2048 	2785542
	 4096	5425619

@rd4com rd4com requested a review from a team as a code owner November 25, 2024 20:57
@rd4com rd4com force-pushed the list_hint_trivial_type branch from bc3ad12 to eb869a4 Compare November 25, 2024 20:59
@rd4com rd4com changed the title Integrate hint_trivial_type in List.__copyinit__ [stdlib] Integrate hint_trivial_type in List.__copyinit__ Nov 25, 2024
@martinvuyk
Copy link
Contributor

martinvuyk commented Nov 25, 2024

Hi @rd4com, I don't think this needed a benchmark it's a bit overkill but nice 🤣.

FYI, I just opened PR #3810 to merge a change from another PR which got stuck long ago, that is useful for this kind of optimization. I was thinking of tackling these kind of holes in performance later on, I completely forgot we had the hint parameter 😆. If we combine that with your request #3581 we will be able to optimize this automatically without the hint 🚀.

@rd4com
Copy link
Contributor Author

rd4com commented Nov 26, 2024

@martinvuyk , great work on #3810 !
It can give the List the speedup even if hint_trivial_type == False 🥳
So users could get the speedup with a simple List[Float64]! (no hint_trivial_type)
(but not Int yet, maybe need a temporary thing for it)

@parameter
if hint_trivial_type or get_dtype[T]!=DType.invalid:
	memcpy(self.data, other.data, 1)
else:
	for i in range(len(existing)):
		...

For the is_trivial[T], it is probably somewhere in the future as trait (chat #nightly),
So let's just wait for features to naturally grow,
and auto-optimize for basic things like List[Int] and dtype how we currently can 👍
(that way users get the speedup before even learning trivial type 🥳 )

@martinvuyk
Copy link
Contributor

It can give the List the speedup even if hint_trivial_type == False 🥳 So users could get the speedup with a simple List[Float64]! (no hint_trivial_type) (but not Int yet, maybe need a temporary thing for it)

Yep, I'm pretty stoked too 😄

@parameter
if hint_trivial_type or get_dtype[T]!=DType.invalid:

I was thinking more along the lines of having a private alias inside List that has its trivial DType or DType.invalid

fn _get_same_bitwidth_dtype[T: AnyType, hint_trivial_type: Bool]() -> DType:
    @parameter
    if not hint_trivial_type:
        return DType.invalid

    alias width = bitwidthof[T]()
    
    @parameter
    if width == 64:
        return DType.uint64
    elif width == 32:
        return DType.uint32
    elif width == 16:
        return DType.uint16
    elif width == 8:
        return DType.uint8
    else:
        return DType.invalid


struct List[T: CollectionElement, hint_trivial_type: Bool = False]:
    alias _D = (
        DType.get_dtype[T]() if DType.is_scalar[T]()
        else _get_same_bitwidth_dtype[T, hint_trivial_type]()
    )

This way the compiler runs the logic only once and not on every function call, and we'll reduce compile times.

(but not Int yet, maybe need a temporary thing for it)

We can totally add a temporary hack for it inside the _get_same_bitwidth_dtype function using _type_is_eq until we have the is_trivial[T: AnyType]() -> Bool function

@rd4com
Copy link
Contributor Author

rd4com commented Nov 26, 2024

❤️‍🔥 ✅ auto-optimize hint_trivial_type
(speedup for List[Int], List[UInt8], List[Scalar])

@martinvuyk , it works!
(See # FIXME: Use dtypes checker of PR #3810 (DType.get_dtype[T]())

But if we change hint_trivial_type=False to hint_trivial_type=_trivial_default[T](),
there is a test error:

test/test_utils/types.mojo:159:8: note: candidate not viable: argument #2 cannot be converted from 'UnsafePointer[List[Int], alignment=1]' to 'UnsafePointer[List[Int, False]]'
struct ValueDestructorRecorder(ExplicitlyCopyable):

(ping to @JoeLoser)

@martinvuyk
Copy link
Contributor

@rd4com that's cool though I didn't mean to do it in this PR, it might extend it more since I think this concept alone needs some independent review, for example:

  • I think the alias should be private
  • The alias should be a DType for later rebinding after the checking e.g. @parameter if Self._D is not DType.invalid: memcpy(self.unsafe_ptr().bitcast[Scalar[Self._D]](), ...) ...
  • IMO the dtype getter should just be a helper func in the list.mojo file (maybe we'll take it to Span later on)

test/test_utils/types.mojo:159:8: note: candidate not viable: argument #2 cannot be converted from 'UnsafePointer[List[Int], alignment=1]' to 'UnsafePointer[List[Int, False]]'

this seems like a compiler frontend bug (the function is not getting executed for one version but it is for the other)

Until PR #3810 gets merged I think its perfectly fine to keep using hint_trivial_type as the parameter, the switch to the DType logic should be straightforward

@JoeLoser
Copy link
Collaborator

JoeLoser commented Dec 4, 2024

@rd4com this is great! To keep things self-contained, can we keep this PR focused on the original commit of eb869a4 and land that first? Then we can look at #3810 and integrate that in with your future commits here to extending the optimization to other known trivial types.

@JoeLoser JoeLoser added the waiting for response Needs action/response from contributor before a PR can proceed label Dec 4, 2024
@JoeLoser JoeLoser self-assigned this Dec 4, 2024
@rd4com rd4com force-pushed the list_hint_trivial_type branch from 26df10d to db7f797 Compare December 4, 2024 06:59
@rd4com
Copy link
Contributor Author

rd4com commented Dec 4, 2024

Thanks, yep 👍

Just rebased and split the commits for another PR !

(ping to @JoeLoser)

@JoeLoser
Copy link
Collaborator

JoeLoser commented Dec 6, 2024

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Dec 6, 2024
@JoeLoser
Copy link
Collaborator

JoeLoser commented Dec 7, 2024

Thanks, yep 👍

Just rebased and split the commits for another PR !

(ping to @JoeLoser)

Thanks! I have an open PR split off from this for adding a test + the optimization for not destroying trivial elements when the hint is provided. I'll need to do a few things to this imported PR before we can land it:

  • Measure compile time impact on e2e models (should be OK, but good to double check) for the runtime perf optimization in trivial element case.

Copy link
Collaborator

@JoeLoser JoeLoser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formally requesting changes until #3809 (comment) is addressed. I'll do some internal benchmarking with compile-times on e2e models like llama as well.

@rd4com rd4com force-pushed the list_hint_trivial_type branch from db7f797 to f08eea7 Compare December 17, 2024 19:53
@rd4com
Copy link
Contributor Author

rd4com commented Dec 17, 2024

Hello,
sorry for late commit, changes implemented 👍

  • ✅ [stdlib] Add parametrized benchmarks for List[Scalar[DT], is_trivial].__copyinit__

Again, still learning Benchmark (please have a look because it is slower than initial one)

(ping to @JoeLoser)

@rd4com rd4com force-pushed the list_hint_trivial_type branch from 465b633 to 6cb64b4 Compare December 18, 2024 16:04
num_repetitions=1,
max_runtime_secs=0.5,
min_runtime_secs=0.25,
min_warmuptime_secs=0, # FIXME: adjust the values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question Any harm in using the default BenchConfig values?

Copy link
Contributor Author

@rd4com rd4com Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes too much time to benchmark all the ⚙️ parametrized on my small laptop with defaults
(please replace it with what makes sense)

Would you prefer a commit with defaults ?

@JoeLoser
Copy link
Collaborator

!sync

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported-internally Signals that a given pull request has been imported internally. waiting for response Needs action/response from contributor before a PR can proceed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants