Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use functions from ML? #13

Open
mrericrichter opened this issue May 2, 2024 · 4 comments
Open

How to use functions from ML? #13

mrericrichter opened this issue May 2, 2024 · 4 comments

Comments

@mrericrichter
Copy link

Could you please provide an example (or even an implementation) of how to use functions from the ML package, e.g. the Microsoft.Spark.ML.Feature.Bucketizer class?

@mrericrichter mrericrichter changed the title How to use function from ML? How to use functions from ML? May 2, 2024
@GoEddie
Copy link
Owner

GoEddie commented May 4, 2024

I was looking at how PySpark implements the ML functions, it seems that they use numpy and do some of the work on the client but I'm not 100% - will carry on trying to figure it out!

@mrericrichter
Copy link
Author

Typically, these functions run on worker nodes. However, it seems that Spark Connect currently supports SQL functions only. All supported functions contain a flag in their documentation that states 'Supports Spark Connect', e.g.
image

Functions from the MLLib do not contain this flag, e.g.
image

This seems to be a major limitation of Spark Connect as of today. The documentation says that more functions will be added in future versions of Spark.

@GoEddie
Copy link
Owner

GoEddie commented May 5, 2024

i'll keep this open and add them in when they are available

@GoEddie
Copy link
Owner

GoEddie commented Oct 13, 2024

There is an open pr for adding ml functions to spark connect, once this is merged in we should be able to implement the functions:

wbo4958/spark#5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants