Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy Matching #159

Open
josh1400 opened this issue Jul 28, 2021 · 5 comments
Open

Fuzzy Matching #159

josh1400 opened this issue Jul 28, 2021 · 5 comments

Comments

@josh1400
Copy link

josh1400 commented Jul 28, 2021

I'm attempting to group duplicate customer names with varying spellings using agrep. I'm new to this so the following might not be ideal, but I'm having trouble just saving the results as a dataset.

This line of code creates a list that seems like it would be useful, but it doesn't save the dataset.

sapply(customer_data$End_Customer, agrep, customer_data$End_Customer)

image

This line of code doesn't show or save anything, but it does take just as long to run.

fuzzy <- sapply(customer_data$End_Customer, agrep, customer_data$End_Customer)
register("fuzzy")

I've had issues saving datasets in the past, but you helped me resolve them and I don't think I'm making the same mistake. I thought maybe the issue is that a list is created rather than a dataframe so I used this code.

list <- sapply(customer_data$End_Customer, agrep, customer_data$End_Customer)
frame <- data.frame(matrix(unlist(list), nrow=length(list), byrow=TRUE),stringsAsFactors=FALSE)
register("frame")

This saved a dataset, but it didn't have the customer names like I see in the list.

I'll keep troubleshooting, but any advice is appreciated. :)

@josh1400
Copy link
Author

I've made this work.

list <- sapply(wa$End_Customer, agrep, wa$End_Customer)
result <- data.frame(stack(list))
register("result")

image

@josh1400
Copy link
Author

This is my first step in a full Fuzzy Matching solution. Any advice for finishing my code will be super helpful!

@vnijs
Copy link
Contributor

vnijs commented Jul 28, 2021

The new MSBA cohort arrives on campus this week so I'm swamped. Have you considered posting this to StackOverflow? There are a lot of R folks that could help you there.

@josh1400
Copy link
Author

I've gotten this far from StackOverflow. I'll post and see what happens. If you can help when you're more free, this is something I'll be using for a lot of datasets at Micron.

Speaking of MSBA, I'd like to sit in on one of your classes. I haven't seen any possibilities yet from the emails Rady sends out. Do you plan on having any classes this upcoming school year where you dive more into R to add onto what I've learned since graduating?

@vnijs
Copy link
Contributor

vnijs commented Jul 28, 2021

Reach out to me by email and we can discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants