Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added functionality to locate elements based on their tagname, attribute, and value. #32

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

hamzaaitbrik
Copy link

Hello, I hope you're doing fine.

I submitted an issue not along ago stating that I had issues trying to locate elements nested deeply into shadowRoots. I tried doing it with JavaScript, the same way I do it with Selenium, but that didn't work because the evaluate() method only returns serializable data, and I needed elements to interact with.

So, I decided to change the source code of the project to fix my issue, I ended up creating 4 other functions that can accurately, and efficiently locate elements in the whole DOM, including shadowRoots and iFrames, using only the tagname of the element, and a pair of attribute/value to narrow down the targeting.

The functions I created are locate(), locate_all(), locate_element_by_tagname_attribute_value(), and locate_elements_by_tagname_attribute_value().

The locate() function takes three arguments, tagname, attribute, and value. That way, if we're aiming to locate a button, and we know that our target button is the only button with a certain pair of attribute/value, we can effectively locate that button every single time, even if the DOM changes in case we're working on a website that changes its structure.

The locate_all() function takes three arguments as well, tagname, attribute, and value. It works the exact same way as the locate() function, the only difference is that we're able to retrieve a list of all elements of our specified tagname, and pair of attribute/value.

Feel free to examine the code I added, and let me know if I missed anything. I will be more than happy to modify on it to match the preferences of this project.

I thought of deleting the find() and select() functions as they won't be much of use compared to locate(), but that's your decision to make.

Looking forward for this pull request to get merged. I believe this will add great value!

Thanks,
Hamza

@stephanlensky
Copy link
Owner

Thank you for your contribution!

I'm interested in merging this, but am hesitant because adding another "find" function with a new name when find and select already exist sounds like it might be confusing for new users.

I spent a bit of time thinking how to solve this problem, but wasn't able to come up with anything. I'd be open to suggestions, though. I do think it's important we don't add a 3rd new function with ambiguous purpose compared to the other existing ones.

@hamzaaitbrik
Copy link
Author

Thanks for the reply @stephanlensky.

I don't know what you suggest. I believe the function I added is much more efficient at locating elements than the other two functions, especially if elements are nested deeply into shadowRoots, hence why I thought of making such thing.

The find() and select() functions are not able to locate elements within shadowRoots, which I stumble upon quite frequently when doing web automation.

If having three functions is not an option, I would suggest removing the find() function altogether, because that's the least important one of them. Selecting elements based on their text can be achieved using the locate() function with the arguments tagname=tagname, attribute=innerHTML or attribute=textContent, and value=ourTarget.

The only downside in only using the select() and locate() functions solely is having to inspect the webpage whenever we're looking for our targeted element. I believe that should be totally okay for people who are using this project, because they're developers. I myself do web automation, and I can't imagine myself writing a script for a webpage without inspecting it's DOM.

Let me know what you think about what I said, and whether there's any modifications I should add.

I am also looking forward into having this pull request merged into the main branch.

Cheers,
Hamza

@stephanlensky
Copy link
Owner

@hamzaaitbrik after giving this more thought, I think the best path is to integrate your functionality into the existing find/find_all functions, rather than creating a new function.

If you are familiar with the Python package beautifulsoup, my goal is for find/find_all to be very similar to the find/find_all functions from there.

More concretely, here is what I suggest:

  1. Modify the existing find/find_all functions to traverse shadow roots
  2. Add a new optional argument (we don't want to break existing users) to find/find_all: attrs.
    • This argument will be a dict mapping attribute name -> value
    • You can specify one or more attributes in this dict to filter results
    • Very similar to what you have now, but extended to work with multiple attributes instead of just one
  3. Make the existing text and best_match arguments optional, so that you have the option to filter only by attrs without searching for some specific text.

Let me know if you have any questions or if this makes sense, thanks.

@hamzaaitbrik
Copy link
Author

Hello @stephanlensky,

I have given what you said so much thought. But I can't comprehend how would such function work? The locate() function takes a tagname as argument, that way we know what we're targeting; and that's more technical than just trying to find something called 'Log In' or 'Submit,' which is not practical in my opinion.

I genuinely believe that this project should ditch the option of finding elements by text, because that's simply not as efficient as finding elements by tagname and attribute/value.

I would argue that whoever will base their work on this project will definitely need a bulletproof method to locate elements effectively, and text simply doesn't do it. I tried, tho; I tried looking for buttons with 'Log In' text with a modified find() method, but I don't accurately get my target element every time, I sometimes get div or span elements while I need a button to click.

This will obviously be a big merge, adding/changing functions is something users will have to adapt with.
What I suggest is, we ditch finding elements by text altogether, because simply that is not efficient. I will be more than happy to rewrite the documentation for the project explaining how the function responsible for finding elements using their tagname and attrs work.

I'm open to other suggestions, but finding elements with text just doesn't work, I would consider myself intermediate in web automation and I know that finding the exact same element every single time is a crucial for longevity.

Let me know what you think.

@stephanlensky
Copy link
Owner

stephanlensky commented Dec 22, 2024

@hamzaaitbrik please reread what I said carefully. Specifically:

Make the existing text and best_match arguments optional, so that you have the option to filter only by attrs without searching for some specific text.

I am suggesting the existing function be extended in such a way that searching by text is no longer required, and (if you desire) you can instead search just by attribute and tag name.

Attributes/attributes values can be provided as an optional dictionary. Similarly, tag name can be provided as an optional string.

This gives users the power to search however they want, whether that is by text, attribute or tag name, all without breaking any existing usages or adding a new function.

@hamzaaitbrik
Copy link
Author

Oh, okay! Gotcha now. Sorry, I totally missed the part where you said text could be optional. I will code it and try to make it as efficient as possible and submit the modified code. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants