Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep trace of pattern used #26

Open
moodymudskipper opened this issue Sep 28, 2020 · 1 comment
Open

keep trace of pattern used #26

moodymudskipper opened this issue Sep 28, 2020 · 1 comment

Comments

@moodymudskipper
Copy link
Owner

So far we don't.

  • unglue and unglue_vec can store it as names (there will be duplicate names but doesn't matter)
  • unglue_data and unglue_unnest can have it in a separate column
@mathematiguy
Copy link

mathematiguy commented Mar 26, 2022

I needed this so I came up with a hack where I prefix the pattern variables with an pattern id so I can gather, separate and spread to get the pattern values as well.

Using the example on the readme it looks like this:

> facts <- c("Antarctica is the largest desert in the world!",
+            "The largest country in Europe is Russia!",
+            "The smallest country in Europe is Vatican!",
+            "Disneyland is the most visited place in Europe! Disneyland is in Paris!",
+            "The largest island in the world is Green Land!")
> facts_df <- data.frame(id = 1:5, facts)
> 
> patterns <- c("The {p1_adjective} {p1_place_type} in {p1_bigger_place} is {p1_place}!",
+               "{p2_place} is the {p2_adjective} {p2_place_type=[^ ]+} in {p2_bigger_place}!{=.*}")
> unglue_data(facts, patterns) %>%
+     add_column(facts, .before=1) %>%
+     gather(key="variable", value="value", -facts) %>%
+     filter(!is.na(value)) %>%
+     separate(variable, sep="_", into=c("pattern", "variable"), extra="merge") %>%
+     spread(key=variable, value=value)
                                                                    facts
1                          Antarctica is the largest desert in the world!
2 Disneyland is the most visited place in Europe! Disneyland is in Paris!
3                                The largest country in Europe is Russia!
4                          The largest island in the world is Green Land!
5                              The smallest country in Europe is Vatican!
  pattern    adjective bigger_place      place place_type
1      p2      largest    the world Antarctica     desert
2      p2 most visited       Europe Disneyland      place
3      p1      largest       Europe     Russia    country
4      p1      largest    the world Green Land     island
5      p1     smallest       Europe    Vatican    country
> 

There might be gotchas that apply to more general cases that I haven't thought of, but I thought you might find this useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants