Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickstart guide example fails #4

Open
mjbroerman opened this issue Sep 16, 2021 · 6 comments
Open

Quickstart guide example fails #4

mjbroerman opened this issue Sep 16, 2021 · 6 comments

Comments

@mjbroerman
Copy link

Hi Emily,

Lovely package. I wanted to use it for an upcoming project, and while I was kicking the tires, I hit this

> filepath <- system.file("", "ex-convo.yml", package = "convo")
> convo <- read_convo(filepath)
> write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", path = tempdir())
 Error in if (step_list$column[[1]] == step_list$columns_expr) { : 
  missing value where TRUE/FALSE needed 
7.
get_column_text(step_list = step_list, expanded = expanded) 
6.
as_agent_yaml_list(agent = agent, expanded = expanded) 
5.
pointblank::yaml_write(., filename = "convo-validation.yml", 
    path = "/var/folders/bp/l5qt50g13sndlz0x48jqqv300000gn/T//RtmpqDnAMJ") 
4.
pointblank::create_agent(read_fn = ~setNames(as.data.frame(matrix(1, 
    ncol = 2)), c("IND_A", "AMT_B"))) %>% pointblank::col_vals_not_null(matches("^([A-Za-z]_){0}ID"), 
    step_id = 1) %>% pointblank::col_is_numeric(matches("^([A-Za-z]_){0}ID"), 
    step_id = 2) %>% pointblank::col_vals_between(matches("^([A-Za-z]_){0}ID"),  ... at <text>#1
3.
eval(parse(text = code)) 
2.
eval(parse(text = code)) 
1.
write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", 
    path = tempdir()) 

Likewise when interrogating:

> pointblank::interrogate(agent)

── Interrogation Started - there are 13 steps ───────────────────────────────────────────────────────────────
Error: Only strings can be converted to symbols
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Only strings can be converted to symbols
Backtrace:
 1. pointblank::interrogate(agent)
 2. pointblank:::check_table_with_assertion(...)
 3. pointblank:::interrogate_not_null(agent = agent, idx = idx, table = table)
 4. pointblank:::get_column_as_sym_at_idx(agent = agent, idx = idx)
 5. rlang::sym(...)

I'd love to find out this was my misunderstanding, thanks!

@emilyriederer
Copy link
Owner

emilyriederer commented Sep 17, 2021

Thanks so much for reporting this, @mjbroerman -- and yikes 😬 ! Not a very "Quick Start" after all..

Could you please tell me what you get for packageVersion("pointblank") ? I know there have been some fabulous updates over there lately (some of which I even requested), and convo may be presuming an incorrect/outdate internal structure there

@mjbroerman
Copy link
Author

Yes that would've been helpful to include in the first place. A bit new to this! 😅

packageVersion("pointblank")
[1] ‘0.8.0’

Indeed, new to this space also and it's all quite exciting! So glad your package is a part of it!

@emilyriederer
Copy link
Owner

emilyriederer commented Sep 17, 2021

Thanks so much, @mjbroerman !

This seems to be a case of convo relying on a pointblank pattern that isn't supported anymore. The core issue in the errors you point out is that the controlled vocabulary specifies checks for variables that do not match anything in the data.

So, while the example given doesn't work, if the controlled vocabulary contains only entries that have a match in the data, the function operates correctly. We can see this if I subset the first level of the controlled vocabulary to only include IND and AMT which both have matches in the data.

library(convo)
filepath <- system.file("", "ex-convo.yml", package = "convo")
convo <- read_convo(filepath)
convo[[1]] <- convo[[1]][c("IND", "AMT")]
write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", path = "../Desktop")

In the short term, if your validation checks only include stubs present in the dataset you are validating, you will not encounter this issue in practice. However, I am very glad to know about it because its clearly a problem that needs to be fixed!

In the next comment, I'm going to leave myself some notes on initial research into why this is happening under the hood. Feel free to ignore the next comment because my goal is that users shouldn't have to think about that level. 🙂

@emilyriederer
Copy link
Owner

emilyriederer commented Sep 17, 2021

Notes to self:

In pointblank 0.6.0 this works:

create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(starts_with("ID"), step_id = 1) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2) %>%
  interrogate()

Now in pointblank 0.8.0, we get an error when we attempt to check something that has no match in the data (the ID check in the example above).

Tentatively, it looks like the right way to fix this will be something like this:

create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(starts_with("ID"), step_id = 1, 
                    active = ~. %>% {length(starts_with("ID", vars = colnames(.))) > 0} ) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2) %>%
  interrogate()

However, I need to do more research to understand:

  • Why does active param seem to cause errors with write_yaml?
  • When is the function passed to active evaluated? Will this "round trip" correctly to and from YAML for portable use on a different dataset?

Reprex for self:

library(pointblank)

# no `active` doesn't interrogate ----
agent1 <-
  create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(starts_with("ID"), step_id = 1) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2)

res1 <- interrogate(agent1)

# `active` (FALSE function) does interrogate; doesn't write yaml ----
agent2 <-
  create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(
    starts_with("ID"), step_id = 1,
    active = ~. %>% {length(starts_with("ID", vars = colnames(.))) > 0} ) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2)

res2 <- interrogate(agent2)

yaml_write(agent2, filename = "pb-test-2.yml", path = tempdir())

# `active` (TRUE function) does interrogate; writes empty yaml ----
agent3 <-
  create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(
    starts_with("AMT"), step_id = 1,
    active = ~. %>% {length(starts_with("AMT", vars = colnames(.))) > 0} ) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2)

res3 <- interrogate(agent3)

yaml_write(agent3, filename = "pb-test-3.yml", path = tempdir())
readLines(file.path(tempdir(), "pb-test-3.yml"))

# `active` (value not function) with cols present does interrogate; writes yaml ----
agent4 <-
  create_agent(read_fn = ~data.frame(IND_A = 1, AMT_B = 2)) %>%
  col_vals_not_null(
    starts_with("AMT"), step_id = 1, active = FALSE) %>%
  col_vals_not_null(starts_with("IND"), step_id = 2)

res4 <- interrogate(agent4)

yaml_write(agent4, filename = "pb-test-4.yml", path = tempdir())
readLines(file.path(tempdir(), "pb-test-4.yml"))

@emilyriederer
Copy link
Owner

@emilyriederer
Copy link
Owner

emilyriederer commented Oct 2, 2021

Hey @mjbroerman ! This should be fixed now with the latest dev version of pointblank

There's a secondary issue about filepath escaping depending on your OS. If you wish to keep using the version of convo you installed, try this:

library(convo)
filepath <- system.file("", "ex-convo.yml", package = "convo")
convo <- read_convo(filepath)
tmp <- gsub("\\\\", "/", tempdir())
write_pb(convo, c("IND_A", "AMT_B"), filename = "convo-validation.yml", path = tmp)

Otherwise, try reinstalling convo and then the example precisely as it's in the guide should work again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants