Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#124] Allow indexing rules to be invoked manually (main) #141

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

korydraughn
Copy link
Contributor

@korydraughn korydraughn commented Feb 22, 2024

Still need to implement tests.

I've verified at full text indexing on data objects works. I'm pretty sure the other rules work since they were lifted directly from exec_rule_expression.

The rules can only be invoked via the main indexing plugin, not the elasticsearch plugin.

Some questions ...

  • Who should be allowed to invoke the indexing rules directly?
    • My immediate thought is only rodsadmins.
    • Is there a use case for non-rodsadmin users to be able to invoke the rules?
  • Should a subset or all indexing rules be exposed?
    • I can't think of a reason to not expose them yet. Giving rodsadmins the ability to invoke rules allows them to get out of bad situations, perhaps without having to go to the indexing service directly.

As implemented, the rules fire as the user who invoked them. Should any rules result in changing the identity of the RsComm to the user who owns the object(s)? Consider full text indexing being invoked on a data object which the rodsadmin doesn't have permissions on. I'll test this to find out what happens.

Thoughts?

@korydraughn
Copy link
Contributor Author

Attempting to do full text indexing on a data object without appropriate permissions results in a CAT_NO_ACCESS_PERMISSION. So, we have some more work to do.

@korydraughn
Copy link
Contributor Author

Attempting to do full text indexing on a data object without appropriate permissions results in a CAT_NO_ACCESS_PERMISSION. So, we have some more work to do.

To add to that, this is for the case where the rule is invoked as rods (a rodsadmin) on another user's data object via irule.

@trel
Copy link
Member

trel commented Feb 22, 2024

Who should be allowed to invoke the indexing rules directly?

Yes, rodsadmins-only seems good

Should a subset or all indexing rules be exposed?

Agreed, 'all' seems good

@korydraughn
Copy link
Contributor Author

korydraughn commented Feb 23, 2024

As implemented, the rules fire as the user who invoked them. Should any rules result in changing the identity of the RsComm to the user who owns the object(s)? Consider full text indexing being invoked on a data object which the rodsadmin doesn't have permissions on. I'll test this to find out what happens.

I've added logic that allows rodsadmins to invoke full text indexing on other user's data objects without needing explicit permission.

I've confirmed the drop to the C API works as intended. This was done manually using irule.

Comment on lines 427 to 429
// Take the max to avoid passing an integer that's less than zero to the
// the string_view constructor.
{"data", std::string_view(buffer.data(), std::max(0, bytes_read))}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a statement explaining how bytes_read can result in an integer less than zero.

#define IRODS_IO_TRANSPORT_ENABLE_SERVER_SIDE_API
#include <irods/dstream.hpp>
#include <irods/transport/default_transport.hpp>
#ifdef IRODS_HAS_FEATURE_ADMIN_MODE_FOR_DSTREAM_LIBRARIES
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature test macro name is just a placeholder until irods/irods#7530 is resolved.

This PR isn't blocked by that work. Once the irods/irods issue is resolved, these preprocessor macros can be updated to match the real feature test macro.

Comment on lines +836 to 874
// irule <text>
if (rule_text.find("@external rule {") != std::string_view::npos) {
const auto start = rule_text.find_first_of('{') + 1;
const auto end = rule_text.rfind(" }");

if (end == std::string_view::npos) {
auto msg = fmt::format("Received malformed rule text. "
"Expected closing curly brace following rule text [{}].",
rule_text);
log_re::error(msg);
return ERROR(SYS_INVALID_INPUT_PARAM, std::move(msg));
}

rule_text = rule_text.substr(start, end - start);
}
// irule -F <script>
else if (const auto external_pos = rule_text.find("@external\n"); external_pos != std::string_view::npos) {
// If there are opening and closing curly braces following the "@external\n" prefix, then we
// can assume that the rule text most likely represents a JSON string.
if (const auto start = rule_text.find_first_of('{'); start != std::string_view::npos) {
const auto end = rule_text.rfind(" }");

if (end == std::string_view::npos) {
auto msg = fmt::format("Received malformed rule text. "
"Expected closing curly brace following rule text [{}].",
rule_text);
log_re::error(msg);
return ERROR(SYS_INVALID_INPUT_PARAM, std::move(msg));
}

rule_text = rule_text.substr(start, end - start);
}
// Otherwise, the rule text must represent something else. In this case, simply strip the
// "@external\n" prefix from the rule text and let the JSON parser throw an exception if the
// rule text cannot be parsed. This allows the REP to fail without causing the agent to crash.
else {
rule_text = rule_text.substr(external_pos + 10);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used this code in two plugins now. It probably needs to be provided by the irods-dev package so we avoid copying it everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make an issue - that seems good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to think maybe this should be a documentation exercise. That code makes a few assumptions about the input which isn't code for general purpose use.

Will think on it a little more.


json delay_obj;
delay_obj["rule-engine-operation"] = irods::indexing::policy::indexing;
if (irods::indexing::policy::object::index == op) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

irods::indexing::policy::object::index expands to the string, "irods_policy_indexing_object_index".

This is the rule name that must be used to do full text indexing of a single data object. It shares the same name as the rules which are fired as a result of triggering PEPs.

You can see the rest of the rule names here.

namespace object
{
static const std::string index{"irods_policy_indexing_object_index"};
static const std::string purge{"irods_policy_indexing_object_purge"};
} // namespace object
namespace metadata
{
static const std::string index{"irods_policy_indexing_metadata_index"};
static const std::string purge{"irods_policy_indexing_metadata_purge"};
} // namespace metadata
namespace collection
{
static const std::string index{"irods_policy_indexing_collection_index"};
static const std::string purge{"irods_policy_indexing_collection_purge"};
} // namespace collection

Are those the rule names we want admins to use or do we want to change them for manual execution contexts?

I can see value in the names staying as they are. It makes it easy for admins to know what happens because they will start to remember the names. However, admins may not be able to distinguish who/what invoked the rule.

All of that to say, perhaps the names should be changed to something like ...

  • indexing_index_data_object
  • indexing_index_collection
  • indexing_purge_data_object

Note: This PR doesn't support invoking indexing rules via the NREP yet. It should be doable, but that may require changing the the rule names for correct behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't feel too strongly either way yet. consistency across our plugins is where i think i'd find the most value/good.

also noting that it's interesting our namespacing is not in the same order as the rule names...
'policy' and 'indexing'... switched places...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the consistency thing.

As for the namespacing, that's not surprising to me. If the C++ code used irods::policy, the possibility of symbol collision rises since other plugins would likely follow suit and define things in the irods::policy namespace.

I don't know that "policy" is a term that's needed in the rule names since everything iRODS does is about policy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that convention started with policy composition... and hasn't really been codified/hardened yet. TBD...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants