[#124] Allow indexing rules to be invoked manually (main) #141

korydraughn · 2024-02-22T21:41:43Z

Still need to implement tests.

I've verified at full text indexing on data objects works. I'm pretty sure the other rules work since they were lifted directly from exec_rule_expression.

The rules can only be invoked via the main indexing plugin, not the elasticsearch plugin.

Some questions ...

Who should be allowed to invoke the indexing rules directly?
- My immediate thought is only rodsadmins.
- Is there a use case for non-rodsadmin users to be able to invoke the rules?
Should a subset or all indexing rules be exposed?
- I can't think of a reason to not expose them yet. Giving rodsadmins the ability to invoke rules allows them to get out of bad situations, perhaps without having to go to the indexing service directly.

As implemented, the rules fire as the user who invoked them. Should any rules result in changing the identity of the RsComm to the user who owns the object(s)? Consider full text indexing being invoked on a data object which the rodsadmin doesn't have permissions on. I'll test this to find out what happens.

Thoughts?

korydraughn · 2024-02-22T21:54:15Z

Attempting to do full text indexing on a data object without appropriate permissions results in a CAT_NO_ACCESS_PERMISSION. So, we have some more work to do.

korydraughn · 2024-02-22T22:24:31Z

Attempting to do full text indexing on a data object without appropriate permissions results in a CAT_NO_ACCESS_PERMISSION. So, we have some more work to do.

To add to that, this is for the case where the rule is invoked as rods (a rodsadmin) on another user's data object via irule.

trel · 2024-02-22T23:33:18Z

Who should be allowed to invoke the indexing rules directly?

Yes, rodsadmins-only seems good

Should a subset or all indexing rules be exposed?

Agreed, 'all' seems good

korydraughn · 2024-02-23T00:01:17Z

As implemented, the rules fire as the user who invoked them. Should any rules result in changing the identity of the RsComm to the user who owns the object(s)? Consider full text indexing being invoked on a data object which the rodsadmin doesn't have permissions on. I'll test this to find out what happens.

I've added logic that allows rodsadmins to invoke full text indexing on other user's data objects without needing explicit permission.

I've confirmed the drop to the C API works as intended. This was done manually using irule.

korydraughn · 2024-02-23T00:08:03Z

libirods_rule_engine_plugin-elasticsearch.cpp

+					// Take the max to avoid passing an integer that's less than zero to the
+					// the string_view constructor.
+					{"data", std::string_view(buffer.data(), std::max(0, bytes_read))}


Add a statement explaining how bytes_read can result in an integer less than zero.

korydraughn · 2024-02-23T00:10:53Z

libirods_rule_engine_plugin-elasticsearch.cpp

-#define IRODS_IO_TRANSPORT_ENABLE_SERVER_SIDE_API
-#include <irods/dstream.hpp>
-#include <irods/transport/default_transport.hpp>
+#ifdef IRODS_HAS_FEATURE_ADMIN_MODE_FOR_DSTREAM_LIBRARIES


This feature test macro name is just a placeholder until irods/irods#7530 is resolved.

This PR isn't blocked by that work. Once the irods/irods issue is resolved, these preprocessor macros can be updated to match the real feature test macro.

korydraughn · 2024-02-23T00:13:37Z

libirods_rule_engine_plugin-indexing.cpp

+			// irule <text>
+			if (rule_text.find("@external rule {") != std::string_view::npos) {
+				const auto start = rule_text.find_first_of('{') + 1;
+				const auto end = rule_text.rfind(" }");
+
+				if (end == std::string_view::npos) {
+					auto msg = fmt::format("Received malformed rule text. "
+					                       "Expected closing curly brace following rule text [{}].",
+					                       rule_text);
+					log_re::error(msg);
+					return ERROR(SYS_INVALID_INPUT_PARAM, std::move(msg));
+				}
+
+				rule_text = rule_text.substr(start, end - start);
+			}
+			// irule -F <script>
+			else if (const auto external_pos = rule_text.find("@external\n"); external_pos != std::string_view::npos) {
+				// If there are opening and closing curly braces following the "@external\n" prefix, then we
+				// can assume that the rule text most likely represents a JSON string.
+				if (const auto start = rule_text.find_first_of('{'); start != std::string_view::npos) {
+					const auto end = rule_text.rfind(" }");
+
+					if (end == std::string_view::npos) {
+						auto msg = fmt::format("Received malformed rule text. "
+						                       "Expected closing curly brace following rule text [{}].",
+						                       rule_text);
+						log_re::error(msg);
+						return ERROR(SYS_INVALID_INPUT_PARAM, std::move(msg));
+					}
+
+					rule_text = rule_text.substr(start, end - start);
+				}
+				// Otherwise, the rule text must represent something else. In this case, simply strip the
+				// "@external\n" prefix from the rule text and let the JSON parser throw an exception if the
+				// rule text cannot be parsed. This allows the REP to fail without causing the agent to crash.
+				else {
+					rule_text = rule_text.substr(external_pos + 10);
+				}
 			}


I've used this code in two plugins now. It probably needs to be provided by the irods-dev package so we avoid copying it everywhere.

please make an issue - that seems good.

I'm starting to think maybe this should be a documentation exercise. That code makes a few assumptions about the input which isn't code for general purpose use.

Will think on it a little more.

korydraughn · 2024-02-23T00:38:55Z

libirods_rule_engine_plugin-indexing.cpp


-				json delay_obj;
-				delay_obj["rule-engine-operation"] = irods::indexing::policy::indexing;
+			if (irods::indexing::policy::object::index == op) {


irods::indexing::policy::object::index expands to the string, "irods_policy_indexing_object_index".

This is the rule name that must be used to do full text indexing of a single data object. It shares the same name as the rules which are fired as a result of triggering PEPs.

You can see the rest of the rule names here.

irods_capability_indexing/configuration.hpp

Lines 189 to 205 in 3f3529d

namespace object

{

static const std::string index{"irods_policy_indexing_object_index"};

static const std::string purge{"irods_policy_indexing_object_purge"};

} // namespace object

namespace metadata

{

static const std::string index{"irods_policy_indexing_metadata_index"};

static const std::string purge{"irods_policy_indexing_metadata_purge"};

} // namespace metadata

namespace collection

{

static const std::string index{"irods_policy_indexing_collection_index"};

static const std::string purge{"irods_policy_indexing_collection_purge"};

} // namespace collection

Are those the rule names we want admins to use or do we want to change them for manual execution contexts?

I can see value in the names staying as they are. It makes it easy for admins to know what happens because they will start to remember the names. However, admins may not be able to distinguish who/what invoked the rule.

All of that to say, perhaps the names should be changed to something like ...

indexing_index_data_object

indexing_index_collection

indexing_purge_data_object

Note: This PR doesn't support invoking indexing rules via the NREP yet. It should be doable, but that may require changing the the rule names for correct behavior.

don't feel too strongly either way yet. consistency across our plugins is where i think i'd find the most value/good.

also noting that it's interesting our namespacing is not in the same order as the rule names...
'policy' and 'indexing'... switched places...

I agree on the consistency thing.

As for the namespacing, that's not surprising to me. If the C++ code used irods::policy, the possibility of symbol collision rises since other plugins would likely follow suit and define things in the irods::policy namespace.

I don't know that "policy" is a term that's needed in the rule names since everything iRODS does is about policy.

i think that convention started with policy composition... and hasn't really been codified/hardened yet. TBD...

…ndexing?

…through the NREP.

korydraughn added 2 commits February 22, 2024 16:35

[124] Allow indexing rules to be invoked manually.

1939a44

squash. clang-format

06155c7

korydraughn mentioned this pull request Feb 22, 2024

Add support for the ADMIN_KW to the dstream library irods/irods#7530

Open

2 tasks

korydraughn added 2 commits February 22, 2024 18:58

squash. admin mode for full text indexing via irule

1a5db05

squash. clang-format

f5ab547

korydraughn commented Feb 23, 2024

View reviewed changes

korydraughn added 5 commits February 25, 2024 09:48

squash. bytes_read < 0 explanation.

75675fe

Should we do something different if the read op fails for full text i…

10ea1ab

…ndexing?

(WIP) investigating why the data is not getting indexed when invoked …

e9383bf

…through the NREP.

squash. wip - tests

eaab095

tests

2f8619e

SwooshyCueb mentioned this pull request Mar 12, 2024

[irods/irods#7265] Minor reorganization + CMake TLC #155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#124] Allow indexing rules to be invoked manually (main) #141

[#124] Allow indexing rules to be invoked manually (main) #141

korydraughn commented Feb 22, 2024 •

edited

Loading

korydraughn commented Feb 22, 2024

korydraughn commented Feb 22, 2024

trel commented Feb 22, 2024

korydraughn commented Feb 23, 2024 •

edited

Loading

korydraughn Feb 23, 2024

korydraughn Feb 23, 2024

korydraughn Feb 23, 2024

trel Feb 23, 2024

korydraughn Feb 25, 2024

korydraughn Feb 23, 2024

trel Feb 23, 2024

korydraughn Feb 23, 2024

trel Feb 23, 2024

	namespace object
	{
	static const std::string index{"irods_policy_indexing_object_index"};
	static const std::string purge{"irods_policy_indexing_object_purge"};
	} // namespace object

	namespace metadata
	{
	static const std::string index{"irods_policy_indexing_metadata_index"};
	static const std::string purge{"irods_policy_indexing_metadata_purge"};
	} // namespace metadata

	namespace collection
	{
	static const std::string index{"irods_policy_indexing_collection_index"};
	static const std::string purge{"irods_policy_indexing_collection_purge"};
	} // namespace collection

[#124] Allow indexing rules to be invoked manually (main) #141

Are you sure you want to change the base?

[#124] Allow indexing rules to be invoked manually (main) #141

Conversation

korydraughn commented Feb 22, 2024 • edited Loading

korydraughn commented Feb 22, 2024

korydraughn commented Feb 22, 2024

trel commented Feb 22, 2024

korydraughn commented Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

korydraughn commented Feb 22, 2024 •

edited

Loading

korydraughn commented Feb 23, 2024 •

edited

Loading