Update nestcollectionsdriver.py #57

vsarathy1 · 2024-11-07T18:35:43Z

No description provided.

d-nagy

Hi @vsarathy1, looks pretty good overall. I had one concern about the regex substitution for making the transactions access the correct scope, and then this turned into a more substantial question about whether this is how we should deal with the transactions.

d-nagy · 2024-11-13T12:29:21Z

ch2driver/pytpcc/drivers/nestcollectionsdriver.py

+                    if self.schema == constants.CH2_DRIVER_SCHEMA["CH2P"]:
+                        stmt = re.sub("default:bench.ch2pp.", "default:bench.ch2p.", statement)


Do we not need the equivalent here for the original ch2 schema?

Also, in the regex pattern we should escape the .s so that we actually match literal . characters instead of anything.

i.e.

stmt = re.sub("default:bench\.ch2pp\.", "default:bench.ch2p.", statement)

An alternative approach to all of this could also be to supply a query_context when submitting N1QL queries (see https://docs.couchbase.com/cloud/n1ql/n1ql-manage/query-settings.html#query_context, and also https://docs.couchbase.com/cloud/n1ql/n1ql-language-reference/prepare.html#query-context for some info about query contexts for prepared statements)

With this approach, all the transaction statement definitions can just use not-fully-qualified table names like orders, and the query_context can be set to constants.CH2_NAMESPACE + ":" + constants.CH2_BUCKET + "." self.schema.

According to the second link above, prepared statements are also "stored relative to the current query context", which seems to mean that if we prepare queries while providing a specific query context like default:bench.ch2pp then we can prepare another query with the same name with a different query context and they won't interfere.

A prepared statement is created and stored relative to the current query context. You can create multiple prepared statements with the same name, each stored relative to a different query context. This enables you to run multiple instances of the same application against different datasets.

To execute a prepared statement, the query context must be the same as it was when the prepared statement was created; otherwise the prepared statement will not be found.

We have CH2_TXN_QUERIES for CH2, and CH2PP_TXN_QUERIES for both CH2P and CH2PP, so CH2 and CH2PP are taken care of correctly.
We handle it like this:
if self.schema == constants.CH2_DRIVER_SCHEMA["CH2"]:
txnQueries = CH2_TXN_QUERIES
else:
txnQueries = CH2PP_TXN_QUERIES
for txn, queries in txnQueries.items():
for query, statement in queries.items():
stmt = statement
if self.schema == constants.CH2_DRIVER_SCHEMA["CH2P"]:
stmt = re.sub("default:bench.ch2pp.", "default:bench.ch2p.", statement)

Instead of duplicating and adding another dictionary for CH2P, I replace ch2pp with ch2p.

Made the change to escape the .

Agreed. This was inherited from the nestcollectionsdriver written by the query service team for the transactions part, so that is the history of this.
Your suggestion to set the correct query context is fantastic, but I am not sure how to set this via the rest api. If you can find some way to do it, we can add it later, but for now, let us pass on this if ok with you.

d-nagy · 2024-11-13T12:33:29Z

ch2driver/pytpcc/drivers/nestcollectionsdriver.py

@@ -1094,34 +599,34 @@ def doDelivery(self, params):

        result = [ ]
        for d_id in range(1, constants.DISTRICTS_PER_WAREHOUSE+1):
-            rs1, status = runNQuery("begin", self.prepared_dict[ txn + "beginWork"],"",self.delivery_txtimeout, randomhost)
+            rs1, status = runNQuery("begin", self.prepared_dict[ self.schema + txn + "beginWork"],"",self.delivery_txtimeout, randomhost)


Instead of having self.schema + txn everywhere, we could just define txn = self.schema + "DELIVERY" initially.

d-nagy · 2024-11-13T12:37:24Z

ch2driver/pytpcc/drivers/nestcollectionsdriver.py

+                stTime = int(time.time() * 1000)
+                stTime = int(time.time_ns() / 1000000)


We only need one of these (same goes for eTime)

d-nagy · 2024-11-13T12:55:40Z

ch2driver/pytpcc/drivers/nestcollectionsdriver.py


                # In benchmark run mode, if the duration has elapsed, stop executing queries
                if duration is not None:
                    if start > endBenchmarkTime:
                        logging.debug("%s started at:   %s (started after the duration of the benchmark)" % (query_id_str, startTime))
                        break

-                logging.info("%s started at: %s" % (query_id_str, startTime))
+                logging.info("%s started at: %s %d" % (query_id_str, startTime, stTime))


What's the idea behind logging stTime? I presume the log line will look something like so:

11-12-2024 19:02:25 [runCH2Queries:1580] INFO : AClient 1:Loop 1:Q02: started at: 19:02:25 1731501546163

If we want to log the query start time with more precision, then we could do something like:

startTime = datetime.fromtimestamp(start).strftime("%H:%M:%S.%f")

Note that this does require from datetime import datetime.

This would make use log something like so:

11-12-2024 19:02:25 [runCH2Queries:1580] INFO : AClient 1:Loop 1:Q02: started at: 19:02:25.123456

All this applies to eTime too

good catch, this was some debugging I had put in for very short running queries while testing with very small data, and forgot to remove these.

Update nestcollectionsdriver.py

678ebcf

vsarathy1 requested a review from d-nagy November 7, 2024 18:35

d-nagy reviewed Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update nestcollectionsdriver.py #57

Update nestcollectionsdriver.py #57

vsarathy1 commented Nov 7, 2024

d-nagy left a comment

d-nagy Nov 13, 2024

d-nagy Nov 13, 2024

d-nagy Nov 13, 2024

d-nagy Nov 13, 2024

vsarathy1 Nov 15, 2024

vsarathy1 Nov 15, 2024

vsarathy1 Nov 15, 2024

d-nagy Nov 13, 2024

vsarathy1 Nov 15, 2024

d-nagy Nov 13, 2024

d-nagy Nov 13, 2024

d-nagy Nov 13, 2024

vsarathy1 Nov 15, 2024

		if self.schema == constants.CH2_DRIVER_SCHEMA["CH2P"]:
		stmt = re.sub("default:bench.ch2pp.", "default:bench.ch2p.", statement)

		stTime = int(time.time() * 1000)
		stTime = int(time.time_ns() / 1000000)

Update nestcollectionsdriver.py #57

Are you sure you want to change the base?

Update nestcollectionsdriver.py #57

Conversation

vsarathy1 commented Nov 7, 2024

d-nagy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment