Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement progress handler interface #458

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

fractaledmind
Copy link
Contributor

This adds support for the progress_handler SQLite interface. Such a callback is useful as it allows for statement timeouts to be defined. For example, we could define a statement_timeout= method on the database instance like so:

def statement_timeout=( milliseconds )
  timeout_seconds = milliseconds.fdiv(1000)

  progress_handler do
    now = Process.clock_gettime(Process::CLOCK_MONOTONIC)
    if @statement_timeout_deadline.nil?
      @statement_timeout_deadline = now + timeout_seconds
    elsif now > @statement_timeout_deadline
      next false
    else
      true
    end
  end
end

Note: I don't know C, so this PR was written using pattern matching (primarily on the rb_sqlite3_busy_handler and busy_handler methods), some Googling (for how to get the correct format for rb_scan_args), and ChatGPT (to explain what all of this C code was doing).

@oldmoe
Copy link

oldmoe commented Jan 7, 2024

+1 to adding the progress handler, maybe we need to make sure this doesn't explode if sqlite is omitting the progress handler in the compilation process

@fractaledmind
Copy link
Contributor Author

+1 to adding the progress handler, maybe we need to make sure this doesn't explode if sqlite is omitting the progress handler in the compilation process

@oldmoe: Does c0d5144 resolve this issue do you think?

ext/sqlite3/database.c Outdated Show resolved Hide resolved
casperisfine pushed a commit to casperisfine/sqlite3-ruby that referenced this pull request Jan 9, 2024
Previous discussion in sparklemotion#458

Storing `VALUE self` as context for `rb_sqlite3_busy_handler` is unsafe
because as of Ruby 2.7, the GC compactor may move objects around
which can lead to this reference pointing to either another random
object or to garbage.

Instead we can store the callback reference inside the malloced
struct (`sqlite3Ruby`) which can't possibly move, and then inside
the handler, get the callback reference from that struct.

This however requires to define a mark function for the database
object, and while I was at it, I implemented compaction support
for it so we don't pin that proc.
@fractaledmind
Copy link
Contributor Author

Clearly didn't get it quite right with af1da42. Will study and try again.

@byroot
Copy link
Contributor

byroot commented Jan 9, 2024

You should start by rebasing on my PR, you'd have the new functions lined up etc.

@fractaledmind
Copy link
Contributor Author

I had a stray line in the allocate method. Removing that brings me to the same problems with TruffleRuby. Whenever you figure out how to solve that in your branch, I will rebase and bring that fix in then.

@byroot
Copy link
Contributor

byroot commented Jan 9, 2024

Removing that brings me to the same problems with TruffleRuby. Whenever you figure out how to solve that in your branch

Ah, I didn't notice there was an issue with TruffleRuby, seems pretty easy to fix though:

xternal LLVMFunction rb_gc_mark_movable cannot be found.

Need to test for that function. I'll also report it to Truffle, as it should be trivial for them to add it.

byroot added a commit to byroot/sqlite3-ruby that referenced this pull request Jan 9, 2024
Previous discussion in sparklemotion#458

Storing `VALUE self` as context for `rb_sqlite3_busy_handler` is unsafe
because as of Ruby 2.7, the GC compactor may move objects around
which can lead to this reference pointing to either another random
object or to garbage.

Instead we can store the callback reference inside the malloced
struct (`sqlite3Ruby`) which can't possibly move, and then inside
the handler, get the callback reference from that struct.

This however requires to define a mark function for the database
object, and while I was at it, I implemented compaction support
for it so we don't pin that proc.
@fractaledmind
Copy link
Contributor Author

All green. Thanks so much for the guidance @byroot.

@fractaledmind
Copy link
Contributor Author

Never seen this error before, but it looks to be completely unrelated to this PR (from sqlcipher (windows-latest, mingw)):

NOTE: ruby_memcheck is not available in this environment: cannot load such file -- ruby_memcheck
D:/a/sqlite3-ruby/sqlite3-ruby/vendor/bundle/ruby/3.4.0+0/gems/minitest-5.20.0/lib/minitest.rb:3: warning: mutex_m was loaded from the standard library, but is not part of the default gems since Ruby 3.4.0. Add mutex_m to your Gemfile or gemspec.
D:/ruby-mingw/lib/ruby/3.4.0+0/bundled_gems.rb:74:in `require': cannot load such file -- mutex_m (LoadError)

Copy link
Contributor

@byroot byroot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm no owner of this gem, and my interest for sqlite3 is rather limited, but if the goal is to implement a statement timeout, I'm not offering a Proc interface is the best.

To call back into Ruby you need to hold the GVL. Right now it's not a problem because the gem calls sqlite3_step with the GVL held. However there is demand for releasing the GVL when executing a query. I don't know if it will happen, but such API would render such change impossible.

If the main envisioned usage is a statement timeout, then it's probably better to implement it all in C so the overhead is much smaller and that it doesn't need to hold the GVL.

ext/sqlite3/database.c Outdated Show resolved Hide resolved
@fractaledmind
Copy link
Contributor Author

So, I'm no owner of this gem, and my interest for sqlite3 is rather limited, but if the goal is to implement a statement timeout, I'm not offering a Proc interface is the best.

To call back into Ruby you need to hold the GVL. Right now it's not a problem because the gem calls sqlite3_step with the GVL held. However there is demand for releasing the GVL when executing a query. I don't know if it will happen, but such API would render such change impossible.

If the main envisioned usage is a statement timeout, then it's probably better to implement it all in C so the overhead is much smaller and that it doesn't need to hold the GVL.

My thought was to provide layers of exposure. The progress_handler is a feature of SQLite that application or library devs might use in any number of ways. The way that I want to use it, in the context of Rails applications, is to add a statement timeout. But I don't want to only provide a public interface for a statement timeout and keep the lower-level SQLite feature hidden. It is a public SQLite interface, so I think it should be a public sqlite3-ruby interface.

However, when we turn to the PR for the higher-level statement_timeout feature, I would love to implement that in C and not in Ruby. My draft PR implements it in Ruby because that was the implementation I could build in an hour at this stage in my journey to learning how to write C extension code that glues SQLite and Ruby together. But I'm down to challenge myself and see if I can't implement the statement_timeout feature in pure C for that PR. I do not believe that the implementation details of the statement_timeout feature should impact this PR, however, as I believe this low-level, public SQLite interface should be exposed to Ruby developers.

@fractaledmind
Copy link
Contributor Author

Another great use-case for a raw progress_handler is allowing apps/projects using a Fiber Scheduler to switch fibers during query execution

@fractaledmind
Copy link
Contributor Author

@tenderlove @flavorjones: I believe this is ready to go.

@fractaledmind
Copy link
Contributor Author

Just realized that in addition to switching fibers, this also unlocks the ability for apps to switch threads while queries are executing 👀

@fractaledmind
Copy link
Contributor Author

@flavorjones @tenderlove: Merged changes from main, resolved a couple Standard issues, and added a test demonstrating that the progress_handler can be used to improve concurrency. I'm still reading and thinking and experimenting on how best to take advantage of this in various contexts (primarily Rails), but I think that this could be a major feature which at least partially addresses issues like #287.

Also, just to add some additional info, Simon Willison of Datasette did some research on approximately how long it takes for SQLite to execute 1,000 VM steps (simonw/datasette#1679 (comment)) and the approximate answer is 0.1ms.

@fractaledmind
Copy link
Contributor Author

This is going to be a really nice feature when we merge it. I've been experimenting with using this to support better CPU saturation in multi-threaded environments and the results are quite nice.

@noteflakes
Copy link

@fractaledmind you might want to look at how this is done in Extralite. There are some nuances that are important to keep in mind. The basic problem is that the SQLite VM instruction counter is reset for each query, so depending on the N parameter passed to sqlite3_progress_handler you'll might not get any calls to your progress handler:

// register progress handler and invoke it every 10 VM instructions
sqlite3_progress_handler(db, 10, my_progress_handler, NULL);
// running the following query repeatedly will never invoke the progress handler
while (true) sqlite3_exec(db, "select 1", NULL, NULL, NULL);

On the other hand, if you set a N number too low you risk seriously hurting the performance of your queries, since your progress handler is called too often!

So what I did was to set the N parameter (which I call tick) passed to sqlite3_progress_handler to a low value, and then increment a counter in the C-based progress handler. When the counter reaches a certain threshold (which I call period), I invoke the Ruby progress handler:

// from the Extralite code:
int Database_progress_handler(void *ptr) {
  Database_t *db = (Database_t *)ptr;
  db->progress_handler.tick_count += db->progress_handler.tick;
  if (db->progress_handler.tick_count < db->progress_handler.period)
    goto done;

  db->progress_handler.tick_count -= db->progress_handler.period;
  rb_funcall(db->progress_handler.proc, ID_call, 0);
done:
  return 0;
}

Also, note that the progress handler above always returns 0, which tells SQLite that the query can continue. Interrupting a query from the progress handler can be done by raising an exception, or by calling Database#interrupt, which also causes an exception to be raised. I believe this design fits in better with the Ruby exception handling system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants