resume repairs automatically after errors #106

marcusb · 2015-06-29T10:01:10Z

The repair often ends up in ERROR state if nodes are down or restarted. Sometimes the message is "Exception: null". After this happens, the repair must be resumed manually with spreaper. It would be preferable if it would resume automatically perhaps after some delay.

rzvoncek · 2015-07-15T12:31:14Z

It looks like the "Exception: null" happened when one of the JMX calls failed (for mundane reasons).

#107 adds extra check for this, as well as automatically resumes a run that is in ERROR.

Bj0rnen · 2015-07-15T13:24:33Z

We tweaked our approach to this. We agreed that ERROR should mean nothing else than "unrecoverable error", and simply don't set the repair run to that state unless it's a known unrecoverable (repair segment mismatch with cluster topology is the only known one for now). Now we keep retrying if the run is hit by exceptions that we don't handle anywhere.

Hopefully that doesn't become a problem in and off itself. Better than retrying when we already know that it's not going to work at least.

…original 'reaper_ui' so we can actually use webpack dev server and hot reload when working on UI. (#106)

rzvoncek mentioned this issue Jul 15, 2015

Zvo/resume errors #107

Merged

Bj0rnen pushed a commit that referenced this issue Jun 21, 2017

Symlink 'src/main/resource/assets' to 'reaper_ui/build' just like in …

d34b305

…original 'reaper_ui' so we can actually use webpack dev server and hot reload when working on UI. (#106)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resume repairs automatically after errors #106

resume repairs automatically after errors #106

marcusb commented Jun 29, 2015

rzvoncek commented Jul 15, 2015

Bj0rnen commented Jul 15, 2015

resume repairs automatically after errors #106

resume repairs automatically after errors #106

Comments

marcusb commented Jun 29, 2015

rzvoncek commented Jul 15, 2015

Bj0rnen commented Jul 15, 2015