Skip to content
This repository has been archived by the owner on Mar 23, 2023. It is now read-only.

Cube Dependency #39

Open
mtotheikle opened this issue Oct 8, 2014 · 26 comments
Open

Cube Dependency #39

mtotheikle opened this issue Oct 8, 2014 · 26 comments

Comments

@mtotheikle
Copy link

It seems that Cube is having very little development done with it now and since the NPM module "websocket-server" has been unpublished I can no longer install this project since Cube fails to install.

Has anyone else had this problem recently? Any plans to move away from Cube?

See square/cube#149 for more information.

@pwhelan
Copy link

pwhelan commented Oct 8, 2014

I was able to install websockets-server by hand. Just git clone https://github.com/miksago/node-websocket-server repository into node_modules (in cube).

Given the current situation with cube I certainly agree with the idea of moving away from it. Any ideas on what to replace it with? Off the top of my head the only thing I could think of is statsd + ratchetphp to handle websockets.

@wa0x6e
Copy link
Owner

wa0x6e commented Oct 14, 2014

Reason I was going with Cube was not really websocket, but the all data aggregation done behind. I'm not really familiar with statsd, but the replacement should be backward compatible, to keep all the database.

@pwhelan
Copy link

pwhelan commented Oct 22, 2014

StatsD does not have it's own native datastore but instead uses several different backends. The best candidate for ResqueBoard is the MongoDB backend: https://github.com/dynmeth/mongo-statsd-backend.

@wa0x6e
Copy link
Owner

wa0x6e commented Oct 22, 2014

Do you know if statsd can also compute metrics, like number of events occurring between 2 dates ?

@pwhelan
Copy link

pwhelan commented Oct 23, 2014

Do you know if statsd can also compute metrics, like number of events occurring between 2 dates ?

I am pretty sure the mongo schema supports this. Any chance you could point me to where these queries are? I can take a quick look and figure out which backend(s) would work.

@wa0x6e
Copy link
Owner

wa0x6e commented Oct 26, 2014

I don't master node.js enough to understand how Cube works, but from what I saw, it uses the javascript part of mongodb to run functions, and compute the metrics. And ResqueBoard does not really query mongodb directly, it uses the Cube API for most of the jobs.

There is 2 things ResqueBoard is expecting from the cube database:

  • A database of events: what happens, when it happens. (so we can fetch a list of jobs, sorted by time). I think that's how statsd work, so no surprise there. With Cube, these events are stored in a [EVENT_NAME]_events collection, and each entry looks like:
 {
   "_id": ObjectId("503c4a613bab703704000148"),
   "d": {
     "worker": "KAMISAMA-MAC.local:987",
     "level": NumberInt(200)
  },
   "t": ISODate("1970-01-01T12:33:32.0Z")
}   
  • database of metrics. When fetching the number of jobs occurring within a timeframe, it does not query mongodb with a complex find() and a lot of filters, it just queries a [EVENT_NAME]_metrics collection, where all metrics are already computed offline. Each entry looks like:
 {
   "_id": {
     "e": "sum(got)",
     "l": NumberInt(86400000),
     "t": ISODate("2013-04-22T00:00:00.0Z")
  },
   "i": false,
   "v": 25609
}   

e: name of the metrics => Sum of all the gots events
l: timeframe => 86400000 = 1 day
t: Time of the metrics
v: Value of the metrics

So it reads: There was 25609 got events in the last 24 hours preceding 2013-04-22T00:00:00.0Z.

All these data are computed offline, so nothing is computed when queried via the Cube API. These metrics are where Cube really shines, and there's a metric for different timeframe (day, hour, min).

The Cube replacement should have a similar structure, where most of the metrics are computed offline, and computed fast enough to be synced with the realtime events.

@wa0x6e
Copy link
Owner

wa0x6e commented Oct 28, 2014

Seems like statsD can provide the type of granularity needed. But its mongodb structure differs from Cube, so migrating to statsd will means losing all cube's data.

@pwhelan
Copy link

pwhelan commented Nov 13, 2014

@Kamisama

Seems like statsD can provide the type of granularity needed. But its mongodb structure differs from Cube, so migrating to statsd will means losing all cube's data.

I will attempt to create a branch of ResqueBoard that uses the StatsD MongoDB schema. Getting the websocket functionality that cube has I think will be the hardest part. Long polling might be an option if we make the assumption that there will not be many clients using the interface.

Once it seems to be working I could try and provide a migration script that translates Cube's metrics to the StatsD schema.

@pwhelan
Copy link

pwhelan commented Nov 13, 2014

It seems like Graphite might be another option: http://graphite.readthedocs.org/en/1.0/url-api.html#graphing-metrics

@wa0x6e
Copy link
Owner

wa0x6e commented Nov 14, 2014

Graphite seems to come with a visualization library, something unneeded in this case. statsD + another websocket framework seems a good combo. Since statsD already run on node.js, the websocket framework should preferably also run under nodejs.

@pwhelan
Copy link

pwhelan commented Nov 14, 2014

The advantage of Graphite is that it already handles aggregation of time series data like Cube does, but yes, it does include a lot of other components that are unnecessary.

@wa0x6e
Copy link
Owner

wa0x6e commented Nov 14, 2014

From what I learned from statsD documentation, it also seems to do data aggregation

@pwhelan
Copy link

pwhelan commented Feb 27, 2015

StatsD does aggregation, it's what it does. But what it does not do is log errors or text in any form. This is basically the different data types it supports: https://github.com/etsy/statsd/blob/master/docs/metric_types.md.

The actual failure counts, success counts, memory usage, cpu usage and even the amount of time each job takes can be recorded though.

@pwhelan
Copy link

pwhelan commented Feb 28, 2015

I have a branch that works with Monolog's MongoDB handler instead of Cube's: https://github.com/pwhelan/ResqueBoard/tree/nocube.

MongoDB's log format (if tweaked correctly) is almost exactly the same as the Cube format except for two things:

  • All the data is under 'context' instead of 'd'.
  • It uses a single collection for all events where 'context.type' defines the type of event.

Statistics will have to be handled separately for this branch. It does require changing the formatter for the Monolog MongoDB connection to have an infinite nesting level. It also does not represent the final direction I want to take.

Here was my general idea for the future of this branch:

  • Abstract or hide the difference in the schema between Cube versus plain Monolog (MongoDB).
  • Make the source for stats configurable between either Cube (decided by the Mongo config...) or MongoDB (backed by StatsD).
  • Use Long-Polling for the realtime stats (and query them in PHP either from Cube or StatsD).

Using Long-Polling should make it possible to avoid a websocket server and also make it compatible with most browsers. Performance is important but I think taking a bit of a hit when it comes to the connection for admins it should be fine. I am assuming here that most systems will only have a few admins using RB for monitoring.

I was also planning making stats optional (and only enabled when a separate Mongo connection/collection for stats is defined or when cube is used). I am in no way of thinking of abandoning this feature though. I also plan to add cpu usage statistics, using gettimeofday/posix_times to calculate it then submitting it to statsd.

I'll start working on my branch. Feel free to comment or suggest anything.

pwhelan added a commit to pwhelan/ResqueBoard that referenced this issue Feb 28, 2015
Add new ResqueStat wrapper classes for Mongo queries, refactor most Cube
Mongo queries to use it.
@HMAZonderland
Copy link

Bump @Kamisama !

@wa0x6e
Copy link
Owner

wa0x6e commented Mar 28, 2015

Seems interesting, I'm looking forward to it

@pwhelan
Copy link

pwhelan commented Mar 29, 2015

I have it working at the moment without the logs tab and without statistics. The major snags are:

  • The Monolog Init library needs to patched so the MongoHandler to use a fully nested schema (otherwise it serializes it into JSON at about the third nesting level).
  • Monolog 1.12+ needs to be installed, I believe 1.5 is the lowest version right now.

Next weekend I can finalize my branch, without the advanced statistics. I'd like that to be finalized before tackling the advanced stuff. I have been using it successfully at work on our test servers.

@Sieberkev
Copy link

I was able to install websockets-server by hand. Just git clone https://github.com/miksago/node-websocket-server repository into node_modules (in cube).

How exactly did you do this? After git cloning this repository in cube/node_modules, I still get the same error on "npm install"...

edit on 17/04/2014

Someone suggested me to remove websocket-server from the package.json file after installing it manually, but then another error happens:

...
npm http 200 https://registry.npmjs.org/wordwrap/-/wordwrap-0.0.2.tgz
npm ERR! Error: shasum check failed for /root/tmp/npm-4389-jKgT7sg0/1429259818152-0.3462741563562304/tmp.tgz
npm ERR! Expected: 500d26d883ddc8e02f2c88011627636111c105c5
npm ERR! Actual: 72b0e88de3feeb269db2effe14e95751b031ab04
npm ERR! at /usr/local/node/lib/node_modules/npm/node_modules/sha/index.js:38:8
...

This same error remains after replacing "websocket-server": "1.4.04" with "node-websocket-server": "1.1.4" (0.0.1 fails) in the package.json as possible fix from https://github.com/square/cube/pull/149/files

Bummer :(

@maxcanna
Copy link

Try this:
cd node_modules/
git clone https://github.com/miksago/node-websocket-server
mv node-websocket-server websocket-server
cd..
npm install cube

@Sieberkev
Copy link

Thanks a lot for the pointer, @maxcanna!

After battling a lot of new errors (corrupted downloads and one runtime error), I got it up and running by using the most recent version of NodeJS (instead of the one specified in an outdated installation script I was using as reference).

@lucups
Copy link

lucups commented Nov 13, 2015

@pwhelan
@maxcanna
@Sieberkev
Thanks very much!

@pwhelan
Copy link

pwhelan commented May 1, 2016

awesome!

@pwhelan
Copy link

pwhelan commented May 1, 2016

I'll push what I have when I can. Nothing out of the ordinary though.

@Techbrunch
Copy link

@pwhelan Any update ?

@pwhelan
Copy link

pwhelan commented Jul 26, 2016

@Techbrunch most of my work has been implemented in https://github.com/pwhelan/ResqueBoard/tree/cft. Feel free to test it out. Note that you must configure fresque or your resque workers to log to mongo and to log to mongo with full recursion (otherwise resque-ex/monolog only logs objects that have 2 levels of nesting).

@pwhelan
Copy link

pwhelan commented Jul 26, 2016

I also did a lot of work to use the fresque.ini configuration directly instead of the config in RB.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants