You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 22, 2018. It is now read-only.
I've just had another look at the timings. This only occurs when I start my CPU intensive app. I think that the app is starving the Marathon/Mesos/Zookeper containers of resources, and it times out. Upon refresh it seems to struggle to reconnect.
I've found the issue. Zookeeper is timing out when writing the transaction log:
2016-07-18 08:33:18,848 [myid:] - WARN [SyncThread:0:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:0 took 2802ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
The solution for me is to use a faster disk (not ebs volumes in AWS) or edit the zookeeper settings to disable the synchronisation of the transaction log. I.e. keep it in memory, don't worry about it flushing to disk. This is obviously a bit risky, if zookeeper fails, it's state will be out of sync.
forceSync
(Java system property: zookeeper.forceSync)
Requires updates to be synced to media of the transaction log before finishing processing the update. If this option is set to no, ZooKeeper will not require updates to be synced to the media.
So -DforceSync=no
I would recommend this in minimesos, as it is intended for testing. This will significantly improve zookeeper performance.
There seems to be an issue with hostnames using a Zookeeper version prior to 2.5.
https://issues.apache.org/jira/browse/ZOOKEEPER-2367
https://issues.apache.org/jira/browse/ZOOKEEPER-2171
soabase/exhibitor#269
http://grokbase.com/t/kafka/users/163jd6pj49/zookeeper-dns-ttl
d2iq-archive/marathon#412
I think the issue is that Zookeeper gets the IP from DNS on startup, then never (or not very often) re-resolves it.
And my minimesos Marathon
/v2/info
states:So I think it actually has to fail before it tries to get a valid IP address. The odd thing is that this seems only to be a serious issue on AWS.
I think an upgrade to Zookeeper 2.5 would fix this, but I'm not confident about the source of the issue.
The text was updated successfully, but these errors were encountered: