English 中文(简体)
JGroups eating memory
原标题:
  • 时间:2010-03-04 07:54:20
  •  标签:
  • jgroups

I currently have a problem with my jgroups configuration, causing thousands of messages getting stuck in the NAKACK.xmit_table. Actually all of them seem to end up in the xmit_table, and a another dump from a few hours later indicates that they never intend to leave either...

This is the protocol stack configuration

UDP(bind_addr=xxx.xxx.xxx.114;
bind_interface=bond0;
ip_mcast=true;ip_ttl=64;
loopback=false;
mcast_addr=228.1.2.80;mcast_port=45589;
mcast_recv_buf_size=80000;
mcast_send_buf_size=150000;
ucast_recv_buf_size=80000;
ucast_send_buf_size=150000):
PING(num_initial_members=3;timeout=2000):
MERGE2(max_interval=20000;min_interval=10000):
FD_SOCK:
FD(max_tries=5;shun=true;timeout=10000):
VERIFY_SUSPECT(timeout=1500):
pbcast.NAKACK(discard_delivered_msgs=true;gc_lag=50;retransmit_timeout=600,1200,2400,4800;use_mcast_xmit=true):
pbcast.STABLE(desired_avg_gossip=20000;max_bytes=400000;stability_delay=1000):UNICAST(timeout=600,1200,2400):
FRAG(frag_size=8192):pbcast.GMS(join_timeout=5000;print_local_addr=true;shun=true):
pbcast.STATE_TRANSFER

Startup message...

2010-03-01 23:40:05,358 INFO  [org.jboss.cache.TreeCache] viewAccepted(): [xxx.xxx.xxx.35:51723|17] [xxx.xxx.xxx.35:51723, xxx.xxx.xxx.36:53088, xxx.xxx.xxx.115:32781, xxx.xxx.xxx.114:32934]
2010-03-01 23:40:05,363 INFO  [org.jboss.cache.TreeCache] TreeCache local address is 10.35.191.114:32934
2010-03-01 23:40:05,393 INFO  [org.jboss.cache.TreeCache] received the state (size=32768 bytes)
2010-03-01 23:40:05,509 INFO  [org.jboss.cache.TreeCache] state was retrieved successfully (in 146 milliseconds)

... indicates that everything is fine so far.

The logs, set to warn-level does not indicate that something is wrong except for the occational

2010-03-03 09:59:01,354 ERROR [org.jgroups.blocks.NotificationBus] exception=java.lang.IllegalArgumentException: java.lang.NullPointerException

which I m guessing is unrelated since it has been seen earlier without the memory memory issue.

I have have been digging through two memory dumps from one of the machines to find oddities but nothing so far. Except for maybe some statistics from the different protocols

UDP has

num_bytes_sent 53617832
num_bytes_received 679220174
num_messages_sent 99524
num_messages_received 99522

while NAKACK has...

num_bytes_sent 0
num_bytes_received 0
num_messages_sent 0
num_messages_received 0

... and a huge xmit_table.

Each machine has two JChannel instances, one for ehcache and one for TreeCache. A misconfiguration means that both of them share the same diagnositics mcast address, but this should not pose a problem unless I want to send diagnostics messages right? However they do of course have different mcast addresses for the messages.

Please ask for clarifications, I have lots of information but I m a bit uncertain about what is relevant at this point.

最佳回答

It turns out that one of the nodes in the cluster did not receive any multicast messages at all. This caused all nodes to hang on to their own xmit_tables, since they did not get any stability messages from the isolated node, stating that it had received their messages.

A restart of ASs, changing the multicast address solved the issue.

问题回答

暂无回答




相关问题
Communication between 2 nodes in a cluster

I am trying to figure out how this will work out: client-server communication via NIO/BIO Server-server communication (replication, membership etc) via JGroups (replication of data can be a pain?) ...

JGroups, TCP_NIO multiple messages sent to nowhere

Messages sent to ports I never specified in my configuration file. this is my config: [10-Jan-2011 11:02:22.917 GMT] ERROR org.jgroups.protocols.TCP_NIO - failed sending message to 192.168.50.41:...

JGroups equivalent library in C++

at work we have been using JGroups and I have to say that I really like it, however, at home I primarily build with C/C++. I m wondering if anyone knows of any good/solid ports of JGroups to C++ or ...

How does Jgroup on UDP become reliable?

In my mind, UDP is fast but not reliable, and in a lot of places jgroup is based on UDP, is that reliable? I saw a lot of places use jgroup to transfer caching information. Does jgroup have to use TCP ...

JAVA 1.5 in OSX 10.6.4

I am trying to debug a j2ee application in tomcat using Intellij Idea in an OS X 10.6.4 system. I need it specifically to run over a 1.5 JVM and 1.5 JDK so that the jgroups-all component doesn t crash ...

JBoss 4.2.2 nodes start to cluster then suspect each other

I have a website running with JBoss 4.2.2 on an existing Red Hat server. I m setting up a second server so as to have a clustered pair (which will then be load-balanced). However, I can t get them to ...

JGroups eating memory

I currently have a problem with my jgroups configuration, causing thousands of messages getting stuck in the NAKACK.xmit_table. Actually all of them seem to end up in the xmit_table, and a another ...

How Fast Might RMI Be?

I have seen the question: Communication between two separate Java desktop applications (answer: JGroups) and I m thinking about implementing something with JavaGroups or straight RMI but speed is of ...

热门标签