We have an ignite cluster with 10 nodes.(Each node has 12 CPU 62 GB Memory.)Sometimes one of the ignition nodes is broken. When we examine the ignition logs, we cannot see any error logs, warnings or explanations. Shortly after the single node is gone, the second node goes and we start getting a partition lost error. As a result of the investigations, we saw that the first outgoing node received SIGSEGV in the server logs.
Apr 19 07:12:25 tr-ignite-6 service.sh[478792]: # SIGSEGV (0xb) at pc=0x00007f97110526d0, pid=478876, tid=0x00007f9675dc6700
Apr 19 08:02:22 tr-ignite-9 service.sh[494185]: # SIGSEGV (0xb) at pc=0x00007fbe71052854, pid=494269, tid=0x00007fbdd9f0e700
Apr 19 08:02:35 tr-ignite-12 service.sh[387860]: # SIGSEGV (0xb) at pc=0x00007f2501052840, pid=387944, tid=0x00007f2475e77700
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # A fatal error has been detected by the Java Runtime Environment:
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: #
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # SIGSEGV (0xb) at pc=0x00007efccc77e6ee, pid=1607467, tid=0x00007efc4c43b700
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: #
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # JRE version: Java(TM) SE Runtime Environment (8.0_281-b09) (build 1.8.0_281-b09)
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.281-b09 mixed mode linux-amd64 compressed oops)
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # Problematic frame:
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # v ~StubRoutines::jbyte_disjoint_arraycopy
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: #
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # Core dump written. Default location: /var/lib/apache-ignite/core or core.1607467
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: #
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # An error report file with more information is saved as:
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # /var/lib/apache-ignite/hs_err_pid1607467.log
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: Compiled method (c2) 1403659804 15609 4 org.apache.ignite.internal.binary.BinaryWriterExImpl::write (13 bytes)
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: total in heap [0x00007efccf1c29d0,0x00007efccf1c3138] = 1896
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: relocation [0x00007efccf1c2af8,0x00007efccf1c2b40] = 72
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: main code [0x00007efccf1c2b40,0x00007efccf1c2e20] = 736
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: stub code [0x00007efccf1c2e20,0x00007efccf1c2e58] = 56
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: oops [0x00007efccf1c2e58,0x00007efccf1c2e60] = 8
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: metadata [0x00007efccf1c2e60,0x00007efccf1c2ea8] = 72
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: scopes data [0x00007efccf1c2ea8,0x00007efccf1c3038] = 400
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: scopes pcs [0x00007efccf1c3038,0x00007efccf1c30d8] = 160
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: dependencies [0x00007efccf1c30d8,0x00007efccf1c30e8] = 16
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: handler table [0x00007efccf1c30e8,0x00007efccf1c3118] = 48
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: nul chk table [0x00007efccf1c3118,0x00007efccf1c3138] = 32
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: #
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # If you would like to submit a bug report, please visit:
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: # http://bugreport.java.com/bugreport/crash.jsp
Mar 24 08:28:02 tr-ignite-2 service.sh[1607384]: #
We looked but couldn t find anything as to why. Why might we have received SIGSEGV?