Wednesday, May 9, 2012

A JVM networking bug

My former colleagues sent me a strange error from a production jboss instance running on windows server 2003. Occasionally an AV terminates the VM:


EXCEPTION_ACCESS_VIOLATION (0xc0000005)


C  [ntdll.dll+0x2b583]  wcscpy+0x108
C  [ntdll.dll+0x2ba81]  RtlTimeFieldsToTime+0x2cb
C  [ntdll.dll+0x2b646]  wcscpy+0x1cb
C  [msvcr71.dll+0x218a]  free+0x39
C  [net.dll+0x70fd]  Java_java_net_SocketInputStream_socketRead0+0x1c6
J  java.net.SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I


From another VM:


C  [ntdll.dll+0x2be3e]
C  [ntdll.dll+0x2b561]
C  [ntdll.dll+0x2ba81]
C  [ntdll.dll+0x2b646]
C  [msvcr71.dll+0x218a]
C  [net.dll+0x7129]
j  java.net.SocketInputStream.socketRead0(Ljava/io/FileDescriptor;[BIII)I+0
j  java.net.SocketInputStream.read([BII)I+84
j  org.apache.coyote.http11.InternalInputBuffer.fill()Z+59


Checking some forums and stuff didn't help too much. There's even a bug for this http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5040096 (I tried to comment on that bug, but this sun portal is not a friend of mine lately ... ) (also https://forums.oracle.com/forums/thread.jspa?threadID=1582665 ).
What I could figure out that the error must be in the JVM net.dll. It's quite strange though that the error seems to come only on "Windows Server 2003 family Build 3790 Service Pack 2".


I checked the source of the socketRead0 method:



/*
 * Class:     java_net_SocketInputStream
 * Method:    socketRead
 * Signature: (Ljava/io/FileDescriptor;[BIII)I
 */
JNIEXPORT jint JNICALL
Java_java_net_SocketInputStream_socketRead0(JNIEnv *env, jobject this,
                                            jobject fdObj, jbyteArray data,
                                            jint off, jint len, jint timeout)
{
    char *bufP;
    char BUF[MAX_BUFFER_LEN];
    jint fd, newfd;
    jint nread;

    if (IS_NULL(fdObj)) {
        JNU_ThrowByName(env, JNU_JAVANETPKG "SocketException", "socket closed");
        return -1;
    }
    fd = (*env)->GetIntField(env, fdObj, IO_fd_fdID);
    if (fd == -1) {
        NET_ThrowSocketException(env, "Socket closed");
        return -1;
    }

    /*
     * If the caller buffer is large than our stack buffer then we allocate
     * from the heap (up to a limit). If memory is exhausted we always use
     * the stack buffer.
     */
    if (len <= MAX_BUFFER_LEN) {
        bufP = BUF;
    } else {
        if (len > MAX_HEAP_BUFFER_LEN) {
            len = MAX_HEAP_BUFFER_LEN;
        }
        bufP = (char *)malloc((size_t)len);
        if (bufP == NULL) {
            /* allocation failed so use stack buffer */
            bufP = BUF;
            len = MAX_BUFFER_LEN;
        }
    }


    if (timeout) {
        if (timeout <= 5000 || !isRcvTimeoutSupported) {
            int ret = NET_Timeout (fd, timeout);

            if (ret <= 0) {
                if (ret == 0) {
                    JNU_ThrowByName(env, JNU_JAVANETPKG "SocketTimeoutException",
                                    "Read timed out");
                } else if (ret == JVM_IO_ERR) {
                    JNU_ThrowByName(env, JNU_JAVANETPKG "SocketException", "socket closed");
                } else if (ret == JVM_IO_INTR) {
                    JNU_ThrowByName(env, JNU_JAVAIOPKG "InterruptedIOException",
                                    "Operation interrupted");
                }
                if (bufP != BUF) {
                    free(bufP);
                }
                return -1;
            }

            /*check if the socket has been closed while we were in timeout*/
            newfd = (*env)->GetIntField(env, fdObj, IO_fd_fdID);
            if (newfd == -1) {
                NET_ThrowSocketException(env, "Socket Closed");
                return -1;
            }
        }
    }

    nread = recv(fd, bufP, len, 0);
    if (nread > 0) {
        (*env)->SetByteArrayRegion(env, data, off, nread, (jbyte *)bufP);
    } else {
        if (nread < 0) {
            /*
             * Recv failed.
             */
            switch (WSAGetLastError()) {
                case WSAEINTR:
                    JNU_ThrowByName(env, JNU_JAVANETPKG "SocketException",
                        "socket closed");
                    break;

                case WSAECONNRESET:
                case WSAESHUTDOWN:
                    /*
                     * Connection has been reset - Windows sometimes reports
                     * the reset as a shutdown error.
                     */
                    JNU_ThrowByName(env, "sun/net/ConnectionResetException",
                        "");
                    break;

                case WSAETIMEDOUT :
                    JNU_ThrowByName(env, JNU_JAVANETPKG "SocketTimeoutException",
                                   "Read timed out");
                    break;

                default:
                    NET_ThrowCurrent(env, "recv failed");
            }
        }
    }
    if (bufP != BUF) {
        free(bufP);
    }
    return nread;
} 

And I found that on this line: "check if the socket has been closed while we were in timeout" - the method returns without releasing the possibly allocated bufP buffer. Well, I'm not good at C, but this seems to be a bug. And it's there in the latest jdk6 (31) as well, but it's fixed in OpenJdk7.
So I think this is the error that somehow causes an AV on win2003. Upgrading to jdk7 should help.

No comments:

Post a Comment