WebSphere problems related to new default nproc limit in RHEL 6
We recently had an incident on one of our production systems running under Red Hat Enterprise Linux where under certain load conditions WebSphere Application Server would fail with an
OutOfMemoryError with the following message:
Failed to create a thread: retVal -1073741830, errno 11
Error number 11 corresponds to
EAGAIN and indicates that the C library function creating the thread fails because of insufficient resources. Often this is related to native memory starvation, but in our case it turned out that it was the
nproc limit that was reached. That limit puts an upper bound on the number of processes a given user can create. It may affect WebSphere because in this context, Linux counts each thread as a distinct process.
Starting with RHEL 6, the soft
nproc limit is set to 1024 by default, while in previous releases this was not the case. The corresponding configuration can be found in
/etc/security/limits.d/90-nproc.conf. Generally a WebSphere instance only uses a few hundred of threads so that this problem may go unnoticed for some time before being triggered by an unusual load condition. You should also take into account that the limit applies to the sum of all threads created by all processes running with the same user as the WebSphere instance. In particular it is not unusual to have IBM HTTP Server running with the same user on the same host. Since the WebSphere plug-in uses a multithreaded processing model (and not an synchronous one), the
nproc limit may be reached if the number of concurrent requests increases too much.
One solution is to remove or edit the
90-nproc.conf file to increase the
nproc limit for all users. However, since the purpose of the new default value in RHEL 6 is to prevent accidental fork bombs, it may be better to define new hard and soft
nproc limits only for the user running the WebSphere instance. While this is easy to configure, there is one other problem that needs to be taken into account.
For some unknown reasons,
sudo (in contrast to
su) is unable to set the soft limit for the new process to a value larger than the hard limit set on the parent process. If that occurs, instead of failing,
sudo creates the new process with the same soft limit as the parent process. This means that if the hard
nproc limit for normal users is lower than the soft
nproc limit of the WebSphere user and an administrator uses
sudo to start a WebSphere instance, then that instance will not have the expected soft
nproc limit. To avoid this problem, you should do the following:
- Increase the soft
nproclimit for the user running WebSphere.
- Increase the hard
nprocfor all users to the same (or a higher) value, keeping the soft limit unchanged (to avoid accidental fork bombs).
Note that you can verify that the limits are set correctly for a running WebSphere instance by determining the PID of the instance and checking the