Checking System Stability

Support knowledgebase (hmeyer_memtest-sig11)

Request

You would like to make sure that Linux runs and is stable on your hardware setup.

Procedure

Provided that you have installed the kernel sources and the necessary developer tools (e.g., the compiler), you can check your system stability quite reliably with the following little script:

#!/bin/bash
#
# Adapted from http://www.bitwizard.nl/sig11
#
#set -x

cd /usr/src/linux

t=1
while [ -f log.$t ] 
  do
  t=`expr $t + 1`
done

if [ ! -r .config ]; then
  echo -e
  echo -e "There was no .config file found in /usr/src/linux ."
  echo -e "This means that the kernel sources have not been configured yet."
  echo -e "If you continue, \"make cloneconfig\" will be executed to create"
  echo -e "a kernel configuration based on the currently running kernel."
  echo -e "\n"
  echo -e "Press <Ctrl>-<C> to abort or <ENTER> to continue ..."
  read
  make cloneconfig
fi

touch log.1
watch "ls -lt log.*" &

while true
  do
  make clean &> /dev/null
  make -k bzImage > log.$t 2> /dev/null
  t=`expr $t + 1`
done

The original version of this script can be found on http://www.bitwizard.nl/sig11. There, also find some background information regarding this test.

Advantages of this test compared to other methods (e.g., memtest86) are:

  1. The complete system will be tested, not just the RAM
  2. The system can stay operational while this test is running

The script will run an endless loop of kernel compiles (make bzimage) and saves the output of make in a separate log file (which will be quite large) for each run.

Normally, it would be expected that each run will result in identical output.

While running, the script will give a continuously updated view with

ls -l /usr/src/linux/log.*

It could look like this:

Every 2s: ls -lt log.*               Wed Aug  8 15:22:02 2001 
-rw-r--r--    1 root     root         5472 Aug  8 15:22 log.4
-rw-r--r--    1 root     root       127120 Aug  8 15:21 log.3
-rw-r--r--    1 root     root       127120 Aug  8 15:12 log.2
-rw-r--r--    1 root     root       127120 Aug  8 15:04 log.1

In this example, the first three runs have been completed already. It is a good sign that the file size of the corresponding log files does not differ. However, to be on the safe side, one should let this test run for about 24 hours. On top of comparing the file size of all log files, you can use md5sum to create check sums for all these files to check if these files are really identical:

linux:/usr/src/linux # md5sum log.*
51e25c01370ce034b2c00d4c71995f02  log.1
51e25c01370ce034b2c00d4c71995f02  log.2
51e25c01370ce034b2c00d4c71995f02  log.3
a014cc76b1fb46a3cc5b84484403a1b7  log.4

It is no surprise that the fourth log file has a different check sum, since this run was not completed yet. All the other (completed) runs should show identical check sums, however.

Note: Under certain circumstances, the first run can result in a slightly different output compared to all the following runs. As a general rule, therefore, one can say that all but the first and the last (not completed) run must result in identical log files.


See also:
o General Hardware Problems
o Checking for Memory Errors with memtest86

Keywords: CRASH, FREEZE, SEGMENTATION FAULT, SIGNAL 11, SIG11, MEMTEST86

SDB-hmeyer_memtest-sig11, Copyright SuSE Linux AG, Nürnberg, Germany - Version: 08. Aug 2001
SuSE Linux AG - Last generated: 15. Oct 2002 by hmeyer (sdb_gen 1.40.0)