Bull GNU/Linux NFSv4 project

SLES10 SP1 Robustness regression tests
September 2007

Introduction

Here are the last results for the Robustness testing  between SLES10 SP1 and LINUX-2.6.22-rc5-CITI_NFS4_ALL-1 CITI patch

Tests performed

Tests have been done with the following benchs:

Test #6 : locks testing   HERE

Robustness Results

SLES10-SP1  Robustness Results as Client in front of a 2.6.22-rc5-CITI_NFS4_ALL-1 as Server :


Testing   Tool
each one alone
NFSV4 
sec=sys
2 hours
 NFSV4  
sec=sys
15 hours
NFSV4
sec=sys
60 hours
nfs_fsstress PASSED
PASSED
PASSED
FSX
PASSED
PASSED
PASSED
FFSB
stress file
       
PASSED

PASSED

PASSED
IOZONE
(-U)
       
PASSED

PASSED


Cthon04(-t)
extended
       
PASSED

PASSED

FAILED(1)
Locks local
500 process

PASSED

PASSED

PASSED
Locks network
1000 process
2clients
       
PASSED



DBENCH
10,100,1000 CLIENTS
       
PASSED

PASSED

Simultaneously
fsstress
+fsx
+iozone (no -U)
+ffsb
+cthon04-t (ext)


PASSED


PASSED



FAILED(1)
Remarks:
(1) The lock tests from the connectathon suite failed after 40 hours with the process
tlocklfs hanging with the following stack:
Stack traceback for pid 26100
0xffff810069eef810    26100    26093  0    0   S  0xffff810069eefab0  tlocklfs
rsp                rip                Function (args)
0xffff81004ca8db38 0xffffffff805d8fc6 __sched_text_start+0x7be
0xffff81004ca8dbf0 0xffffffff80241150 futex_wait+0x243
0xffff81004ca8dce0 0xffffffff802213b5 __wake_up+0x43
0xffff81004ca8dd40 0xffffffff805dab5a _read_unlock_irq+0x9
0xffff81004ca8dd58 0xffffffff80223d36 default_wake_function
0xffff81004ca8ddb0 0xffffffff80241753 do_futex+0x74
0xffff81004ca8de20 0xffffffff80416d9c __up_read+0x86
0xffff81004ca8de40 0xffffffff8023dfd7 up_read+0x9
0xffff81004ca8de50 0xffffffff805dca44 do_page_fault+0x45e
0xffff81004ca8dea8 0xffffffff805dab4d _spin_unlock_irq+0x9
0xffff81004ca8deb8 0xffffffff805d902b thread_return+0x64
0xffff81004ca8df00 0xffffffff802428fc sys_futex+0xee
0xffff81004ca8df80 0xffffffff80209b3e system_call+0x7e

SLES10-SP1  Robustness Results as Server in front of a 2.6.22-rc5-CITI_NFS4_ALL-1 as Client:



Testing   Tool
each one alone
NFSV4 
sec=sys
2 hours
NFSV4
sec=sys
15 hours
NFSV4
sec=sys
60 hours
nfs_fsstress PASSED
PASSED PASSED
FSX
FAILED(2)
FAILED(2) FAILED(2)
FFSB
stress file
       
PASSED

PASSED
PASSED
IOZONE
(-U)
       
PASSED

PASSED

Cthon04 -t
extended
       
FAILED(1)

PASSED

FAILED(1)
Locks local
500 process

PASSED

PASSED

Locks network
1000 process
2 clients
       
PASSED


dbench
10,100,1000 clients
       
PASSED

PASSED

Simultaneously
fsstress
+fsx
+iozone
+ffsb
+cthon04-t (ext)


PASSED


PASSED



FAILED(1)(2)

Remarks:
(1)
The lock tests from the connectathon suite failed after 40 hours with the process tlocklfs hanging with the following stack:
Stack traceback for pid 26100
0xffff810069eef810    26100    26093  0    0   S  0xffff810069eefab0  tlocklfs
rsp                rip                Function (args)
0xffff81004ca8db38 0xffffffff805d8fc6 __sched_text_start+0x7be
0xffff81004ca8dbf0 0xffffffff80241150 futex_wait+0x243
0xffff81004ca8dce0 0xffffffff802213b5 __wake_up+0x43
0xffff81004ca8dd40 0xffffffff805dab5a _read_unlock_irq+0x9
0xffff81004ca8dd58 0xffffffff80223d36 default_wake_function
0xffff81004ca8ddb0 0xffffffff80241753 do_futex+0x74
0xffff81004ca8de20 0xffffffff80416d9c __up_read+0x86
0xffff81004ca8de40 0xffffffff8023dfd7 up_read+0x9
0xffff81004ca8de50 0xffffffff805dca44 do_page_fault+0x45e
0xffff81004ca8dea8 0xffffffff805dab4d _spin_unlock_irq+0x9
0xffff81004ca8deb8 0xffffffff805d902b thread_return+0x64
0xffff81004ca8df00 0xffffffff802428fc sys_futex+0xee
0xffff81004ca8df80 0xffffffff80209b3e system_call+0x7e

See  BUGZILLA#147
(2) Get an issue when running fsx. Sometimes results are not good. This  issue should be due to 2.6.22-rc5-CITI_NFS4_ALL-1. See  BUGZILLA#143
We get:
Regularly, I get the folowing Size errors with fsx-linux
truncating to largest ever: 0x3ffff
Size error: expected 0x1fe6 stat 0x2b000 seek 0x2b000
LOG DUMP (38414 total operations):
38415(15 mod 256): WRITE 0x134c7 thru 0x20c13 (0xd74d bytes)
38416(16 mod 256): WRITE 0x22379 thru 0x2f895 (0xd51d bytes) EXTEND
38417(17 mod 256): TRUNCATE UP from 0x2f896 to 0x3b913
38418(18 mod 256): WRITE 0xbba3 thru 0x1b0ca (0xf528 bytes)
38419(19 mod 256): MAPWRITE 0xd8cd thru 0xe939 (0x106d bytes)
38420(20 mod 256): READ 0x1203c thru 0x2064c (0xe611 bytes)
38421(21 mod 256): MAPREAD 0x194de thru 0x24595 (0xb0b8 bytes)
38422(22 mod 256): MAPWRITE 0x36ed7 thru 0x39bca (0x2cf4 bytes)
38423(23 mod 256): READ 0x2611b thru 0x33e13 (0xdcf9 bytes)

Software versions


Linux SLES 10    SP1
2.6.16.46-0.12-smp          


Linux Fedora Core 6
2.6.22-rc5-CITI_NFS4_ALL-1
Client userland package util-linux-2.12
+ util-linux-2.12-CITI_NFS4_ALL-3.dif
Linux nfs-utils version nfs-utils-1.1.0
+nfs-utils-1.1.0-CITI_NFS4_ALL-1.dif
gssapi library
libgssapi-0.11
rpcsecgss library
librpcsecgss-0.14
nfsidmap library
libnfsidmap-0.19
acl library
acl_2.2.29-1
+acl-2.2.29-CITI_NFS4_ALL-3
Linux TI-RPC 0.1.7
Kerberos
V5 MIT


Hardware configuration

X86_64

Network

Ethernet: 1Gb/s link

Conclusion

 In a bi-ways client/server configuration, NFSV4 for SLES10-SP1  can  be considered as stable for the flavors sys as client and server. The issues found have been reproduced between two 2.6.22-rc5-CITI_NFS4_ALL-1 and are not due to SLES10-SP1.
Kerberos flavours krb5  and krb5i have not been tested. 
krb5p is not supported by SLES10-SP1 .