Bull GNU/Linux NFSv4 project


RHEL5.1 Public Gold - linux-2.6.24-rc2-CITI_NFS4_ALL-1

Robustness Regression tests
December 2007


Introduction

Here are the last results for the Robustness testing  between:
RHEL5.1(Public Gold)
and
linUX-2.6.24-rc2-CITI_NFS4_ALL-1  CITI patch

Tests performed

Tests have been done with the following benchs:

Test #6 : locks testing   HERE

Robustness Results

RHEL5.1 Public Gold Robustness Results as Client in front of a 2.6.24-rc2-CITI_NFS4_ALL-1 as Server :


Testing   Tool
each one alone
NFSV4 
sec=sys
2 hours
NFSV4 
sec=krb5
2 hours
NFSV4   
sec=krb5i
2 hours
NFSV4
sec=krb5p
2 hours
 NFSV4  
sec=sys
15 hours
NFSV4
sec=krb5
15 hours
NFSV4
sec=krb5i
15 hours
NFSV4
sec=krb5p
15 hours
NFSV4
sec=sys
  60hours
nfs_fsstress PASSED
PASSED
PASSED
PASSED
PASSED
PASSED PASSED PASSED

FSX
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED

FFSB
stress file
       
PASSED
         
PASSED

PASSED

PASSED

PASSED

PASSED

PASSED

PASSED

IOZONE
(-U)
       
PASSED(3)
         
PASSED

PASSED

PASSED

FAILED (2)




Cthon04(-t)
extended
       
PASSED
         
PASSED

PASSED

PASSED

PASSED

PASSED

PASSED

PASSED


Locks local
500 process

PASSED

PASSED

PASSED

PASSED

PASSED

PASSED



DBENCH
10,100,1000 CLIENTS
       
PASSED

PASSED

PASSED

PASSED

PASSED




Simultaneously
fsstress
+fsx
+iozone (no -U)
+ffsb
+cthon04-t (ext)


PASSED


PASSED


PASSED


PASSED



PASSED


PASSED


PASSED


PASSED



FAILED (1)

(3)  Memory leak on size-8192 buckets on the client side (See  https://bugzilla.redhat.com/show_bug.cgi?id=423521)

RHEL5.1 Public Gold  Robustness Results as Server in front of a 2.6.24-rc2-CITI_NFS4_ALL-1 as Client:



Testing   Tool
each one alone

NFSV4 
sec=sys
2 hours
NFSV4
sec=krb5
2 hours
NFSV4     
sec=krb5i
2 hours
NFSV4
sec=krb5p
2 hours
NFSV4
sec=sys
15 hours
NFSV4
sec=krb5
15 hours
NFSV4
sec=krb5i
15 hours
NFSV4
sec=krb5p
15 hours
nfs_fsstress PASSED
PASSED
PASSED
PASSED
PASSED PASSED PASSED PASSED
FSX
PASSED
PASSED
PASSED
PASSED
PASSED PASSED PASSED PASSED
FFSB
stress file
       
PASSED
         
PASSED

PASSED

PASSED

PASSED

PASSED

PASSED

PASSED
IOZONE (-U)
PASSED
PASSED
PASSED
PASSED
PASSED
PASSED

Cthon04 -t
extended

PASSED

PASSED


PASSED

PASSED

PASSED(1)

PASSED

PASSED

PASSED
Locks local
500 process

PASSED

PASSED

PASSED

PASSED

PASSED

PASSED




dbench
10,100,1000 clients
       
PASSED
         
PASSED

PASSED

PASSED

PASSED



Simultaneously
fsstress
+fsx
+iozone
+ffsb
+cthon04-t (ext)


PASSED


PASSED


PASSED



PASSED


PASSED


PASSED


PASSED



PASSED

Remarks:
(1) Connectathon lock tests stop sometimes with the following:

 ps -eaf
root 22597 22183  0 00:34 pts/7    00:00:00 sh runtests -t
root 22608 22597 0 00:34 pts/7  00:00:00 tlock64 -r /mnt/nosec/cthon_nfs4_nfs1_gb
root 22609 22608  0 00:34 pts/7    00:00:00 [tlock64] <defunct>

pstack 22608
#0  0x000000309d4d93b8 in __lll_mutex_lock_wait () from /lib64/libc.so.6
#1  0x000000309d44682d in _L_lock_11742 () from /lib64/libc.so.6
#2  0x000000309d44678b in buffered_vfprintf () from /lib64/libc.so.6
#3  0x000000309d44227f in vfprintf () from /lib64/libc.so.6
#4  0x000000309d44b95a in printf () from /lib64/libc.so.6
#5  0x000000000040148a in comment ()
#6  0x000000000040134a in childdied ()
#7  <signal handler called>
#8  0x000000309d46b511 in _IO_cleanup () from /lib64/libc.so.6
#9  0x000000309d432e92 in exit () from /lib64/libc.so.6
#10 0x0000000000404c10 in runtests ()
#11 0x0000000000404f6a in main ()

See
BUGZILLA #147

RHEL5.1 Public Gold  Loopback Robustness Results:

Testing   Tool

NFSV4 
sec=sys
2 hours
Simultaneously in loopback
fsstress
+fsx
+iozone (no -U)
+ffsb
+cthon04-t (ext)



FAILED (1)

Remarks:
(1) All the applications suspended after around 2hours
(See https://bugzilla.redhat.com/show_bug.cgi?id=408861)


2.6.24-rc2-CITI_NFS4_ALL-1 Loopback Robustness Results:

Testing   Tool

NFSV4 
sec=sys
2 hours
Simultaneously in loopback
fsstress
+fsx
+iozone (no -U)
+ffsb
+cthon04-t (ext)



FAILED (1)

Remarks:
(1) All the applications suspended  after around 2 hours. The hang is nevertheless different from this one got
with the 2.6.23-rc9-CITI_NFS4_ALL-1
where nfsd was looping using 100%cpu.See BUGZILLA #152

Observations RHEL5.1 (Public Gold) :

1) By default, rpcsec_gss_krb5 module needs to be loaded
2) Exports syntax for security flavors
RHEL5.1 supports gss/krb5, gss/krb5i, gss/krb5p syntax but not the sec=krb5p:krb5i:krb5:sys syntax

Observations 2.6.24-rc2-CITI_NFS4_ALL-1

             1)Mount error. See BUGZILLA#145
*When doing the following NFSV4 mount and umount sequence ,
*the first "umount" doesn't give error messages even when we are under the
mounting point
*but the second umount tells well unmounting is not possible


mount /mnt/nosec
cd /mnt/nosec
umount /mnt/nosec

*No messages here telling unmounting not possible
*The commands "mount" tells /mnt/nosec is no longer mounted
*but /proc/mounts still contains the mount.
*If we do a second:

umount /mnt/nosec

*we get normally

umount: /mnt/nosec: device is busy

*By no longer being under the mounted directory

cd /
umount /mnt/nosec

*We effectively unmount

2) Exports syntax for security flavors

 "gss/krb5, gss/krb5i, gss/krb5p" syntax and the new one  "sec=krb5p:krb5i:krb5:sys" syntax are both supported

Software versions


Linux RHEL5.1  Public Gold
   2.6.18-53.el5


Linux Fedora Core 6
2.6.24-rc2-CITI_NFS4_ALL-1
Client userland package util-linux-2.12
+ util-linux-2.12-CITI_NFS4_ALL-3.dif
Linux nfs-utils version nfs-utils-1.1.0
+nfs-utils-1.1.0-CITI_NFS4_ALL-2.dif
gssapi library
libgssapi-0.11
rpcsecgss library
librpcsecgss-0.14
nfsidmap library
libnfsidmap-0.19
acl library
acl_2.2.29-1
+acl-2.2.29-CITI_NFS4_ALL-3
Linux TI-RPC 0.1.7
Kerberos
V5 MIT


Hardware configuration

X86_64

Network

Ethernet: 1Gb/s link

Conclusion

Between a X86_64 64bits bi-ways client  and a X86_64 64bits bi-ways server, NFSV4 for RHEL5.1 Gold and  2.6.24-rc2-CITI_NFSV4_ALL-1 can be considered stable for the security flavors sys, krb5, krb5i and krb5p.
Nevertheless, we continue tests to investigate what happened during the 60hours tests. For both either, it doesn't look stable enough when the machine is client and server in the same time.