Introduction
Here are the last results for the Robustness and
Performances
testing of the kernel
and CITI patch:
linux-2.6.25-rc6-CITI_NFS4_ALL-1
Tests performed
The following
tools have been used:
Test #1 : connectathon 04 (update 2003/12)
(ltp-full-20060822) LTP
- run basic (-b) , general (-g) ,
special (-s), and
lock tests (-l)
-
mode extended (-t) is used
Test #2 : nfs_fsstress (ltp-full-20060822) LTP
-
Tests are done with various number of
subprocessess : 2, 4,
8, 16, 32
-
Tests are done with various length of instructions lists:
1, 2,
4, 8, 16, 32, 64, 128
Test #3 : fsx (ltp-full-20060822) LTP
-
fsx is run with 50000 iterations
Test #4 : Iozone
(version 3.263) HERE
-
Run iozone -+q 30 -ace -r 64 -i 0 -i 1 -i 2
-g
2G -U mountpoint
Test #5 : ffsb (version 5.2.1) FFSB
-
Run with default profile_stress_test file as profile
Test #6 : locks testing
HERE
-
Run
mode local done with 500 processes
-
Run mode network done with two clients ,
500
processes
each
Test #7 : dbench (3.04) HERE
-
Run with 10, 100, 1000 clients
Test #8 : ACL testing (ltp-full-20060822) LTP
Performances Results
- We run iozone (Test#4) to measure:
- read and write performance
throughputs for the asynchronous and
synchronous modes of the command mount (async exports used in both
cases)
- The different security flavours
are measured:
sys, krb5, krb5i, krb5p
- Client machine is two processors, server machine is 4
processors
The following parameters are used:
iozone -+q 5 -ace -r
64 -i 0 -i
1 -g 1G
-f /mnt/dir/file -U /mnt/dir
-
mount options rsize,wsize are set to 262144 (due
to
2G ram memory for both client and server)
Kernel .config file impact is very important on
performances. Check it when bad performances.
For example, CONFIG_DEBUG_SLAB set will drastically decrease the
performances.
- The
results on reads for asynchronous mount are HERE
Remarks:
We
retrieve logically a degradation following the security flavor used.
- The
results on reads for synchronous mount are HERE
Remarks:
Same as asynchronous mode.
- The
results on writes for asynchronous mount are HERE
For any flavors but krb5p, we get
a drastic degradation
performance between 256MB and 512 Mbytes files and over even
having 2G RAM memory. If it was able to see that in
linux-2.6.22-rc5-CITI_NFS4_ALL-1.diff, we
didn't see that in the previous 2.6.19-rc6-CITI_NFS4_ALL-1
- The
results on writes for synchronous mount are HERE
Robustness Results
- The objective was to
run:
- Tests
#1 #2
#3 #4 #5 #6 #7 alone for: 2 hours, 15 hours, 60 hours.
- Tests
#1 #2 #3
#4 #5 were also
run
all together simultaneously for: 2 hours , 15 hours , 60 hours (option
-U not used for iozone for this all together longrun)
- Asynchronous mode is
used for both: mount command and
export option
- Both client and server are two processors
Remarks:
- Tests #1 #2
#3 #4 #5
together
- special tests
from connectathon suite fail
We get randomly in a such
overloaded
environment the following error:
Second check for lost reply on
non-idempotent requests
testing 50 idempotencies in directory "testdir"
rmdir 1: Directory not empty
special tests failed
See BUGZILLA
#162
We get randomly nfsd looping
PID
USER PR NI VIRT
RES SHR S %CPU %MEM
TIME+ COMMAND
10826
root
20 0 0
0 0 R
100
0.0 769:34.99
nfsd
5
root
RT -5 0
0 0 S
5
0.0
20:42.48
watchdog/0
A similar bug was found by Miklos
Szeredi about infinite loop in generic_file_splice_read() got with fsx
one the benchs we are using
When using the patch found at http://lkml.org/lkml/2008/2/29/443, the
infinite loop no longer appear and we were able to run benchmarks
during 60 hours and more.
Kerberos Robustness Results
- The objective was to
run :
- Tests
#1 #2
#3 #4 #5 alone for: 2 hours, 15 hours, 60 hours using the different
flavors krb5, krb5i , krb5p for 2
hours, 15
hours, 60 hours
- Tests
#1 #2 #3 #4 #5
were
also
run
all together simultaneously for: 2 hours , 15 hours , 60 hours
- Asynchronous mode is used for both: mount
command
and
export
option
- Both client and server are two processors
- The
results
are HERE
Remarks:
- Tests
#1 #2 #3 #4 #5
were
also
run
all together
- special tests
from connectathon suite fail
We get randomly in a such
overloaded
environment the following error:
Second check for lost reply on
non-idempotent requests
testing 50 idempotencies in directory "testdir"
rmdir 1: Directory not empty
special tests failed
See BUGZILLA
#162
We get randomly nfsd
looping
PID
USER PR NI VIRT
RES SHR S %CPU %MEM
TIME+ COMMAND
10826
root
20 0
0
0 0 R
100
0.0
769:34.99 nfsd
5
root
RT -5
0
0 0 S 5
0.0
20:42.48
watchdog/0
A similar bug was found by Miklos
Szeredi about infinite loop in generic_file_splice_read() got with fsx
one the benchs we are using
When using the patch found at http://lkml.org/lkml/2008/2/29/443, the
infinite loop no longer appear and we were able to run benchmarks
during 60 hours and more.
kernel BUG at
net/sunrpc/auth_gss/svcauth_gss.c:1243!
See BUGZILLA
#165
Loopback
Robustness Results:
- Client
and Server on the same machine : Intel(R) Xeon(TM)
X86_64 CPU
2.80GHz BI-ways 64 bits
Testing
Tool
|
NFSV4
sec=sys
2hours
|
NFSV4
sec=sys
15 hours |
NFSV4
sec=krb5
2Hours
|
NFSV4
sec=krb5
15 Hours
|
NFSV4
sec=krb5i
2 Hours
|
NFSV4
sec=krb5p
2 hours
|
NFSV4
sec=krb5p
15 Hours
|
Simultaneously
in loopback
fsstress
+fsx
+iozone (no -U)
+ffsb
+cthon04 -t
|
PASSED
(1)
|
PASSED
(1)
|
PASSED
(1)
|
FAILED
(3)
|
FAILED
(1)(2)
|
PASSED
(1)
|
FAILED
(1)(3)
|
iozone -U
|
PASSED |
PASSED |
|
|
PASSED
|
|
|
locktests
|
PASSED
|
|
|
|
PASSED
|
|
|
dbenchs
|
PASSED
|
|
|
|
|
|
|
(1)
special tests
from connectathon suite randomly fail
(2) kernel bug See BUGZILLA
#165
(3) Hang
- Client
and Server on the same machine : Intel(R) Xeon(TM)
X86_64 CPU
2.80GHz FOUR-ways 64 bits
testing
tool
|
NFSV4
sec=sys
15 hours
|
NFSV4
sec=krb5
2 hours
|
NFSV4
sec=krb5
15 hours
|
NFSV4
sec=krb5i
2 hours
|
NFSV4
sec=krb5p
2 hours
|
Simultaneously
in loopback
fsstress
+fsx
+iozone (no -U)
+ffsb
+cthon04-t (ext) |
PASSED
(1)
|
PASSED
(1)
|
FAILED
(3)
|
FAILED
(1) (2)
|
PASSED
(1)
|
(1)
special tests
from connectathon suite randomly fail
(2) kernel
bug See BUGZILLA
#165
(3) Hang
Remarks:
- linux-2.6.25-rc6-CITI_NFS4_ALL-1
is the first release we have seen the above benchmarks running
simultaneously
so long without hanging before 2 hours
- (1) special
tests
from connectathon suite fail
We get randomly in a such
overloaded
environment the following error:
Second check for lost reply on
non-idempotent requests
testing 50 idempotencies in directory "testdir"
rmdir 1: Directory not empty
special tests failed
See BUGZILLA
#162
- (2) kernel BUG at
net/sunrpc/auth_gss/svcauth_gss.c:1243!
Software versions
| Linux |
linux-2.6.25-rc6-CITI_NFS4_ALL-1
|
| Client
userland
package |
util-linux-2.12
+ util-linux-2.12-CITI_NFS4_ALL-3.dif |
| Linux
nfs-utils
version |
nfs-utils-1.1.2
+nfs-utils-1.1.2-CITI_NFS4_ALL-1.dif |
gssapi
library
|
libgssglue-0.1
|
rpcsecgss
library
|
librpcsecgss-0.17
|
nfsidmap
library
|
libnfsidmap-0.20
|
acl
library
|
acl_2.2.29-1
+acl-2.2.29-CITI_NFS4_ALL-3
|
| Linux
TI-RPC |
1.0.8
|
Hardware configuration
Client
- Xeon(TM) CPU 2.80GHz, cache 512 KB, 2
processors
- Total memory: 2GBytes
- Ethernet: 1Gbits/s link
- Distribution : Fedora Core 8 64bits
- Kerberos V5 MIT1.4.3-4.
Server
- Xeon(TM) CPU 2.80GHz,
cache 512 KB, 2 processors or 4 processors
- Total memory: 2GBytes
- Ethernet: 1Gbits/s link
- Distribution : Fedora Core 8 64bits
- Kerberos V5 MIT1.4.3-4.
Conclusion
Core linux-2.6.25-rc6-CITI_NFS4_ALL-1
functions can be considered as stable
for the security flavors sys,
krb5 ,krb5i and krb5p when the "splice patch" is applied. Note that is
the
first release we have with all the benchmarks running simultaneously
more than 2 hours.
There is
nevertheless
the random issue about the special tests of the connectathon suite.
About write performances we also have a
big performance
degradation when using iozone with file sizes from 512Mbytes.
There is no longer the mount
issue BUGZILLA#145
which could be troubling in many scripts.