[netperf-dev] [netperf-talk] setting SO_DONTROUTE for UDP tests by default?
Rick Jones
rick.jones2 at hp.com
Mon Oct 26 09:40:54 PDT 2009
Andrew Gallatin wrote:
> Rick Jones wrote:
>
>> also connected to their employer's site lans. Some of their tests
>> include triggering link-down events on their NICs under test. This
>> can then cause the IP stack on the test system to seek another path,
>> and that can be the default route leading out onto the site network.
>> If the netperf test happens to be UDP_STREAM things can get rather
>> ugly from that point on...
>
>
> You're describing the Solaris NICDRV certification test, eh?
Not necessarily :)
>> Clearly these folks are being, well frankly they are being idiots
>> running network tests on systems without an "air gap" between them and
>> the rest of the
>
> It is hard to implement an air gap when your test hardware is 2500 miles
> from you, and your only access is remote. Serial console access gets
> old after a while..
We choose to implement the air gap, and do the other things, not because they
are easy, but because they are correct :)
>> world. And while I don't normally like to cover the backsides of
>> idiots I am getting a little soft in my middle years and was thinking...
>>
>> Would it inconvenience folks very much if for UDP tests, by default
>> netperf were to set SO_DONTROUTE on the data socket, and only not set
>> it by explicit command-line action on the part of the user?
I should also point-out that I'm not completely sold on the idea of setting
SO_DONTROUTE for UDP, on the premis that despite its utility in such matters,
netperf is not a functional test suite, but a benchmark.
> Can you optionally set SO_DONTROUTE even for TCP tests? I had
> an issue (with NICDRV) where the test suite would bounce our
> interface up & down while running netperf TCP_STREAM. The problem
> was that the DUT (myri10ge) implemented LSO, and the Solaris driver
> for the on-board interface holding the default route did not. When our
> NIC was taken down, the kernel re-sent some 60BK "packets" to
> a driver that was utterly unprepared to handle anything larger
> than 1518 bytes. It scribbled over random kernel memory when
> copying the 60KB packet to a 1518 byte buffer, and things
> went boom in strange an unpredictable ways. It took me
> over a day to figure out what was happening.. The "fix" was
> to delete the default route before running those tests..
Strange, I'd have thought the fix was to make sure the driver wasn't so
completely borked by implicitly trusting the upper layers to give it
correctly-sized packets :) (And/or that TCP would check the status of LSO/TSO on
a retransmission/route change)
All ribbing aside though, I'm even less inclined to set SO_DONTROUTE for TCP,
perhaps even optionally. In constrast to the example above, the case I was
familiar with was one where the stack was doing precisely what it was supposed
to do - find another path. The bug was in the test setup. Modulo having an
option during debugging, the example above was a bug that arguably needed to be
found, and optionally letting netperf close one eye might have let it be missed.
And even had the on-board driver done the right thing with the too large
packet(s), the TCP case differs from UDP in that there *is* flow control, so
when a different path is chosen, and it leads into the void, the flow of packets
into the void will come to a rather quick halt, very much unlike a UDP_STREAM
test, which will continue to toss packets at speed into the void until the test
timer expires.
I'm not slamming the door on either of course, but am not yet convinced. I'm
still grumpy over having to have three variables to track socket buffer sizes
thanks to Linux's doubling and autotuning :)
happy benchmarking,
rick jones
More information about the netperf-dev
mailing list