|
Hi,
thanks
a lot. your fastcopy1.S works well here. :-)
Have
you tried my patches, too? I mean skbuff.c.diff and also the prevention of the
memcpy() in adapter.c:xenet_FifoSend() and memmove() in
adapter.c:FifoRecvHandler()? Has the speed increased for you as
well?
Together with your patches plus also placing IRQ-entry-path and
system-call-entry-path in BRAM, the throughput of a TCP/IP package (including
receiving of ACK) increased a lot here (from about 2ms to 1ms).
Smart using of user-application socket handling (I mean smart buffer
sizes of multiples of 1460 bytes for functions send()) also improves a lot, and
increased my transfer rate a lot. Not to forget
eventually to switch off the nagle algorithm by socket option
TCP_NODELAY.
That's
all for now. Anyway I've learned a lot about the kernel internals
concerning the TCP stack... maybe there was something that was useful
for you too...
-----Original Message----- From: Jim Law
[mailto:jlaw@xxxxxxxxxxxxx] Sent: Wednesday, May 07, 2008 4:46
PM To: microblaze-uclinux@xxxxxxxxxxxxxx Subject:
[microblaze-uclinux] MicroBlaze fast memcpy / memmove
Hi all,
I took some suggestions from Goran Bilski and the
trials that Falk had been doing trying to speed up the Ethernet, and built
assembler memcpy and memmove functions that incorporate some loop unrolling,
etc. to try and get uniformly faster moves and copies.
General code flow for both ascending and
descending transfers is:
1. Transfer 1-byte per loop to get to
an aligned destination address.
2. Transfer 8-word blocks per loop if at
least 8 words left to transfer, using unrolled loops per Goran's
suggestions.
3. Transfer 1-word per loop if at least 1 word
left to transfer.
4. Transfer 1-byte per loop to
finish.
I kept the ascending move (slightly slower than
descending move) for the memcpy function, just in case anyone is dependant on
this behaviour. If anyone with more experience thinks (knows?)
that the direction of memcpy's operation is irrelevant to every program
ever written <G>, then i can do a quick change and get slightly better
performance on memcpy moves.
Attached is the fastcopy.S file that I place and build in
/arch/microblaze/lib. I just commented out the C versions of memcpy and
memmove in their respective .c files in the same directory.
Obviously this change won't make a day and night difference on small
moves, but might be noticeable on larger blocks of data.
Jim Law
Iris Power LP
Please note that Iris Power LP Canada has
relocated its Toronto operation to 3110 American Drive, Mississauga L4V 1T2,
right next to the Toronto airport. Our phone and fax numbers remain the same,
as does the level of service we offer our clients. For further details
please visit our website at www.irispower.com
|