[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [microblaze-uclinux] MicroBlaze fast memcpy / memmove



Hi,
 
thanks a lot. your fastcopy1.S works well here. :-)
 
Have you tried my patches, too? I mean skbuff.c.diff and also the prevention of the memcpy() in adapter.c:xenet_FifoSend() and memmove() in adapter.c:FifoRecvHandler()? Has the speed increased for you as well?
 
Together with your patches plus also placing IRQ-entry-path and system-call-entry-path in BRAM, the throughput of a TCP/IP package (including receiving of ACK) increased a lot here (from about 2ms to 1ms). Smart using of user-application socket handling (I mean smart buffer sizes of multiples of 1460 bytes for functions send()) also improves a lot, and increased my transfer rate a lot. Not to forget eventually to switch off the nagle algorithm by socket option TCP_NODELAY.
 
That's all for now. Anyway I've learned a lot about the kernel internals concerning the TCP stack... maybe there was something that was useful for you too...
 
Cheers, F@lk
 
 
 
 
-----Original Message-----
From: Jim Law [mailto:jlaw@xxxxxxxxxxxxx]
Sent: Wednesday, May 07, 2008 4:46 PM
To: microblaze-uclinux@xxxxxxxxxxxxxx
Subject: [microblaze-uclinux] MicroBlaze fast memcpy / memmove

Hi all,
 
I took some suggestions from Goran Bilski and the trials that Falk had been doing trying to speed up the Ethernet, and built assembler memcpy and memmove functions that incorporate some loop unrolling, etc. to try and get uniformly faster moves and copies.
 
General code flow for both ascending and descending transfers is:
1. Transfer 1-byte per loop to get to an aligned destination address.
2. Transfer 8-word blocks per loop if at least 8 words left to transfer, using unrolled loops per Goran's suggestions.
3. Transfer 1-word per loop if at least 1 word left to transfer.
4. Transfer 1-byte per loop to finish.
 
I kept the ascending move (slightly slower than descending move) for the memcpy function, just in case anyone is dependant on this behaviour. If anyone with more experience thinks (knows?) that the direction of memcpy's operation is irrelevant to every program ever written <G>, then i can do a quick change and get slightly better performance on memcpy moves.
 
Attached is the fastcopy.S file that I place and build in /arch/microblaze/lib.  I just commented out the C versions of memcpy and memmove in their respective .c files in the same directory.
 
Obviously this change won't make a day and night difference on small moves, but might be noticeable on larger blocks of data.
 
Jim Law
Iris Power LP
 
 
Please note that Iris Power LP Canada has relocated its Toronto operation to 3110 American Drive, Mississauga L4V 1T2, right next to the Toronto airport. Our phone and fax numbers remain the same, as does the level of service we offer our clients.  For further details please visit our website at www.irispower.com