|
Thanks for the response! I'd like to clarify our requirement before I delve further into these optimizations (though I'll try them anyway, to see what happens and get a little more familiar with the stack). In our eventual production design, we're counting on being able to move data at 200-400Mbits/second; that's why we are designing in Gigabit interface. The modifications that you discuss below sound like "tweaks"; my guess is I'd be looking for 2x to 3x improvement in throughput? That's not going to get us where we need to be. Either way, your suggestion of benchmarking our design and examining bottlenecks in clearly an excellent suggestion, and I'll make that my immediate priority! Thank you again for your analysis, I'll post what we've seen when we get further along. Dan Miller John Williams wrote: Hi Dan, With the ll_temac connected to the MPMC, we've seen around 20-25mbps using iperf as the performance benchmark. That performance was adequate for the customer who supported (read 'paid for') the ll_temac driver integration work, so we didn't take it any further. 13MBps is a little on the slow side, but in the ballpark region reported by others here on the list. In terms of how to improve it, the first step is to do some serious benchmarking and profiling, find out where the bottlenecks are. There are a few prime suspects: 1 - data copy from kernel to user space. This is unavoidable unless you are using the sendfile() API to directly push a file out over the network - all network packets that terminate in userspace applications will have to be copied from the kernel into userspace buffers. Currently this is done with a somewhat optimised memcpy routine, but might be further improved with a hardware memcpy implementation 2 - cache invalidation - when incoming packets are received via DMA, it is necessary to invalidate the CPU datacache to prevent cache coherence problems. The MicroBlaze architecture requires a loop over the entire region to be invalidated, there's no magic "cache invalidate" bit we can hit to do it all in a couple of cycles. Even worse, in early MicroBlaze versions, the reference manual said that the cache itself had to be disabled during the invalidation, making it a serious performance hit. I believe this restriction has been removed in the most recent CPU versions, maybe if Goran is listening he can comment? 3 - Alignment issues - ethernet frame headers are not a multiple of 32-bits long - something stupid like 18 bytes. This means that the IP packet headers that follow it, are not 32-bit aligned when they arrive. This can force slow unaligned accesses during IP packet header handling in the kernel. Some things you might like to try: 1. take a look at arch/microblaze/kernel/cpu/cache.c. Find the various references to __disable_dcache() in the _flush_dcache_* functions, and comment them out. You might also like to try the same - commont out the __disable_icache() calls in the various flush_icache_* functions. 2. - Try tweaking the ALIGNMENT_RECV value on line 87 of drivers/net/xilinx_lltemac/xlltemac_main.c, to 34 (from 32). This should cause the IP packet headers to become word aligned, and may improve TCP packet header handling. Let us know how you go. Cheers, John On Wed, Jul 1, 2009 at 9:57 AM, Dan Miller<dan@xxxxxxxxxx> wrote:We've finally got our ML505 board booting PetaLinux 0.40-rc3 !! Things generally seem to work. We're using the TeMac ethernet module. When our other engineer had been testing his design with a raw Ethernet connect, he said he was getting 400Mbit/second throughput (I don't know what tools he was using to test this). With PL loading and running, though, we're only able to get about 13Mbit/second through the network!! We've connected directly to a Quad Core machine running WinXP SP2, connecting with a short 1000Mbit crossover cable. The boards were able to connect at 1Gbit. However, when I attempt ftp transfers to/from the box, I'm getting 1.8Mbps into the board, and 13Mbps out of the board. What is going on here?? Is this because of the default tcp/ip stack? Our HW engineer saw another tcp/ip stack somewhere that he says claims to get over 100Mbps, but I'm hesitant to try to rip out the installed stack and link something else in. Is that really necessary? Dan Miller ___________________________ microblaze-uclinux mailing list microblaze-uclinux@xxxxxxxxxxxxxx Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/ ___________________________ microblaze-uclinux mailing list microblaze-uclinux@xxxxxxxxxxxxxx Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/ |