|
Okay, that's an easy first change; we're using 8K I/D cache right now. Also, I suspect that ftp isn't the best test for network throughput, I'll look at iperf and other benchmark tools Dan Goran Bilski wrote: Hi, In the latest version of MicroBlaze you don't need to disable cache while invalidating them. I have been doing some benchmarking using LWIP in standalone mode and was able to reach around 150-160 MBit/s, it could have reached much more if lwip supported jumbo frames. A couple of things that improved the performance are: - Large cache, especially dcache. I was using 32kbyte I and D cache - Writeback mode on the dcache, unfortunately the Linux port doesn't support writeback yet (or?) - Use checksum offload in the temac - Analyze the critical functions and place them in LMB using linker script Göran -----Original Message----- From: owner-microblaze-uclinux@xxxxxxxxxxxxxxxxxxxx [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxxxxxxxx] On Behalf Of John Williams Sent: Wednesday, July 01, 2009 2:32 AM To: microblaze-uclinux@xxxxxxxxxxxxxx Subject: Re: [microblaze-uclinux] Is the standard tcp/ip stack slow?? Hi Dan, With the ll_temac connected to the MPMC, we've seen around 20-25mbps using iperf as the performance benchmark. That performance was adequate for the customer who supported (read 'paid for') the ll_temac driver integration work, so we didn't take it any further. 13MBps is a little on the slow side, but in the ballpark region reported by others here on the list. In terms of how to improve it, the first step is to do some serious benchmarking and profiling, find out where the bottlenecks are. There are a few prime suspects: 1 - data copy from kernel to user space. This is unavoidable unless you are using the sendfile() API to directly push a file out over the network - all network packets that terminate in userspace applications will have to be copied from the kernel into userspace buffers. Currently this is done with a somewhat optimised memcpy routine, but might be further improved with a hardware memcpy implementation 2 - cache invalidation - when incoming packets are received via DMA, it is necessary to invalidate the CPU datacache to prevent cache coherence problems. The MicroBlaze architecture requires a loop over the entire region to be invalidated, there's no magic "cache invalidate" bit we can hit to do it all in a couple of cycles. Even worse, in early MicroBlaze versions, the reference manual said that the cache itself had to be disabled during the invalidation, making it a serious performance hit. I believe this restriction has been removed in the most recent CPU versions, maybe if Goran is listening he can comment? 3 - Alignment issues - ethernet frame headers are not a multiple of 32-bits long - something stupid like 18 bytes. This means that the IP packet headers that follow it, are not 32-bit aligned when they arrive. This can force slow unaligned accesses during IP packet header handling in the kernel. Some things you might like to try: 1. take a look at arch/microblaze/kernel/cpu/cache.c. Find the various references to __disable_dcache() in the _flush_dcache_* functions, and comment them out. You might also like to try the same - commont out the __disable_icache() calls in the various flush_icache_* functions. 2. - Try tweaking the ALIGNMENT_RECV value on line 87 of drivers/net/xilinx_lltemac/xlltemac_main.c, to 34 (from 32). This should cause the IP packet headers to become word aligned, and may improve TCP packet header handling. Let us know how you go. Cheers, John On Wed, Jul 1, 2009 at 9:57 AM, Dan Miller<dan@xxxxxxxxxx> wrote:We've finally got our ML505 board booting PetaLinux 0.40-rc3 !! Things generally seem to work. We're using the TeMac ethernet module. When our other engineer had been testing his design with a raw Ethernet connect, he said he was getting 400Mbit/second throughput (I don't know what tools he was using to test this). With PL loading and running, though, we're only able to get about 13Mbit/second through the network!! We've connected directly to a Quad Core machine running WinXP SP2, connecting with a short 1000Mbit crossover cable. The boards were able to connect at 1Gbit. However, when I attempt ftp transfers to/from the box, I'm getting 1.8Mbps into the board, and 13Mbps out of the board. What is going on here?? Is this because of the default tcp/ip stack? Our HW engineer saw another tcp/ip stack somewhere that he says claims to get over 100Mbps, but I'm hesitant to try to rip out the installed stack and link something else in. Is that really necessary? Dan Miller ___________________________ microblaze-uclinux mailing list microblaze-uclinux@xxxxxxxxxxxxxx Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/ ___________________________ microblaze-uclinux mailing list microblaze-uclinux@xxxxxxxxxxxxxx Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/ |