[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microblaze-uclinux] Preliminary 2.6.19 support for PetaLinux



Hi Hans,

Dr. Johann Pfefferl wrote:

- After login every access to the filesystems seems to be very slow. But
  browsing the shell history with the cursor keys works at normal speed.

I see this too - it's directly related to the size of the binary executable. I instrumented the binfmt_flat loader - here's what you get with loading busybox (in my case almost 300kb executable)

# busybox
[ 1251.310000] load_flat_binary start
[ 1251.320000] load_flat_file start
[ 1251.320000] mmapping text+data+bss
[ 1251.580000] flush_icache_range(0x23c80000,0x23ce15ad)
[ 1251.590000] updated:0x23c80000->0x23c82010
[ 1251.590000] flush_icache_range() end
[ 1251.600000] done
[ 1251.600000] loading flat file
[ 1251.890000] done
[ 1251.890000] stating relocations
[ 1252.140000] done
[ 1252.140000] flush_icache_range(0x23c80040,0x23cbf580)
[ 1252.150000] updated:0x23c80040->0x23c82050
[ 1252.150000] flush_icache_range() end
[ 1252.160000] zeroing BSS etc
[ 1252.300000] done
[ 1252.310000] load_flat_binary start thread
[ 1252.310000] load_flat_binary done

The big time killers are all approximately linear in application size:

1. loading the binary from file into memory (0.3 seconds)
2. doing mmap() on the text/data/bss segment (0.28 seconds)
3. performing the relocations (0.25 seconds)
4. zeroing BSS and stack (0.15 seconds)

This all adds up to make it almost 1 second to load the app.

On a tiny executable like "cat" (8kb) the times are much much smaller.

I get similar results on both romfs and cramfs - romfs is slightly faster because no zlib decompression is required during the application load.

mmap() is slow because there is a big memset() to zero on the entire chunk of memory allocated by do_mmap_private.

dcaches are not enabled yet - I get lockups in the early network initialisation code otherwise. See arch/microblaze/kernel/setup.c. We should see some app load time improvement once we get the dcaches working.

Some big improvements would also come if we could convince the compiler to generate position independent code. This would give big savings:

1. No load-to-ram (we could XIP) and therefore no text code relocation performed. 2. The initial mmap() of the code would do the load into memory, avoiding do_mmap_private()'s memset to zero a page that we write over immediately anyway.

A PIC option appeared on mb-gcc a few generates ago but it's gone now - I am not sure it ever worked :(

- I have switched on the
  General setup --->
   Configure standard kernel features (for small systems)  --->
     BUG() support
  debug feature and got the following warning/error:
  BUG: warning at kernel/softirq.c:144/local_bh_enable()

Yep - I get this too with BUG enabled.  Haven't traced it yet.

Cheers,

John
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/