[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microblaze-uclinux] Bad page state in process 'swapper'



In both configurations networking is disabled in menuconfig.

My method of preventing interrupts was to comment out the code in the obp_intc_enable() and opb_intc_end() functions in arch/microblaze/kernel/opb_int.c file. It seems to me that all enabling of interrupts to service peripherals had to be routed through these functions. This doesn't stop interrupts from being enabled at the processor level, but should silence interrupts routed through the interrupt controller which should include all peripherals. Of course there may be code elsewhere the monkeys with the interrupt controller, but that would be really bad.

To always intialize memory to a known configuration we created an image.bin that zeroed out all memory in the entire contiguous 32M SDRAM region not occupied by the kernel and romfs. This image had no discernible change in the crashing behavior.

With regards to the cache, I'm not sure if CONFIG_XILINX_UNCACHED_SHADOW should be set or unset. I'm wondering if the code in arch/microblaze/mm/consistent.c and init.c are doing the right thing.

Something we are doing is using the old Suzaku EDK projects to generate the bitstream the FPGA. I'm just enabling the Petalinux BSP in that project. For the Memec I created a project from scrath using the Base System Builder in the EDK. Are you using updated EDK projects for the Gemini, or using the Base System Builder to create the projects. I'm wondering if there are some legacy Xilinx settings making it into the auto-config files that are causing the kernel some grief.

BTW, I'm also using the 8.2 EDK/ISE.

Mike

Cor Venner wrote:
Mike,

it is not that I was doubting the hardware. I have tested my case with an extension of the memory test program.

The point that I wanted to make is that with the SDRAM case when compared to DDR case some different info is passed from the Xilinx hardware build to the Petalinux software build. This could result e.g. in a different control of the cache as you already stated. However I would expect that a cache never can go wrong in a flat memory system. Or it must be controlled in a wrong way (due to a difference of the SDRAM/DDR info????).

I think that consistency in the crash behaviour can't be expected as long as no interrupts occur. At that point there will be a mutliple of interacting "processes" influenced by the factor "time". However you wrote that you zeroed memory (something that I was also thinking of) and disabled interrupts.

Did you disable the interrupts by hardware? In software they could be enabled again by a regular part of the kernel.

Cor

----- Original Message ----- From: "Mike Thompson" <mike@xxxxxxxxxx>
To: <microblaze-uclinux@xxxxxxxxxxxxxx>
Sent: Saturday, November 10, 2007 5:43 PM
Subject: Re: [microblaze-uclinux] Bad page state in process 'swapper'


Cor,

I'll look to provide more details on the Suzaku (not working) and Memec (working) differences when I get back into the office on Monday. However, I am using the exact same FPGA bitstream with the Petalinux 2.6 kernel that I am using with the more ancient Suzaku uClinux 2.4 kernel. Since it is working fine with the Suzaku 2.4 kernel it at least assures me the hardware configuration is in a stable and working configuration when the drivers are properly configured.

The difference between SDRAM and DDR SDRAM didn't occur to me. I'll look into this as well on my side. Something I have noticed is that my crash results are not entirely consistent and sometimes it will crash prior to the Dentry/Inode-cache entry appearing on the console. It never gets past those two entries though and the 'Bad page state in process 'swapper'' will appear. Such inconsistent crashing could potentially indicate some memory problem.

One diagnostic attempt we made was to set all memory to a precise known state and disable the enabling of interrupts and see what the result was. We still were getting inconsistent crashing although we expected to find the system to crash in exactly the same place each time. This is perhaps another clue that memory is unstable with the 2.6 kernel.

Unfortunately, I'm not seeing where the memory controller is exposed as a peripheral to the Microblaze so I don't see how the 2.4 kernel could be doing anything different in memory than the 2.6 kernel. Perhaps the issue is in the handling of the instruction/data cache controller as that could also cause memory issues. Just a wild guess though on my part at this time.

I'll provide more details on the other information you mention in a follow up email.

Mike

Cor Venner wrote:
Hello Mike and everyone who is interested,

let us share some more details to look if we can identify a pattern:
- I use the image.srec as source for the kernel and fsrom, I expect that most of us use image.bin or image.ub
- networking is disabled in "menuconfig"
- initially the board had 128Mbyte SDRAM (not DDR) configured, which is quite al lot when compared to the evaluation boards mostly used. I have brought this down to 32Mbyte in Kconfig. The difference that I could notice was that the downsized memory version was able to reach the "Memory: 29378k/32768k available" and "Calibrating delay loop.." messages while backtracing. So it seems that there is some multi processing is going on. - This SDRAM<>DDR difference is quite intriguing. As far as I can remember is the Memec board a DDR board, the SZ130 is a SDRAM board as far as I could find out on the web. My Gemini board is also SDRAM. Could it be something in the configuration info of the IP?? - I have applied the peripheral test functions as provided by Xilinx. They run successfully. Unfortunately the do not test the serial interface interrupt. I expect that the timer and interrupt controller are running correctly. See the previous statement. - Xilinx tests the memory in a simple way. They try if they can do char, short and long accesses. I have extended this test and it seems to run succesfully on 128Mbyte of SDRAM

I don't quite understand the relation between printk() and early_printk(). At what point has printk() to take over the role of early_printk(). And what is happening when you apply the "keep" option in init_early_printk() on a correct running system. Is the info displayed twice?

Mike, is this something which you can try on the Memec board?

Best regards, Cor Venner


___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/



___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/



___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/