[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [microblaze-uclinux] kernel BUG at sched.c:687!



On Monday 24 April 2006 01:58, DeviPrasad Natesan wrote:
> Hi alejandro,
> Appreciate your response. Adding this along with ur stack checking
> code worked. I am going to leave the system up and running and find
> out whether it hangs due to kernel stack corruption.
>
> BTW, should i need to retain this disabling the interrupt in the
> switch_thread once i remove ur stack checking code. (In other words is
> this a bug that needs to be fixed. ?).

Hi Prasan,

This is not a bug. Just it is necessary for the stack checking code.

Regards.
>
> Thanx
> - DeviPrasad
>
> On 4/23/06, Alejandro Lucero <alucero@xxxxxxxxx> wrote:
> > Hi Prasad,
> >
> > Surely the problem is when an interrupt arrives in the middle of a switch
> > thread. I don't have this problem in my code since I have other
> > modifications to the entry.S file.
> >
> > It should work adding a BIPSET(r11) and BIPCLR(r11) to the switch_thread
> > code:
> >
> > C_ENTRY(switch_thread):
> >
> >         // Return the previous task
> >         // Do this before push_state, so that return value is on stack
> >         // Update the current task pointer
> >         //GET_CURRENT_TASK(CURRENT_TASK)
> > +     BIPSET(r11)
> >         RETRIEVE_CURRENT_TASK(r3);
> >
> >         // First, push the current processor state on the stack
> >         PUSH_STATE(SWITCH)
> >
> >         // Now save the location of the kernel stack pointer for this
> > thread; // since we've pushed all other state on the stack, this is
> > enough to // restore it all later.
> >         swi     r1, r5, THREAD_KSP;
> >         // Now restore the stack pointer from the new process
> >         lwi     r1, r6, THREAD_KSP;
> >         // ... and restore all state from that
> >         POP_STATE(SWITCH)
> >         // Update the current task pointer
> >         GET_CURRENT_TASK(CURRENT_TASK)
> >         // Now return into the new thread
> > +     BIPCLR(r11)
> >         rtsd    r15,8;
> >         nop;
> > C_END(switch_thread)
> >
> > On Saturday 22 April 2006 00:55, DeviPrasad Natesan wrote:
> > > Hi alexandro,
> > > I think still i still have problem using that code after putting that
> > > below SAVE_STATE. It gives some weird results  and go into loop, even
> > > with the minimum code that use to work ok before. May be i miss some
> > > point..!!!
> > >
> > > - Prasad
> > >
> > > On 4/21/06, Alejandro Lucero <alucero@xxxxxxxxx> wrote:
> > > > Hi Prasad,
> > > >
> > > > This is my fault. I sent an email 1 hour ago but it seems to be
> > > > delayed. I had a mistake in my instructions. The code must be AFTER
> > > > SAVE_STATE. If not the stack pointer used could be from user stack
> > > > and the check will fail giving a false positive.
> > > >
> > > > Sorry again.
> > > >
> > > > On Friday 21 April 2006 09:41, DeviPrasad Natesan wrote:
> > > > > Hi alexandro,
> > > > > I did put your stack checking code (with current ptr updated as
> > > > > mine ) and strange as it seems, the code goes into loop just when
> > > > > the linux coming up. (This seems to work normal without this code).
> > > > > I checked with the our custom hardware and also the ml403 eval
> > > > > board and it behaves the same. I removed all the custom application
> > > > > and it is a simple system with ethernet, uart with root mounted in
> > > > > the jffs2. This itself seems to corrupt the kernel stack (according
> > > > > to this test code).
> > > > >
> > > > > Does it mean that the last "current" task corrupts the stack?.  Got
> > > > > to continue debugging tomorrow. Seems totally illogical at this
> > > > > point.
> > > > >
> > > > > - Prasad
> > > > >
> > > > > On 4/20/06, Alejandro Lucero <alucero@xxxxxxxxx> wrote:
> > > > > > On Thursday 20 April 2006 13:29, Brettschneider Falk wrote:
> > > > > > > Hi Alejandro,
> > > > > > > thanks! Am I right, it will infinitely loop in case of a kernel
> > > > > > > stack overflow? If yes, how could I write a fixed value to a
> > > > > > > register address in that loop (to e.g. switch an LED on)? Going
> > > > > > > to try that soon... Cheers, F@lk
> > > > > >
> > > > > > If the CPU executes this infinity loop it means the kernel stack
> > > > > > size is more than 7500 bytes and this size is very close to 8192.
> > > > > > Moreover, the limit is 8192 - sizeof(struct task_struct) since
> > > > > > the process descriptor is at the bottom. This does not imply a
> > > > > > stack overflow but the odds are high.
> > > > > >
> > > > > > If you know the leds address (look at
> > > > > > arch/microblaze/platform/uclinux-auto/autoconfig.in) you can use
> > > > > > it with a swi instruction:
> > > > > >
> > > > > >         addi r11, r0, 0x1;  /* Assuming 0x00000001 put a led on
> > > > > > */ swi r11, r0, led_address;
> > > > > >
> > > > > > You can add this before the first nop instruction.
> > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: owner-microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > > > > [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxx]On Behalf Of
> > > > > > > > Alejandro Lucero
> > > > > > > > Sent: Thursday, April 20, 2006 1:50 PM
> > > > > > > > To: microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > > > > Subject: Re: [microblaze-uclinux] kernel BUG at sched.c:687!
> > > > > > > >
> > > > > > > > On Thursday 20 April 2006 10:31, Brettschneider Falk wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Alejandro Lucero wrote:
> > > > > > > > > > I assumed you are using the entry.S without my patch
> > > > > > > > > > reported two days ago.
> > > > > > > > > > aren't you?
> > > > > > > > >
> > > > > > > > > I've tried JWs version of your patch but it doesn't help as
> > > > > > > >
> > > > > > > > a bugfix. My
> > > > > > > >
> > > > > > > > > environment is one active user app with several threads
> > > > > > > >
> > > > > > > > (SCHED_RR), a high
> > > > > > > >
> > > > > > > > > IRQ frequency (about 2 per millisecond), many thread
> > > > > > > > > switches, many locks/unlocks of semaphores and mutexes.
> > > > > > > > > From time to time
> > > > > > > >
> > > > > > > > one thread of
> > > > > > > >
> > > > > > > > > that application calls pthread_cancel() to another thread.
> > > > > > > > > Often (about after 20 kill actions) this leads to either a
> > > > > > > >
> > > > > > > > Linux crash
> > > > > > > >
> > > > > > > > > (with several versions of "kernel BUG at sched.c:***"), or
> > > > > > > >
> > > > > > > > just a total
> > > > > > > >
> > > > > > > > > hang or an exit of the app with return code 5. (The
> > > > > > > > > statistical distribution is: displaying of scheduler bug =
> > > > > > > > > 0,01%, Linux
> > > > > > > >
> > > > > > > > hang = 60%,
> > > > > > > >
> > > > > > > > > process exit = rest.) I haven't the problems if either the
> > > > > > > >
> > > > > > > > IRQ frequency is
> > > > > > > >
> > > > > > > > > very low or no threads are cancelled(). That's why I asked
> > > > > > > >
> > > > > > > > you if you ever
> > > > > > > >
> > > > > > > > > tried to kill threads in your application, this increases
> > > > > > > >
> > > > > > > > the chance of a
> > > > > > > >
> > > > > > > > > Linux crash extremely here.
> > > > > > > >
> > > > > > > > Perhaps you could do some tests to discard the kernel stack
> > > > > > > > overflow. Try to
> > > > > > > > put this in your entry.S file but update the "current"
> > > > > > > > pointer and make sure
> > > > > > > > you are not using memory 0x554, 0x558, 0xc64 and 0xc68
> > > > > > > > (surely LMB memory).
> > > > > > > > This code looks at the kernel stack size and if it is greter
> > > > > > > > then 0x1d4c
> > > > > > > > (7500bytes) the system will execute an endless loop with
> > > > > > > > interrupts disabled.
> > > > > > > > In 0xc64 is stored the maximum kernel stack size used.
> > > > > > > >
> > > > > > > > Rembember to update current which is my kernel is in
> > > > > > > > 0x0213472c address. Use
> > > > > > > > objdump -t image.elf | grep current
> > > > > > > >
> > > > > > > > Try to put this in ENTRY(irq) just after swi r1, r0, ENTRY_SP
> > > > > > > > and before
> > > > > > > > SAVE_STATE
> > > > > > > >
> > > > > > > >                             swi r11, r0, 0x554
> > > > > > > >                     swi r12, r0, 0x558
> > > > > > > >                         lwi    r11, r0, 0x0213472c;
> > > > > > > >                         addi   r11, r11, 0x2000;
> > > > > > > >                         rsub   r11, r1, r11;
> > > > > > > >                         lwi    r12, r0, 0xc64;
> > > > > > > >                         swi    r11, r0, 0xc68;
> > > > > > > >                         rsub   r11, r11, r12;
> > > > > > > >                         bgei   r11, 1f;
> > > > > > > >                         lwi    r11, r0, 0xc68;
> > > > > > > >                         swi    r11, r0, 0xc64;
> > > > > > > >                         1:
> > > > > > > >                         lwi    r11, r0, 0x0213472c;
> > > > > > > >                         addi   r11, r11, 0x2000;
> > > > > > > >                         rsub   r11, r1, r11;
> > > > > > > >                         addi   r12, r0, 0x1d4c;
> > > > > > > >                         rsub   r11, r12, r11;
> > > > > > > >                         blei   r11, 2f;
> > > > > > > >                         lwi    r11, r0, 0;
> > > > > > > >                         mts    rmsr, r11;
> > > > > > > >                         nop;
> > > > > > > >                         nop;
> > > > > > > >                         nop;
> > > > > > > >                         bri -8;
> > > > > > > >                         2:
> > > > > > > >                     lwi r11, r0, 0x554
> > > > > > > >                     lwi r12, r0, 0x558
> > > > > > > >
> > > > > > > > > Cheers, F@lk
> > > > > > > > > ___________________________
> > > > > > > > > microblaze-uclinux mailing list
> > > > > > > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > > > > > Project Home Page :
> > > > > > >
> > > > > > > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > > > > >
> > > > > > > > Mailing List Archive :
> > > > > > > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > > > > >
> > > > > > --
> > > > > > Alejandro Lucero
> > > > > > Technical Director
> > > > > > +34 665 68 71 68
> > > > > > Valencia (SPAIN)
> > > > > > www.os3sl.com
> > > > > > ___________________________
> > > > > > microblaze-uclinux mailing list
> > > > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > > Project Home Page :
> > > > > > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux Mailing List
> > > > > > Archive :
> > > > > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > > > >
> > > > > ___________________________
> > > > > microblaze-uclinux mailing list
> > > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > Project Home Page :
> > > > > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux Mailing List
> > > > > Archive :
> > > > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > > >
> > > > --
> > > > Alejandro Lucero
> > > > Technical Director
> > > > +34 665 68 71 68
> > > > Valencia (SPAIN)
> > > > www.os3sl.com
> > > > ___________________________
> > > > microblaze-uclinux mailing list
> > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > Project Home Page :
> > > > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux Mailing List
> > > > Archive :
> > > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > >
> > > ___________________________
> > > microblaze-uclinux mailing list
> > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > Mailing List Archive :
> > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> >
> > --
> > Alejandro Lucero
> > Technical Director
> > +34 665 68 71 68
> > Valencia (SPAIN)
> > www.os3sl.com
> > ___________________________
> > microblaze-uclinux mailing list
> > microblaze-uclinux@xxxxxxxxxxxxxx
> > Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > Mailing List Archive :
> > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
>
> ___________________________
> microblaze-uclinux mailing list
> microblaze-uclinux@xxxxxxxxxxxxxx
> Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> Mailing List Archive :
> http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/

-- 
Alejandro Lucero
Technical Director
+34 665 68 71 68
Valencia (SPAIN)
www.os3sl.com
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/