[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [microblaze-uclinux] kernel BUG at sched.c:687!
Hi Prasad,
Surely the problem is when an interrupt arrives in the middle of a switch
thread. I don't have this problem in my code since I have other modifications
to the entry.S file.
It should work adding a BIPSET(r11) and BIPCLR(r11) to the switch_thread code:
C_ENTRY(switch_thread):
// Return the previous task
// Do this before push_state, so that return value is on stack
// Update the current task pointer
//GET_CURRENT_TASK(CURRENT_TASK)
+ BIPSET(r11)
RETRIEVE_CURRENT_TASK(r3);
// First, push the current processor state on the stack
PUSH_STATE(SWITCH)
// Now save the location of the kernel stack pointer for this thread;
// since we've pushed all other state on the stack, this is enough to
// restore it all later.
swi r1, r5, THREAD_KSP;
// Now restore the stack pointer from the new process
lwi r1, r6, THREAD_KSP;
// ... and restore all state from that
POP_STATE(SWITCH)
// Update the current task pointer
GET_CURRENT_TASK(CURRENT_TASK)
// Now return into the new thread
+ BIPCLR(r11)
rtsd r15,8;
nop;
C_END(switch_thread)
On Saturday 22 April 2006 00:55, DeviPrasad Natesan wrote:
> Hi alexandro,
> I think still i still have problem using that code after putting that
> below SAVE_STATE. It gives some weird results and go into loop, even
> with the minimum code that use to work ok before. May be i miss some
> point..!!!
>
> - Prasad
>
> On 4/21/06, Alejandro Lucero <alucero@xxxxxxxxx> wrote:
> > Hi Prasad,
> >
> > This is my fault. I sent an email 1 hour ago but it seems to be delayed.
> > I had a mistake in my instructions. The code must be AFTER SAVE_STATE. If
> > not the stack pointer used could be from user stack and the check will
> > fail giving a false positive.
> >
> > Sorry again.
> >
> > On Friday 21 April 2006 09:41, DeviPrasad Natesan wrote:
> > > Hi alexandro,
> > > I did put your stack checking code (with current ptr updated as mine
> > > ) and strange as it seems, the code goes into loop just when the linux
> > > coming up. (This seems to work normal without this code). I checked
> > > with the our custom hardware and also the ml403 eval board and it
> > > behaves the same. I removed all the custom application and it is a
> > > simple system with ethernet, uart with root mounted in the jffs2. This
> > > itself seems to corrupt the kernel stack (according to this test
> > > code).
> > >
> > > Does it mean that the last "current" task corrupts the stack?. Got to
> > > continue debugging tomorrow. Seems totally illogical at this point.
> > >
> > > - Prasad
> > >
> > > On 4/20/06, Alejandro Lucero <alucero@xxxxxxxxx> wrote:
> > > > On Thursday 20 April 2006 13:29, Brettschneider Falk wrote:
> > > > > Hi Alejandro,
> > > > > thanks! Am I right, it will infinitely loop in case of a kernel
> > > > > stack overflow? If yes, how could I write a fixed value to a
> > > > > register address in that loop (to e.g. switch an LED on)? Going to
> > > > > try that soon... Cheers, F@lk
> > > >
> > > > If the CPU executes this infinity loop it means the kernel stack size
> > > > is more than 7500 bytes and this size is very close to 8192.
> > > > Moreover, the limit is 8192 - sizeof(struct task_struct) since the
> > > > process descriptor is at the bottom. This does not imply a stack
> > > > overflow but the odds are high.
> > > >
> > > > If you know the leds address (look at
> > > > arch/microblaze/platform/uclinux-auto/autoconfig.in) you can use it
> > > > with a swi instruction:
> > > >
> > > > addi r11, r0, 0x1; /* Assuming 0x00000001 put a led on */
> > > > swi r11, r0, led_address;
> > > >
> > > > You can add this before the first nop instruction.
> > > >
> > > > > > -----Original Message-----
> > > > > > From: owner-microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > > [mailto:owner-microblaze-uclinux@xxxxxxxxxxxxxx]On Behalf Of
> > > > > > Alejandro Lucero
> > > > > > Sent: Thursday, April 20, 2006 1:50 PM
> > > > > > To: microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > > Subject: Re: [microblaze-uclinux] kernel BUG at sched.c:687!
> > > > > >
> > > > > > On Thursday 20 April 2006 10:31, Brettschneider Falk wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Alejandro Lucero wrote:
> > > > > > > > I assumed you are using the entry.S without my patch reported
> > > > > > > > two days ago.
> > > > > > > > aren't you?
> > > > > > >
> > > > > > > I've tried JWs version of your patch but it doesn't help as
> > > > > >
> > > > > > a bugfix. My
> > > > > >
> > > > > > > environment is one active user app with several threads
> > > > > >
> > > > > > (SCHED_RR), a high
> > > > > >
> > > > > > > IRQ frequency (about 2 per millisecond), many thread switches,
> > > > > > > many locks/unlocks of semaphores and mutexes. From time to time
> > > > > >
> > > > > > one thread of
> > > > > >
> > > > > > > that application calls pthread_cancel() to another thread.
> > > > > > > Often (about after 20 kill actions) this leads to either a
> > > > > >
> > > > > > Linux crash
> > > > > >
> > > > > > > (with several versions of "kernel BUG at sched.c:***"), or
> > > > > >
> > > > > > just a total
> > > > > >
> > > > > > > hang or an exit of the app with return code 5. (The statistical
> > > > > > > distribution is: displaying of scheduler bug = 0,01%, Linux
> > > > > >
> > > > > > hang = 60%,
> > > > > >
> > > > > > > process exit = rest.) I haven't the problems if either the
> > > > > >
> > > > > > IRQ frequency is
> > > > > >
> > > > > > > very low or no threads are cancelled(). That's why I asked
> > > > > >
> > > > > > you if you ever
> > > > > >
> > > > > > > tried to kill threads in your application, this increases
> > > > > >
> > > > > > the chance of a
> > > > > >
> > > > > > > Linux crash extremely here.
> > > > > >
> > > > > > Perhaps you could do some tests to discard the kernel stack
> > > > > > overflow. Try to
> > > > > > put this in your entry.S file but update the "current"
> > > > > > pointer and make sure
> > > > > > you are not using memory 0x554, 0x558, 0xc64 and 0xc68
> > > > > > (surely LMB memory).
> > > > > > This code looks at the kernel stack size and if it is greter
> > > > > > then 0x1d4c
> > > > > > (7500bytes) the system will execute an endless loop with
> > > > > > interrupts disabled.
> > > > > > In 0xc64 is stored the maximum kernel stack size used.
> > > > > >
> > > > > > Rembember to update current which is my kernel is in
> > > > > > 0x0213472c address. Use
> > > > > > objdump -t image.elf | grep current
> > > > > >
> > > > > > Try to put this in ENTRY(irq) just after swi r1, r0, ENTRY_SP
> > > > > > and before
> > > > > > SAVE_STATE
> > > > > >
> > > > > > swi r11, r0, 0x554
> > > > > > swi r12, r0, 0x558
> > > > > > lwi r11, r0, 0x0213472c;
> > > > > > addi r11, r11, 0x2000;
> > > > > > rsub r11, r1, r11;
> > > > > > lwi r12, r0, 0xc64;
> > > > > > swi r11, r0, 0xc68;
> > > > > > rsub r11, r11, r12;
> > > > > > bgei r11, 1f;
> > > > > > lwi r11, r0, 0xc68;
> > > > > > swi r11, r0, 0xc64;
> > > > > > 1:
> > > > > > lwi r11, r0, 0x0213472c;
> > > > > > addi r11, r11, 0x2000;
> > > > > > rsub r11, r1, r11;
> > > > > > addi r12, r0, 0x1d4c;
> > > > > > rsub r11, r12, r11;
> > > > > > blei r11, 2f;
> > > > > > lwi r11, r0, 0;
> > > > > > mts rmsr, r11;
> > > > > > nop;
> > > > > > nop;
> > > > > > nop;
> > > > > > bri -8;
> > > > > > 2:
> > > > > > lwi r11, r0, 0x554
> > > > > > lwi r12, r0, 0x558
> > > > > >
> > > > > > > Cheers, F@lk
> > > > > > > ___________________________
> > > > > > > microblaze-uclinux mailing list
> > > > > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > > > > Project Home Page :
> > > > >
> > > > > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > > >
> > > > > > Mailing List Archive :
> > > > > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > > >
> > > > --
> > > > Alejandro Lucero
> > > > Technical Director
> > > > +34 665 68 71 68
> > > > Valencia (SPAIN)
> > > > www.os3sl.com
> > > > ___________________________
> > > > microblaze-uclinux mailing list
> > > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > > Project Home Page :
> > > > http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux Mailing List
> > > > Archive :
> > > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> > >
> > > ___________________________
> > > microblaze-uclinux mailing list
> > > microblaze-uclinux@xxxxxxxxxxxxxx
> > > Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > > Mailing List Archive :
> > > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
> >
> > --
> > Alejandro Lucero
> > Technical Director
> > +34 665 68 71 68
> > Valencia (SPAIN)
> > www.os3sl.com
> > ___________________________
> > microblaze-uclinux mailing list
> > microblaze-uclinux@xxxxxxxxxxxxxx
> > Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> > Mailing List Archive :
> > http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
>
> ___________________________
> microblaze-uclinux mailing list
> microblaze-uclinux@xxxxxxxxxxxxxx
> Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
> Mailing List Archive :
> http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/
--
Alejandro Lucero
Technical Director
+34 665 68 71 68
Valencia (SPAIN)
www.os3sl.com
___________________________
microblaze-uclinux mailing list
microblaze-uclinux@xxxxxxxxxxxxxx
Project Home Page : http://www.itee.uq.edu.au/~jwilliams/mblaze-uclinux
Mailing List Archive : http://www.itee.uq.edu.au/~listarch/microblaze-uclinux/