Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

native: Resurect native/perif/timer and handle #715 (WIP) #3282

Merged
merged 6 commits into from
Jul 9, 2015
Merged

native: Resurect native/perif/timer and handle #715 (WIP) #3282

merged 6 commits into from
Jul 9, 2015

Conversation

benoit-canet
Copy link
Contributor

-An eINT() was missing before calling the trampoline.
-A potential dINT()/eINT() recursive call has been eliminated.
-schedule_timer() was setting the native timer while signals where disabled this could leads to SIGALARM being blocked.
-Add a define for the native platform timer resolution
-A bunch of case in the signal handler where mishandled leading to usefull signal being not processed.So a mechanism to postpone the processing of these usefull signals was added.

Pass the hwtimer* and vtimer* tests.

@benoit-canet
Copy link
Contributor Author

There is still a very rare early random crash of hwtimer_wait() I will look at it.

@benoit-canet benoit-canet changed the title native: Fix some bugs in IRQ, signals, and hwtimer native: Fix some bugs in IRQ, signals, and hwtimer (wip) Jul 1, 2015
@OlegHahm OlegHahm added Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors) Platform: native Platform: This PR/issue effects the native platform labels Jul 1, 2015
@@ -325,6 +385,7 @@ void native_isr_entry(int sig, siginfo_t *info, void *context)
_native_cur_ctx = (ucontext_t *)sched_active_thread->sp;

DEBUG("\n\n\t\tnative_isr_entry: return to _native_sig_leave_tramp\n\n");
dINT();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use enableIRQ() instead of the deprecated dINT

@haukepetersen
Copy link
Contributor

Nice one, pulling this into my network stack test branch prevents some ISR error messages at the least... Let us know once we can remove the WIP and test this in more depth!

@OlegHahm OlegHahm added the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Jul 1, 2015
@kaspar030
Copy link
Contributor

@benoit-canet Do you mind also fixing #2870? That periph timer implementation is basically using the same code, and I think you could adapt the fixes a lot quicker than me.

@benoit-canet
Copy link
Contributor Author

@kaspar030 could #2870 be implemented on top of native hwtimers ?
I am putting some effort resurecting it and would like the code be reused.

@benoit-canet
Copy link
Contributor Author

Ah !

I just saw you created RIOT.

I think I will finish to polish hwtimer first by fixing corner case bugs in native.
Then I'll probably move to #2870.

@haukepetersen
Copy link
Contributor

Just one idea: does it make sense to switch over to the periph_timer entirely on native and use the hwtimer_compat module on top of that for now?

@kaspar030
Copy link
Contributor

does it make sense to switch over to the periph_timer entirely on native and use the hwtimer_compat module on top of that for now?

Totally, as it would bring native closer to many other platforms, and get the periph/timer implementation some testing until we switch away from hwtimer/vtimer completely.

@haukepetersen
Copy link
Contributor

That's what I wanted to hear :-)

@kaspar030
Copy link
Contributor

@benoit-canet hwtimer will be phased out soon, as we think the slimmer periph/timer interface coupled with a redesigned higher level interface is the way to go.

#2870 is basically copy&paste from native hwtimer code to implement the periph/timer interface. It actually makes more sense to concentrate on stabilizing that.

Compare #2926 to see where we're headed.

@benoit-canet benoit-canet changed the title native: Fix some bugs in IRQ, signals, and hwtimer (wip) native: Fix some bugs in IRQ, signals, and hwtimer Jul 2, 2015
@benoit-canet
Copy link
Contributor Author

Covered a few more corner cases that could cause hwtimer_wait stall.

There is still a rare intermitent crash but I think it related on how context switchs are done
on the native platform: the cpu try to jump to a stack start...

I test with:

--#!/bin/bash

make clean all-debug

while :
do

screen -d -m -L gdb -ex 'run > log' ./bin/native/hwtimer_wait.elf
pid=$!
sleep 9

success=$(cat log|grep success)

if [ "$success" != "success" ]
then
echo "\nBUG\n"
exit
fi

pkill screen
pkill hwtimer

done

@benoit-canet
Copy link
Contributor Author

@kaspar030 I'll look a these issues/PR.

@benoit-canet benoit-canet changed the title native: Fix some bugs in IRQ, signals, and hwtimer native: Resurect native/perif/timer and handle #715 (WIP) Jul 2, 2015
@benoit-canet
Copy link
Contributor Author

@kaspar030 I resurected your perif/timer series an rebased it in top of my patches.

@benoit-canet
Copy link
Contributor Author

@kaspar030

Why the idle_thread never try to thread_yield_higher ?

What if it's stuck in lpm's pause and the signal that wake it up does not reschedule something else ?

@kaspar030
Copy link
Contributor

Why the idle_thread never try to thread_yield_higher ?

In RIOT, a thread is always run if there's no higher thread waiting to be run.

CPU time is never time-sliced between multiple running threads, the scheduler is completely tickless.
This implies that if the idle_thread gets scheduled, all other threads are in some sleep mode.

What if it's stuck in lpm's pause and the signal that wake it up does not reschedule something else ?

Interrupts are supposed to trigger the scheduler, i.e., by setting "sched_context_switch_request = 1", if another thread might need scheduling. So if idle goes to LPM, then an interrupt occurs, and another thread needs to be run, the scheduler would switch to this other thread, and only if this other thread (and all others) are finished, switch back to the idle thread.

@benoit-canet
Copy link
Contributor Author

@kaspar030 thanks for the clear explanation.

@benoit-canet
Copy link
Contributor Author

@kaspar030 work fine on two machines now up for review

@benoit-canet
Copy link
Contributor Author

I don't have access to an osX machine for testing though.

@kaspar030
Copy link
Contributor

@benoit-canet I've tried the ng_networking example. It loses some packets (compared to master, which loses none). Also, when I hammer it with pings from the linux box, it crashes at some point:

[kaspar@booze ng_networking (benoit-timer)]$ /home/kaspar/src/riot.8/examples/ng_networking/bin/native/ng_networking.elf tap0
RIOT native uart0 initialized.
RIOT native interrupts/signals initialized.
LED_GREEN_OFF
LED_RED_ON
RIOT native board initialized.
RIOT native hardware initialization complete.

kernel_init(): This is RIOT! (Version: 2014.12-2057-g2f60e5-booze-benoit-timer)
kernel_init(): jumping into first task...
UART0 thread started.
uart0_init() [OK]
RIOT network stack example application
All up, running the shell now
> ifconfig
ifconfig
Iface  7   HWaddr: da:5d:58:4d:66:16 
           inet6 addr: ff02::1/128  scope: local [multicast]
           inet6 addr: fe80::d85d:58ff:fe4d:6616/64  scope: local
           inet6 addr: ff02::1:ff4d:6616/128  scope: local [multicast]

> *** stack smashing detected ***: /home/kaspar/src/riot.8/examples/ng_networking/bin/native/ng_networking.elf terminated
Segmentation fault (core dumped)
[kaspar@booze ng_networking (benoit-timer)]$

master works fine (unless two packets arrive at the same time).

edit that was on Linux.

@benoit-canet
Copy link
Contributor Author

The trampoline removal is the culprit.

I'll try without it.

@benoit-canet
Copy link
Contributor Author

@kaspar030 Everything seems to work fine without the trampoline related commits. I will run the hwtimer_wait test a few hours and if it work I'll postpone my no assembly crusade for later ;)

@benoit-canet
Copy link
Contributor Author

@kaspar030 works fine now including the ping6 -f test.

@kaspar030
Copy link
Contributor

It doesn't always freeze...

@kaspar030
Copy link
Contributor

Enabling debug in timer.c and hwtimer.c makes it freeze a lot less. ;(

@kaspar030
Copy link
Contributor

The problem is vtimer setting an absolute timeout in the past.

@PeterKietzmann
Copy link
Member

In that case the problem shouldn't be related to OSX right? tests/vtimer_msg works for me. Is that still the misbehaving application you're talking about?

@kaspar030
Copy link
Contributor

No, it's not vtimer. Seems like using "timer_set_absolute()" with very small offsets causes timer.c to underflow, e.g., set a hwtimer in the past.

@benoit-canet
Copy link
Contributor Author

@kaspar030 that what NATIVE_TIMER_MIN_RES is suposed to prevent

@benoit-canet
Copy link
Contributor Author

@kaspar030 but increasing it will slow down test/hwtimer_wait test further

@thomaseichinger
Copy link
Member

@benoit-canet with NATIVE_TIMER_MIN_RES increased to 2000 it still freezes (sometimes)

@benoit-canet
Copy link
Contributor Author

@thomaseichinger Could you test this ? http://paste.debian.net/280649/

@benoit-canet
Copy link
Contributor Author

@thomaseichinger and this http://paste.debian.net/280657/ to test @kaspar030 hypothesis

@jnohlgard
Copy link
Member

@kaspar030 Does #3147 help with the vtimer issue?

@thomaseichinger
Copy link
Member

patches result
[1] some improvement, i.e. less freezes
[1]+[2] no further improvement
[1]+[2]+[3] didn't see a freeze in 20 runs
[3] didn't see a freeze in 20 runs

[1] http://paste.debian.net/280649/
[2] http://paste.debian.net/280657/
[3] #3147

@kaspar030
Copy link
Contributor

Same here (didn't try [3]).

@kaspar030
Copy link
Contributor

[2] doesn't fix anything, delta calculation needs to be cast to (int32_t) to actually get a negative value.
(```int64_t delta = (int32_t)(value - now);)

@benoit-canet
Copy link
Contributor Author

[2] should be delta <= 0 also to avoid cancelling the timer.

@benoit-canet
Copy link
Contributor Author

@kaspar030 is it expected from the upper layer that setting a relative timer to 0 does a cancel ? Or should we rework timer.c ?

@kaspar030
Copy link
Contributor

@benoit-canet No, setting to absolute 0 shouldn't cancel.

@benoit-canet
Copy link
Contributor Author

Ok I'l prepare a series

@kaspar030
Copy link
Contributor

I have a weird effect. If catching underflows like this:

int timer_set_absolute(tim_t dev, int channel, unsigned int value)
{
    uint32_t now = timer_read(dev);
    int64_t target = (int32_t)(value - now);

    DEBUG("timer_set_absolute(): delta=%lli\n", target);
    if (target < 0 && target > -NATIVE_TIMER_MIN_RES) {
        DEBUG("timer_set_absolute(): preventing underflow.\n");
        target = now + NATIVE_TIMER_MIN_RES;
    }   

    timer_set(dev, channel, target);

    return 1;
}

When the underflow happens, the test seems to delay exactly the amount of time timer_read() would return.

edit nevermind, I see it...

@OlegHahm
Copy link
Member

OlegHahm commented Jul 9, 2015

No, it's not vtimer. Seems like using "timer_set_absolute()" with very small offsets causes timer.c to underflow, e.g., set a hwtimer in the past.

Can't we just replace this call by timer_set() and get rid of this function completely? IIRC vtimer is the only user of this function and even vtimer is not really needing it.

@kaspar030
Copy link
Contributor

wtimer also uses it, it's needed there to e.g., set a timer exactly to overflow. But wtimer checks a lot better for underflows.

@kaspar030
Copy link
Contributor

@benoit-canet I have a patch ready to prevent the underflow.

@kaspar030
Copy link
Contributor

See #3344. Together with [1], this makes the test run fine for me.

@benoit-canet
Copy link
Contributor Author

@kaspar030 I did a pr for the zero value case

@kaspar030
Copy link
Contributor

alright, @thomaseichinger, could you confirm that master + #3344 works for you?

@benoit-canet
Copy link
Contributor Author

@kaspar030 any idea for the slowliness of the hwtimer_wait test case on native ? Should we define the idea of a minimal timer resolution in a central place and bake it in the hwtimer_wait/main.c loop test ?

@thomaseichinger
Copy link
Member

@kaspar030 didn't see any freezes with #3344.

@PeterKietzmann
Copy link
Member

@benoit-canet I stopped following your debugging steps here and in the other PRs so I can't comment in detail. In general I'd like to have an application that does what it says :-). If this is dependent on general timer specifications (like granularity), a global macro and a comment about the behaviour seem to be reasonable.

@jnohlgard jnohlgard mentioned this pull request Jul 21, 2015
7 tasks
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Platform: native Platform: This PR/issue effects the native platform State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Type: bug The issue reports a bug / The PR fixes a bug (including spelling errors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants