Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Assertion failed (vector overflow) during initialization of ToDPDKDevice #1

Closed
davidek opened this issue Feb 17, 2016 · 4 comments
Closed

Comments

@davidek
Copy link
Contributor

davidek commented Feb 17, 2016

Hi Tom,

(I realize there are no issues yet, hope this is the approppriate place for bug reports/help)

I was dealing with a failed assertion raised by the Vector indexing operation. Here's the stack trace just before the failure:

#0  operator[] (i=<optimized out>, this=<optimized out>) at ../include/click/vector.hh:292
#1  thread_for_queue (queue=<optimized out>, this=<optimized out>) at ../elements/userlevel/queuedevice.hh:259
#2  QueueDevice::get_runnable_threads (this=0xaf6c60, bmk=...) at ../elements/userlevel/queuedevice.hh:153
#3  0x0000000000677859 in InputThreadVisitor::visit (this=0x7fffffffddd0, e=<optimized out>) at ../lib/element.cc:1676
#4  0x000000000068f6c1 in Router::visit (this=0xaf5770, first_element=first_element@entry=0xaff3f0, forward=forward@entry=false, 
    first_port=first_port@entry=-1, visitor=visitor@entry=0x7fffffffddd0) at ../lib/router.cc:926
#5  0x0000000000675a9e in Element::get_threads (this=this@entry=0xaff3f0, is_pull=is_pull@entry=false) at ../lib/element.cc:1700
#6  0x000000000064c32e in QueueDevice::initialize_tx (this=this@entry=0xaff3f0, errh=errh@entry=0x7fffffffdf10)
    at ../elements/userlevel/queuedevice.cc:99
#7  0x0000000000651cd7 in ToDPDKDevice::initialize (this=0xaff3f0, errh=0x7fffffffdf10) at ../elements/userlevel/todpdkdevice.cc:81
#8  0x0000000000692cb8 in Router::initialize (this=this@entry=0xaf5770, errh=errh@entry=0xa9e9c0) at ../lib/router.cc:1233
#9  0x000000000065b10e in parse_configuration (text=..., text_is_expr=text_is_expr@entry=false, hotswap=hotswap@entry=false, 
    errh=errh@entry=0xa9e9c0) at click.cc:392
#10 0x000000000065c4e6 in main (argc=<optimized out>, argv=<optimized out>) at click.cc:703

Where:

  • At frames #7-#5, this is an instance of ToDPDKDevice
  • At frame #2, instead, this is a FromDPDKDevice which (if I'm not mistaken) has not been initialized yet:
(gdb) frame 2
#2  QueueDevice::get_runnable_threads (this=0xaf6c60, bmk=...) at ../elements/userlevel/queuedevice.hh:153
153                                     bmk[thread_for_queue(i) - j] = 1;
(gdb) p this->class_name()
$21 = 0x72e058 "FromDPDKDevice"
(gdb) p this->_queue_to_thread
$22 = {vm_ = {l_ = 0x0, n_ = 0, capacity_ = 0}}

For the moment I've been able to fix this by forcing FromDPDKDevice to initialized earlier patching its configure_phase (which actually also influences initalization order):

int configure_phase() const {
    return CONFIGURE_PHASE_PRIVILEGED - 1;  // was CONFIGURE_PHASE_PRIVILEGED
}

Not sure whether this could have subtle drawbacks, would you suggest any other approach?

Some more context follows.

  • Excerpt from the configuration:

    from0 :: FromDPDKDevice(0, MINQUEUES 2, MAXTHREADS 64, RSS_AGGREGATE yes);
    from1 :: FromDPDKDevice(1, MINQUEUES 2, MAXTHREADS 64, RSS_AGGREGATE yes);
    from0, from1 => ... => ToDPDKDevice(0), ToDPDKDevice(1);
    

    This appears to happen only when there is actually something in the middle, but I guess this depends on how Click chooses the order for configuration/initialization.

  • configure line

    ./configure --enable-multithread --disable-linuxmodule --enable-intel-cpu --enable-user-multithread --verbose CFLAGS="-g -Og" CXXFLAGS="-g -std=gnu++11 -Og" --disable-dynamic-linking --enable-poll --enable-bound-port-transfer --with-netmap=no --enable-zerocopy --disable-dpdk-pools --enable-dpdk --disable-batch --with-numa=no
    

    It's almost like the one in the README except for -Og, --disable-batch, and --with-numa=no (those have other unrelated stories)

  • uname -a

    Linux anslab1 3.16.0-50-generic Prevent division with zero in TimestampDiff average handler #67~14.04.1-Ubuntu SMP Fri Oct 2 22:07:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

  • gcc --version

    gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4

  • DPDK version: 2.2.0

Besides the issue, thanks for all the open source work.

Best regards,
Davide Kirchner

@tbarbette
Copy link
Owner

Thanks for reporting the issue, this is indeed a good place to do so.

What's your launch command? And particularly the number of threads. I'll try to reproduce the error. I changed some things arround get_runnable_threads not long ago, I may have introduced a regression...

Tom

@davidek
Copy link
Contributor Author

davidek commented Feb 17, 2016

When I fisrt found it I was running with -c 0x3 -n 2, but it also happens with -c 0x1 -n 1 -- -f testconf.click (the stack traces refer to this latter case, but I doubt there is any difference).

I also managed to get a configuration file to cause the crash without requiring our custom elements (it appears that in my instance doing some stretching with compund elements causes Click to schedule initalization differently):

io :: {
  input[0,1] => ToDPDKDevice(0), ToDPDKDevice(1);
  from0 :: FromDPDKDevice(0, MINQUEUES 2, MAXTHREADS 64, RSS_AGGREGATE yes);
  from1 :: FromDPDKDevice(1, MINQUEUES 2, MAXTHREADS 64, RSS_AGGREGATE yes);
  from0, from1 => [0,1]output;
}();
io[0,1] => [0,1]io;

@tbarbette
Copy link
Owner

Ok I reproduced it, I'm looking into it ;)

@tbarbette
Copy link
Owner

Your approach is the good one. It's what's done with FromDevice / ToDevice and what I also did with FromNetmapDevice/ToNetmapDevice. Forgot that with DPDK...

The problem is that in FromDPDKDevice, the get_runnable_thread() function depends on this element being initialized. But to initialize ToDPDKDevice, in our model we need to know which elements will "launch" threads so we can find which threads will end up in the ToDPDKDevice and allocate the minimal number of queues. So ToDPDKDeive call get_runnable_threads() on upstream elements in its own initialization routine. It's still a little unclean. I was thinking of adding a thread-initialization phase or something like that.

But last commit a62349e solve your problem, and no it should not introduce any bug. Thanks for reporting !

gkatsikas referenced this issue Nov 15, 2018
* EnsureDPDKBuffer: handle Packet::make errors gracefully

* Fix previous commit

* Correct fix...
tbarbette pushed a commit that referenced this issue Oct 21, 2021
Pull latest change from main
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants