dpss segmentation faults

Document created by cdnadmin on Jan 25, 2014
Version 1Show Document
  • View in full screen mode
This document was generated from CDN thread

Created by: Viktor S. Wold Eide on 27-05-2013 10:28:38 AM
Hi,

I am experiencing segmentation faults for the dpss process (dpss_mp_32-0.7.0.503), as illustrated by some segfault from today below. Some of the segfaults are for the same reason. Any suggestions in this respect are welcome.

Best regards
Viktor
 
Thu Jan 01 04:16:17.485 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 05:13:56.494 CET : onep_dpss_transport.c:82: platform messaging: Not received IPv4 encapsulated GRE packet from the platform
Segmentation fault


Thu Jan 01 05:15:01.816 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 05:22:20.598 CET : src/shared/onep_dpss_paktype.c:97: platform messaging: Unexpected section with section type : 9
Segmentation fault


Thu Jan 01 05:23:15.721 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 05:32:44.037 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault


Thu Jan 01 05:45:34.520 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 05:50:55.528 CET : src/shared/onep_dpss_paktype.c:104: platform messaging: Decoding of message failed with error code : invalid section type ,for section type : 2
Segmentation fault


Thu Jan 01 06:14:43.162 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 06:35:34.235 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault


Thu Jan 01 06:36:30.160 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 06:57:34.187 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault


Thu Jan 01 07:17:06.385 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 07:18:18.470 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault

Thu Jan 01 07:27:48.640 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDSegmentation fault


Thu Jan 01 07:40:54.818 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 07:43:15.772 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault

Subject: Re: New Message from Viktor S. Wold Eide in onePK - Troubleshooting: dpss s
Replied by: Einar Nilsen-Nygaard on 27-05-2013 04:27:23 PM
By any chance do you have NAT enabled?

If you have a core file you are willing to share, that would be helpful.

Cheers,

Einar

On May 27, 2013, at 4:28 PM, Cisco Developer Community Forums <cdicuser@developer.cisco.com<mailto:cdicuser@developer.cisco.com>> wrote:

Viktor S. Wold Eide has created a new message in the forum "Troubleshooting": -------------------------------------------------------------- Hi,

I am experiencing segmentation faults for the dpss process (dpss_mp_32-0.7.0.503), as illustrated by some segfault from today below. Some of the segfaults are for the same reason. Any suggestions in this respect are welcome.

Best regards
Viktor

Thu Jan 01 04:16:17.485 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 05:13:56.494 CET : onep_dpss_transport.c:82: platform messaging: Not received IPv4 encapsulated GRE packet from the platform
Segmentation fault


Thu Jan 01 05:15:01.816 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 05:22:20.598 CET : src/shared/onep_dpss_paktype.c:97: platform messaging: Unexpected section with section type : 9
Segmentation fault


Thu Jan 01 05:23:15.721 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 05:32:44.037 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault


Thu Jan 01 05:45:34.520 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 05:50:55.528 CET : src/shared/onep_dpss_paktype.c:104: platform messaging: Decoding of message failed with error code : invalid section type ,for section type : 2
Segmentation fault


Thu Jan 01 06:14:43.162 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 06:35:34.235 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault


Thu Jan 01 06:36:30.160 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 06:57:34.187 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault


Thu Jan 01 07:17:06.385 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 07:18:18.470 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault

Thu Jan 01 07:27:48.640 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDSegmentation fault


Thu Jan 01 07:40:54.818 CET : src/main/rpc_client/onep_dpss_main_rpc_client.c:103: onep_dpss_connect_to_client_rpc_server: CONNECTEDThu Jan 01 07:43:15.772 CET : src/shared/onep_dpss_paktype.c:40: platform messaging: Decoding of section header failed with error: invalid section type
Segmentation fault
--
To respond to this post, please click the following link: http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/15627789 or simply reply to this email.

Subject: RE: dpss segmentation faults
Replied by: Joseph Clarke on 28-05-2013 03:35:39 AM
Actually, a coredump might not be created since dpss_mp is setuid.  If you don't get a core, Viktor, please run:
 
sudo sysctl fs.suid_dumpable=2
 
Then reproduce the problem.  That should give you a core file.

Subject: New Message from Einar Nilsen-Nygaard in onePK - Troubleshooting: Re: New M
Replied by: Viktor S. Wold Eide on 28-05-2013 12:31:35 PM
No, NAT should not be enabled.

The segmentation faults occurred while testing the dpss, where the onep
application connects to the router over a VLAN.

How do I provide you with core dump files? Or did you want me to just
attach core files to messages here?

It seems like some of the segfaults are similar to the one below:

gdb dpss_mp_32-0.7.0.503 20130528-1552/core
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /root/onep/dpss_mp_32-0.7.0.503...done.
Illegal process-id: 20130528-1552/core.

warning: core file may not match specified executable file.
[New LWP 12587]
[New LWP 12588]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./dpss_mp_32-0.7.0.503 -f'.
Program terminated with signal 11, Segmentation fault.
#0  0x08073071 in onep_dpss_handle_client_packet (dpss=0x86d23d0,
pak=0xf7256c82)
    at src/main/onep_dpss_engine.c:403
403     src/main/onep_dpss_engine.c: No such file or directory.
(gdb) bt
#0  0x08073071 in onep_dpss_handle_client_packet (dpss=0x86d23d0,
pak=0xf7256c82)
    at src/main/onep_dpss_engine.c:403
#1  0x0807341a in onep_dpss_get_packet_from_client (d=0x86f1cc0, fd=10,
user_data=0x86f2010)
    at src/main/onep_dpss_engine.c:465
#2  0x0807247d in dc_poll ()
#3  0x080725a2 in dc_run ()
#4  0x0807520c in onep_dpss_engine_main (obj=0x86d23d0) at
src/main/onep_dpss_engine.c:1222
#5  0x080546d4 in main (argc=2, argv=0xffc1d044) at
src/main/onep_dpss_main.c:409
(gdb)


Best regards
Viktor

2013/5/27 Cisco Developer Community Forums <cdicuser@developer.cisco.com>
>
> Einar Nilsen-Nygaard has created a new message in the forum
"Troubleshooting":
-------------------------------------------------------------- By any
chance do you have NAT enabled?
>
> If you have a core file you are willing to share, that would be helpful.
>
> Cheers,
>
> Einar



--
Viktor S. Wold Eide
CEO - Lividi AS
Lividi co/Simula Innovation AS
P.O. Box 134 NO-1325 Lysaker, Norway
+47 9775 9449
viktor@lividi.com
http://www.lividi.com

Subject: RE: dpss segmentation faults
Replied by: Joseph Clarke on 28-05-2013 04:40:17 PM
Thanks for the backtrace.  I'm having a problem lining up the lines in the trace with the 15.3(2)T codebase.  Could you run dpss_mp with the "-d all" parameter?  That will help narrow down where the crash is occurring.  Also, from the messages, it seems you might be punting non-IP traffic to the dpss_mp.  What are you using to match interesting traffic for the punt?

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: dpss seg
Replied by: Viktor S. Wold Eide on 29-05-2013 05:12:34 AM
Below you find a new backtrace with a segfault for `./dpss_mp_32-0.7.0.503
-f -d all', as well as the output from the dpss process right before the
segfault.

Any IP traffic is punted (at least that was the intention), simlar to
access-list 40 permit ip any any

Best regards
Viktor

gdb ../dpss_mp_32-0.7.0.503 core
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /root/onep/eg/dpss_mp_32-0.7.0.503...done.

warning: core file may not match specified executable file.
[New LWP 19054]
[New LWP 19055]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./dpss_mp_32-0.7.0.503 -f -d all'.
Program terminated with signal 11, Segmentation fault.
#0  0x08073071 in onep_dpss_handle_client_packet (dpss=0x9e17410,
pak=0xf72c0f52) at src/main/onep_dpss_engine.c:403
403     src/main/onep_dpss_engine.c: No such file or directory.
(gdb) bt
#0  0x08073071 in onep_dpss_handle_client_packet (dpss=0x9e17410,
pak=0xf72c0f52) at src/main/onep_dpss_engine.c:403
#1  0x0807341a in onep_dpss_get_packet_from_client (d=0x9e37030, fd=10,
user_data=0x9e37380) at src/main/onep_dpss_engine.c:465
#2  0x0807247d in dc_poll ()
#3  0x080725a2 in dc_run ()
#4  0x0807520c in onep_dpss_engine_main (obj=0x9e17410) at
src/main/onep_dpss_engine.c:1222
#5  0x080546d4 in main (argc=4, argv=0xffcbb704) at
src/main/onep_dpss_main.c:409
(gdb)


Thu Jan 01 22:46:21.200 CET :
CFT Engine: CFT_ERRMSG_CLIENT_API, Thu Jan 01 22:46:21.200 CET :
CFT Engine: CFT_ERRMSG_CLIENT_API, Thu Jan 01 22:46:21.200 CET :
CFT Engine: CFT_ERRMSG_CLIENT_API, Thu Jan 01 22:46:21.200 CET :
src/posix/onep_posix_shared_memory_queue.c:91: shared memory: enqueing
packet
Thu Jan 01 22:46:21.200 CET :
src/posix/onep_posix_shared_memory_queue.c:121: queue #[0] enqueue at ndx
89, queue size = 1
Thu Jan 01 22:46:21.200 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:318: shared memory:
allocating packet
Thu Jan 01 22:46:21.200 CET : src/main/onep_dpss_engine.c:604: Received 204
bytes from platform
Thu Jan 01 22:46:21.200 CET : src/shared/onep_dpss_paktype.c:110: platform
messaging: Decoded message from platform of type 1
Thu Jan 01 22:46:21.200 CET :
CFT L4: Error: fid[10] invalid seq in state bi-establishedThu Jan 01
22:46:21.200 CET : src/posix/onep_posix_shared_memory_queue.c:91: shared
memory: enqueing packet
Thu Jan 01 22:46:21.200 CET :
src/posix/onep_posix_shared_memory_queue.c:121: queue #[0] enqueue at ndx
90, queue size = 1
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:318: shared memory:
allocating packet
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:604: Received 204
bytes from platform
Thu Jan 01 22:46:21.201 CET : src/shared/onep_dpss_paktype.c:110: platform
messaging: Decoded message from platform of type 1
Thu Jan 01 22:46:21.201 CET :
CFT L4: Error: fid[10] invalid seq in state bi-establishedThu Jan 01
22:46:21.201 CET : src/posix/onep_posix_shared_memory_queue.c:91: shared
memory: enqueing packet
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_posix_shared_memory_queue.c:121: queue #[0] enqueue at ndx
91, queue size = 1
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_posix_shared_memory_queue.c:172: queue #[0] dequeued from
ndx 93, size = 1
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:463: Received
packet from client with name dpss-101, pid 19421
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:418: Packet
advanced to platform
Thu Jan 01 22:46:21.201 CET : src/shared/onep_dpss_paktype.c:253: platform
messaging: RE-INJECT message created
Thu Jan 01 22:46:21.201 CET : onep_dpss_transport.c:168: platform
messaging: Created IPv4 packet for injection.
Thu Jan 01 22:46:21.201 CET : onep_dpss_transport.c:281: 151 bytes written
to platform
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:359: shared memory:
recycling packet
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_posix_shared_memory_queue.c:172: queue #[0] dequeued from
ndx 94, size = 1
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:463: Received
packet from client with name dpss-101, pid 19421
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:418: Packet
advanced to platform
Thu Jan 01 22:46:21.201 CET : src/shared/onep_dpss_paktype.c:253: platform
messaging: RE-INJECT message created
Thu Jan 01 22:46:21.201 CET : onep_dpss_transport.c:168: platform
messaging: Created IPv4 packet for injection.
Thu Jan 01 22:46:21.201 CET : onep_dpss_transport.c:281: 131 bytes written
to platform
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:359: shared memory:
recycling packet
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_posix_shared_memory_queue.c:172: queue #[0] dequeued from
ndx 95, size = 0
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:463: Received
packet from client with name dpss-101, pid 19421
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:418: Packet
advanced to platform
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:359: shared memory:
recycling packet
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:318: shared memory:
allocating packet
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:604: Received 204
bytes from platform
Thu Jan 01 22:46:21.201 CET : src/shared/onep_dpss_paktype.c:110: platform
messaging: Decoded message from platform of type 1
Thu Jan 01 22:46:21.201 CET :
CFT Engine: CFT_ERRMSG_CLIENT_API, Thu Jan 01 22:46:21.201 CET :
CFT Engine: CFT_ERRMSG_CLIENT_API, Thu Jan 01 22:46:21.201 CET :
CFT Engine: CFT_ERRMSG_CLIENT_API, Thu Jan 01 22:46:21.201 CET :
src/posix/onep_posix_shared_memory_queue.c:91: shared memory: enqueing
packet
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_posix_shared_memory_queue.c:121: queue #[0] enqueue at ndx
92, queue size = 1
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_posix_shared_memory_queue.c:172: queue #[0] dequeued from
ndx 96, size = 0
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:463: Received
packet from client with name dpss-101, pid 19421
Thu Jan 01 22:46:21.201 CET : src/main/onep_dpss_engine.c:418: Packet
advanced to platform
Thu Jan 01 22:46:21.201 CET : src/shared/onep_dpss_paktype.c:253: platform
messaging: RE-INJECT message created
Thu Jan 01 22:46:21.201 CET : onep_dpss_transport.c:168: platform
messaging: Created IPv4 packet for injection.
Thu Jan 01 22:46:21.201 CET : onep_dpss_transport.c:281: 151 bytes written
to platform
Thu Jan 01 22:46:21.201 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:359: shared memory:
recycling packet
Thu Jan 01 22:46:21.202 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:318: shared memory:
allocating packet
Thu Jan 01 22:46:21.202 CET : src/main/onep_dpss_engine.c:604: Received
1180 bytes from platform
Thu Jan 01 22:46:21.202 CET : src/shared/onep_dpss_paktype.c:110: platform
messaging: Decoded message from platform of type 1
Thu Jan 01 22:46:21.202 CET :
CFT L4: Error: fid[10] invalid seq in state bi-establishedThu Jan 01
22:46:21.202 CET : src/posix/onep_posix_shared_memory_queue.c:91: shared
memory: enqueing packet
Thu Jan 01 22:46:21.202 CET :
src/posix/onep_posix_shared_memory_queue.c:121: queue #[0] enqueue at ndx
93, queue size = 1
Thu Jan 01 22:46:21.202 CET :
src/posix/onep_dpss_shared_memory_queue_private.c:318: shared memory:
allocating packet
Thu Jan 01 22:46:21.202 CET : src/main/onep_dpss_engine.c:604: Received
1180 bytes from platform
Thu Jan 01 22:46:21.202 CET : src/shared/onep_dpss_paktype.c:40: platform
messaging: Decoding of section header failed with error: invalid section
type
Thu Jan 01 22:46:21.202 CET :
src/posix/onep_posix_shared_memory_queue.c:172: queue #[0] dequeued from
ndx 97, size = 0
Thu Jan 01 22:46:21.202 CET : src/main/onep_dpss_engine.c:463: Received
packet from client with name dpss-101, pid 19421
Segmentation fault (core dumped)

2013/5/28 Cisco Developer Community Forums <cdicuser@developer.cisco.com>

> Joseph Clarke has created a new message in the forum "Troubleshooting":
> -------------------------------------------------------------- Thanks for
> the backtrace.  I'm having a problem lining up the lines in the trace with
> the 15.3(2)T codebase.  Could you run dpss_mp with the "-d all" parameter?
> That will help narrow down where the crash is occurring.  Also, from the
> messages, it seems you might be punting non-IP traffic to the dpss_mp.
> What are you using to match interesting traffic for the punt?
> --
> To respond to this post, please click the following link:
> http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/15663893or simply reply to this email.

Subject: RE: dpss segmentation faults
Replied by: Joseph Clarke on 29-05-2013 06:59:34 PM
I've been unable to reproduce the problem, though it now appears the GRE frame coming from the device is corrupt in some way.  I wonder if the VLAN sourcing has anything to do with it.  Would it be possible to share the config from this ISR?

Subject: RE: dpss segmentation faults
Replied by: Zach Seils on 29-05-2013 08:03:43 PM
A packet capture of the traffic between the network element and dpss_mp would also be helpful.

Thanks.

Subject: New Message from Joseph Clarke in onePK - Troubleshooting: RE: dpss segment
Replied by: Viktor S. Wold Eide on 31-05-2013 01:11:34 PM
I have tried to connect the computers directly to the routers to avoid the
VLAN issue. However, that alone does not seem to make a difference.

Best regards
Viktor

2013/5/30 Cisco Developer Community Forums <cdicuser@developer.cisco.com>
>
> Joseph Clarke has created a new message in the forum "Troubleshooting":
-------------------------------------------------------------- I've been
unable to reproduce the problem, though it now appears the GRE frame coming
from the device is corrupt in some way.  I wonder if the VLAN sourcing has
anything to do with it.  Would it be possible to share the config from this
ISR?
> --
> To respond to this post, please click the following link:
http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/15708545or
simply reply to this email.



--
Viktor S. Wold Eide
CEO - Lividi AS
Lividi co/Simula Innovation AS
P.O. Box 134 NO-1325 Lysaker, Norway
+47 9775 9449
viktor@lividi.com
http://www.lividi.com

Subject: RE: dpss segmentation faults
Replied by: Zach Seils on 31-05-2013 04:04:44 PM
Would it be possible to get a copy of a packet capture when this occurs?

Subject: RE: New Message from Joseph Clarke in onePK - Troubleshooting: RE: dpss seg
Replied by: Joseph Clarke on 01-06-2013 04:10:08 AM
I don't think the VLAN is the problem per se.  I think it has to do with how the the GRE tunnel is sourced and how the traffic traverses the router.  Zach's request is a good one.  Seeing the packets would help.  I realize that may be tough, though.  If possible I'd still like to see the router config.

Subject: Re: New Message from Zach Seils in onePK - Troubleshooting: RE: dpss segmen
Replied by: Viktor S. Wold Eide on 01-06-2013 05:58:13 AM
Hi Zach,

I will try to get a representative packet trace next week.

Best regards
Viktor

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: New Mess
Replied by: Viktor S. Wold Eide on 01-06-2013 06:09:41 AM
Hi Joseph,

I did a quick test using the sample-apps/CustomEncryption app. No extensive
testing, but I have not seen any segfaults so far for the same setup. In
that case, in addition to the applications being different, only tcp port
23 packets are punted for the CustomEncryption app. I will try to get a
representative trace next week.

Best regards
Viktor

2013/6/1 Cisco Developer Community Forums <cdicuser@developer.cisco.com>

> Joseph Clarke has created a new message in the forum "Troubleshooting":
> -------------------------------------------------------------- I don't
> think the VLAN is the problem per se.  I think it has to do with how the
> the GRE tunnel is sourced and how the traffic traverses the router.  Zach's
> request is a good one.  Seeing the packets would help.  I realize that may
> be tough, though.  If possible I'd still like to see the router config.
> --
> To respond to this post, please click the following link:
> http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/15792896or simply reply to this email.

Subject: RE: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: New
Replied by: Joseph Clarke on 01-06-2013 10:36:13 AM
Interesting.  To what interface is your policy applied?  How is that interface configured?  Perhaps we're seeing IP Frames with VLAN (i.e., dot1q) headers on them.  Since this would come before the ethertype, that could explain some of the error messages we see.

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 06-06-2013 02:14:15 PM
Hi,

Sorry for replying somewhat late. I did some packet capturing and also some
testing. It might seem like the segfault happens for packets that are
injected from the application to the dpss, that is, before the packet
enters the network. Below, some information from the dpss being run from
gdb, since for a normal core file, the contents of the packet is not
available.

First the contents of pak is printed after the segfault. Below, the
contents of pak in the normal case (with a breakpoint at the same place as
the segfault).

In the segfault case, buffer_len=0 and the packet does not seem to be
correctly encapsulated (which might explain the error message). In the
normal case, buffer_len=308 and the packet appears to be encapsulated.

gdb ../../dpss_mp_32-0.7.0.503
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /root/onep/eb8570w2/dpss_mp_32-0.7.0.503...done.
(gdb) run  -f -c ../../dpss.conf
Starting program: /root/onep/eb8570w2/dpss_mp_32-0.7.0.503 -f -c
../../dpss.conf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0xf7a6cb40 (LWP 19903)]
Tue Jan 06 03:36:39.222 CET :
src/main/rpc_client/onep_dpss_main_rpc_client.c:103:
onep_dpss_connect_to_client_rpc_server: CONNECTEDTue Jan 06 03:43:15.229
CET : onep_dpss_transport.c:82: platform messaging: Not received IPv4
encapsulated GRE packet from the platform
Program received signal SIGSEGV, Segmentation fault.
0x08073071 in onep_dpss_handle_client_packet (dpss=0x809b3f0,
pak=0xf7b3e12a) at src/main/onep_dpss_engine.c:403
403     src/main/onep_dpss_engine.c: No such file or directory.
(gdb) bt
#0  0x08073071 in onep_dpss_handle_client_packet (dpss=0x809b3f0,
pak=0xf7b3e12a) at src/main/onep_dpss_engine.c:403
#1  0x0807341a in onep_dpss_get_packet_from_client (d=0x80bace0, fd=14,
user_data=0x80bb060) at src/main/onep_dpss_engine.c:465
#2  0x0807247d in dc_poll ()
#3  0x080725a2 in dc_run ()
#4  0x0807520c in onep_dpss_engine_main (obj=0x809b3f0) at
src/main/onep_dpss_engine.c:1222
#5  0x080546d4 in main (argc=4, argv=0xffffd6b4) at
src/main/onep_dpss_main.c:409
(gdb) print pak
$1 = (onep_dpss_paktype_t *) 0xf7b3e12a
(gdb) print *pak
$2 = {packet_debug = false, msg_debug = false, traffic_reg = 0x0, next =
0x0, next_client = 0x0, prev_client = 0x0, shared_mem_ndx = 9, is_new =
true,
  buffer =
"\377\377\377\377\377\377\000\000\000\000\000\000\b\000E\000\000\251\000\362\000\000@\021\036\366\n\032dC\n\000\341\377\206?\304\000\225\037s{
\"ver\": 3, \"type\": \"hello\", \"seq\": 66, \"sid\":
\"cd1b0695-e999-4de2-8900-ce62bc2734b2\", \"name\": \"eb8570w2\",
\"src_tid\": \"GigabitEthernet0\\/0\" }: \"GigabitEthernet0\\/0\", \"did\":
"..., buffer_len = 0, cef_msg_offset = 90,
  cef_msg_len = 13, pkt_data_offset = 0, l2_length = 183, ft_cache =
{isset__ = 0, fid = 4, ft_err = 0, aging = 5, app_id = 0, fo = 0,
l4_payload_size = 1288, l4_flow_state = 4, l3_protocol = 2048,
    bitmask = 0, l4_protocol = 6 '\006', l4_error = 5 '\005', is_initiator
= 1 '\001', fif = 0 '\000', lif = 0 '\000', first_payload = 0 '\000',
tunnel_type = 0 '\000', l3_start = 0 '\000',
    l4_start = 20 '\024', payload = 52 '4', fragmented = 0 '\000',
superctx_fo = 0, superctx_id = 0, flow_pkt_count = 704228, l7_flow_bytes =
612146096}, client_data = {event = 0 '\000',
    is_modified = false, bypass_info = {mode = 0 '\000', number = 0},
redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u =
{ipv6addr8 = '\000' <repeats 15 times>, ipv6addr16 = {0, 0,
                0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}}, port =
0}}, is_dropped = false, p2d_sync = {l7_flow_bytes = 0, flow_pkt_count = 0,
flow_aging_time = 0, actions_taken = 0 '\000'},
  verb = {msg_type = 0 '\000', version = 0 '\000', seq_num = 0,
num_misordered = 0}, opt_sections = ONEP_DPSS_OPT_SCT_INJCT, inject_data =
{l3_start = 0, l4_start = 0, vrf_id = 0, interface = 4,
    location_id = 1 '\001'}, d2p_sync = {platform_flow_id = 0, dpss_flow_id
= 0, redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u =
{ipv6addr8 = '\000' <repeats 15 times>,
              ipv6addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0,
0}}}}}, port = 0, flow_direction = 0 '\000'}, app_id = 0}, flow_actions =
{platform_flow_id = 0,
    action_bitmask = 0 '\000', bypass_info = {mode = 0 '\000', number = 0},
redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u =
{ipv6addr8 = '\000' <repeats 15 times>, ipv6addr16 = {
                0, 0, 0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}},
port = 0, flow_direction = 0 '\000'}}, dpss_metadata = {location_id = 0
'\000', input_interface = 4, output_interface = 0,
    platform_flow_id = 486, dpss_flow_id = 0, dpi_app_id = 0, vrf_id = 0,
timestamp = 519641836, l4_flow_state = 4 '\004', l3_start = 14, l4_start =
34, pre_nat_src_ip = 0, pre_nat_dst_ip = 0,
    pre_nat_src_port = 0, pre_nat_dst_port = 0, app_count = 1 '\001',
app_data = {{app_id = 19904, action_type = 1 '\001', local_id = 1}, {app_id
= 0, action_type = 0 '\000', local_id = 0}},
    tunnel_type = 0 '\000'}, platform_addr = {sin_family = 2, sin_port = 0,
sin_addr = {s_addr = 216203274}, sin_zero =
"\000\000\000\000\000\000\000"}, local_ip_addr = {sin_family = 0,
    sin_port = 0, sin_addr = {s_addr = 434307082}, sin_zero =
"\000\000\000\000\000\000\000"}, pkt_send_fd = 7}
(gdb)


For comparison, with a breakpoint at the same place as the segfault occured
(from another run),
(gdb) print *pak
$1 = {packet_debug = false, msg_debug = false, traffic_reg = 0x94bb3a0,
next = 0x0, next_client = 0x0, prev_client = 0x80bb060, shared_mem_ndx =
29, is_new = false,
  buffer = "E\000\001\064RB@\000\377/N2\n\000\343\f\n\000\343\031\000\000\211!\001\000\n\001\002\a\355RB\000\000\001<\002\000\062\000\001\000\000V\247\001\000\000\000\001\000\000\000\004\000\000\000\000\000\000\001\373",
'\000' <repeats 16 times>,
"##R\300\001\000\016\000\"\000\000\003\000\n\001\000\000\000\000<\000\000w\006\a\000\021\000\000\000\001\000\000\000\000\000\000\000\214\000\000\000\005\000\b\000\266\377\377\377\377\377\377\000\000\000\000\000\000\b\000E\000\000\250\000\362\000\000?\021\037\366\n\033dC\n\000\341\377\206?\304\000\224\271\257{
\"ver\": 3, \"type\": \"hello\", \"seq\": 0, \"sid\":
\"76bf48f8-c0b2-4b5c"..., buffer_len = 308, cef_msg_offset = 90,
cef_msg_len = 13, pkt_data_offset = 126, l2_length = 182, ft_cache = {
    isset__ = 0, fid = 2, ft_err = 0, aging = 5, app_id = 0, fo = 0,
l4_payload_size = 140, l4_flow_state = 1, l3_protocol = 2048, bitmask = 0,
l4_protocol = 17 '\021',
    l4_error = 0 '\000', is_initiator = 1 '\001', fif = 1 '\001', lif = 0
'\000', first_payload = 1 '\001', tunnel_type = 0 '\000', l3_start = 0
'\000', l4_start = 20 '\024',
    payload = 28 '\034', fragmented = 0 '\000', superctx_fo = 0,
superctx_id = 0, flow_pkt_count = 1, l7_flow_bytes = 140}, client_data =
{event = 0 '\000', is_modified = false,
    bypass_info = {mode = 0 '\000', number = 0}, redirect_info = {ip =
{ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u = {ipv6addr8 = '\000' <repeats
15 times>, ipv6addr16 = {0, 0, 0, 0, 0,
                0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}}, port = 0}},
is_dropped = false, p2d_sync = {l7_flow_bytes = 140, flow_pkt_count = 1,
flow_aging_time = 5,
    actions_taken = 0 '\000'}, verb = {msg_type = 1 '\001', version = 2
'\002', seq_num = 132993602, num_misordered = 316}, opt_sections = 0,
inject_data = {l3_start = 0, l4_start = 0,
    vrf_id = 0, interface = 0, location_id = 0 '\000'}, d2p_sync =
{platform_flow_id = 0, dpss_flow_id = 0, redirect_info = {ip =
{ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u = {
              ipv6addr8 = '\000' <repeats 15 times>, ipv6addr16 = {0, 0, 0,
0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}}, port = 0, flow_direction = 0
'\000'}, app_id = 0},
  flow_actions = {platform_flow_id = 0, action_bitmask = 0 '\000',
bypass_info = {mode = 0 '\000', number = 0}, redirect_info = {ip =
{ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u = {
              ipv6addr8 = '\000' <repeats 15 times>, ipv6addr16 = {0, 0, 0,
0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}}, port = 0, flow_direction = 0
'\000'}}, dpss_metadata = {
    location_id = 0 '\000', input_interface = 4, output_interface = 0,
platform_flow_id = 507, dpss_flow_id = 0, dpi_app_id = 0, vrf_id = 0,
timestamp = 589517504,
    l4_flow_state = 1 '\001', l3_start = 14, l4_start = 34, pre_nat_src_ip
= 0, pre_nat_dst_ip = 0, pre_nat_src_port = 0, pre_nat_dst_port = 0,
app_count = 1 '\001', app_data = {{
        app_id = 22183, action_type = 1 '\001', local_id = 1}, {app_id = 0,
action_type = 0 '\000', local_id = 0}}, tunnel_type = 0 '\000'},
platform_addr = {sin_family = 2, sin_port = 0,
    sin_addr = {s_addr = 216203274}, sin_zero =
"\000\000\000\000\000\000\000"}, local_ip_addr = {sin_family = 0, sin_port
= 0, sin_addr = {s_addr = 434307082},
    sin_zero = "\000\000\000\000\000\000\000"}, pkt_send_fd = 7}
(gdb)



2013/6/1 Cisco Developer Community Forums <cdicuser@developer.cisco.com>

> Joseph Clarke has created a new message in the forum "Troubleshooting":
> --------------------------------------------------------------
> Interesting.  To what interface is your policy applied?  How is that
> interface configured?  Perhaps we're seeing IP Frames with VLAN (i.e.,
> dot1q) headers on them.  Since this would come before the ethertype, that
> could explain some of the error messages we see.
> --
> To respond to this post, please click the following link:
> http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/15793804or simply reply to this email.

Subject: RE: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re:
Replied by: Joseph Clarke on 06-06-2013 06:37:50 PM
It also appears that in the working case you're dealing with UDP and in the non-working case you're dealing with TCP.  Do you mean to be mixing protocols?  Could you include the snippet of code that modifies or injects your packets?  I have a feeling the buffer is being corrupted in some way, but I can't tell where or how from this.  The error we're seeing in the top packet (l4_error) points to an invalid TCP sequence number.  But again, without seeing what you're intention is with this packet, I'm not sure where the problem lies exactly.

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 07-06-2013 11:05:15 AM
Hi,

The application is dealing with different kinds of packets. For a protocol
that we are using for test purposes, the protocol messages are transported
in UDP packets. At the same time the application is diverting (and possibly
modifying) other packets going through the router, including for example
TCP traffic.

This basically works as expected. The protocol itself (implemented in the
onepk application)  does what it is supposed to do, exchanging protocol
messages with other instances of the application connected to other routers.

Network traffic through the routers is also correctly handled, being
diverted to the application instance connected to the router and then
re-injected again, continuing towards the destination.

For performance testing, TCP (or UDP) bulk data is transferred from a
computer to another, through the routers.

So each application instance concurrently runs the test protocol on behalf
of a single router, while also diverting and re-injecting traffic flowing
through that router.

In this setting we sometimes experience the dpss process segfaulting. It
might be some kind of race. The application itself is currently handled by
a single libev-based event loop (historical reasons), and of course the
onepk callback system is also active.

Regarding the previous email. The pak buffer in both the segfault and the
non-segfault case contained an application protocol message. In the
segfault case the protocol message appeared to not be encapsulated. Is this
correct, or should a packet at this point in the dpss process always be
encapsulated?

Additionally, in the segfault case buffer_len=0 while in the normal case
buffer_len has a reasonable value. I also saw that some of the other values
in the segfault case looked suspicious, but I do not know how this is
supposed to look, in particular when the buffer_len is reported to be
zero.

Best regards
Viktor

2013/6/7 Cisco Developer Community Forums <cdicuser@developer.cisco.com>

> Joseph Clarke has created a new message in the forum "Troubleshooting":
> -------------------------------------------------------------- It also
> appears that in the working case you're dealing with UDP and in the
> non-working case you're dealing with TCP.  Do you mean to be mixing
> protocols?  Could you include the snippet of code that modifies or injects
> your packets?  I have a feeling the buffer is being corrupted in some way,
> but I can't tell where or how from this.  The error we're seeing in the top
> packet (l4_error) points to an invalid TCP sequence number.  But again,
> without seeing what you're intention is with this packet, I'm not sure
> where the problem lies exactly.
> --
> To respond to this post, please click the following link:
> http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/15951760or simply reply to this email.

Subject: RE: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re:
Replied by: Joseph Clarke on 08-06-2013 09:26:47 AM
Sorry, Viktor.  I'm flying blind here.  The packets shown in the two gdb outputs are different.  One is UDP, one is TCP, so I don't have a true like-to-like comparison.  Assuming the backtrace was taken at the same place both times, I would expect the buffers to be encapsulated similarly as you say, but I can't tell from this. 

Honestly, I just don't have enough information to figure out where the problem lies.  Is there any way to share the code with us, even privately?  If not, could you distill the code down to a test case where we could see the problem that wouldn't reveal any proprietary info?

[Edit: hit submit too soon.]

On the buffer_len thing (and this is another reason the code would be so helpful) I see that when buffer_len is 0 you have option ONEP_DPSS_OPT_SCT_INJCT set.  This looks like you're calling onep_dpss_inject_raw_packet() or onep_dpss_inject_packet().  And if the former, are you sure the "len" argument to that function is not zero?  The fact that this option is not set in the second, working backtrace makes me think I'm looking at two different places in the code path.  Any injected packet would have that option set.  This might explain the encapsulation difference.

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 09-06-2013 06:47:04 AM
Hi Joseph,

I did a quick modification of the CustomEncryption sample application, so
that it regularly injects messages in addition to diverting and
re-injecting TCP (or UDP) port 23 packets. See attached diff file. In other
words, this should mimic the behavior of our test application. I then
observe the same segfault behavior (see below). For simplicity, the packet
to inject is just a hard coded. In our test application we have some
locking to protect shared structures, not here.

For testing I have used netcat to send TCP (or UDP) bulk data through the
routers. The packets are punted and re-injected as intended. I also observe
the injected packets sent out from the router.

Let me know if you are able to reproduce the segfault behavior or if there
is some misunderstanding on our side here.

Best regards
Viktor

gdb --args ../../dpss_mp_32-0.7.0.503 -f -c ../../dpss.conf
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /root/onep/eb8570w1/dpss_mp_32-0.7.0.503...done.
(gdb) run
Starting program: /root/onep/eb8570w1/dpss_mp_32-0.7.0.503 -f -c
../../dpss.conf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0xf7ab7b40 (LWP 9359)]
Wed Jan 14 01:56:12.462 CET :
src/main/rpc_client/onep_dpss_main_rpc_client.c:103:
onep_dpss_connect_to_client_rpc_server: CONNECTEDWed Jan 14 01:59:07.493
CET : src/main/onep_dpss_engine.c:399: Error in CFT processing for new
packet.Wed Jan 14 02:01:00.508 CET : src/main/onep_dpss_engine.c:399: Error
in CFT processing for new packet.Wed Jan 14 02:02:52.523 CET :
src/main/onep_dpss_engine.c:399: Error in CFT processing for new packet.Wed
Jan 14 02:05:04.540 CET : src/main/onep_dpss_engine.c:399: Error in CFT
processing for new packet.Wed Jan 14 02:05:33.544 CET :
onep_dpss_transport.c:82: platform messaging: Not received IPv4
encapsulated GRE packet from the platform
Program received signal SIGSEGV, Segmentation fault.
0x08073071 in onep_dpss_handle_client_packet (dpss=0x809b3f0,
pak=0xf7b799f2) at src/main/onep_dpss_engine.c:403
403     src/main/onep_dpss_engine.c: No such file or directory.
(gdb) print *pak
$1 = {packet_debug = false, msg_debug = false, traffic_reg = 0x0, next =
0x0, next_client = 0x0, prev_client = 0x0,
  shared_mem_ndx = 14, is_new = true,
  buffer =
"\377\377\377\377\377\377\000\000\000\000\000\000\b\000E\000\000\250\000\362\000\000@\021\037\064\n\033d\005\n\000\341\377\206Ć\304\000\224\003~{
\"ver\": 3, \"type\": \"hello\", \"seq\": 1, \"sid\":
\"ebf28b6d-9dda-4adc-9715-45536324d1db\", \"name\": \"eb8570w2\",
\"src_tid\": \"GigabitEthernet0\\/0\"
}\b\n\v\205V\246\020ʇR\001\001\005\n\305\367\205\313\305\370\060\333\066\070\063\063\067\061\065\n36833716\n368"...,
buffer_len = 0, cef_msg_offset = 90, cef_msg_len = 13, pkt_data_offset = 0,
  l2_length = 182, ft_cache = {isset__ = 0, fid = 3, ft_err = 0, aging = 5,
app_id = 0, fo = 0, l4_payload_size = 1288,
    l4_flow_state = 4, l3_protocol = 2048, bitmask = 0, l4_protocol = 6
'\006', l4_error = 5 '\005', is_initiator = 1 '\001',
    fif = 0 '\000', lif = 0 '\000', first_payload = 0 '\000', tunnel_type =
0 '\000', l3_start = 0 '\000',
    l4_start = 20 '\024', payload = 52 '4', fragmented = 0 '\000',
superctx_fo = 0, superctx_id = 0, flow_pkt_count = 487452,
    l7_flow_bytes = 325707888}, client_data = {event = 0 '\000',
is_modified = false, bypass_info = {mode = 0 '\000',
      number = 0}, redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0,
ipv6 = {u = {ipv6addr8 = '\000' <repeats 15 times>,
              ipv6addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0,
0}}}}}, port = 0}}, is_dropped = false, p2d_sync = {
    l7_flow_bytes = 0, flow_pkt_count = 0, flow_aging_time = 0,
actions_taken = 0 '\000'}, verb = {msg_type = 0 '\000',
    version = 0 '\000', seq_num = 0, num_misordered = 0}, opt_sections =
ONEP_DPSS_OPT_SCT_INJCT, inject_data = {l3_start = 0,
    l4_start = 0, vrf_id = 0, interface = 4, location_id = 1 '\001'},
d2p_sync = {platform_flow_id = 0, dpss_flow_id = 0,
    redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u =
{ipv6addr8 = '\000' <repeats 15 times>, ipv6addr16 = {
                0, 0, 0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}},
port = 0, flow_direction = 0 '\000'}, app_id = 0},
  flow_actions = {platform_flow_id = 0, action_bitmask = 0 '\000',
bypass_info = {mode = 0 '\000', number = 0},
    redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u =
{ipv6addr8 = '\000' <repeats 15 times>, ipv6addr16 = {
                0, 0, 0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}},
port = 0, flow_direction = 0 '\000'}}, dpss_metadata = {
    location_id = 0 '\000', input_interface = 5, output_interface = 0,
platform_flow_id = 490, dpss_flow_id = 0,
    dpi_app_id = 0, vrf_id = 0, timestamp = 155501224, l4_flow_state = 4
'\004', l3_start = 14, l4_start = 34,
    pre_nat_src_ip = 0, pre_nat_dst_ip = 0, pre_nat_src_port = 0,
pre_nat_dst_port = 0, app_count = 1 '\001', app_data = {{
        app_id = 9361, action_type = 1 '\001', local_id = 2}, {app_id = 0,
action_type = 0 '\000', local_id = 0}},
    tunnel_type = 0 '\000'}, platform_addr = {sin_family = 2, sin_port = 0,
sin_addr = {s_addr = 199360522},
    sin_zero = "\000\000\000\000\000\000\000"}, local_ip_addr = {sin_family
= 0, sin_port = 0, sin_addr = {s_addr = 400687114},
    sin_zero = "\000\000\000\000\000\000\000"}, pkt_send_fd = 7}
(gdb)

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 09-06-2013 06:47:12 AM
Hi Joseph,

I did a quick modification of the CustomEncryption sample application, so
that it regularly injects messages in addition to diverting and
re-injecting TCP (or UDP) port 23 packets. See attached diff file. In other
words, this should mimic the behavior of our test application. I then
observe the same segfault behavior (see below). For simplicity, the packet
to inject is just a hard coded. In our test application we have some
locking to protect shared structures, not here.

For testing I have used netcat to send TCP (or UDP) bulk data through the
routers. The packets are punted and re-injected as intended. I also observe
the injected packets sent out from the router.

Let me know if you are able to reproduce the segfault behavior or if there
is some misunderstanding on our side here.

Best regards
Viktor

gdb --args ../../dpss_mp_32-0.7.0.503 -f -c ../../dpss.conf
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /root/onep/eb8570w1/dpss_mp_32-0.7.0.503...done.
(gdb) run
Starting program: /root/onep/eb8570w1/dpss_mp_32-0.7.0.503 -f -c
../../dpss.conf
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0xf7ab7b40 (LWP 9359)]
Wed Jan 14 01:56:12.462 CET :
src/main/rpc_client/onep_dpss_main_rpc_client.c:103:
onep_dpss_connect_to_client_rpc_server: CONNECTEDWed Jan 14 01:59:07.493
CET : src/main/onep_dpss_engine.c:399: Error in CFT processing for new
packet.Wed Jan 14 02:01:00.508 CET : src/main/onep_dpss_engine.c:399: Error
in CFT processing for new packet.Wed Jan 14 02:02:52.523 CET :
src/main/onep_dpss_engine.c:399: Error in CFT processing for new packet.Wed
Jan 14 02:05:04.540 CET : src/main/onep_dpss_engine.c:399: Error in CFT
processing for new packet.Wed Jan 14 02:05:33.544 CET :
onep_dpss_transport.c:82: platform messaging: Not received IPv4
encapsulated GRE packet from the platform
Program received signal SIGSEGV, Segmentation fault.
0x08073071 in onep_dpss_handle_client_packet (dpss=0x809b3f0,
pak=0xf7b799f2) at src/main/onep_dpss_engine.c:403
403     src/main/onep_dpss_engine.c: No such file or directory.
(gdb) print *pak
$1 = {packet_debug = false, msg_debug = false, traffic_reg = 0x0, next =
0x0, next_client = 0x0, prev_client = 0x0,
  shared_mem_ndx = 14, is_new = true,
  buffer =
"\377\377\377\377\377\377\000\000\000\000\000\000\b\000E\000\000\250\000\362\000\000@\021\037\064\n\033d\005\n\000\341\377\206Ć\304\000\224\003~{
\"ver\": 3, \"type\": \"hello\", \"seq\": 1, \"sid\":
\"ebf28b6d-9dda-4adc-9715-45536324d1db\", \"name\": \"eb8570w2\",
\"src_tid\": \"GigabitEthernet0\\/0\"
}\b\n\v\205V\246\020ʇR\001\001\005\n\305\367\205\313\305\370\060\333\066\070\063\063\067\061\065\n36833716\n368"...,
buffer_len = 0, cef_msg_offset = 90, cef_msg_len = 13, pkt_data_offset = 0,
  l2_length = 182, ft_cache = {isset__ = 0, fid = 3, ft_err = 0, aging = 5,
app_id = 0, fo = 0, l4_payload_size = 1288,
    l4_flow_state = 4, l3_protocol = 2048, bitmask = 0, l4_protocol = 6
'\006', l4_error = 5 '\005', is_initiator = 1 '\001',
    fif = 0 '\000', lif = 0 '\000', first_payload = 0 '\000', tunnel_type =
0 '\000', l3_start = 0 '\000',
    l4_start = 20 '\024', payload = 52 '4', fragmented = 0 '\000',
superctx_fo = 0, superctx_id = 0, flow_pkt_count = 487452,
    l7_flow_bytes = 325707888}, client_data = {event = 0 '\000',
is_modified = false, bypass_info = {mode = 0 '\000',
      number = 0}, redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0,
ipv6 = {u = {ipv6addr8 = '\000' <repeats 15 times>,
              ipv6addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0,
0}}}}}, port = 0}}, is_dropped = false, p2d_sync = {
    l7_flow_bytes = 0, flow_pkt_count = 0, flow_aging_time = 0,
actions_taken = 0 '\000'}, verb = {msg_type = 0 '\000',
    version = 0 '\000', seq_num = 0, num_misordered = 0}, opt_sections =
ONEP_DPSS_OPT_SCT_INJCT, inject_data = {l3_start = 0,
    l4_start = 0, vrf_id = 0, interface = 4, location_id = 1 '\001'},
d2p_sync = {platform_flow_id = 0, dpss_flow_id = 0,
    redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u =
{ipv6addr8 = '\000' <repeats 15 times>, ipv6addr16 = {
                0, 0, 0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}},
port = 0, flow_direction = 0 '\000'}, app_id = 0},
  flow_actions = {platform_flow_id = 0, action_bitmask = 0 '\000',
bypass_info = {mode = 0 '\000', number = 0},
    redirect_info = {ip = {ip_addr_type = 0, a = {ipv4 = 0, ipv6 = {u =
{ipv6addr8 = '\000' <repeats 15 times>, ipv6addr16 = {
                0, 0, 0, 0, 0, 0, 0, 0}, ipv6addr32 = {0, 0, 0, 0}}}}},
port = 0, flow_direction = 0 '\000'}}, dpss_metadata = {
    location_id = 0 '\000', input_interface = 5, output_interface = 0,
platform_flow_id = 490, dpss_flow_id = 0,
    dpi_app_id = 0, vrf_id = 0, timestamp = 155501224, l4_flow_state = 4
'\004', l3_start = 14, l4_start = 34,
    pre_nat_src_ip = 0, pre_nat_dst_ip = 0, pre_nat_src_port = 0,
pre_nat_dst_port = 0, app_count = 1 '\001', app_data = {{
        app_id = 9361, action_type = 1 '\001', local_id = 2}, {app_id = 0,
action_type = 0 '\000', local_id = 0}},
    tunnel_type = 0 '\000'}, platform_addr = {sin_family = 2, sin_port = 0,
sin_addr = {s_addr = 199360522},
    sin_zero = "\000\000\000\000\000\000\000"}, local_ip_addr = {sin_family
= 0, sin_port = 0, sin_addr = {s_addr = 400687114},
    sin_zero = "\000\000\000\000\000\000\000"}, pkt_send_fd = 7}
(gdb)

Subject: RE: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re:
Replied by: Joseph Clarke on 09-06-2013 04:59:16 PM
Thanks for this, Viktor.  I've been running with this patch for about 30 minutes now with no crash.  I have netcat running in an infinite loop piping IOL images across the routers.  I can confirm that I'm seeing both punts and injections.  How long did it take for this to crash for you?

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 10-06-2013 08:49:11 AM
Joseph, things go wrong with segfaults fairly quickly, usually within
minutes when sending data through the routers. I often also experience a
significant drop in throughput, but I am not sure if these things are
related. If you are not able to reproduce the behavior there might be some
differences in the setup or other software.

What kind of system / setup did you use for testing?

How do you ensure that packets with GRE encapsulation do not become too
large, ref Forum message thread "Maximum Segment Size)
http://developer.cisco.com/web/onepk/community/-/message_boards/message/15483346?

Our test environment is as follows:

sdk-c32-0.7.0.503g
The libraries used by the binaries are as follows:

ldd  ../../dpss_mp_32-0.7.0.503
        linux-gate.so.1 =>  (0xf77c3000)
        libonep32_core.so =>
/opt/cisco/onep/c32/sdk-c32-0.7.0.503g/c/lib/libonep32_core.so (0xf75c1000)
        librt.so.1 => /lib/i386-linux-gnu/librt.so.1 (0xf75a5000)
        libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf73fb000)
        libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf73e0000)
        libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf73db000)
        /lib/ld-linux.so.2 (0xf77c4000)

ldd ./dpss_encdec-and-inject
        linux-gate.so.1 =>  (0xf76f9000)
        libglib-2.0.so.0 => /lib/i386-linux-gnu/libglib-2.0.so.0
(0xf75ea000)
        libonep32_core.so =>
/opt/cisco/onep/c32/sdk-c32-0.7.0.503g/c/lib/libonep32_core.so (0xf73eb000)
        libonep32_datapath.so =>
/opt/cisco/onep/c32/sdk-c32-0.7.0.503g/c/lib/libonep32_datapath.so
(0xf737c000)
        libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf71d3000)
        libpcre.so.3 => /lib/i386-linux-gnu/libpcre.so.3 (0xf7197000)
        libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf717c000)
        librt.so.1 => /lib/i386-linux-gnu/librt.so.1 (0xf7173000)
        libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf716d000)
        /lib/ld-linux.so.2 (0xf76fa000)

We use Ubuntu 12.04 (running the 32 bit binaries under 64 bit Linux)
Linux 3.2.0-44-generic #69-Ubuntu SMP Thu May 16 17:35:01 UTC 2013 x86_64
x86_64 x86_64 GNU/Linux

Routers:
OnePK-1#show version
Cisco IOS Software, C2951 Software (C2951-UNIVERSALK9-M), Version 15.3(2)T,
RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Thu 28-Mar-13 13:17 by prod_rel_team

ROM: System Bootstrap, Version 15.0(1r)M13, RELEASE SOFTWARE (fc1)


By the way, updated IOL images would have been helpful to determine if this
is related to our setup or not.

Best regards
Viktor


2013/6/9 Cisco Developer Community Forums <cdicuser@developer.cisco.com>

> Joseph Clarke has created a new message in the forum "Troubleshooting":
> -------------------------------------------------------------- Thanks for
> this, Viktor.  I've been running with this patch for about 30 minutes now
> with no crash.  I have netcat running in an infinite loop piping IOL images
> across the routers.  I can confirm that I'm seeing both punts and
> injections.  How long did it take for this to crash for you?
> --
> To respond to this post, please click the following link:
> http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/16016671or simply reply to this email.

Subject: RE: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re:
Replied by: Joseph Clarke on 10-06-2013 09:52:15 AM
I'm using SDK 0.7.0.503g as well for the libraries and dpss_mp:

[cisco@onePK-EFT1 bin]$ ldd ./dpss_mp
    linux-gate.so.1 =>  (0x0016b000)
    libonep32_core.so => /home/cisco/onep-ca/c32/sdk-c32-0.7.0.503g/c/lib/libonep32_core.so (0x003db000)
    librt.so.1 => /lib/librt.so.1 (0x00a9e000)
    libc.so.6 => /lib/libc.so.6 (0x008ee000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x00a7a000)
    libdl.so.2 => /lib/libdl.so.2 (0x00a97000)
    /lib/ld-linux.so.2 (0x008cd000)

[cisco@onePK-EFT1 bin]$ ldd ./dpss_encdec
    linux-gate.so.1 =>  (0x00e28000)
    libglib-2.0.so.0 => /lib/libglib-2.0.so.0 (0x00aec000)
    libpthread.so.0 => /lib/libpthread.so.0 (0x00a7a000)
    librt.so.1 => /lib/librt.so.1 (0x00a9e000)
    libonep32_core.so => /home/cisco/onep-ca/c32/sdk-c32-0.7.0.503g/c/lib/libonep32_core.so (0x00110000)
    libonep32_datapath.so => /home/cisco/onep-ca/c32/sdk-c32-0.7.0.503g/c/lib/libonep32_datapath.so (0x0030f000)
    libc.so.6 => /lib/libc.so.6 (0x008ee000)
    /lib/ld-linux.so.2 (0x008cd000)
    libdl.so.2 => /lib/libdl.so.2 (0x00a97000)

I am testing on the all-in-one-VM running Fedora 14 with an internal IOL version 15.3(2)T:

Cisco IOS Software, Linux Software (I86BI_LINUX-ADVENTERPRISEK9-M), Version 15.3(2)T, DEVELOPMENT TEST SOFTWARE
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Thu 28-Mar-13 17:36 by prod_rel_team

I have not tried on real hardware yet.  My maximum frame size is 1514 bytes as sent from netcat.  With the gibberish I'm sending, I didn't think to confirm the receipt.  Wireshark shows the packets being transferred, and my console shows packets being punted.  I can try to reduce the segment size and try.  I let the test run over night and I didn't get any segfaults.

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 10-06-2013 02:29:24 PM
The onepk SDK then should be the same, while the Linux version and the
routers are different. Did you notice any drop in throughput while testing?

I was going to suggest reducing the interval / sleep time between the
injected packets to, e.g., 100 injects per second instead of 1, but since
you have been running this over night, it seems like a long shot.

The segfauls seem to occur only when inject and punt/re-inject happens
concurrently. Does your VM have multiple "CPU"s, utilizing more than one
physical CPU? If not could you try that?

I just did a quick test here, taking 7 out of 8 CPUs offline, with:
for N in `seq 1 7`; do echo 0 > /sys/devices/system/cpu/cpu$N/online; done
I have not seen any segfault since I started the process more than an hour
ago. I get some error messages, but no segfaults so far.

Best regards
Viktor

Subject: Re: New Message from Viktor S. Wold Eide in onePK - Troubleshooting: Re: Ne
Replied by: Viktor S. Wold Eide on 12-06-2013 07:22:05 AM
As mentioned, I still got error messages from the dpss process. I also
experienced segfaults, while much less often compared to having all CPUs
online.

For different reasons we have been using 32 bit binaries so far. Since the
indications seem to point to the environment / platform, I am now doing a
test with 64 bit binaries (both dpss and the application) on 64 bit Ubuntu
12.04 . It has been running for almost two hours now without any segfaults
or errors reported by the dpss process so far.

Best regards
Viktor

Subject: RE: Re: New Message from Viktor S. Wold Eide in onePK - Troubleshooting: Re
Replied by: Joseph Clarke on 12-06-2013 10:07:53 AM
Sorry, I got a little delayed.  I just started testing on my VM with two vCPUs.  So far, no segfaults.  I also find it odd that the 64-bit version wouldn't segfault at all.  That leads me to believe there might be some kind of buffer overflow somewhere.  I'll let my test run some more, but I don't think I'm going to be able to reproduce.  Development thinks the problem you're facing may be fixed in current code, but unfortunately, I can't get you newer code until we go GA.  Just out of curiosity, what is the MD5 checksum of your 32-bit dpss_mp?

Subject: RE: Re: New Message from Viktor S. Wold Eide in onePK - Troubleshooting: Re
Replied by: Joseph Clarke on 12-06-2013 11:44:41 AM
It took a while, Viktor, but I was able to reproduce the crash with multiple CPUs.  So it does look like a locking problem leading to a race condition.  Now that I have the steps to reproduce, I'll see what development wants to do.  Thanks for the info.

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 12-06-2013 12:04:56 PM
Hi Joseph,

The md5sum for the 32 bit dpss process:
md5sum dpss_mp_32-0.7.0.503
c9eed2bd400c6774ffee89363f63cdc3  dpss_mp_32-0.7.0.503

I ran the 64 bit version for four hours (one CPU online) and during that
time one of the dpss process reported a sinle error, as follows. Note that
this was not a segfault, only an error message output.

/opt/cisco/onep/c64/sdk-c64-0.7.0.503g/c/bin/dpss_mp_64-0.7.0.503 -c
../../dpss.conf -f
Sat Jan 17 00:57:14.160 CET :
src/main/rpc_client/onep_dpss_main_rpc_client.c:103:
onep_dpss_connect_to_client_rpc_server: CONNECTEDSat Jan 17 03:34:01.422
CET : onep_dpss_transport.c:82: platform messaging: Not received IPv4
encapsulated GRE packet from the platform
Sat Jan 17 05:02:45.242 CET : src/main/server/onep_dpss_server.c:124:
client closed after 0 bytes

Best regards
Viktor

Subject: RE: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re:
Replied by: Joseph Clarke on 12-06-2013 12:14:03 PM
I wonder if perhaps the 64-bit version is somehow based off of a different code rev.  Nonetheless, since I have reproduced, and I know it's dependent on number of CPUs, I've asked development to take a closer look.  This may very well be fixed, but I want to confirm.

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 12-06-2013 12:17:38 PM
In one way I am happy that you could reproduce the segfault crash, Joseph.

Just to be clear, was this for the same platform that you described
earlier, that is :" all-in-one-VM running Fedora 14 with an internal IOL
version 15.3(2)T" only with multiple CPUs?

In this thread I have also mentioned slowdown for bulk data transfer, but I
will open a new thread.

Best regards
Viktor

2013/6/12 Cisco Developer Community Forums <cdicuser@developer.cisco.com>

> Joseph Clarke has created a new message in the forum "Troubleshooting":
> -------------------------------------------------------------- It took a
> while, Viktor, but I was able to reproduce the crash with multiple CPUs.
> So it does look like a locking problem leading to a race condition.  Now
> that I have the steps to reproduce, I'll see what development wants to do.
> Thanks for the info.
> --
> To respond to this post, please click the following link:
> http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/16135446or simply reply to this email.

Subject: RE: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re:
Replied by: Joseph Clarke on 12-06-2013 12:20:36 PM
Correct.  Same platform only now with two vCPUs.  Everything else remained the same.

Subject: Re: New Message from Joseph Clarke in onePK - Troubleshooting: RE: Re: New
Replied by: Viktor S. Wold Eide on 13-06-2013 05:58:38 AM
Just would like to add that we see segfaults on the 64 bit version as well.

/opt/cisco/onep/c64/sdk-c64-0.7.0.503g/c/bin/dpss_mp_64-0.7.0.503 -c
../../dpss.conf -f
Thu Jan 01 03:15:30.200 CET :
src/main/rpc_client/onep_dpss_main_rpc_client.c:103:
onep_dpss_connect_to_client_rpc_server: CONNECTED
Thu Jan 01 04:18:30.213 CET : src/main/onep_dpss_engine.c:399: Error in CFT
processing for new packet.Thu Jan 01 05:31:30.221 CET :
src/shared/onep_dpss_paktype.c:104: platform messaging: Decoding of message
failed with error code : invalid section type ,for section type : 2
Segmentation fault (core dumped)


gdb /opt/cisco/onep/c64/sdk-c64-0.7.0.503g/c/bin/dpss_mp_64-0.7.0.503 core
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from
/opt/cisco/onep/c64/sdk-c64-0.7.0.503g/c/bin/dpss_mp_64-0.7.0.503...done.

warning: core file may not match specified executable file.
[New LWP 4991]
[New LWP 4992]

warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by
`/opt/cisco/onep/c64/sdk-c64-0.7.0.503g/c/bin/dpss_mp_64-0.7.0.503 -c
../../dpss'.
Program terminated with signal 11, Segmentation fault.
#0  0x000000000043007f in onep_dpss_handle_client_packet (dpss=0x1b116a0,
pak=0x7fa42697d2e2) at src/main/onep_dpss_engine.c:403
403     src/main/onep_dpss_engine.c: No such file or directory.
(gdb) bt
#0  0x000000000043007f in onep_dpss_handle_client_packet (dpss=0x1b116a0,
pak=0x7fa42697d2e2) at src/main/onep_dpss_engine.c:403
#1  0x00000000004303e0 in onep_dpss_get_packet_from_client (d=0x1b456e0,
fd=10, user_data=0x1b45af0) at src/main/onep_dpss_engine.c:465
#2  0x000000000042f526 in dc_poll ()
#3  0x000000000042f665 in dc_run ()
#4  0x0000000000431feb in onep_dpss_engine_main (obj=0x1b116a0) at
src/main/onep_dpss_engine.c:1222
#5  0x000000000040e57d in main (argc=4, argv=0x7fff72bc2988) at
src/main/onep_dpss_main.c:409
(gdb) q

Best regards
Viktor


2013/6/12 Cisco Developer Community Forums <cdicuser@developer.cisco.com>

> Joseph Clarke has created a new message in the forum "Troubleshooting":
> -------------------------------------------------------------- I wonder if
> perhaps the 64-bit version is somehow based off of a different code rev.
> Nonetheless, since I have reproduced, and I know it's dependent on number
> of CPUs, I've asked development to take a closer look.  This may very well
> be fixed, but I want to confirm.
> --
> To respond to this post, please click the following link:
> http://developer.cisco.com/web/onepk/community/-/message_boards/view_message/16138132or simply reply to this email.

Subject: RE: Re: New Message from Viktor S. Wold Eide in onePK - Troubleshooting: Re
Replied by: Joseph Clarke on 13-06-2013 03:43:59 PM
Looks like this has been fixed already as part of CSCud94223.  The fix will be available in the general availability release this summer.  The trigger is the throughput of traffic causing the dpss_mp to read a packet that was already allocated from the shared memory buffer.  The result was corruption and the segfault.

Subject: RE: Re: New Message from Viktor S. Wold Eide in onePK - Troubleshooting: Re
Replied by: Viktor S. Wold Eide on 13-08-2013 02:28:42 AM
Joseph Clarke:
Looks like this has been fixed already as part of CSCud94223.  The fix will be available in the general availability release this summer.  The trigger is the throughput of traffic causing the dpss_mp to read a packet that was already allocated from the shared memory buffer.  The result was corruption and the segfault.
Hi again,

I just looked quickly at the Release Notes for Cisco onePK Version 1.0.0, Controlled Availability Release, Last Updated: August 5, 2013
http://developer.cisco.com/web/onepk-developer/release-notes-for-c

At the end of "Open Caveats" :
CSCui21136
Symptom: The dpss_mp process may crash.
Conditions: Occurs when multiple concurrent sessions or high throughput of data is being handled by the Datapath Service Set.
Workaround: There is no workaround.

Is this issue related to our discussion in this thread? We would just like to know what to expect.

Best regards
Viktor S. Wold Eide

Subject: RE: Re: New Message from Viktor S. Wold Eide in onePK - Troubleshooting: Re
Replied by: Joseph Clarke on 13-08-2013 09:52:45 AM
No, this is a different issue.  The issue you reported is fixed.  The problem here appears to be related to the number of concurrent connections (e.g., with HTTP).  It's also not clear this bug is reproducible yet.  That's not to say it isn't, but it's still being investigated.

Subject: Re: New Message from Joseph Clarke in onePK Developer - Troubleshooting: RE
Replied by: Viktor S. Wold Eide on 13-08-2013 10:01:39 AM
OK Joseph, thanks a lot for the info.

Best regards
Viktor

Subject: RE: Re: New Message from Viktor S. Wold Eide in onePK - Troubleshooting: Re
Replied by: Viktor S. Wold Eide on 09-10-2013 03:48:38 AM
Joseph Clarke:
Looks like this has been fixed already as part of CSCud94223.  The fix will be available in the general availability release this summer.  The trigger is the throughput of traffic causing the dpss_mp to read a packet that was already allocated from the shared memory buffer.  The result was corruption and the segfault.
For the record. After the v1.0.0.84 release we did some testing and we have not seen these dpss segfaults so far.

Best regards
Viktor

Subject: RE: dpss segmentation faults
Replied by: Joseph Clarke on 09-10-2013 09:39:10 AM
Nor have I, and I'm doing a lot with DPSS and punt right now.  I did find an issue if you try and register for a punt and the device isn't configured for DPSS.  I'm going to run that down.

Outcomes