Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when -p flag used #62

Closed
virtualdxs opened this issue Oct 26, 2016 · 25 comments
Closed

Segfault when -p flag used #62

virtualdxs opened this issue Oct 26, 2016 · 25 comments

Comments

@virtualdxs
Copy link

I'm just getting in to APRS and I've hit a problem. Perhaps the shell output is the best way to demonstrate:

Ξ ~ → direwolf
Dire Wolf version 1.3
Includes optional support for:  gpsd

Reading config file direwolf.conf
Audio device for both receive and transmit: default  (channel 0)
Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 44100 sample rate, DTMF decoder enabled.
Note: PTT not configured for channel 0. (Ignore this if using VOX.)
Ready to accept AGW client application 0 on port 8000 ...
Use -p command line option to enable KISS pseudo terminal.
Ready to accept KISS client application on port 8001 ...
^C
QRT
Ξ ~ → direwolf -p
Dire Wolf version 1.3
Includes optional support for:  gpsd

Reading config file direwolf.conf
Audio device for both receive and transmit: default  (channel 0)
Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 44100 sample rate, DTMF decoder enabled.
Note: PTT not configured for channel 0. (Ignore this if using VOX.)
zsh: segmentation fault (core dumped)  direwolf -p
↑139 ~ →

OS: Arch Linux
Compiled from latest source with

git clone https://github.com/wb2osz/direwolf/
make
sudo make install
@dranch
Copy link
Collaborator

dranch commented Oct 27, 2016

Please try building the "dev" branch (1.4-DEV) and see if you can reproduce the segfault. If you can, please provide a backtrack on resulting core file.

@virtualdxs
Copy link
Author

Reproduced on dev branch. Something I just noticed: if JACK isn't running it doesn't run and complains, so somehow it's detecting that I use JACK. But the JACK device that shows up seems to be an ALSA bridge.

@virtualdxs
Copy link
Author

I don't know how to do anything with traces, but if you have a guide I can follow I will.

@dranch
Copy link
Collaborator

dranch commented Oct 27, 2016

Please post your direwolf config but technically speaking, Direwolf only supports low level ALSA devices (not higher level sound systems such as pulseaudio, not portaudio, not jack)

@virtualdxs
Copy link
Author

virtualdxs commented Oct 27, 2016

I just figured out that that was because of Cadence's alsa-jack bridge. However, the segfault still occurs with said bridge disabled, so that is not the culprit. Direwolf config follows: On second thought here's a pastebin.

@wb2osz
Copy link
Owner

wb2osz commented Oct 27, 2016

Do this to figure out where it is crashing:

Edit Makefile.linux. Look for this line: "CFLAGS := -O3 -pthread -Igeotranz" and add " -g" to the end.

Type commands:

make clean
make
gdb ./direwolf

When you get the "(gdb)" prompt, type "run -p".

Note that any command line options that would normally go with direwolf are put here instead.

When it crashes, type "backtrace" to find out where it is and how it got there.

Type "quit" to get out.

@dranch
Copy link
Collaborator

dranch commented Oct 28, 2016

From the output of the initial startup of Direwolf, it shows:

"Audio device for both receive and transmit: default (channel 0)"

You didn't configured a sound card in the direwolf.conf file (ADEVICE) so it's using the default sound device which is shown above as "default" which is actually a PulseAudio device. Please follow the Direwolf User Guide on how to find and configure your ALSA device into the direwolf.conf file. It would look like something like:

ADEVICE plughw:1,0

--David

@virtualdxs
Copy link
Author

Thanks, I'll let you know the results

@virtualdxs
Copy link
Author

Didn't fix it.

Ξ ~ → direwolf -p
Dire Wolf DEVELOPMENT version 1.4 C (Oct 27 2016)
Includes optional support for:  gpsd

Reading config file direwolf.conf
Audio device for both receive and transmit: plughw:0,0  (channel 0)
Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 44100 sample rate.
Note: PTT not configured for channel 0. (Ignore this if using VOX.)
Ready to accept AGW client application 0 on port 8000 ...
Ready to accept KISS client application on port 8001 ...
zsh: segmentation fault (core dumped)  direwolf -t 0 -p
↑139 ~ →

@dranch
Copy link
Collaborator

dranch commented Oct 30, 2016

Bummer.. well, follow John's use of gdb and see if you can get a backtrace of the issue

@virtualdxs
Copy link
Author

Ξ ~git/direwolf git:(dev) ▶ gdb ./direwolf
GNU gdb (GDB) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./direwolf...done.
(gdb) run -p -t 0
Starting program: /var/git/direwolf/direwolf -p -t 0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Dire Wolf DEVELOPMENT version 1.4 C (Oct 30 2016)
Includes optional support for:  gpsd

Reading config file direwolf.conf
Audio device for both receive and transmit: default  (channel 0)
[New Thread 0x7fffef764700 (LWP 26304)]
[New Thread 0x7fffeef63700 (LWP 26305)]
Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 44100 sample rate.
Note: PTT not configured for channel 0. (Ignore this if using VOX.)
[New Thread 0x7fffee762700 (LWP 26306)]
[New Thread 0x7fffedf61700 (LWP 26307)]
Ready to accept AGW client application 0 on port 8000 ...
[New Thread 0x7fffed760700 (LWP 26308)]
[New Thread 0x7fffecf5f700 (LWP 26309)]
[New Thread 0x7fffdfffe700 (LWP 26310)]
[New Thread 0x7fffdf7fd700 (LWP 26311)]
Ready to accept KISS client application on port 8001 ...
[New Thread 0x7fffdeffc700 (LWP 26312)]

Thread 1 "direwolf" received signal SIGSEGV, Segmentation fault.
0x00000000004473d4 in strlcpy_debug (dst=dst@entry=0x825bc0 <pt_slave_name> "", src=0xfffffffff7184280 <error: Cannot access memory at address 0xfffffffff7184280>,
    siz=siz@entry=32, file=file@entry=0x45d97d "kiss.c", func=func@entry=0x45da48 <__func__.5056> "kiss_open_pt", line=line@entry=382) at misc/strlcpy.c:95
95                              if ((*d++ = *s++) == 0)
(gdb) backtrace
#0  0x00000000004473d4 in strlcpy_debug (dst=dst@entry=0x825bc0 <pt_slave_name> "", src=0xfffffffff7184280 <error: Cannot access memory at address 0xfffffffff7184280>,
    siz=siz@entry=32, file=file@entry=0x45d97d "kiss.c", func=func@entry=0x45da48 <__func__.5056> "kiss_open_pt", line=line@entry=382) at misc/strlcpy.c:95
#1  0x000000000042d09e in kiss_open_pt () at kiss.c:382
#2  kiss_init (mc=mc@entry=0x66d7e0 <misc_config>) at kiss.c:278
#3  0x00000000004045fc in main (argc=4, argv=0x7fffffffe848) at direwolf.c:724
(gdb) quit
A debugging session is active.

        Inferior 1 [process 26300] will be killed.

Quit anyway? (y or n) y
Ξ ~git/direwolf git:(dev) ▶

I'm not getting much more out of this than what I was told by the fact that it was a segfault, as I'm not really a dev myself.

@virtualdxs
Copy link
Author

Also, is there a config option to make -t 0 the default? LXTerminal doesn't show any of the text besides the green text so I have alias direwolf='direwolf -t 0 but it would be nicer to have that as a config option.

@dranch
Copy link
Collaborator

dranch commented Oct 30, 2016

Ok.. thanks for the BT. As part of the troubleshooting, please configure Direwolf to use a real ALSA device and NOT use default. Please see the User Guide .pdf on how to identify your proper audio device. Also try running Direwolf as root just to see if it will run under the root user vs. a standard user.

@virtualdxs
Copy link
Author

virtualdxs commented Oct 31, 2016

Didn't realize there was a direwolf.conf in the root of the git repo. That's why it did the default again.

Ξ ~git/direwolf git:(dev) ▶ gdb ./direwolf
GNU gdb (GDB) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./direwolf...done.
(gdb) run -t 0 -p
Starting program: /var/git/direwolf/direwolf -t 0 -p
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Dire Wolf DEVELOPMENT version 1.4 C (Oct 30 2016)
Includes optional support for:  gpsd

Reading config file /home/dxs/direwolf.conf
Audio device for both receive and transmit: plughw:1,0  (channel 0)
Channel 0: 1200 baud, AFSK 1200 & 2200 Hz, E+, 44100 sample rate.
Note: PTT not configured for channel 0. (Ignore this if using VOX.)
[New Thread 0x7ffff5458700 (LWP 2502)]
[New Thread 0x7ffff4c57700 (LWP 2503)]
Ready to accept AGW client application 0 on port 8000 ...
[New Thread 0x7ffff4456700 (LWP 2504)]
[New Thread 0x7ffff3c55700 (LWP 2505)]
[New Thread 0x7ffff3454700 (LWP 2506)]
[New Thread 0x7ffff2c53700 (LWP 2507)]
Ready to accept KISS client application on port 8001 ...
[New Thread 0x7ffff2452700 (LWP 2508)]

Thread 1 "direwolf" received signal SIGSEGV, Segmentation fault.
0x00000000004473d4 in strlcpy_debug (dst=dst@entry=0x825bc0 <pt_slave_name> "", src=0xfffffffff7184280 <error: Cannot access memory at address 0xfffffffff7184280>,
    siz=siz@entry=32, file=file@entry=0x45d97d "kiss.c", func=func@entry=0x45da48 <__func__.5056> "kiss_open_pt", line=line@entry=382) at misc/strlcpy.c:95
95                              if ((*d++ = *s++) == 0)
(gdb) backtrace
#0  0x00000000004473d4 in strlcpy_debug (dst=dst@entry=0x825bc0 <pt_slave_name> "", src=0xfffffffff7184280 <error: Cannot access memory at address 0xfffffffff7184280>,
    siz=siz@entry=32, file=file@entry=0x45d97d "kiss.c", func=func@entry=0x45da48 <__func__.5056> "kiss_open_pt", line=line@entry=382) at misc/strlcpy.c:95
#1  0x000000000042d09e in kiss_open_pt () at kiss.c:382
#2  kiss_init (mc=mc@entry=0x66d7e0 <misc_config>) at kiss.c:278
#3  0x00000000004045fc in main (argc=4, argv=0x7fffffffe848) at direwolf.c:724
(gdb) quit
A debugging session is active.

        Inferior 1 [process 2498] will be killed.

Quit anyway? (y or n) y
Ξ ~git/direwolf git:(dev) ▶

Still segfaults as root.

@virtualdxs
Copy link
Author

For the record SELinux is not enabled, so it's not a permissions issue.

@virtualdxs
Copy link
Author

Any updates? Not too much of a priority because it works on Linux Mint but it would be nice to have it on Arch Linux as it boots much faster.

@dranch
Copy link
Collaborator

dranch commented Nov 4, 2016

If Direwolf run fine on Mint but not Arch, this sounds like a potential kernel issue. What kernel version are you running on Arch? Are you able to upgrade the Arch kernel or build and install a new vanilla Linus kernel to see if that changes anything?

@bmayton
Copy link

bmayton commented Nov 6, 2016

I can reproduce the same behavior on Arch. The segfault occurs in kiss_open_pt when strlcpy attempts to copy the name of the pseudo terminal from the pointer returned by ptsname (pts) into pt_slave_name. Reading from pts causes the segfault.

ptsname is one of the functions in the standard library that isn't thread-safe. I don't know the internals of Direwolf at all, but it appears to be multithreaded and it's possible something in another thread is calling a function that is causing the pts pointer in pt_slave_name to no longer be valid in between when ptsname is called and it's passed as an argument to strlcpy.

As a quick fix, I replaced this line:

strlcpy (pt_slave_name, pts, sizeof(pt_slave_name));

with the thread-safe:

ptsname_r(fd, pt_slave_name, sizeof(pt_slave_name));

which copies the PTS name directly into pt_slave_name rather than returning a pointer into the standard library somewhere. As ptsname_r is a Linux-specific extension, this would break compilation on OS X, but it gives me a working Direwolf on Arch.

@virtualdxs
Copy link
Author

Just curious, what makes this work in most Linuxes but not Arch?
Also, would it be possible to detect if it's an affected OS with an if/else or is there something I'm missing? I don't know much about programming, especially not with languages where you have to manage memory yourself, but would a

if (memory_is_readable) {
    use_strlcpy;
} else {
    use_ptsname_r;
}

of some sort be possible?

@bmayton
Copy link

bmayton commented Nov 7, 2016

This turns out to not actually be a race condition, but is the result of a change made in glibc 2.24. It doesn't appear that any other threads in Direwolf are calling ptsname, so the usage here should be safe.

ptsname is part of the POSIX.1-2001 spec and shouldn't have been included in the definitions in stdlib.h without defining _XOPEN_SOURCE or _POSIX_C_SOURCE to a value indicating that these extensions should be included. Adding -D_XOPEN_SOURCE=600 to the CFLAGS in Makefile.linux does the trick; full details are in the feature_test_macros(7) man page.

glibc versions prior to 2.24 were including these definitions regardless. In 2.24, without the correct feature test macro, the function is implicitly defined. C allows you to call functions that haven't been declared; the return type of implicit functions is always assumed to be int. (This is an archaic feature of the language; no sane programmer depends on implicit declaration, and all modern compilers will throw a warning when calling something implicitly).

On a 64-bit machine with gcc, the size of a pointer is 8 bytes, but the size of an int is 4. ptsname returns a pointer (8 bytes) but because it hasn't been declared, the compiler treats the returned value as an int, taking only the lower 4 bytes, possibly truncating or inappropriately sign-extending the returned address. So it turns out not to be a race condition; the correct pointer was returned but misinterpreted because the compiler didn't know that a pointer, not an int was being returned.

ptsname_r sidesteps the issue since the pointers are passed around in the arguments, and there's no ambiguity in the implicit definition.

Just curious, what makes this work in most Linuxes but not Arch?

It'll only happen in distributions running glibc-2.24, which is the current version. Most distributions are on a slower release cycle and are still using older glibc.

Also, would it be possible to detect if it's an affected OS with an if/else or is there something I'm missing?

No. For one, just because the memory is readable doesn't mean that it points to the value you expect. It would be possible for a pointer to get mangled into something that points into completely valid memory (no segfault on access) but is some other random data from the program. Two, while it would theoretically be possible to catch the segfault and do something else, you'd have to set up a signal handler for SIGSEGV, try accessing the memory, and then in the signal handler try to pick up the pieces and do the other thing instead. Not just a simple if/else. Finally, with non-Linux versions of the standard library, ptsname_r doesn't exist at all, so a reference to it in the code will cause the linker to fail (even if the if/else would never get there at runtime).

Getting back on topic, all that to say, -D_XOPEN_SOURCE=600 should probably be added to the CFLAGS. Possibly also -D_DEFAULT_SOURCE=1 for strsep (which has potential for similar issues as it otherwise ends up implicitly defined on newer glibc (>=2.19) and also returns a pointer).

@virtualdxs
Copy link
Author

Interesting. Works as you said with the CFLAGS additions. Thanks!

@dranch
Copy link
Collaborator

dranch commented Nov 7, 2016

Excellent detail in your message bmayton! Hopefully this flag can be added to more legacy systems w/o any negative impacts.

@coreyreichle
Copy link

I can confirm. Adding -D_XOPEN_SOURCE=600 to cflags stops the segfaulting on Ubuntu Yakkety as well.

@k9wkj
Copy link

k9wkj commented Dec 19, 2016

debian stretch
newerlappy:~$ uname -a
Linux newerlappy 4.7.0-1-amd64 #1 SMP Debian 4.7.8-1 (2016-10-19) x86_64 GNU/Linux
DireWolf 1.4 development snapshot E

had to set the cflags as such
CFLAGS := -O3 -pthread -Igeotranz -D_XOPEN_SOURCE=600 -D_DEFAULT_SOURCE=1

segfault gone

@wb2osz wb2osz reopened this Dec 20, 2016
wb2osz added a commit that referenced this issue Dec 21, 2016
We need to define a couple more symbols for glibc >= 2.24.

Complete details:
#62

	modified:   Makefile.linux
	modified:   Makefile.macosx
@wb2osz
Copy link
Owner

wb2osz commented Dec 21, 2016

Fixed with commit 40047e9 in the dev branch.

Thanks for the detailed explanation.

@wb2osz wb2osz closed this as completed Dec 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants