Sorting and Sanitizers
13 mins readOctober 31, 2020
sorting,
algorithms
Title is misleading isn’t it? What has sanitizers to do with insertion sort?
Well one fine day, I was trying to brush-up algorithms and implementation skills so I started with insertion sort. The implementation took around 2 mins but the testing ruined around 15 mins. This blog just records the problem.
Here is where it all started.
#include <bits/stdc++.h>
using namespace std;
void isort(int n, int *a) {
for(int i=1; i<n; i++) {
int key = a[i];
int j;
for(j=i-1; j>=0 && a[j]>key; j--) {
a[j+1] = a[j];
}
a[j] = key;
}
}
int main(int argc, char ** argv) {
for(int i=0; i<1000; i++) {
int r = rand() % 10000;
int a[r];
for(int j=0; j<r; j++) {
a[j] = rand() % 10000;
}
isort(r, a);
for(int j=1; j<r; j++) {
assert(a[j-1] <= a[j]);
}
}
}
Looks pretty good right? When I execute it…
[mmp@mpataki insertionsort]$ g++ sol.cpp -o sol -g
[mmp@mpataki insertionsort]$ ./sol
Segmentation fault (core dumped)
I thought I will run it with gdb
to figure out the exception stack.
[mmp@mpataki insertionsort]$ echo -e "r\nwhere" | gdb -q ./sol | head -n 20
Reading symbols from ./sol...
(gdb) Starting program: /home/mmp/cp/algo/c2/insertionsort/sol
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.31-4.fc32.x86_64
Program received signal SIGSEGV, Segmentation fault.
0x00000d5400401349 in ?? ()
Missing separate debuginfos, use: dnf debuginfo-install libgcc-10.2.1-1.fc32.x86_64 libstdc++-10.2.1-1.fc32.x86_64
(gdb)
#0 0x00000d5400401349 in ?? ()
#1 0x0000150a0000150a in ?? ()
#2 0x0000150a0000150a in ?? ()
#3 0x0000150a0000150a in ?? ()
#4 0x0000150a0000150a in ?? ()
#5 0x0000150a0000150a in ?? ()
#6 0x0000150a0000150a in ?? ()
#7 0x0000150a0000150a in ?? ()
#8 0x0000150a0000150a in ?? ()
#9 0x0000150a0000150a in ?? ()
#10 0x0000150a0000150a in ?? ()
#11 0x0000150a0000150a in ?? ()
#12 0x0000150a0000150a in ?? ()
...
WTF is there on the stack? 0x*******
? Looks like the stack is corrupt. But which line corrupted it? gdb
looks definitely not of any help here. So I had do go through each line to figure out the line whic corrupted the stack. It was indeed line 11. Instead of writing to a[j+1]
I was writing to a[j]
.
All that is fine, mistakes are possible. But how will we catch them (the cause) in future? After a little googling I found something called sanitizers (there are many of them asan
, tsan
, ubsan
).
Asan
is address sanitizer which adds a few more instructions to your code to do sanity checks to detect such stack corruptions (possibly it can do more, which I haven’t explored much) and can be used as follows
[mmp@mpataki insertionsort]$ g++ -fsanitize=address sol.cpp -o sol -g
You have to install the libasan
first
# Fedora / RHEL
[mmp@mpataki insertionsort]$ sudo yum install libasan
Now when I run it (Note it clearly points out the line number 11)
mmp@mpataki insertionsort]$ ./sol
=================================================================
==447554==ERROR: AddressSanitizer: dynamic-stack-buffer-overflow on address 0x7fff01150a9c at pc 0x0000004013ea bp 0x7fff01150a40 sp 0x7fff01150a30
WRITE of size 4 at 0x7fff01150a9c thread T0
#0 0x4013e9 in isort(int, int*) /home/mmp/cp/algo/c2/insertionsort/sol.cpp:11
#1 0x40159e in main /home/mmp/cp/algo/c2/insertionsort/sol.cpp:23
#2 0x7f9531820041 in __libc_start_main (/lib64/libc.so.6+0x27041)
#3 0x40115d in _start (/home/mmp/cp/algo/c2/insertionsort/sol+0x40115d)
Address 0x7fff01150a9c is located in stack of thread T0
SUMMARY: AddressSanitizer: dynamic-stack-buffer-overflow /home/mmp/cp/algo/c2/insertionsort/sol.cpp:11 in isort(int, int*)
Shadow bytes around the buggy address:
0x100060222100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100060222110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100060222120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100060222130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100060222140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100060222150: ca ca ca[ca]00 00 00 00 00 00 00 00 00 00 00 00
0x100060222160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100060222170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100060222180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100060222190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x1000602221a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==447554==ABORTING
What does asan
do
Let’s see some disassembly and see what code did the asan
add.
Without sanitizer
[mmp@mpataki insertionsort]$ objdump -D ./sol
...
0000000000401166 <_Z5isortiPi>:
401166: 55 push %rbp
401167: 48 89 e5 mov %rsp,%rbp
40116a: 89 7d ec mov %edi,-0x14(%rbp)
40116d: 48 89 75 e0 mov %rsi,-0x20(%rbp)
401171: c7 45 fc 01 00 00 00 movl $0x1,-0x4(%rbp)
401178: 8b 45 fc mov -0x4(%rbp),%eax
40117b: 3b 45 ec cmp -0x14(%rbp),%eax
40117e: 0f 8d 9c 00 00 00 jge 401220 <_Z5isortiPi+0xba>
401184: 8b 45 fc mov -0x4(%rbp),%eax
401187: 48 98 cltq
401189: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
401190: 00
401191: 48 8b 45 e0 mov -0x20(%rbp),%rax
401195: 48 01 d0 add %rdx,%rax
401198: 8b 00 mov (%rax),%eax
40119a: 89 45 f4 mov %eax,-0xc(%rbp)
40119d: 8b 45 fc mov -0x4(%rbp),%eax
4011a0: 83 e8 01 sub $0x1,%eax
4011a3: 89 45 f8 mov %eax,-0x8(%rbp)
4011a6: 83 7d f8 00 cmpl $0x0,-0x8(%rbp)
4011aa: 78 52 js 4011fe <_Z5isortiPi+0x98>
4011ac: 8b 45 f8 mov -0x8(%rbp),%eax
4011af: 48 98 cltq
4011b1: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
4011b8: 00
4011b9: 48 8b 45 e0 mov -0x20(%rbp),%rax
4011bd: 48 01 d0 add %rdx,%rax
4011c0: 8b 00 mov (%rax),%eax
4011c2: 39 45 f4 cmp %eax,-0xc(%rbp)
4011c5: 7d 37 jge 4011fe <_Z5isortiPi+0x98>
4011c7: 8b 45 f8 mov -0x8(%rbp),%eax
4011ca: 48 98 cltq
4011cc: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
4011d3: 00
4011d4: 48 8b 45 e0 mov -0x20(%rbp),%rax
4011d8: 48 01 d0 add %rdx,%rax
4011db: 8b 55 f8 mov -0x8(%rbp),%edx
4011de: 48 63 d2 movslq %edx,%rdx
4011e1: 48 83 c2 01 add $0x1,%rdx
4011e5: 48 8d 0c 95 00 00 00 lea 0x0(,%rdx,4),%rcx
4011ec: 00
4011ed: 48 8b 55 e0 mov -0x20(%rbp),%rdx
4011f1: 48 01 ca add %rcx,%rdx
4011f4: 8b 00 mov (%rax),%eax
4011f6: 89 02 mov %eax,(%rdx)
4011f8: 83 6d f8 01 subl $0x1,-0x8(%rbp)
4011fc: eb a8 jmp 4011a6 <_Z5isortiPi+0x40>
4011fe: 8b 45 f8 mov -0x8(%rbp),%eax
401201: 48 98 cltq
401203: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
40120a: 00
40120b: 48 8b 45 e0 mov -0x20(%rbp),%rax
40120f: 48 01 c2 add %rax,%rdx
401212: 8b 45 f4 mov -0xc(%rbp),%eax
401215: 89 02 mov %eax,(%rdx)
401217: 83 45 fc 01 addl $0x1,-0x4(%rbp)
40121b: e9 58 ff ff ff jmpq 401178 <_Z5isortiPi+0x12>
401220: 90 nop
401221: 5d pop %rbp
401222: c3 retq
...
With sanitizer
[mmp@mpataki insertionsort]$ objdump -D ./sol
...
0000000000401216 <_Z5isortiPi>:
401216: 55 push %rbp
401217: 48 89 e5 mov %rsp,%rbp
40121a: 48 83 ec 20 sub $0x20,%rsp
40121e: 89 7d ec mov %edi,-0x14(%rbp)
401221: 48 89 75 e0 mov %rsi,-0x20(%rbp)
401225: c7 45 f4 01 00 00 00 movl $0x1,-0xc(%rbp)
40122c: 8b 45 f4 mov -0xc(%rbp),%eax
40122f: 3b 45 ec cmp -0x14(%rbp),%eax
401232: 0f 8d c0 01 00 00 jge 4013f8 <_Z5isortiPi+0x1e2>
401238: 8b 45 f4 mov -0xc(%rbp),%eax
40123b: 48 98 cltq
40123d: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
401244: 00
401245: 48 8b 45 e0 mov -0x20(%rbp),%rax
401249: 48 8d 0c 02 lea (%rdx,%rax,1),%rcx
40124d: 48 89 c8 mov %rcx,%rax
401250: 48 89 c2 mov %rax,%rdx
401253: 48 c1 ea 03 shr $0x3,%rdx
401257: 48 81 c2 00 80 ff 7f add $0x7fff8000,%rdx
40125e: 0f b6 12 movzbl (%rdx),%edx
401261: 84 d2 test %dl,%dl
401263: 40 0f 95 c7 setne %dil
401267: 48 89 c6 mov %rax,%rsi
40126a: 83 e6 07 and $0x7,%esi
40126d: 83 c6 03 add $0x3,%esi
401270: 40 38 d6 cmp %dl,%sil
401273: 0f 9d c2 setge %dl
401276: 21 fa and %edi,%edx
401278: 84 d2 test %dl,%dl
40127a: 74 08 je 401284 <_Z5isortiPi+0x6e>
40127c: 48 89 c7 mov %rax,%rdi
40127f: e8 dc fd ff ff callq 401060 <__asan_report_load4@plt>
401284: 8b 01 mov (%rcx),%eax
401286: 89 45 fc mov %eax,-0x4(%rbp)
401289: 8b 45 f4 mov -0xc(%rbp),%eax
40128c: 83 e8 01 sub $0x1,%eax
40128f: 89 45 f8 mov %eax,-0x8(%rbp)
401292: 83 7d f8 00 cmpl $0x0,-0x8(%rbp)
401296: 0f 88 02 01 00 00 js 40139e <_Z5isortiPi+0x188>
40129c: 8b 45 f8 mov -0x8(%rbp),%eax
40129f: 48 98 cltq
4012a1: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
4012a8: 00
4012a9: 48 8b 45 e0 mov -0x20(%rbp),%rax
4012ad: 48 8d 0c 02 lea (%rdx,%rax,1),%rcx
4012b1: 48 89 c8 mov %rcx,%rax
4012b4: 48 89 c2 mov %rax,%rdx
4012b7: 48 c1 ea 03 shr $0x3,%rdx
4012bb: 48 81 c2 00 80 ff 7f add $0x7fff8000,%rdx
4012c2: 0f b6 12 movzbl (%rdx),%edx
4012c5: 84 d2 test %dl,%dl
4012c7: 40 0f 95 c7 setne %dil
4012cb: 48 89 c6 mov %rax,%rsi
4012ce: 83 e6 07 and $0x7,%esi
4012d1: 83 c6 03 add $0x3,%esi
4012d4: 40 38 d6 cmp %dl,%sil
4012d7: 0f 9d c2 setge %dl
4012da: 21 fa and %edi,%edx
4012dc: 84 d2 test %dl,%dl
4012de: 74 08 je 4012e8 <_Z5isortiPi+0xd2>
4012e0: 48 89 c7 mov %rax,%rdi
4012e3: e8 78 fd ff ff callq 401060 <__asan_report_load4@plt>
4012e8: 8b 01 mov (%rcx),%eax
4012ea: 39 45 fc cmp %eax,-0x4(%rbp)
4012ed: 0f 8d ab 00 00 00 jge 40139e <_Z5isortiPi+0x188>
4012f3: 8b 45 f8 mov -0x8(%rbp),%eax
4012f6: 48 98 cltq
4012f8: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
4012ff: 00
401300: 48 8b 45 e0 mov -0x20(%rbp),%rax
401304: 48 8d 34 02 lea (%rdx,%rax,1),%rsi
401308: 8b 45 f8 mov -0x8(%rbp),%eax
40130b: 48 98 cltq
40130d: 48 83 c0 01 add $0x1,%rax
401311: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
401318: 00
401319: 48 8b 45 e0 mov -0x20(%rbp),%rax
40131d: 48 8d 0c 02 lea (%rdx,%rax,1),%rcx
401321: 48 89 f0 mov %rsi,%rax
401324: 48 89 c2 mov %rax,%rdx
401327: 48 c1 ea 03 shr $0x3,%rdx
40132b: 48 81 c2 00 80 ff 7f add $0x7fff8000,%rdx
401332: 0f b6 12 movzbl (%rdx),%edx
401335: 84 d2 test %dl,%dl
401337: 41 0f 95 c0 setne %r8b
40133b: 48 89 c7 mov %rax,%rdi
40133e: 83 e7 07 and $0x7,%edi
401341: 83 c7 03 add $0x3,%edi
401344: 40 38 d7 cmp %dl,%dil
401347: 0f 9d c2 setge %dl
40134a: 44 21 c2 and %r8d,%edx
40134d: 84 d2 test %dl,%dl
40134f: 74 08 je 401359 <_Z5isortiPi+0x143>
401351: 48 89 c7 mov %rax,%rdi
401354: e8 07 fd ff ff callq 401060 <__asan_report_load4@plt>
401359: 8b 36 mov (%rsi),%esi
40135b: 48 89 c8 mov %rcx,%rax
40135e: 48 89 c2 mov %rax,%rdx
401361: 48 c1 ea 03 shr $0x3,%rdx
401365: 48 81 c2 00 80 ff 7f add $0x7fff8000,%rdx
40136c: 0f b6 12 movzbl (%rdx),%edx
40136f: 84 d2 test %dl,%dl
401371: 41 0f 95 c0 setne %r8b
401375: 48 89 c7 mov %rax,%rdi
401378: 83 e7 07 and $0x7,%edi
40137b: 83 c7 03 add $0x3,%edi
40137e: 40 38 d7 cmp %dl,%dil
401381: 0f 9d c2 setge %dl
401384: 44 21 c2 and %r8d,%edx
401387: 84 d2 test %dl,%dl
401389: 74 08 je 401393 <_Z5isortiPi+0x17d>
40138b: 48 89 c7 mov %rax,%rdi
40138e: e8 9d fc ff ff callq 401030 <__asan_report_store4@plt>
401393: 89 31 mov %esi,(%rcx)
401395: 83 6d f8 01 subl $0x1,-0x8(%rbp)
401399: e9 f4 fe ff ff jmpq 401292 <_Z5isortiPi+0x7c>
40139e: 8b 45 f8 mov -0x8(%rbp),%eax
4013a1: 48 98 cltq
4013a3: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
4013aa: 00
4013ab: 48 8b 45 e0 mov -0x20(%rbp),%rax
4013af: 48 8d 0c 02 lea (%rdx,%rax,1),%rcx
4013b3: 48 89 c8 mov %rcx,%rax
4013b6: 48 89 c2 mov %rax,%rdx
4013b9: 48 c1 ea 03 shr $0x3,%rdx
4013bd: 48 81 c2 00 80 ff 7f add $0x7fff8000,%rdx
4013c4: 0f b6 12 movzbl (%rdx),%edx
4013c7: 84 d2 test %dl,%dl
4013c9: 40 0f 95 c7 setne %dil
4013cd: 48 89 c6 mov %rax,%rsi
4013d0: 83 e6 07 and $0x7,%esi
4013d3: 83 c6 03 add $0x3,%esi
4013d6: 40 38 d6 cmp %dl,%sil
4013d9: 0f 9d c2 setge %dl
4013dc: 21 fa and %edi,%edx
4013de: 84 d2 test %dl,%dl
4013e0: 74 08 je 4013ea <_Z5isortiPi+0x1d4>
4013e2: 48 89 c7 mov %rax,%rdi
4013e5: e8 46 fc ff ff callq 401030 <__asan_report_store4@plt>
4013ea: 8b 45 fc mov -0x4(%rbp),%eax
4013ed: 89 01 mov %eax,(%rcx)
4013ef: 83 45 f4 01 addl $0x1,-0xc(%rbp)
4013f3: e9 34 fe ff ff jmpq 40122c <_Z5isortiPi+0x16>
4013f8: 90 nop
4013f9: c9 leaveq
4013fa: c3 retq
...