Archive for September, 2004

The curious instance of the car in the night!

2004-09-21

Day 2 of Java Language course today.

The best bit from today had to be the explanation of an object
verses a reference type for which we were given the following example:

John bought a new car. It was his car, all his friends knew it was
his car and thus it was johns car:

public class example {
public static void main(String[] args) {
Car johnsCar = New Car();
...
}
}
-----------            --------
| johnsCar  |--------->|   Car  |
-----------            --------

However he shared the car with his wife, and her friends new the
car to be Debbie’s car. So now the car is know by two names, but
there is still only one car. It just has two references, represented
in java as:

public class example {
public static void main(String[] args) {
Car johnsCar = New Car();
Car debbiesCar = johnsCar;
...
}
}
-----------            --------
| johnsCar  |--------->|   Car  |
-----------            --------
^
------------         /
| debbiesCar |--------
------------

Before long the Daughter has passed her driving test and is also
allowed to borrow the Car. But to her friends it is know as…
Jackie’s Car. Its still the same car only it now has three references:

public class example {
public static void main(String[] args) {
Car johnsCar = New Car();
Car debbiesCar = johnsCar;
Car jackiesCar = johnsCar;
...
}
}
-----------            --------
| johnsCar  |--------->|   Car  |
-----------            --------
^   ^
------------         /   /
| debbiesCar |--------   /
------------           /
/
------------         /
| jackiesCar |--------
------------

Unfortunately, Jackie has an accident in the car. Jackie is all right,
but the same can not be said for the car. Jackie is no longer
allowed to use the car. But of course the car is damaged:

public class example {
public static void main(String[] args) {
Car johnsCar = New Car();
Car debbiesCar = johnsCar;
Car jackiesCar = johnsCar;
...
jackiesCar = null;
}
}
-----------            --------
| johnsCar  |--------->|   Car  |
-----------            --------
^
------------         /
| debbiesCar |--------
------------
------------
| jackiesCar |
------------

John then buys Jackie a car of her own. Thus we have two cars, one
with one reference and one with two:

public class example {
public static void main(String[] args) {
Car johnsCar = New Car();
Car debbiesCar = johnsCar;
Car jackiesCar = johnsCar;
...
jackiesCar = new Car();
}
}
-----------            -------
| johnsCar  |--------->|   Car |
-----------            -------
^
------------         /
| debbiesCar |--------
------------
------------          -----
| jackiesCar |------->| Car |
------------          -----

Of course it’s not a true story… If it was true then Debbie would
have Dis-associated herself from John’s car until he got it fixed! (ouch)!

Stace

import java.lang.*

2004-09-20

This week I am attending a JavaTM Programming Language class SL-275. This is the first time that I have studied an Object Oriented programming Language. And after the first day today I must say I’m quite impressed.

To prepare myself I read the the second chapter of David Flanagan‘s JAVA In a Nutshell book: “How Java Differs from C“. Which I found to be of great help when experiencing Attributes, Classes and Methods today.

It made me wonder… I first learned Fortran 77 in 1987. I stopped using F77 when I learned Pr1me’s System Programming Language (SPL, a subset of PL1/G). Which I subsequently stopped using after I learnt The Kernighan and Ritchie: C Programming Language; First Edition. So How long is it going to be before ‘C’ is but a memory in my not so long past….

Stace

To be honest I stopped using SPL when I left Pr1me Computer. Primos (Prime Operating System) was mainly written in SPL and PLP with bits in fortran, assembler and Modular. Unlike Solaris which is mainly written in ‘C’.

mdb: drilling sigsegv; by way of an introduction

2004-09-15

I have been using mdb(1), Solaris Modular
Debugger,
to help diagnose an issue with vold(1M).

Shortly after startup vold core dumped. So I configured
the system to save the core file using coreadm(1M) and restarted
vold with some additional debug flags. When vold next crashed the
logs it created were not particularly helpful in this case, but
the core file told another story:

$ mdb /var/cores/vold:seth-5361:0-0:1094116587
Loading modules: [ libc.so.1 ld.so.1 ]
> ::status
debugging core file of vold (32-bit) from seth
file: /usr/sbin/vold
initial argv: /usr/sbin/vold -v -t -L 99
threading model: multi-threaded
status: process terminated by SIGSEGV (Segmentation Fault)
> ::regs
%g0 = 0x0000000000000000                 %l0 = 0xfef50692
%g1 = 0x000000006e6c2cff                 %l1 = 0x01010101
%g2 = 0x0000000000007b34                 %l2 = 0xff266b00 libc.so.1`_uberdata
%g3 = 0xffffffffffffffff                 %l3 = 0x00001084
%g4 = 0xfffffffffffff810                 %l4 = 0x00001000
%g5 = 0x0000000000000000                 %l5 = 0x00000000
%g6 = 0x0000000000000000                 %l6 = 0x00000000
%g7 = 0x00000000ff139000                 %l7 = 0x01000000
%o0 = 0x00000000ff1e1010 libc.so.1`strlcat %i0 = 0xff30dda8
%o1 = 0x0000000000000041                 %i1 = 0xfef50680
%o2 = 0x00000000ff29f2dc libnsl.so.1`parse_default+0xec %i2 = 0xfef50620
%o3 = 0x00000000ff3ea108 ld.so.1`lml_main %i3 = 0x00000000
%o4 = 0x00000000ff3ec644 ld.so.1`dbg_mask %i4 = 0x0000ff00
%o5 = 0x0000000000000000                 %i5 = 0x80808080
%o6 = 0x00000000fef4f948                 %i6 = 0xfef505b8
%o7 = 0x00000000ff3b36e8 ld.so.1`elf_rtbndr+0x10 %i7 = 0xff2155d0
libc.so.1`snprintf+0x7c
%psr = 0xfe900000 impl=0xf ver=0xe icc=NzvC
ec=0 ef=0 pil=0 s=0 ps=0 et=0 cwp=0x0
%y = 0x00000000
%pc = 0xff21192c libc.so.1`_ndoprnt+4
%npc = 0xff211930 libc.so.1`_ndoprnt+8
%sp = 0xfef4f948
%fp = 0xfef505b8
%wim = 0x00000000
%tbr = 0x00000000
>

mdb has debugger commands, known as dcmds and
walkers: see the man page.

First off, the ::status dcmd shows us a summary of the
current target. A segmentation fault occurs when a process tries to
reference an area of memory (a segment) that has not been allocated
(mapped). The ::regs dcmd shows us the registers when the
fault occurred. The register %pc (Program Counter) points to the
instruction that caused the violation, so lets look at the
instructions around %pc

> <pc-8/4i
libc.so.1`___const_seg_900000505+0x168:         unimp     0x808
save      %sp, -0xc70, %sp
st        %i3, [%sp + 0x6c]
mov       %o7, %i5
> <pc ::dis -n 2
libc.so.1`___const_seg_900000505+0x168: unimp     0x808
libc.so.1`_ndoprnt:             save      %sp, -0xc70, %sp
libc.so.1`_ndoprnt+4:           st        %i3, [%sp + 0x6c]
libc.so.1`_ndoprnt+8:           mov       %o7, %i5
libc.so.1`_ndoprnt+0xc:         clr       %i3
>

Firstly I used the older adb command to disassemble instructions
around the PC, as SPARC instructions are always 4 words its easy to
rewind a little first. However mdb provides the dis dcmd
which nicely backs up a little for you and highlights the starting
address.

Anyhow, I digress… The instructions at the PC show register
%i3 being stored on the stack, %sp offset 0x6c. As the
fault is a SIGSEGV we can presume that %sp+0x6c is non-existent. To
confirm:

> <sp+0x6c=X
fef4f9b4
> <sp+0x6c/X
mdb: failed to read data from target: no mapping for address
0xfef4f9b4:
> ::mappings ! egrep 'fef|SIZE'
BASE    LIMIT     SIZE NAME
fef00000 fef02000     2000 /usr/lib/smedia/sm_scsi.so.1
fef12000 fef14000     2000 /usr/lib/smedia/sm_scsi.so.1
fef20000 fef22000     2000 /usr/lib/straddr.so.2
fef32000 fef34000     2000 /usr/lib/straddr.so.2
fef40000 fef48000     8000
fef50000 fef58000     8000
fef60000 fef64000     4000 /usr/lib/vold/dev_rmdisk.so.1
fef74000 fef76000     2000 /usr/lib/vold/dev_rmdisk.so.1
fef80000 fef84000     4000 /usr/lib/vold/dev_pcmem.so.1
fef94000 fef96000     2000 /usr/lib/vold/dev_pcmem.so.1
fefb0000 fefb2000     2000 /usr/lib/vold/dev_floppy.so.1
fefc2000 fefc4000     2000 /usr/lib/vold/dev_floppy.so.1
fefd6000 fefd8000     2000
fefe4000 fefe6000     2000
feff6000 feff8000     2000
>

<sp+0x6c=X calculates the address and displays it in
hex. <sp+0x6c/X calculates the address and attempts to
display the data from that location in hex.

Sure enough the stack pointer register seems to be in no mans land.
The mappings dcmd (output abbreviated by grep) confirms that.

So who set %sp to this value? Well that would have been the
save instruction just prior to this store instruction (see
disassemble above). Now the save instruction causes a new stack to be
popped on top (lowest address) of the current stack. The SPARC save
instruction places the old %sp in %o6 and the new
%sp in %i6 (in fact %sp is a pseudo for %i6).

So Why was my stack limited to only 0x8000 (32,768):
32k?
Good question to which I thank Nobutomo Nakano for his
assistance in helping me dig deeper…

One clue was right at the very beginning threading model:
multi-threaded
. Another is at the bottom (highest address) of the stack:

> ::stack
libc.so.1`_ndoprnt+4(ff30dda8, fef50680, fef50620, 0, ff00, 80808080)
libc.so.1`snprintf+0x7c(fef50e90, 800, ff30dda8, fef54150, 59048, f)
libnsl.so.1`parse_default+0x370(fef54150, 0, 694c0, 7e, 59048, 800)
libnsl.so.1`parse_path+0xc4(fef54150, fef516f0, 5a5a8, 694b8, 80, 3)
libnsl.so.1`__nis_getnames+0xfc(fef54150, fef549e4, 5a5a8, 694b8, 694b8,
ffbff415)
libnsl.so.1`nis_list+0x2a8(fef54a48, 10043, 0, 0, fef54a60, 1)
nss_nisplus.so.1`_nss_nisplus_expand_lookup+0x168(7ead8, fef54f78, fee86b3c,
fef575b9, fee86b44, 0)
nss_nisplus.so.1`getbyname+0x2c(7ead8, fef54f78, ff266b00, 15684, ff1d8364,
fee98000)
libc.so.1`nss_search+0x1d4(fee8296c, ff26573c, ff26a7cc, fef54f78, 1, ff262228)
libnsl.so.1`_switch_gethostbyname_r+0x50(fef575b9, 7c99c, 7c9b0, 2120, fef57088
, ff2b81d4)
libnsl.so.1`_door_gethostbyname_r+0x108(fef575b9, 7c99c, 7c9b0, 2120, fef57088,
fef575b9)
libnsl.so.1`_get_hostserv_inetnetdir_byname+0xac0(2, fef57138, fef57130, 0,
fef570ac, 0)
libnsl.so.1`netdir_getbyname+0x84(56f48, fef571bc, fef571b8, ff295aa8, 760c0,
ff22e73c)
libnsl.so.1`_getclnthandle_timed+0x198(fef575b9, 56f48, fef5728c, ff314f00, 0, 0
)
libnsl.so.1`__rpcb_findaddr_timed+0x134(1873b, 1, 56f48, fef575b9, fef57304,
ff314f00)
libnsl.so.1`clnt_tp_create_timed+0x5c(fef575b9, 1873b, 56f48, 73cb0, 0, ff29cc90
)
libnsl.so.1`clnt_create_timed+0x198(fef575b9, 1873b, 1, 0, 0, 516a8)
libsmedia.so.1`is_server_running+0x38(58a28, 10ea0, 1234, ff3515cc, ff110580,
ff362000)
libsmedia.so.1`get_handle_from_fd+0x10c(17, 0, 58a28, 46008, d, ff362000)
dev_reset_symname+4(66760, 17, 666b0, 800017, 1c, 7)
dev_rmdisk.so.1`rmdisk_thread_wait+0x26c(66760, 666ec, 1, fef62a00, 8, 2)
libc.so.1`_lwp_start(0, 0, 0, 0, 0, 0)
>

A stack belonging to _lwp_start(): vold(1M) is threaded and
the first argument to thr_create(3C) is the stack address. Now we could use
truss(1M) or (for those using Solaris Express) dtrace(1M) to see
thr_create() being used. But we need not as mdb (bless its
cotton socks) has a some walkers to help:

> $l
5
> ::walk thread
1
2
4
5
> ::walk ulwps
ff138000
ff138400
ff138c00
ff139000
>

$l” Displays the thread number. “::walk thread
displays a list of valid threads. “::walk ulwps lists ulwp_t
pointers.

I can sense that you are not impressed! But wait… Look at the
man page to libthread(3LIB). Notice a man page for thr_stksegment(3C)
which states “The thr_stksegment() function returns, in its
stack_t argument, the address and size of the calling threads
stack.
” meaning that its got to be stored (or calculated)
somewhere (or somehow). Well it would make sense to have that stored
somewhere in the ulwp_t structure… And it is! Thanks to the
walker ulwps we know where that is:

> ff139000 ::print ulwp_t ! grep stk
ul_stk = 0xfef50000
ul_stktop = 0xfef58000
ul_stksiz = 0x8000
>

Using print dcmd we can print the contents of the
ulwp_t data structure.

Again I have used grep to show you only what is important here.
Now remembering that SPARC stacks grow down, the end of the stack for
this LWP is at ul_stk = 0xfef50000, its size is ul_stksiz = 0x8000
(32k) and thus its start is at ul_stktop = 0xfef58000.

So the problem is that this program didn’t allow for the stack to
grow so large. Lets look at the stack and see what made it out
grow:

> ::stackregs 1 ! grep '('
fef4f948 libc.so.1`_ndoprnt+4(ff30dda8)
fef505b8 libc.so.1`snprintf+0x7c(fef50e90)
fef50630 libnsl.so.1`parse_default+0x370(fef54150)
fef51690 libnsl.so.1`parse_path+0xc4(fef54150)
fef540f0 libnsl.so.1`__nis_getnames+0xfc(fef54150)
fef54950 libnsl.so.1`nis_list+0x2a8(fef54a48)
fef549e8 nss_nisplus.so.1`_nss_nisplus_expand_lookup+0x168(7ead8)
fef54e48 nss_nisplus.so.1`getbyname+0x2c(7ead8)
fef54ea8 libc.so.1`nss_search+0x1d4(fee8296c)
fef54f18 libnsl.so.1`_switch_gethostbyname_r+0x50(fef575b9)
fef54fa8 libnsl.so.1`_door_gethostbyname_r+0x108(fef575b9)
fef57018 libnsl.so.1`_get_hostserv_inetnetdir_byname+0xac0(2)
fef570b0 libnsl.so.1`netdir_getbyname+0x84(56f48)
fef57150 libnsl.so.1`_getclnthandle_timed+0x198(fef575b9)
fef571d0 libnsl.so.1`__rpcb_findaddr_timed+0x134(1873b)
fef57298 libnsl.so.1`clnt_tp_create_timed+0x5c(fef575b9)
fef57308 libnsl.so.1`clnt_create_timed+0x198(fef575b9)
fef57398 libsmedia.so.1`is_server_running+0x38(58a28)
fef579c0 libsmedia.so.1`get_handle_from_fd+0x10c(17)
fef57bb0 dev_reset_symname+4(66760)
fef57c28 dev_rmdisk.so.1`rmdisk_thread_wait+0x26c(66760)
fef57fa0 libc.so.1`_lwp_start(0)
>

Here I have used stackregs dcmd as it prints out the stack
pointer at the beginning of each frame. Stackregs provides more
information than I need here, so again I have trimmed it with grep.

To work out where the space went one simply needs to calculate the
differences between the addresses (stack/frame pointers) down the
left.

But hang on a minute. This is mdb(1m), we can copy /usr/demo/mdb
and write our own module to do the calculation for us….

> >fp ::walk simple_stack | ::stack_size
FRAME    bytes kilobyte Owner
fef505b8      120        0 0
fef50630     4192        4 libnsl.so.1`parse_default+0x370
fef51690    10848       11 libnsl.so.1`parse_path+0xc4
fef540f0     2144        2 libnsl.so.1`__nis_getnames+0xfc
fef54950      152        0 libnsl.so.1`nis_list+0x2a8
fef549e8     1120        1 nss_nisplus.so.1`_nss_nisplus_expand_lookup+0x168
fef54e48       96        0 nss_nisplus.so.1`getbyname+0x2c
fef54ea8      112        0 libc.so.1`nss_search+0x1d4
fef54f18      144        0 libnsl.so.1`_switch_gethostbyname_r+0x50
fef54fa8     8304        8 libnsl.so.1`_door_gethostbyname_r+0x108
fef57018      152        0 libnsl.so.1`_get_hostserv_inetnetdir_byname+0xac0
fef570b0      160        0 libnsl.so.1`netdir_getbyname+0x84
fef57150      128        0 libnsl.so.1`_getclnthandle_timed+0x198
fef571d0      200        0 libnsl.so.1`__rpcb_findaddr_timed+0x134
fef57298      112        0 libnsl.so.1`clnt_tp_create_timed+0x5c
fef57308      144        0 libnsl.so.1`clnt_create_timed+0x198
fef57398     1576        2 libsmedia.so.1`is_server_running+0x38
fef579c0      496        0 libsmedia.so.1`get_handle_from_fd+0x10c
fef57bb0      120        0 dev_reset_symname+4
fef57c28      888        1 dev_rmdisk.so.1`rmdisk_thread_wait+0x26c
fef57fa0        0        0 libc.so.1`_lwp_start
>

The Modular nature of MDB makes it easy to add your own custom
dcmds and walkers. Note: kilobyte column is rounded to nearest K.

In short the fix was to provide a larger stack for this thread.

I hope this little walk through of mdb has been of some interest if
not educational to some.

Stace

Is this thing on?

2004-09-10

So this would be my first ever blog then. I am not really certain as to what I want to write here for my first entry. The polite thing would be to introduce myself; Hello, my name is Stacey Marshall. I work for Sun Microsystems in the Sustaining and Engineering group known internally (at this current time) as Operating Platforms and N1 Revenue Products Engineering or OP/N1 RPE for short.

Now that’s done (First post!) I want to read up some more about Roller Weblogger (Thanks to Constantin for the weblog tips). As one of the first things I’d like to do is personalise the look of my page here which means I may need to learn to use some graphical tools, Looks like Chandan has some helpful tips on sodipodi.

Thanks for listening,

Stacey