Alaska Biological Research Center MARK shortcourse

Imagine if you will, we wish to shrink the parameter space of a band-recovery model from 8 parameters to 4 in the following manner:

We are attempting to map the parameter space from 8 dimensions [real parameters] down to 4 [beta parameters] (for biological reasons, not capricious ones); akin to the efforts Joel described to you this morning.

S₁

ß₁

ß₂

ß₃

ß₄

S₂

ß₁

ß₂

ß₃

ß₄

S₃

ß₂

ß₁

ß₂

ß₃

ß₄

S₄

ß₂

ß₁

ß₂

ß₃

ß₄

r₁

ß₃

ß₁

ß₂

ß₃

ß₄

r₂

ß₄

ß₁

ß₂

ß₃

ß₄

r₃

ß₄

ß₁

ß₂

ß₃

ß₄

r₄

ß₄

ß₁

ß₂

ß₃

ß₄

Stripping away the sludge of arithmetic operators, and leaving behind just the coefficients of this system of equations, provides us with the glorious design matrix. Think of a design matrix as a juice squeezer. Place an orange in a squeezer, apply no pressure to the handle, and you get the same orange back out of the squeezer. Reef on the handle, and you produce juice.

A design matrix must possess as many rows as there are parameters in the PIM. If you wish to place a design matrix structure only upon the survival rates, but not upon reporting rates, there will be a 1-to-1 correspondence between real and ß parameters for reporting rates. Consequently an identity matrix is used as a pass-through filter for the reporting component of the design matrix. Let us further contrive to have our model manifest our biological insight regarding the effect of a group covariate upon the real parameters of interest:

survival_i= ß₀+ß₁(snow depth_i) + ß₂(sex)

This construct can exploit the design matrix to implement it. We will now have a column in our design matrix for the intercept term, that will allow us to differentiate survival parameters in our vector of estimated parameters, from the other kind of parameters in our model, namely reporting rates. The next column of the design matrix will contain not zeroes and ones, but rather data for snow depth that we will utilize for our analyze. The final column in our design matrix will be a single dummy variable, representing the sex category to which the real parameter belongs.

Three things happen with this use of a design matrix.

Most simply, we discover the use of dummy variables. We need k-1 dummy variables to describe categorical variables that have k categories.
Second, we discover the nature of group covariates, these are items thta affect multiple individuals in our marked population concurrently.
Third, we learn that a design matrix need not be populated exclusively by stark zeroes and ones.

This invocation of what I called an ultrastructure yesterday, is of considerable power in thorough data analysis. However, the power of this tool comes at a price. That price is the necessity of another matrix through which to pass our struggling real parameters. This hideous creature goes by the name link function.

The link function is a mathematical contrivance. Particularly under the snow-depth scenario described above, we are taking an unbounded independent variable, and applying it to the estimation of a bounded parameter (survival residing in the range (0,1)). Have you ever tried to perform such a task? It requires special talents, namely some type of a transformation, to enforce the boundedness constraint. The most prevalent form of this is logistic regression, hence a common link function is the logit link. The software we are using employs a sin link function, which in our experience has been adequate under all circumstances.

Maximum likelihood estimation actually takes place in this transformed parameter space. The software produces estimates of what it describes as ß-parameters which is equal in number to the number of columns of the design matrix. These estimates are not constrained in the domain (0,1). However, the software also performs the back-transformation (inverse of the link function) to report estimates and standard errors on the "biological" or real parameters scale.

Our advice to you is simple: stay away from link function alteration. There are no biological reasons for altering elements that are mathematical conveniences. Explorations on your part will deduce that different parameter estimates, AIC values, and perhaps model rankings will result from application of differing link functions. This is not justification for such explorations.

With those comments as background, let's take a tour of design matrices from a visual, as well as numerical point of view to discover more models we can construct within any of the sampling design frameworks described yesterday.