A Spatial Tale of Two Dolphin Species and Many Data Sources

Valencia - January 20, 2026

Sara Martino

Dept. of Mathematical Science, NTNU

Trondheim and NTNU

Trondheim and NTNU

Trondheim and NTNU

Introduction

Introduction

  • Goal: Understand the spatial distribution of wild species
  • How do we collect data?

How do we collect data?

  1. Traditional data sources \(\longrightarrow\) go out and search for dolphins!!

How do we collect data?

  1. Traditional data sources \(\longrightarrow\) go out and search for dolphins!!
  • The observation process introduces a bias…
  • … but we know the searching protocol…

How do we collect data?

  1. Traditional data sources \(\longrightarrow\) go out and search for dolphins!!

    • The observation process introduces a bias…
    • … but we know the searching protocol…
    • ..therefore, we can correct for such bias!! 😀

How do we collect data?

  1. Traditional data sources \(\longrightarrow\) go out and search for dolphins!!

    • The observation process introduces a bias…
    • … but we know the searching protocol…
    • ..therefore, we can correct for such bias!! 😀

BUT These data are expensive and limited in space and time! 😔

Can we also use data from other sources

Social Data

  • Many people are out in the sea with leisure boats
  • People like to take pictures of dolphins if they spot one…such pictures are often put on social media…

Source: AI-generated image via ChatGPT (DALL·E), OpenAI (2026)

Social Data

  • Many people are out in the sea with leisure boats
  • People like to take pictures of dolphins if they spot one…such pictures are often put on social media…
  • This can be a valuable source of data

Social Data

  • Many people are out in the sea with leisure boats
  • People like to take pictures of dolphins if they spot one…such pictures are often put on social media…
  • This can be a valuable source of data
  • BUT these data have biases…and there is no searching protocoll!!

The Data

  • All our data are presence-only
  • We want to merge all data sources….
  • …accounting for each specific bias!

The Data

Data Type Years N. Campaigns Sightings (Stenella) Sightings (Tursiope)
FERRY 2007–2018 311 133 16
UNIRM 2017–2019 73 14 98
Social media 2008–2019 136 465

Notes:

  • We have many “Social media” data
  • We have both a “Spatial” and a “Temporal” bias!!

The Data

Sightings for bottlenose and striped dophins

Statistical Tools

Statistical tools

  • Log Gaussian Cox Processes (presence only data)

    • SPDE representation of Gaussian fields
    • Inference using INLA
  • Thinned point process (observation bias)

    • Detection function
    • Needs more than just INLA \(\longrightarrow\) inlabru
  • Joint modeling (merging of all data sources)

    • Easy with INLA+inlabru

Log Gaussian Cox Processes

  • We observe \(N\) points in the domain \(\Omega\).

  • Given the intensity \(\lambda(s)\) the likelihood is given by \[ \pi(Y|\lambda) =\exp \left\{ |\Omega| - \int_{\Omega} \lambda(s) ds\right\}\prod_{i = 1}^N\lambda(s_i) \]

  • The log-intensity is a Gaussian process \[ \log(\lambda(s)) = Z(s); \ Z(s)\sim \text{GP} \]

  • ‼️ Not analytically tractable ‼️

Implementation - Grid discretization

  • Discretize the domain into a grid
  • \(N_{ij}\) = # of observation in cell \((i,j)\)
  • \(N_{ij} \sim \text{Pois}(\Lambda_{ij})\) where \[\Lambda_{ij} = \int_{s_{ij}}\lambda(s)ds \approx |s_{ij}|\exp(z_{ij})\]
  • Possible models for \(Z(s)\):
    • Continuous GF \(\longrightarrow\) dense covariance matrix
    • GMRF \(\longrightarrow\) sparse covariance matrix
  • NOTE The grid serves both to approximate the latent Gaussian field and to approximate the likelihood

Implementation - SPDE Approach1

  • Use the SPDE approach over a mesh to represent the GF

\[ Z(s) = \sum_{i = 1}^nz_i\phi_i(s) \]

  • Approximate the GF

  • Do not need to approximate the observation location

  • Efficient computationally

  • Fits into the INLA framework!

Thinned point process

Thinned point process

  • The true intensity: \(\lambda(s)\) (..this is what we are interested in!)

  • The thinned intensity \[ \lambda^{\star}(s) = \lambda(s)g(s) \]

    • \(g(s)\) is the thinning (detection) function

    • Unless \(g(s)\) is log-linear in all parameters the INLA framework will not work!

    • inlabru is an extention of INLA that allowes for non linear terms

INLA-friendly models

Latent Gaussian Models

  1. Likelihood

\[ \pi(\mathbf{y}|\mathbf{u},\theta) = \prod_{i = 1}^n\pi(y_i|u_i,\theta) \]

INLA-friendly models

Latent Gaussian Models

  1. Likelihood

\[ \pi(\mathbf{y}|\mathbf{u},\theta) = \prod_{i = 1}^n\pi(y_i|u_i,\theta) \] 2. Latent field

  • Gaussianity

\[ \mathbf{u}|\theta\sim\mathcal{N}(0,\mathbf{Q}^{-1}(\theta)) \]

  • Linearity of the predictor

\[ \eta = \mathbf{A}^T\mathbf{u} \]

INLA-friendly models

Latent Gaussian Models

  1. Likelihood

\[ \pi(\mathbf{y}|\mathbf{u},\theta) = \prod_{i = 1}^n\pi(y_i|u_i,\theta) \] 2. Latent field

  • Gaussianity

\[ \mathbf{u}|\theta\sim\mathcal{N}(0,\mathbf{Q}^{-1}(\theta)) \]

  • Linearity of the predictor

\[ \eta = \mathbf{A}^T\mathbf{u} \]

  1. Hyper-prior \(\theta\sim\pi(\theta)\)

Thinned LGCP

Remember:

We assume that \(\log(\lambda(s))\) is linear wrt the latent Gaussian field

The log-observed intensity is then:

\[ \log(\lambda^{\star}(s)) = \log(\lambda(s)) + \log(g(s)) \]

This is linear only if

\[ \log(g(s)) \]

is linear…. which is often not the case!

Can we bring these models into the family of models that are amenable to INLA?

Yes (up to a certain degree) and the method is implemented in the inlabru library

inlabru: linearizing the non-linear term

Let \(\tilde{\eta}(\mathbf{u})\) be the non-linear predictor, and let \(\bar{\eta}(\mathbf{u})\) be the 1st order Taylor approximation at some \(\mathbf{u}_0\) \[ \bar{\eta}(\mathbf{u}) = \tilde{\eta}(\mathbf{u}_0) + B(\mathbf{u} - \mathbf{u}_0) = [\tilde{\eta}(\mathbf{u}_0)-B\mathbf{u}_0] + B\mathbf{u} \]

where \(B\) is the derivative matrix for the non-linear predictor, evaluated at \(\mathbf{u}_0\).

The non-linear observation model \(\tilde{\pi}(\mathbf{y}|\mathbf{u}, θ)\) is approximated by \[ \bar{\pi}(\mathbf{y}|\mathbf{u}, θ) = \pi(\mathbf{y}|\bar{\eta}(\mathbf{u}), θ)\approx\pi(\mathbf{y}|\tilde{\eta}(\mathbf{u}), θ) = \tilde{\pi}(\mathbf{y}|\mathbf{u}, θ) \] This model can be fit with INLA.

Iterative INLA algorithm

  • Start from a point \(\mathbf{u}_0^{(0)}\)

  • Repeat until convergence

    • Compute the first order Taylor approximation for \(\tilde{\eta}(\mathbf{u})\) \[ \bar{\eta}(\mathbf{u}) = [\tilde{\eta}(\mathbf{u}_0)-B\mathbf{u}_0] + B\mathbf{u} \]

    • Fit the model with INLA

    • Compute \[ \begin{aligned} \hat{\theta}^{(k)} & = \text{argmax}_{\theta}\bar{\pi}(\theta|\mathbf{y}; \mathbf{u}_0^{{(k-1)}}) \\ \hat{\mathbf{u}}^{(k)} & = \text{argmax}_{\mathbf{u}}\bar{\pi}(\mathbf{y}| \hat{\theta}; \mathbf{u}_0^{{(k-1)}}) \\ \end{aligned} \]

    • choose the next linearisation point as the point \(\mathbf{u}^{(k)}\) as \[ \mathbf{u}^{(k)} = (1-\alpha)\mathbf{u}^{(k-1)} + \alpha \mathbf{u}^{(k-1)} \]

    for some \(\alpha >0\) that minimises a norm of the difference between the linearised and non-linear predictors.

Some notes

  • Upon convergence (\(|\tilde{\eta}(\mathbf{u}_*) - \bar{\eta}(\mathbf{u}_*)| = 0\)), the overall posterior approximation accuracy depends on the degree of nonlinearity in the vicinity of \(\mathbf{u}_*\)
  • Some methods exist to check the approximation accuracy.
  • Great care has to be taken when defining prior for the non-linear terms!
  • More details in the inlabru ArXiv paper1!

Modeling the intensity

Modeling the intensity

  • The “true” (unthinned) intensity: \[ \lambda(s) = \beta_0 + \beta^TX(s) + \sum_k f_k(x_k(s)) + u(s) \]

    • \(u(s)\) is a GRF with Matern correlation function
    • could be spatio-temporal but would need more data!
  • The observed intensity:

\[ \lambda_j^{\star}(s) = t_j \ \lambda(s)\ g_j(s);\qquad j = 1,\dots \]

where

  • \(\lambda(s)\) is the true density
  • \(g_j(s)\) is the thinning function for observation process \(j\)
  • \(t_j\) is a time-scaling factor ( this is known for all observations processes except for the social data!)

Accounting for spatial bias: detection functions

  • For FERRY data \[ g_{\text{ferry}}(s) = \exp\left(-\frac{1}{\sigma^2_{\text{ferry}}} d(s)^2\right) \] where \(d(s)\) is the perpendicular distance to the transect

Accounting for spatial bias: detection functions

  • For FERRY data \[ g_{\text{ferry}}(s) = \exp\left(-\frac{1}{\sigma^2_{\text{ferry}}} d(s)^2\right) \] where \(d(s)\) is the perpendicular distance to the transect

  • For UNIRM data \[ g_{unirm}(s) = \begin{cases} 1 \text{ for } d(s)< K\\ 0 \text{ for } d(s)> K\\ \end{cases} \]

  • How about the SOCIAL MEDIA data?

Modeling spatial bias for social data

  • We assume that the sightings are biased towards area where there are more leisure boats….

  • but we do not have data about that…

  • Three different ideas:

    • Distance from the coastline
    • Boat density data from EmodNET platform
    • Use animal intensity as proxy for small boat intensity

Distance from the coastline

Assume that the closer to the coast there are more small boats…. higher detection probability close to the coast

  • This in is not necessary true (people like islands a lot!)
  • This is also a covariate often used to model species density

\[ g_{\tiny{SOCIAL\_COAST}}(s) = \exp\left\{-\frac{d(s)^2}{\xi_{\tiny{FERRY}}^2}\right\} \]

EMODnet data for boat density

EmodNET (European Marine Observation and Data Network) records boats using AIS (Automatic Identification System, mandatory above 15m length)

  • Detection probability is higher where boat intensity is higher

  • Does not consider small boats which are often those reporting sightings

\[ g_{\tiny{SOCIAL\_EMODNET}}(s) = \Phi\left\{\frac{d(s)^2}{\xi_{\tiny{EMODNET}}^2}- \mu_{\tiny{EMODNET}} \right\} \]

Social data sightings for all species

  • Use all sightings as a proxy for boat density
  • Data include species with very different behavior
  • Detection probability is higher where boat intensity is higher

\[ g_{\tiny{SOCIAL\_ANIMAL}}(s) = \Phi\left\{\frac{d(s)^2}{\xi_{\tiny{ANIMAL}}^2}- \mu_{\tiny{ANIMAL}} \right\} \]

Putting things togeter

The true intensity:

\[ \lambda(s) = \beta_0 + \beta X(s) + u(s); \qquad u(s)\sim GRF(\rho, \sigma^2_u) \]

The observed intensities:

\[ \begin{aligned} \lambda_{\tiny{FERRY}}(s) &= t_{\tiny{FERRY}} \ \lambda(s)\ g_{\tiny{FERRY}}(s);\\ \lambda_{\tiny{UNIRM}}(s) &= t_{\tiny{UNIRM}} \ \lambda(s)\ g_{\tiny{UNIRM}}(s);\\ \lambda_{\tiny{SOCIAL}}(s) &= t_{\tiny{SOCIAL}} \ \lambda(s)\ g_{\tiny{SOCIAL\_??}}(s);\\ \end{aligned} \]

Four choices for \(g_{\small{SOCIAL}}(s)\):

  • No (constant) detection \(g_{\tiny{SOCIAL\_BASELINE}}(s) = 1\) (benchmark)
  • Detection based on distance from the coastline \(g_{\tiny{SOCIAL\_COAST}}(s)\)
  • Detection based on boat intensity \(g_{\tiny{SOCIAL\_EMODNET}}(s)\)
  • Detection based on sightings intensity \(g_{\tiny{SOCIAL\_ANIMAL}}(s)\)

Is the model identifiable?

Low sightings intensity can result from:

  • There are no animals in the area

  • There are no observer in the area

  • How to solve this?

    • Gather information about the observation process
    • Use informative prior to “guide” inference

Priors and posteriors for the parameters in the detection functions

\(g_{\tiny{SOCIAL\_COAST}}(s)\)

\(g_{\tiny{SOCIAL\_EMODNET}}(s)\)

\(g_{\tiny{SOCIAL\_ANIMAL}}(s)\)

Results

Reconstructed intensity surface (Stenella)

Posterior Median

RWPCI

\[ \text{RWPCI} = (Q_3-Q_1)/Q_2 \]

Reconstructed intensity surface (Tursiope)

Posterior Median

RWPCI

\[ \text{RWPCI} = (Q_3-Q_1)/Q_2 \]

Summary and conclusions

  • Complex but very topical problem
  • What can we do:
    • Model several data sources jointly
    • Correct for the bias induced by the observation process
    • Recover known covariate effects
    • Estimate intensity surface with associated uncertainty
  • INLA + inlabru give a huge model flexibility….with great power comes great responsibility, models should be carefully considered!