BOSKALIS – Intent-Driven Knowledge Graph Synthesis via Semantic Topology Optimization #SWI2026
About the company:
With its roots in the Netherlands, Boskalis has over 100 years’ experience in hydraulic engineering, coastal protection and land reclamation.
Boskalis has more than 11,000 experts, operates in 94 countries and across six continents, with a versatile fleet of more than 500 vessels and floating equipment.
Introduction
Traditional Knowledge Graph (KG) construction typically relies on Large Language Models
(LLMs) for Open Information Extraction (OIE). However, these methods are often stochastic
and computationally opaque. We are looking for deterministic alternatives: construction of a KG
through the optimization of its topological structure within a semantic vector space.
Objective: Given a corpus and a specific user intent, the optimal relational structure
of a KG can be induced by optimizing the graph topology itself such that semantic connectivity
between intent-relevant concepts is maximized under sparsity and locality constraints, bypassing
the need for generative linguistic inference.
Mathematical Framework
Suppose \(D = \{d_1, d_2, \ldots, d_n\}\) be a dataset of text segments. We define an embedding function
\[
\phi : D \to \mathbb{R}^d
\]
that represents each segment as a \(d\)-dimensional vector \(v_i \in \mathbb{R}^d\)$. Each embedding corresponds to a
node in the graph, and we denote the node set by \(V = \{v_1, \ldots, v_n\}\).
Topology as a Design Variable
Rather than fixing the graph connectivity a priori, we introduce a weighted adjacency matrix
\[
W = [w_{ij}] \in [0,1]^{n \times n}
\]
where \(w_{ij}\) represents the strength (or existence) of a directed edge from node \(i\) to node \(j\). The
matrix \(W\) constitutes the primary optimization variable.
To enforce semantic locality, we define intent-independent semantic kernels
\[
k_{ij} = \exp\!\bigl(-\tau\, \mathrm{dist}(\mathbf{v}_i, \mathbf{v}_j)\bigr),
\]
and define edge conductances as
\[
c_{ij} = w_{ij}k_{ij}.
\]
Intent Modeling
A user intent is represented as a vector \(\mathbf{u} \in \mathbb{R}^d\). Each node receives an intent relevance score
\[
s_i(\mathbf{u}) = \frac{\mathbf{v}_i \cdot \mathbf{u}}{\|\textbf{v}_i\|\,\|\textbf{u}\|}
\]
which is normalized into a probability distribution
\[
p_i(\textbf{u}) = \frac{\exp(\lambda s_i)}{\sum_{j=1}^n \exp(\lambda s_j)}.
\]
This distribution identifies nodes most relevant to the user intent.
Intent-Driven Topology Optimization
Let \(\textbf{C} = [c_{ij}]\) be the conductance matrix and \(L(\textbf{W})\) the associated (symmetrized) graph Laplacian.
We define the effective resistance \(R_{ij}(\textbf{W})\) between nodes \(i\) and \(j\) as induced by \(L(\textbf{W})\).
We propose the following topology optimization objective:
\begin{equation}
\min_{W}\;
\underbrace{\sum_{i,j} p_i(\textbf{u})\,p_j(\textbf{u})\, R_{ij}(\textbf{W})}_{\text{intent-conditioned connectivity}}
\;+\;
\underbrace{\mu \sum_{i,j} w_{ij}\,\mathrm{dist}(\textbf{v}_i, \textbf{v}_j)}_{\text{semantic locality}}
\;+\;
\underbrace{\eta \|\textbf{W}\|_1}_{\text{sparsity}}
\label{eq:objective}
\end{equation}
subject to
\[
0 \le w_{ij} \le 1,\qquad \sum_{i,j} w_{ij} \le B.
\]
Interpretation.
- The first term minimizes effective resistance between intent-relevant nodes, encouraging
efficient semantic connectivity. - The second term penalizes edges between semantically distant nodes.
- The third term enforces sparsity, analogous to a material budget in classical topology
optimization.
Proposed Methodologies
Optimization Strategies
The problem admits multiple solution strategies:
- Gradient-based methods using continuous relaxations of \(\textbf{W}\).
- Alternating minimization between connectivity and sparsity terms.
- Evolutionary or heuristic methods for nonconvex regimes.
Discrete graphs are recovered via thresholding or continuation schemes on \(\textbf{W}\).
Relation Labeling via Semantic Residuals
Once the optimized topology is obtained, relations are inferred from edge-wise semantic transitions.
For an edge \((i,j)\) with \(w_{ij} > 0\), the relation vector is defined as
\begin{equation}
\textbf{r}_{ij} = \mathrm{Proj}_{u}(\textbf{v}_j – \textbf{v}_i).
\label{eq:relationvec}
\end{equation}
Relation labels are selected from a vocabulary \(W\) via cosine similarity:
\begin{equation}
\mathrm{Label}_{ij} = \arg\max_{w \in W}
\left(
\frac{\mathrm{vec}(w)\cdot \textbf{r}_{ij}}{\|\mathrm{vec}(w)\|\,\|\textbf{r}_{ij}\|}
\right).
\label{eq:label}
\end{equation}
Research Objectives
- \textbf{Topology Sensitivity:} Analyze how sparsity budgets and locality penalties affect the
emergence of long-range semantic relations. - \textbf{Evaluation:} Benchmark the resulting KGs against LLM-based extraction using ranking
metrics, e.g., such as the Mean Reciprocal Rank (MRR).
