How It Works
The Technology
A language-agnostic knowledge graph requires rethinking every layer of the knowledge representation stack — from how individual words are grouped, to how concepts are encoded, to how queries resolve across linguistic boundaries.
Architecture
Three-Layer Semantic Stack
The NeuroCollective architecture separates linguistic surface forms from conceptual meaning through three distinct layers, each with a specific function.
Lexeme Groups
Surface form variants for a single concept within a language. walk, walking, walked, walker — different morphological forms, same underlying meaning. Grouped by language, normalized by a language-specific lemmatizer.
Interlinguals
Cross-language concept bridges. An interlingual node links the English lexeme group for WALK to the German group (Lauf, laufen), the French group (marcher, marche), and so on. Informed by multilingual lexical resources and typological data (Grambank, WALS).
SupraConcepts
The language-neutral semantic core. A SupraConcept node is the canonical representation of a meaning — no language attached, no string label, just a unique identifier and a set of semantic relationships to other SupraConcepts. Every lexeme in every language resolves to a SupraConcept.
Interactive Demo
SupraConcept Resolution
Seven lexemes across four languages. One SupraConcept. The graph below is a static demonstration — the actual knowledge graph is not publicly accessible.
Static demonstration — pre-loaded mock data only. No live KG access.
Why It Matters
The Problem with Language-Dependent KGs
English Centrism
Most knowledge graphs — including Wikidata, ConceptNet, and enterprise data dictionaries — use English as the organizing principle. Non-English data is either translated or excluded.
Translation Bottleneck
Bridging knowledge across languages via translation introduces errors, loses nuance, and fails entirely on concepts that don’t have clean translations. Some concepts only exist in specific languages.
Maintenance at Scale
A language-dependent KG serving N languages requires N separate maintenance tracks. Add a language, double the work. New languages create Lexeme Groups that are attached to Interlinguals without touching the semantic core. NeuroCollective’s architecture scales linearly with language coverage.
Query Failure
When a user queries in Thai, they shouldn’t miss results that were entered in Finnish. Language-agnostic graph traversal means the query language and the data language are irrelevant to retrieval quality. Human knowledge does not change if it is presented in a different language than originally recorded.
Read the Research
Explore the active research tracks behind the technology.