About Safegloss

Read the SafeGloss blog

SafeGloss is an open, research-capable glossary system designed to support multilingual learners while providing educators and researchers with a transparent, inspectable platform for studying vocabulary use in authentic learning contexts.

SafeGloss is intended to function both as a classroom tool and as a research instrument for examining how learners interact with vocabulary, definitions, translations, and language-support scaffolds during normal instructional use.

Safegloss.org is operated as a (pending) 501(c)(3) non-profit. Subscription fees are used to cover the cost of hosting, software maintenance, and administration.


Open-Source Core

The SafeGloss core platform is free and open-source software, released under the GNU Affero General Public License (AGPL).

The open-source core includes the glossary engine, data model, and core interaction logic required to deploy SafeGloss in instructional or research environments.

This licensing ensures that SafeGloss remains:


Why AGPL?

SafeGloss is licensed under the AGPL to support research integrity and reproducibility.

The AGPL requires that if the software is modified and deployed as a networked service, the modified source code must also be made available to users. This ensures that:

In short, AGPL aligns with the values of open science, methodological transparency, and verifiable instrumentation.


SafeGloss as a Research Instrument

SafeGloss may be used in two primary research configurations:

  1. Hosted use via SafeGloss.org

    SafeGloss includes built-in analytics that allow researchers to study patterns such as:

    • Frequency and timing of glossary interactions
    • Types of terms accessed
    • Duration and repetition of engagement
    • Aggregate and group-level usage trends
    • Engagement patterns from optional quiz and gamification features when enabled

    This pathway supports observational and quasi-experimental research conducted in real classroom settings, using a shared, maintained platform.

  2. Self-hosted deployment

    Researchers and institutions may install SafeGloss on their own infrastructure.

    Self-hosting enables:

    • Full control over data storage and retention
    • Alignment with institutional review board (IRB) or ethics requirements
    • Custom instrumentation, logging, or experimental conditions
    • Use in restricted, offline, or jurisdiction-specific environments

    In this configuration, SafeGloss functions as a locally deployed research tool, rather than a third-party service.


Data Ethics

SafeGloss is intended for responsible educational and research use across diverse instructional contexts, including classrooms that use quizzes and gamified learning workflows.

Institutions and researchers should define appropriate data handling practices for their setting, including retention, access controls, and governance procedures.

Institutions and researchers retain responsibility for consent, anonymization, and ethical use in accordance with their local regulations and review processes.


Source Code and Research Use

The SafeGloss source code is publicly available on GitHub and may be forked, modified, and redeployed for research purposes in accordance with the AGPL.

GitHub repository: https://github.com/safegloss/safegloss-unified

Researchers are encouraged to:

We welcome scholarly use, critique, and extension of the platform.


Research-Aligned Design Philosophy

SafeGloss is developed with the following principles:


Use of LLMs

SafeGloss can use large language models (LLMs) to help generate glossary content. This is designed to reduce teacher workload while keeping outputs tied to a specific instructional setting.

When AI generation is enabled, SafeGloss uses two key inputs:

SafeGloss uses this information to generate:

Live Models
$ safegloss ai status
LLM provider: openrouter
TTS provider: openai
Definitions: google/gemini-2.0-flash-001
Translations: google/gemini-2.0-flash-001
Example sentences: google/gemini-2.0-flash-001
Audio (TTS): gpt-4o-mini-tts
# Values reflect the running server configuration (env vars / provider settings).

Glossary Research