About Safegloss

Read the SafeGloss blog

SafeGloss is an open, research-capable glossary system designed to support multilingual learners while providing educators and researchers with a transparent, inspectable platform for studying vocabulary use in authentic learning contexts.

SafeGloss is intended to function both as a classroom tool and as a research instrument for examining how learners interact with vocabulary, definitions, translations, and language-support scaffolds during normal instructional use.

Safegloss.org is operated as a (pending) 501(c)(3) non-profit. Subscription fees are used to cover the cost of hosting, software maintenance, and administration.

Open-Source Core

The SafeGloss core platform is free and open-source software, released under the GNU Affero General Public License (AGPL).

The open-source core includes the glossary engine, data model, and core interaction logic required to deploy SafeGloss in instructional or research environments.

This licensing ensures that SafeGloss remains:

Transparent and auditable
Modifiable for research purposes
Freely usable in educational contexts
Resistant to enclosure as a closed or proprietary system

Why AGPL?

SafeGloss is licensed under the AGPL to support research integrity and reproducibility.

The AGPL requires that if the software is modified and deployed as a networked service, the modified source code must also be made available to users. This ensures that:

Researchers can inspect the exact code running in a deployed study
Experimental conditions are not hidden behind proprietary changes
Published findings can be meaningfully replicated or extended
Educational research tools remain accountable to the academic community

In short, AGPL aligns with the values of open science, methodological transparency, and verifiable instrumentation.

SafeGloss as a Research Instrument

SafeGloss may be used in two primary research configurations:

Hosted use via SafeGloss.org
SafeGloss includes built-in analytics that allow researchers to study patterns such as:
- Frequency and timing of glossary interactions
- Types of terms accessed
- Duration and repetition of engagement
- Aggregate and group-level usage trends
- Engagement patterns from optional quiz and gamification features when enabled
This pathway supports observational and quasi-experimental research conducted in real classroom settings, using a shared, maintained platform.
Self-hosted deployment
Researchers and institutions may install SafeGloss on their own infrastructure.

Self-hosting enables:
- Full control over data storage and retention
- Alignment with institutional review board (IRB) or ethics requirements
- Custom instrumentation, logging, or experimental conditions
- Use in restricted, offline, or jurisdiction-specific environments
In this configuration, SafeGloss functions as a locally deployed research tool, rather than a third-party service.

Data Ethics

SafeGloss is intended for responsible educational and research use across diverse instructional contexts, including classrooms that use quizzes and gamified learning workflows.

Institutions and researchers should define appropriate data handling practices for their setting, including retention, access controls, and governance procedures.

Institutions and researchers retain responsibility for consent, anonymization, and ethical use in accordance with their local regulations and review processes.

Source Code and Research Use

The SafeGloss source code is publicly available on GitHub and may be forked, modified, and redeployed for research purposes in accordance with the AGPL.

GitHub repository: https://github.com/safegloss/safegloss-unified

Researchers are encouraged to:

Fork the repository for experimental or institutional use
Modify instrumentation to suit specific study designs
Document and publish methodological changes alongside findings

We welcome scholarly use, critique, and extension of the platform.

Research-Aligned Design Philosophy

SafeGloss is developed with the following principles:

Transparency over optimization
Reproducibility over proprietary advantage
Teacher control over algorithmic opacity
Respect for student data and classroom context

Use of LLMs

SafeGloss can use large language models (LLMs) to help generate glossary content. This is designed to reduce teacher workload while keeping outputs tied to a specific instructional setting.

When AI generation is enabled, SafeGloss uses two key inputs:

Course and glossary context (e.g., course/program/authority metadata, subject, grade level, and optional teacher-provided context) to scope definitions and examples to the way terms are used in that learning environment.
User language settings to determine which translation languages are needed and how audio pronunciations should be generated (including language-appropriate voices and pronunciation steering).

SafeGloss uses this information to generate:

Definitions (scoped to course context, not generic dictionary meaning)
Translations (term + definition + example sentence, into learner languages)
Example sentences (simple, instructional, context-relevant usage)
Audio pronunciations (text-to-speech using the selected provider/model)

Live Models

$ safegloss ai status
LLM provider: openrouter
TTS provider: openai
Definitions: google/gemini-2.0-flash-001
Translations: google/gemini-2.0-flash-001
Example sentences: google/gemini-2.0-flash-001
Audio (TTS): gpt-4o-mini-tts
# Values reflect the running server configuration (env vars / provider settings).

Glossary Research

Abedi, J., Hofstetter, C. H., & Lord, C. (2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74(1), 1-28.
Nerlinger, S. J. (2021). The bilingual dictionary accommodation: Can it help your students succeed on tests? NABE Journal of Research and Practice, 11(1–2), 22–31. https://doi.org/10.1080/26390043.2021.1962227