About Safegloss
SafeGloss is an open, research-capable glossary system designed to support multilingual learners while providing educators and researchers with a transparent, inspectable platform for studying vocabulary use in authentic learning contexts.
SafeGloss is intended to function both as a classroom tool and as a research instrument for examining how learners interact with vocabulary, definitions, translations, and language-support scaffolds during normal instructional use.
Safegloss.org is operated as a (pending) 501(c)(3) non-profit. Subscription fees are used to cover the cost of hosting, software maintenance, and administration.
Open-Source Core
The SafeGloss core platform is free and open-source software, released under the GNU Affero General Public License (AGPL).
The open-source core includes the glossary engine, data model, and core interaction logic required to deploy SafeGloss in instructional or research environments.
This licensing ensures that SafeGloss remains:
- Transparent and auditable
- Modifiable for research purposes
- Freely usable in educational contexts
- Resistant to enclosure as a closed or proprietary system
Why AGPL?
SafeGloss is licensed under the AGPL to support research integrity and reproducibility.
The AGPL requires that if the software is modified and deployed as a networked service, the modified source code must also be made available to users. This ensures that:
- Researchers can inspect the exact code running in a deployed study
- Experimental conditions are not hidden behind proprietary changes
- Published findings can be meaningfully replicated or extended
- Educational research tools remain accountable to the academic community
In short, AGPL aligns with the values of open science, methodological transparency, and verifiable instrumentation.
SafeGloss as a Research Instrument
SafeGloss may be used in two primary research configurations:
-
Hosted use via SafeGloss.org
SafeGloss includes built-in analytics that allow researchers to study patterns such as:
- Frequency and timing of glossary interactions
- Types of terms accessed
- Duration and repetition of engagement
- Aggregate and group-level usage trends
- Engagement patterns from optional quiz and gamification features when enabled
This pathway supports observational and quasi-experimental research conducted in real classroom settings, using a shared, maintained platform.
-
Self-hosted deployment
Researchers and institutions may install SafeGloss on their own infrastructure.
Self-hosting enables:
- Full control over data storage and retention
- Alignment with institutional review board (IRB) or ethics requirements
- Custom instrumentation, logging, or experimental conditions
- Use in restricted, offline, or jurisdiction-specific environments
In this configuration, SafeGloss functions as a locally deployed research tool, rather than a third-party service.
Data Ethics
SafeGloss is intended for responsible educational and research use across diverse instructional contexts, including classrooms that use quizzes and gamified learning workflows.
Institutions and researchers should define appropriate data handling practices for their setting, including retention, access controls, and governance procedures.
Institutions and researchers retain responsibility for consent, anonymization, and ethical use in accordance with their local regulations and review processes.
Source Code and Research Use
The SafeGloss source code is publicly available on GitHub and may be forked, modified, and redeployed for research purposes in accordance with the AGPL.
GitHub repository: https://github.com/safegloss/safegloss-unified
Researchers are encouraged to:
- Fork the repository for experimental or institutional use
- Modify instrumentation to suit specific study designs
- Document and publish methodological changes alongside findings
We welcome scholarly use, critique, and extension of the platform.
Research-Aligned Design Philosophy
SafeGloss is developed with the following principles:
- Transparency over optimization
- Reproducibility over proprietary advantage
- Teacher control over algorithmic opacity
- Respect for student data and classroom context
Use of LLMs
SafeGloss can use large language models (LLMs) to help generate glossary content. This is designed to reduce teacher workload while keeping outputs tied to a specific instructional setting.
When AI generation is enabled, SafeGloss uses two key inputs:
- Course and glossary context (e.g., course/program/authority metadata, subject, grade level, and optional teacher-provided context) to scope definitions and examples to the way terms are used in that learning environment.
- User language settings to determine which translation languages are needed and how audio pronunciations should be generated (including language-appropriate voices and pronunciation steering).
SafeGloss uses this information to generate:
- Definitions (scoped to course context, not generic dictionary meaning)
- Translations (term + definition + example sentence, into learner languages)
- Example sentences (simple, instructional, context-relevant usage)
- Audio pronunciations (text-to-speech using the selected provider/model)
$ safegloss ai status LLM provider: openrouter TTS provider: openai Definitions: google/gemini-2.0-flash-001 Translations: google/gemini-2.0-flash-001 Example sentences: google/gemini-2.0-flash-001 Audio (TTS): gpt-4o-mini-tts # Values reflect the running server configuration (env vars / provider settings).
Glossary Research
- Abedi, J., Hofstetter, C. H., & Lord, C. (2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74(1), 1-28.
- Nerlinger, S. J. (2021). The bilingual dictionary accommodation: Can it help your students succeed on tests? NABE Journal of Research and Practice, 11(1–2), 22–31. https://doi.org/10.1080/26390043.2021.1962227