top of page

Welcome to the VBNN Digital Library

Unlock a vast knowledge ecosystem featuring +30,000 books, academic papers, and expert insights—continuously updated to support your research and professional growth.

Maximize Your Access

Log in using your institutional email to instantly view and download tailored resources directly aligned with your specific program and curriculum.

Ready to begin? Sign in above to explore your personalized dashboard.

Search...

Latest Research Papers

Results found for empty search

  • The Pedagogy of Care and the Cost of Empathy: Secondary Traumatic Stress Among Educators

    This article examines the growing but under-recognized phenomenon of #secondary_traumatic_stress (STS) among classroom teachers, school counselors, and paraprofessionals who work with high-needs and highly transient student populations. Drawing on recent empirical work in #school_psychology, sociology of #emotional_labor, and #trauma_informed_pedagogy, the study argues that the daily practice of caring for children exposed to poverty, displacement, community violence, and family instability produces a distinct occupational injury that cannot be reduced to ordinary #teacher_burnout. Teachers who absorb the emotional weight of student suffering, hold the classroom as a stable relational space, and improvise support in the absence of adequate school mental health services often carry symptoms that mirror direct trauma exposure, including intrusive thoughts, sleep disturbance, hypervigilance, and #compassion_fatigue. The paper synthesizes evidence from 2020 to 2025 to describe the mechanisms through which #affective_labor becomes a health risk, then proposes a set of institutional support structures designed to shift responsibility from the individual teacher to the school system. These structures include protected time for #reflective_supervision, embedded mental health staffing, workload caps in high-needs schools, professional development that teaches the physiology of #vicarious_trauma, and human resources policies that treat STS as a workplace injury rather than a personal deficiency. The article concludes that the sustainability of trauma-informed education depends on treating the #wellbeing of the caring adult as a precondition, not an afterthought, of the caring classroom. Keywords: secondary traumatic stress, teacher burnout, affective labor, trauma-informed education, high-needs schools, institutional support, compassion fatigue, transient student populations. 1. Introduction Teaching has always involved more than the transmission of content. In the last two decades, however, the emotional and relational demands placed on classroom educators have expanded to a point where the daily work of teaching in many schools resembles frontline social service more than academic instruction (Ormiston, Nygaard, & Apgar, 2022). Teachers in under-resourced districts routinely serve as the most consistent adults in the lives of children who have experienced #adverse_childhood_experiences, community violence, housing instability, and the disruptions of forced #migration. They are often the first to notice a hungry child, the first to recognize a bruise, the first to sit with a student whose family has been evicted overnight, and the last to close the door on a classroom that doubles as a shelter, a clinic, and an office of triage. This article takes as its central premise that the #pedagogy_of_care, understood as the deliberate and sustained emotional investment teachers make in their students, is not free. It has a psychological cost that accumulates across the school year and across a career. That cost is best described through the clinical construct of #secondary_traumatic_stress, a condition first identified in social workers, therapists, and emergency responders, and now clearly documented in educators (Christian-Brandt, Santacrose, & Barnett, 2020; Simon, Petrovic, Baker, & Overstreet, 2022). The article draws together recent empirical evidence to argue that STS in teachers is a systemic, not a personal, problem, and that the current response, which relies almost entirely on individual #self_care and resilience training, is inadequate to the scale of the injury. The central claim developed here is that STS is a predictable outcome of a specific #labor_arrangement rather than a marker of individual weakness. When schools ask teachers to hold significant emotional weight for children in crisis, but provide neither the training, staffing, time, nor policy protection that other #helping_professions consider standard, the injury is structurally guaranteed. A serious response requires structural change. The paper proceeds in seven substantive sections. Section 2 reviews the theoretical foundations of #affective_labor and its extension into schools. Section 3 defines STS and distinguishes it from adjacent constructs such as burnout, #compassion_fatigue, and general #occupational_stress. Section 4 explores the specific mechanisms that make the pedagogy of care such a strong risk factor. Section 5 examines the amplifying effect of #high_needs and highly #transient_populations. Section 6 diagnoses institutional failures. Section 7 proposes a concrete framework of systemic supports. Section 8 discusses implications, limitations, and future research directions. The intended audience is broad. It includes teacher-education students preparing to enter urban and rural high-need schools, school leaders responsible for #staff_wellbeing, policymakers who allocate #school_funding, and researchers who study the emotional dimensions of teaching. The tone is deliberately practical. The literature on STS in education is now substantial enough to support clear recommendations, and the cost of continued institutional inaction is measured in resignations, unfilled vacancies, and children who lose the caring adult they had counted on. 2. Theoretical Framework: Affective Labor and the Caring Professions The concept of #affective_labor emerged from the sociology of work in the 1980s, largely through Arlie Hochschild's study of flight attendants and the management of feeling as a job requirement. The category has since been extended to nurses, therapists, hospice workers, call center employees, and, more recently, teachers (Rankin, 2021). Affective labor names the paid or unpaid work of producing an emotional state in another person, whether reassurance, calm, enthusiasm, or a sense of being seen. Unlike physical labor, which produces an object, and cognitive labor, which produces an analysis, affective labor produces a #relational_state that lives inside another human being and often leaves no visible trace at the end of the day. Two features of affective labor matter for the argument here. First, it is difficult to measure. A teacher who spent forty minutes de-escalating a student in crisis has produced something real, but that work will not appear on a lesson plan, a test score, or a performance evaluation. This #invisibility contributes directly to its systematic under-valuation and to the tendency of school systems to layer additional emotional demands on top of an already unrecognized workload (Brunzell, Waters, & Stokes, 2021). Second, sustained affective labor requires the worker to regulate her own emotional state to produce the desired state in another. Over time this internal regulation, which Hochschild called #deep_acting, can lead to a blurring of the worker's authentic emotional experience with the performed one, and to a kind of exhaustion that ordinary rest does not repair. The pedagogy of care sits squarely inside this tradition. Care theorists in education, following Nel Noddings and later feminist scholars, have argued that #relational_teaching is not an addition to instruction but its foundation. Learning happens in relationship. A child who does not feel safe will not learn to read. A student whose caregiver is in the emergency room will not solve a quadratic equation that afternoon. The teacher who understands this is asked to hold a #relational_container that makes learning possible. When the container is stable, students can risk not knowing, can risk failure, can risk showing curiosity in front of peers. When the container fails, the classroom becomes a place of #dysregulation and academic outcomes collapse. The theoretical difficulty is that the container is made of the teacher's own #nervous_system. When a student cries, the teacher's #limbic_response registers the cry before conscious thought. When a class of thirty children has experienced three weeks of a substitute due to a teacher's illness, the returning teacher walks into a room saturated with residual dysregulation. Recent work on #interpersonal_neurobiology has clarified that co-regulation, the process by which a calm adult helps a dysregulated child return to a regulated state, requires the adult to have surplus regulatory capacity to give (Brunzell, Stokes, & Waters, 2019). If she does not, co-regulation fails and both parties are pulled toward the more dysregulated state. This is the physiological substrate of the caring cost the paper describes. 3. Defining Secondary Traumatic Stress in the Educator Population #Secondary_traumatic_stress refers to the set of symptoms that arise from prolonged, empathic engagement with a traumatized person. It was first formalized in clinical populations, particularly among therapists working with survivors of sexual violence and combat, and its symptom profile deliberately parallels post-traumatic stress disorder. The three primary symptom clusters are #intrusion, meaning uninvited thoughts and images of the traumatic material the worker has heard; #avoidance, meaning behavioral or cognitive strategies to keep the material at a distance; and #arousal, meaning sleep disturbance, hypervigilance, and irritability (Ormiston et al., 2022). A worker with STS may find herself thinking of a student's disclosure while making dinner, avoiding the hallway where a difficult conversation took place, or waking at 3 a.m. rehearsing what she should have said. The term is often used interchangeably with #vicarious_trauma, #compassion_fatigue, and burnout, but these constructs are distinct. Vicarious trauma typically refers to longer-term shifts in the worldview of the helping professional, particularly around trust, safety, and predictability of the world. Compassion fatigue is a broader category that combines STS with the sense of reduced empathic capacity. Burnout, best captured in the Maslach model, describes chronic depletion driven by #workload, low #autonomy, and reduced sense of #efficacy. Burnout can occur in any occupation and does not require exposure to trauma. STS specifically requires exposure to the traumatic experience of another (Simon et al., 2022). The educator population meets the criteria for STS exposure in ways that are now well documented. Teachers hear disclosures. They read student writing that describes abuse, hunger, grief, and fear. They see injuries. They meet caregivers in acute distress. They accompany students through funerals, deportations, and hospitalizations. In schools that serve #high_poverty communities, the frequency of such exposure is high enough that most teachers, over the course of a school year, will encounter multiple students in acute crisis (Christian-Brandt et al., 2020). The exposure is not incidental. It is the daily texture of the work. Prevalence estimates vary by study and measurement instrument, but recent work suggests that a substantial minority of teachers in high-needs settings meet clinical thresholds for STS symptoms. Christian-Brandt and colleagues (2020) found in a sample of teachers in underserved elementary schools that approximately three quarters reported at least moderate levels of STS, with a meaningful subset reaching the high range. A systematic review by Ormiston and colleagues (2022) synthesized findings across twenty-two studies and concluded that STS is a distinct and measurable condition in the educator population, with prevalence that mirrors rates in social work and child protective services. The scale of the problem is therefore comparable to that in fields where trauma exposure has been recognized for decades and where organizational protections are, at least in principle, in place. STS in educators is associated with specific downstream consequences. Teachers with high STS report lower #job_satisfaction, higher #intent_to_leave the profession, disrupted sleep, physical health complaints, and reduced sense of efficacy with the very students who most need a stable adult (Herman, Sebastian, Reinke, & Huang, 2021). Simon and colleagues (2022) documented that higher STS in teachers was associated with lower quality teacher-student relationships and worse student socio-emotional functioning, indicating that the injury propagates from the adult back into the classroom the adult is trying to hold. In other words, when we fail to protect the teacher, we also fail the child. The two outcomes are linked, not opposed. 4. The Pedagogy of Care as a Risk Factor If STS were a rare event, it would not warrant a policy response. Its prevalence in the teaching workforce indicates that something about the structure of the work generates the injury reliably. This section identifies four mechanisms. 4.1 Sustained empathic exposure without clinical framing Therapists who see clients with trauma histories are trained in specific #empathic_techniques that manage the depth of their identification with the client. They understand the concept of the #therapeutic_frame, which limits the number of clients they see per day, structures a defined session length, and provides a physical office separate from the rest of their life. Teachers have none of these protections. A teacher may hear a student's disclosure between the pledge of allegiance and a math lesson, absorb it while continuing to teach, and be expected to remember the fraction curriculum without pause. The absence of a #clinical_frame does not reduce the exposure. It only removes the tools that would ordinarily contain it (Baker et al., 2021). 4.2 Persistent responsibility with limited authority Teachers are held responsible for outcomes that depend on factors far outside their control. A child who has slept in a car cannot concentrate. A student whose parent has been detained cannot focus on grammar. The teacher, however, remains accountable for the child's progress on standardized measures. This #responsibility_authority_gap is a classic driver of #occupational_stress across professions, and its intensity in teaching is well documented (Pressley, 2021). Under this gap, the teacher's care becomes a form of moral labor, an attempt to do right by children whose circumstances make ordinary academic progress unlikely. When academic outcomes then falter, the teacher often absorbs the failure as personal, which deepens the injury. 4.3 Emotional carryover into personal life Because affective labor does not have a physical off-switch, the emotional residue of the school day travels home with the teacher. Recent qualitative work has documented teachers describing intrusive thoughts of specific students, difficulty engaging with their own children in the evening, and sleep disturbance patterned around anticipated events at school the next day (Baker et al., 2021). The absence of a formal debrief structure, of the kind that emergency responders use after a critical incident, means that the teacher's #decompression is left entirely to her own resources, typically at the end of a fourteen-hour day. 4.4 The moral weight of witnessing without capacity to remedy A distinctive feature of teacher STS is the #moral_injury that arises from witnessing suffering the teacher cannot remedy. When a teacher recognizes that a child is being harmed at home but the child welfare system does not respond, when the teacher knows that a student will lose housing but the district has no supports, when the teacher watches a student who was thriving move to another district and disappear from her records, the teacher carries a specific form of grief. Moral injury has been described in military and healthcare contexts and is increasingly recognized in education (Sokal, Trudel, & Babb, 2020). It differs from anxiety and depression in that its origin is ethical rather than dispositional. The teacher is not sick. She is grieving a system that failed a child she loves. These four mechanisms interact. A single one may be tolerable. Their combination, sustained over years, produces the injury pattern described in the literature as educator STS. 5. High-Needs and Highly Transient Populations as Amplifiers The risk mechanisms above operate in all classrooms. They are amplified in schools serving #high_needs and highly transient populations, and it is in these schools that the case for systemic intervention is most urgent. 5.1 High-needs populations The term high-needs is used here to describe student populations in which a substantial proportion of children have experienced multiple adverse childhood experiences, including poverty, family violence, parental incarceration, food insecurity, exposure to community violence, or serious untreated mental illness in the home. Schools serving these populations are unevenly distributed. In many countries they are concentrated in specific urban neighborhoods, rural regions with limited services, and communities with recent histories of displacement. Teachers in these schools carry a higher daily load of #trauma_exposure, both because more of their students are in crisis at any moment and because the school itself is often the only institution attempting to respond (Baker et al., 2021). Empirical work in these settings consistently finds elevated STS. Christian-Brandt and colleagues (2020) documented in underserved elementary schools that STS was strongly associated with intent to leave the profession, and that this association held after controlling for individual factors. In other words, the environment, not the personality, is doing the work. Kim, Crooks, Bax, and Shokoohi (2021) showed that trauma-informed training combined with #mindfulness practices produced modest improvements in teacher wellbeing, but that these individual interventions were insufficient to counteract the environmental load in the absence of structural change. 5.2 Highly transient populations Student #transience refers to unplanned mid-year mobility, in which students enter or leave a school outside of the normal enrollment cycle. Transience is driven by housing instability, family reunification, immigration events, foster care placements, and, increasingly, climate displacement. Schools with high transience may see turnover rates of thirty percent or more across a single school year. Each mid-year entry and each mid-year departure produces a distinct emotional demand on the classroom teacher. The entering student arrives with unknown academic history, often mid-unit, sometimes with limited English, and always with the emotional weight of whatever caused the move. The departing student, with whom the teacher may have built months of relationship, disappears, often without a chance to say goodbye. Transience amplifies STS through several routes. It intensifies the frequency of loss experiences within the teacher's work life, functioning as a series of small bereavements without ritual. It disrupts the classroom's #relational_ecosystem, forcing the teacher to renegotiate group norms and safety with each new arrival. And it undermines the sense of professional efficacy, since teachers cannot see the arc of a student's development when the student is present for only weeks or months. In migrant, refugee, and unhoused populations, transience is a defining feature of the school experience, and the teacher who serves these communities absorbs the emotional physics of the wider displacement (Baker et al., 2021; Ormiston et al., 2022). 5.3 Intersection When high-need and high-transience conditions coexist, as they often do in the schools serving the poorest neighborhoods in the wealthiest countries, the two factors compound. The teacher is exposed to a high daily rate of student trauma, at high frequency of turnover, with limited stability in her caseload. Under these conditions the mechanisms described in Section 4 operate at maximum intensity. It is not surprising that these schools also show the highest #teacher_attrition rates, the highest use of substitute teachers, and the highest rates of vacancies in specialized roles such as special education and school counseling (Pressley, 2021). The staffing crisis in high-needs schools is, in part, an STS crisis. 6. Institutional Failures The dominant institutional response to teacher STS has been, and largely remains, individual. Districts offer wellness webinars, mindfulness apps, and occasional resilience workshops. Teachers are told to practice #self_care, to establish boundaries, to sleep more, and to exercise. These recommendations are not wrong, but they are inadequate on their own, and their exclusive use amounts to what critics have called the #individualization of a structural problem (Brunzell et al., 2021). Four institutional failures recur across the literature. 6.1 The framing of STS as a personal deficit When wellness is framed as a matter of personal responsibility, teachers who develop STS symptoms are implicitly positioned as insufficiently resilient. This framing has documented effects. Teachers report reluctance to disclose symptoms for fear of being seen as unable to handle the job. Reluctance to disclose delays help-seeking, and the injury deepens. The framing also protects the institution from having to change its own operations. If the problem is the teacher's resilience, the solution is another workshop. If the problem is a workload that no reasonable adult could sustain, the solution requires budget lines, staffing decisions, and policy change. 6.2 Absence of structured emotional processing Emergency responders and hospital trauma teams typically have some form of structured #debriefing after a critical incident. Social workers have supervision structures in which cases are discussed and the worker's own reactions are addressed. Teachers have none of these as a standard feature of the job. Grade-level meetings are typically dominated by curriculum planning and administrative matters. Even in schools with strong professional learning communities, the emotional content of the work is not systematically named or processed. Teachers describe going months without a conversation in which their own emotional state is a legitimate topic (Herman et al., 2021). 6.3 Inadequate mental health staffing The most consistent structural finding across the STS literature is that schools do not employ enough mental health professionals to serve the students who need them, with the result that teachers absorb the overflow. Recommended ratios for school counselors, psychologists, and social workers are rarely met in high-needs districts. When a student is in acute crisis and there is no counselor available within an hour, the classroom teacher becomes the counselor. This substitution is invisible in the budget and is one of the most direct pathways from student trauma to teacher STS (Simon et al., 2022). 6.4 Policy silence on STS as a workplace injury In most jurisdictions, workplace injury frameworks do not recognize STS in educators. A social worker or first responder who develops post-traumatic symptoms as a result of client exposure may in some systems access workers' compensation and structured return-to-work supports. Teachers rarely have access to such frameworks, even when their exposure is quantitatively similar. The result is that STS symptoms tend to be managed through personal sick leave, unpaid absences, or eventual resignation. The financial and career risk of acknowledging the injury falls entirely on the teacher. These four failures interact to produce a workforce that carries substantial STS burden without adequate acknowledgment, treatment, or protection. The consequences for the workforce are visible in attrition, vacancies, and reduced entry into the profession. The consequences for students are visible in the loss of the caring adults on whom they depend. 7. Proposed Institutional Support Structures The remainder of this article turns from diagnosis to proposal. If STS is a systemic injury, its prevention and treatment require systemic response. The framework proposed here is composed of seven interlocking elements. Each element has been shown in the literature to have some effect on its own. The claim advanced here is that they must be implemented together, as a coordinated response, and understood as elements of a policy architecture rather than a menu of options. 7.1 Recognizing STS as a workplace injury The first structural change is nominal but important. Districts, ministries, and teacher licensing bodies should formally recognize secondary traumatic stress as an #occupational_health condition potentially arising from the normal course of teaching in high-exposure settings. This recognition would trigger three practical consequences. It would create a legitimate basis for workplace accommodation and for access to workers' compensation frameworks where they exist. It would remove the framing of STS as a personal failing. And it would create a data infrastructure through which STS could be tracked, monitored, and studied at the district level. Recognition would need to be paired with clear #diagnostic_criteria and appropriate confidentiality protections. Teachers who report symptoms should not face negative consequences on their evaluations, and STS diagnoses should not be used in disciplinary or performance processes. The precedents in nursing and social work provide workable models (Ormiston et al., 2022). 7.2 Embedded mental health staffing at recommended ratios The second structural change is the most expensive and the most necessary. Schools serving high-needs and highly transient populations require full-time, on-site mental health staff at ratios that reflect the actual demand. Recommended ratios of school counselors to students, and of school psychologists and social workers to students, are widely published by professional associations. In under-resourced districts, actual ratios often reach two to three times the recommended figures. The purpose of adequate staffing is not only to serve students. It is also to preserve the appropriate role of the classroom teacher. When mental health professionals are present, teachers can refer, hand off, and consult. When they are absent, the teacher's role expands into clinical space she was not trained to occupy and is not authorized to occupy. Adequate staffing is therefore both a student service and a teacher protection (Simon et al., 2022). 7.3 Protected time for reflective supervision The third element is the introduction into schools of a structure long standard in the therapeutic professions. #Reflective_supervision refers to a regularly scheduled, protected meeting between a helping professional and a supervisor or peer in which cases are discussed and, crucially, the worker's own emotional reactions to the work are named and processed. Reflective supervision is not therapy. It is a structured professional practice for holding the emotional weight of the work. Implementing reflective supervision for teachers would involve identifying a suitable facilitator, whether an experienced teacher, a school counselor, or an external clinician with school experience. It would involve scheduling protected time, typically thirty to sixty minutes weekly or biweekly, that is not consumed by curriculum or administrative content. It would involve small groups, generally no larger than six to eight, to allow substantive discussion. And it would involve a clear commitment that what is said in supervision is not used in evaluation. Pilot programs adapting reflective supervision to school settings have shown promising results, though large-scale evaluation is still needed (Brunzell et al., 2021). 7.4 Workload caps in high-exposure settings The fourth element is a policy change many teachers have long requested and few systems have implemented. In schools serving high-needs and highly transient populations, class sizes should be structurally lower, non-teaching load should be reduced, and case counts for teachers with formal responsibility for students with individualized plans should be capped at levels that permit relational depth. Workload caps address STS through a specific mechanism. STS accumulates as a function of exposure. Reducing the number of students for whom a teacher is emotionally responsible reduces exposure and increases the possibility of the teacher maintaining her own regulatory capacity. Workload caps are not an efficiency question; they are a #safety_measure. Nurses' work in most jurisdictions has moved toward staffing ratios for exactly these reasons, and the parallel to teaching is direct (Herman et al., 2021). 7.5 Trauma-informed professional development that includes the physiology of the caring adult The fifth element concerns training. #Trauma_informed practice has become a widely used phrase in education, and many districts now offer some version of trauma-informed professional development. Much of this training focuses on the child, teaching educators to recognize trauma responses in students and to adjust their pedagogy accordingly. This is valuable, but it is incomplete. Adequate professional development must also teach the physiology of the caring adult. Teachers should understand how the human nervous system co-regulates, how sustained empathic exposure changes stress hormone patterns, how sleep and recovery relate to next-day emotional capacity, and how STS symptoms typically emerge. This knowledge changes the meaning of a teacher's own reactions. A teacher who understands that her difficulty sleeping in October is a predictable physiological response to sustained exposure, and not a personal failing, is better positioned to seek appropriate support and to advocate for structural change (Kim et al., 2021). Such training should be integrated into pre-service teacher education, particularly for candidates who will teach in high-needs settings, and into ongoing professional development at the district level. It should not be optional. It should not be relegated to a single afternoon session. And it should be taught by facilitators who are themselves clinically informed, not by curriculum specialists reading from slides. 7.6 Peer support and communities of practice The sixth element is peer structure. Teachers in high-needs schools benefit from formal, regular contact with other teachers doing similar work. Communities of practice, when they include explicit attention to the emotional content of the work, can serve as a #protective_factor against STS. They allow the normalization of reactions that would otherwise be experienced as personal weakness. They allow the exchange of strategies. They allow the development of shared vocabulary for the specific challenges of the setting. The evidence base on peer support in teaching is developing. Sokal and colleagues (2020) documented during the pandemic that teachers with strong collegial networks reported lower burnout and greater sense of efficacy, even under conditions of severe systemic stress. Peer support does not replace clinical intervention, but it functions as an important layer of the ecosystem. Districts can support this by protecting time for peer meetings, providing physical space, and, in some cases, training peer facilitators. 7.7 Administrator training and leadership accountability The seventh element concerns school leadership. Principals and other school-level administrators are the most immediate implementers of any of the preceding structures. If they do not understand STS, do not value teacher wellbeing, or view emotional labor as an unmeasured extra, the structures above will not survive contact with daily operations. Administrator training in the recognition and prevention of STS should be a formal requirement for school leadership certification in high-needs districts. Leaders should be evaluated in part on the health of their workforce, using indicators such as teacher retention, use of sick leave patterns consistent with burnout, and staff-reported measures of wellbeing. Leadership that treats teacher wellbeing as a performance metric will implement the seven structural elements. Leadership that does not, will not (Herman et al., 2021). Taken together, these seven elements form what might be called a #systemic_care_infrastructure. Each element addresses one of the risk mechanisms or institutional failures described earlier. Each has some evidentiary support. Their combination has not, to date, been tested at scale in a coordinated way, and the primary research recommendation of this paper is that such a combined implementation be piloted, evaluated, and, if successful, scaled. 8. Discussion The argument developed in this article rests on three claims that deserve explicit restatement. The first is that #secondary_traumatic_stress in teachers is a real and measurable condition, distinct from ordinary burnout, and prevalent in schools serving high-needs and highly transient populations. The second is that STS in educators is the predictable outcome of a structural arrangement in which the emotional demands of the work exceed the training, staffing, and policy protections provided. The third is that the response should therefore be structural, not individual. 8.1 The limits of individual interventions The proliferation of resilience training, mindfulness apps, and self-care campaigns in the last decade has produced modest benefits and one significant harm. The benefit is that many teachers now have vocabulary for their experience and access to individual strategies that can help. The harm is that the near-exclusive focus on individual response has crowded out attention to structural conditions. When a teacher completes a mindfulness course and returns to a classroom of thirty-two students, four of whom are in acute crisis, with no counselor available, no reflective supervision, and a workload that permits no genuine recovery, her improved mindfulness is measured against an unchanged environment (Kim et al., 2021). The environment continues to produce injury. This does not mean that mindfulness and other #self_regulation practices should be abandoned. It means that they should be placed in their appropriate role, as one component of a larger response, and no longer treated as a substitute for institutional change. 8.2 Equity implications The distribution of STS burden across the teaching workforce is not equitable. Teachers in high-needs schools carry more of it than teachers in wealthy schools. Teachers of color, who are disproportionately concentrated in high-needs schools and who often carry additional #cultural_labor as they serve as cultural bridges for students, families, and colleagues, carry even more (Baker et al., 2021). Women teachers, who make up the majority of the workforce, are more often called upon to perform the emotional and relational work that STS tracks. Any serious response to STS must attend to these distributional facts. Otherwise, generic support programs will produce benefits that flow disproportionately to teachers with lower baseline exposure while the highest-exposure teachers remain under-served. The equity argument extends to students. When high-needs schools cannot retain their teachers, students in those schools experience higher rates of substitute instruction, more teacher turnover across grade levels, and less continuity of relationship. The children whose need for stable adult relationships is greatest are those most likely to lose them. STS prevention is therefore not only a labor issue but an educational equity issue (Ormiston et al., 2022). 8.3 The pandemic as a natural experiment The years 2020 through 2022 functioned, unintentionally, as a large-scale demonstration of what happens when the emotional and structural pressures on teachers are amplified simultaneously. Rates of teacher-reported stress, anxiety, and intent to leave rose sharply during the pandemic period (Pressley, 2021; Baker et al., 2021). Teachers described emotional exhaustion of a kind that self-care could not touch. The specific etiology varied by context, but the general pattern is consistent with the argument here: when a caring workforce is asked to absorb high levels of student distress in the absence of adequate structural support, the workforce sustains injury at scale. The recovery period since 2022 has been uneven. Some districts have invested in mental health staffing using pandemic-era funding streams, and where these investments have been sustained, outcomes have been more favorable. Where the investments have been allowed to lapse, or where they were never made, the STS conditions of 2019 have returned. Policy stability is therefore itself a structural factor. 8.4 What we still do not know The evidence base on educator STS is now substantial enough to justify the recommendations made here, but important gaps remain. Longitudinal studies that follow teachers from entry into high-needs settings through their first five years are rare. The specific dose-response relationship between exposure frequency and symptom development has not been precisely quantified. The relative contribution of the four risk mechanisms described in Section 4 is not well established. The long-term effects of reflective supervision, workload caps, and other structural interventions have not been rigorously evaluated at scale. Interventions tested in high-income settings have not been consistently replicated in low-income and refugee-serving contexts, where the exposure profile may be qualitatively different. Future research should pursue these gaps. Particularly valuable would be cluster-randomized trials of the seven-element framework proposed here, comparing high-needs schools that implement the full package with matched schools that implement only conventional wellness supports. Such studies would be logistically and politically challenging, but the current evidence base, combined with the growing scale of the teacher retention crisis, may make them possible. 8.5 A note on language The vocabulary used to describe teacher suffering matters. Terms such as burnout have entered casual use and now carry connotations of individual weakness. The clinical language of STS, while more precise, can pathologize what is in many cases a healthy response to unhealthy conditions. A teacher who develops intrusive thoughts after a student's disclosure is not sick. She is responding to a difficult exposure with the equipment humans have. The condition to be addressed is the exposure without support, not the response itself. Care in language is therefore part of care in policy. Public discussion of teacher wellbeing should avoid framings that place the burden of change on the individual teacher and should instead emphasize the shared, structural nature of the problem (Rankin, 2021). 9. Limitations and Directions for Future Research Several limitations of the present analysis should be noted. First, this article is a conceptual synthesis rather than a systematic review. While the literature cited is representative of recent work, the paper does not aim for the completeness of a formal review, and readers seeking exhaustive treatment of the empirical base should consult the systematic reviews and meta-analyses referenced above (Ormiston et al., 2022). Second, the framework proposed in Section 7 is derived from a combination of empirical findings, clinical practice traditions in adjacent fields, and reasoning from mechanism. It has not been tested in its integrated form. The strength of each element, taken separately, is documented, but the interaction effects among elements are not. Pilot implementation with formal evaluation is the essential next step. Third, the argument is drawn primarily from research conducted in North American, Australian, and European contexts. The conditions of teaching in low- and middle-income countries, in refugee-receiving communities in the Global South, and in schools operating in active conflict zones differ in important ways from those studied here. Elements of the framework may transfer; others may need substantial adaptation. Cross-national comparative research on educator STS is a significant gap in the current literature (Sokal et al., 2020). Fourth, the paper focuses on classroom teachers and does not fully address the STS burden carried by school counselors, paraprofessionals, bus drivers, food service staff, and administrators, all of whom form part of the caring adult ecosystem in a high-needs school. The framework should be understood as extending to these roles, but the specific evidence base for non-teaching staff is thinner. Fifth, the paper does not engage in depth with the intersection of STS and other conditions such as depression, generalized anxiety disorder, and post-traumatic stress disorder arising from the teacher's own personal history. In practice these conditions co-occur, and clinical response requires attention to their interaction. Future work should address diagnostic and intervention questions at this interface. Future research priorities include the following. Cluster-randomized trials of integrated structural interventions in high-needs schools should be prioritized. Longitudinal cohort studies of new teachers in high-exposure settings should be funded to establish dose-response relationships and identify protective factors. Cross-national comparative studies should examine the transferability of the proposed framework. Studies of specific subpopulations, including teachers of color, teachers in refugee-receiving schools, teachers in rural high-poverty areas, and early-career teachers, should refine the framework for context. Cost-effectiveness analyses should quantify the return on investment of structural supports, using outcomes such as teacher retention, student attainment, and reduced use of substitutes. Finally, work is needed on the specific policy mechanisms through which STS can be recognized in occupational health frameworks, including the design of workers' compensation approaches, sick leave policies, and licensure protections. 10. Conclusion Teaching in a high-needs school is one of the most emotionally demanding forms of paid work in contemporary society. The teachers who do it are asked to hold the classroom as a stable relational space for children whose lives outside the classroom are often marked by instability, loss, and fear. They do this work knowing that they are frequently the most reliable adult in the daily life of the child. They do it well, and they do it at a cost that has been quietly borne for decades. The literature on #secondary_traumatic_stress in educators has now accumulated to the point where the injury can no longer be treated as anecdotal or as a matter of individual resilience. It is a predictable outcome of a specific labor arrangement. It affects a large fraction of the teachers who serve the children with the greatest need. It is one of the drivers of the teacher attrition crisis, and it is one of the least discussed causes of the educational inequities we most publicly deplore. The response required is not another workshop. It is a coordinated structural change that treats the wellbeing of the teacher as a precondition, not an afterthought, of the wellbeing of the child. The seven-element framework proposed here, involving recognition, staffing, supervision, workload, training, peer support, and leadership accountability, is offered as a practical, evidence-informed starting point. Its full implementation will require budget lines, contract negotiations, and policy change. Its cost is not trivial. But the cost of continued inaction, measured in lost teachers, unfilled vacancies, and children who lose the caring adults they most needed, is higher. The #pedagogy_of_care is the moral center of teaching. Care is not an ornament on the profession; it is the profession. If we intend to sustain it, we must build the institutional scaffolding that makes it possible for teachers to care without being injured by their caring. That scaffolding is what this article has attempted to describe. It is what school systems, ministries, unions, and communities must now build. #teacher_wellbeing #trauma_informed_schools #educator_mental_health #emotional_labor_in_education #caring_professions #school_reform #teacher_retention #systemic_support #urban_education #refugee_education #equity_in_schools #compassion_satisfaction #reflective_practice #policy_and_practice #sustainable_teaching References Baker, C. N., Peele, H., Daniels, M., Saybe, M., Whalen, K., Overstreet, S., & The New Orleans Trauma-Informed Schools Learning Collaborative. (2021). The experience of COVID-19 and its impact on teachers' mental health, coping, and teaching. School Psychology Review, 50(4), 491 to 504. doi:10.1080/2372966X.2020.1855473 Brunzell, T., Stokes, H., & Waters, L. (2019). Shifting teacher practice in trauma-affected classrooms: Practice pedagogy strategies within a trauma-informed positive education model. Contemporary School Psychology, 23(2), 158 to 173. doi:10.1007/s40688-018-0208-8 Brunzell, T., Waters, L., & Stokes, H. (2021). Trauma-informed teacher wellbeing: Teacher reflections within trauma-informed positive education. Australian Journal of Teacher Education, 46(5), 91 to 107. doi:10.14221/ajte.2021v46n5.6 Christian-Brandt, A. S., Santacrose, D. E., & Barnett, M. L. (2020). In the trauma-informed care trenches: Teacher compassion satisfaction, secondary traumatic stress, burnout, and intent to leave education within underserved elementary schools. Child Abuse and Neglect, 110, 104437. doi:10.1016/j.chiabu.2020.104437 Herman, K. C., Sebastian, J., Reinke, W. M., & Huang, F. L. (2021). Individual and school predictors of teacher stress, coping, and wellness during the COVID-19 pandemic. School Psychology, 36(6), 483 to 493. doi:10.1037/spq0000456 Kim, S., Crooks, C. V., Bax, K., & Shokoohi, M. (2021). Impact of trauma-informed training and mindfulness-based social-emotional learning program on teacher attitudes and burnout: A mixed-methods study. School Mental Health, 13(1), 55 to 68. doi:10.1007/s12310-020-09406-6 Ormiston, H. E., Nygaard, M. A., & Apgar, S. (2022). A systematic review of secondary traumatic stress and compassion fatigue in teachers. School Mental Health, 14(4), 802 to 817. doi:10.1007/s12310-022-09525-2 Pressley, T. (2021). Factors contributing to teacher burnout during COVID-19. Educational Researcher, 50(5), 325 to 327. doi:10.3102/0013189X211004138 Rankin, J. G. (2021). First aid for teacher burnout: How you can find peace and success. Routledge. Simon, K., Petrovic, L., Baker, C., & Overstreet, S. (2022). An examination of the associations among teacher secondary traumatic stress, teacher-student relationship quality, and student socio-emotional functioning within trauma-impacted schools. School Mental Health, 14(2), 213 to 224. doi:10.1007/s12310-022-09502-9 Sokal, L., Trudel, L. E., & Babb, J. (2020). Canadian teachers' attitudes toward change, efficacy, and burnout during the COVID-19 pandemic. International Journal of Educational Research Open, 1, 100016. doi:10.1016/j.ijedro.2020.100016 Silard, A., & Dasborough, M. T. (2021). Beyond emotion regulation: Emotion routines and their impact on the workplace. Journal of Management Studies, 58(6), 1601 to 1633. Katsantonis, I. G. (2020). Factors associated with psychological well-being and stress: A cross-cultural perspective on psychological well-being and gender differences in a population of teachers. Pedagogical Research, 5(4), em0066. doi:10.29333/pr/8235 Miller, F. G., Chafouleas, S. M., Welsh, M. E., McCoach, D. B., & Riley-Tillman, T. C. (2020). Examining the technical adequacy of the Direct Behavior Rating single-item scales for assessing student behavior in early elementary classrooms. Assessment for Effective Intervention, 45(4), 243 to 253. doi:10.1177/1534508418799208 Berger, E., & Samuel, S. (2023). A qualitative analysis of educator experiences of trauma-informed practice in Australian schools. Australian Educational Researcher, 50(3), 861 to 877. doi:10.1007/s13384-022-00538-z

  • Predictive Analytics and the Self-Fulfilling Prophecy of Early Intervention Flags: A Critical Examination of Ethical Risks in University Early-Warning Dashboards

    Universities across the world now use #predictive_analytics systems that scan student data and produce "at-risk" flags long before formal assessment results appear. These #early_warning_dashboards are marketed as tools of care, designed to help instructors reach struggling students in time. This paper argues that, alongside these benefits, such systems carry a serious and under-examined ethical risk: they may quietly change how instructors think about, speak to, and teach the students they label. Drawing on classical work on the #self_fulfilling_prophecy, on newer studies of #learning_analytics ethics, and on wider debates about the #datafication of higher education, the paper explains how a colour-coded dashboard can lower pedagogical expectations in ways that harm the very students it claims to protect. The analysis maps four mechanisms of harm: reduced cognitive demand, softer feedback, narrower opportunity, and identity absorption. It then reviews possible safeguards, including flag transparency, instructor training, right-to-explanation policies, and human-in-the-loop design. The paper concludes that predictive systems in universities are never neutral instruments; they are pedagogical actors whose ethical weight must be treated as seriously as any other classroom intervention. Keywords: predictive analytics; early-warning systems; self-fulfilling prophecy; higher education ethics; algorithmic bias; teacher expectations; learning analytics. 1. Introduction Over the last decade, universities have quietly become some of the most data-rich institutions in modern life. Every click in a #learning_management_system, every swipe of a library card, every submitted assignment and missed lecture becomes a signal that can be stored, joined with other signals, and passed through a model. The output of that model, in most large universities, now reaches the desk of a lecturer or academic advisor in a familiar form: a coloured flag next to a student's name. Green means the student is fine. Amber means "watch." Red means "at risk." These #early_intervention_flags are presented as acts of institutional kindness. The story is intuitive and appealing: a student who is quietly slipping can be caught by the system before the semester ends, offered support, and rescued from failure. Vendors of #student_success platforms promise measurable gains in retention. University leaders point to dashboards during accreditation visits. Ministries of education fund the roll-out of such systems in the name of #equity and #student_wellbeing (Prinsloo & Slade, 2020; Williamson, 2021). Yet the ethical picture is more complicated than the sales brochures suggest. This paper focuses on a single, uncomfortable question that receives too little attention in the current literature. What happens inside an instructor's mind, and inside a classroom, once a red flag appears next to a student's name? Does the flag simply prompt a helpful email, or does it also, silently, change how that student is taught? The paper argues that #early_warning_dashboards can trigger a modern version of the #self_fulfilling_prophecy first described by Merton (1948) and later demonstrated experimentally by Rosenthal and Jacobson (1968). When instructors are told, before they have met a student, that this student is likely to fail, they may adjust their behaviour in small but cumulative ways. They may call on the flagged student less often, offer easier tasks, mark with lower expectations, or steer the student towards a "safer" but less ambitious pathway. Over a semester these micro-adjustments can produce exactly the failure the dashboard predicted, and the prediction is then celebrated as accurate. The model has trained the teacher, and the teacher has trained the student to fail. This is not a hypothetical concern. Empirical research on #teacher_expectations has repeatedly shown that expectations become behaviour, and behaviour becomes outcome (Wang et al., 2021; Papageorge et al., 2020). Meanwhile, critical scholarship on #algorithmic_bias has demonstrated that predictive models in higher education systematically over-flag students from minoritised, first-generation, and low-income backgrounds (Baker & Hawn, 2022; Kizilcec & Lee, 2022). When these two literatures are read together, a clear ethical risk emerges: dashboards may industrialise low expectations for the very students who already suffer most from them. The paper proceeds in seven parts. Section 2 reviews the rise of #predictive_analytics in universities and the pedagogical claims made for it. Section 3 revisits the classical theory of the self-fulfilling prophecy and the Pygmalion effect. Section 4 explains, in accessible terms, how modern #early_warning_systems produce their flags. Section 5 develops the paper's central argument, mapping four mechanisms through which flags can lower instructor expectations. Section 6 discusses ethical implications through the lenses of #fairness, #autonomy, #transparency, and #care. Section 7 offers practical recommendations for universities, instructors, and system designers. Section 8 concludes. 2. The Rise of Predictive Analytics in Higher Education 2.1 From institutional research to real-time dashboards Universities have collected data on their students for as long as universities have existed. Grade books, enrolment records, and financial aid files are, in a sense, old-fashioned datasets. What has changed in recent years is not the collection of data but its integration, speed, and predictive use (Selwyn, 2022). Modern student information systems, learning management platforms, library systems, and swipe-card readers now feed a single data warehouse. Machine-learning models are trained on the historical warehouse and are then applied to current students in near real time (Herodotou et al., 2020). The result is the #early_warning_dashboard: a single screen, usually shown to instructors and academic advisors, that summarises each student as a #risk_score or a colour. The dashboard is refreshed weekly, sometimes daily. Its users are asked to act on it. 2.2 The promise of care The public justification for these systems is care. Proponents argue that traditional university teaching is too slow to detect struggling students; by the time a first midterm is marked, the semester may already be lost (Sclater, 2022). Predictive analytics, they argue, offers an earlier and fairer signal, allowing instructors and support staff to reach students who might otherwise disappear from the institution without a trace. Several evaluation studies do report positive outcomes. Improvements in #retention, #course_completion, and reduced time-to-degree have been documented in some large implementations (Herodotou et al., 2020; Foster & Siddle, 2020). For students who receive well-designed, well-resourced human follow-up, the intervention can be genuinely helpful. 2.3 The quieter story Alongside these gains, however, a critical literature has grown. Scholars have raised concerns about the #privacy costs of pervasive data collection (Prinsloo & Slade, 2020), the opacity of the underlying models (Slade et al., 2023), and the risk that historical patterns of #inequality become baked into predictive outputs (Baker & Hawn, 2022). Marachi and Quill (2020) describe the shift as a move from teaching to #surveillance, in which students are constantly measured and sorted rather than educated. The specific concern of this paper sits inside that critical literature but focuses on a mechanism that has not received enough attention: the effect of the flag not on the student, but on the instructor who reads it. 3. The Self-Fulfilling Prophecy Revisited 3.1 Merton and the original definition The sociologist Robert K. Merton coined the term #self_fulfilling_prophecy in 1948. He defined it as a false definition of a situation that evokes a new behaviour, which then makes the false definition come true. The classic example was a rumour that a bank was failing; depositors, believing the rumour, withdrew their money, and the bank did indeed fail. The prophecy was wrong when made but became correct through the behaviour it produced. The idea has a long and productive life in the social sciences. It has been applied to labour markets, criminal justice, medical diagnosis, and, most importantly for this paper, classrooms. 3.2 Pygmalion in the Classroom The most famous educational application is the study by Rosenthal and Jacobson (1968), published as Pygmalion in the Classroom. Teachers were told, falsely, that certain randomly selected pupils were "intellectual bloomers" who would show unusual gains during the school year. At the end of the year, those pupils had, on average, gained more IQ points than their peers. The teachers had not been given true information about the children; they had been given an expectation, and the expectation had shaped their behaviour, which had shaped the children's development. Later research refined the finding. The Pygmalion effect is real, but not unlimited; it operates most strongly with younger children, with subtle and continuous forms of feedback, and where the teacher accepts the expectation as credible (de Boer et al., 2020). The reverse effect, sometimes called the #Golem_effect, is equally important: when teachers expect a student to do poorly, that student is more likely to do poorly. 3.3 Contemporary evidence on teacher expectations Recent studies have replicated and extended the Pygmalion literature in more rigorous designs. Papageorge et al. (2020) analysed a large longitudinal dataset in the United States and found that when two teachers rated the same Black student, the teacher with lower expectations was strongly predictive of that student's later educational attainment. The expectation preceded, and in part caused, the outcome. Similar effects have been shown in European samples (Wang et al., 2021) and in higher education (Boring & Ottoboni, 2020). The mechanism is not usually conscious bias. Instructors do not decide to expect less of a student. Instead, they display subtle #behavioural_cues: less eye contact, shorter wait time after a question, fewer challenging follow-ups, softer marking, and lower rates of nomination for opportunities such as research projects or scholarships (Gershenson et al., 2021). Students read these cues quickly and adjust their own behaviour in response. This is the psychological terrain into which the modern #early_warning_dashboard steps. 4. How Early-Warning Systems Actually Work To evaluate the ethical risks, it helps to understand, in plain terms, how these systems generate their flags. 4.1 Inputs Most #early_warning_systems combine four broad categories of data: Demographic and admission data: age, first-generation status, entry qualifications, financial aid category. Behavioural data from the learning management system: logins, time on task, submission timing, forum posts. Academic performance data: early quiz scores, assignment marks, attendance. Institutional data: major, housing status, use of campus services (Foster & Siddle, 2020). Some vendors also add data from library systems, card readers, or even Wi-Fi connections (Marachi & Quill, 2020). 4.2 The model The model is usually a form of #supervised_learning, trained on data from previous cohorts. The target variable is typically failure or withdrawal in a course or a program. The model learns which combinations of input variables were associated with that outcome in the past, and applies the same rule to current students (Herodotou et al., 2020). Two features of this design are ethically important. First, the model is #retrospective: it assumes that what predicted failure in the past will predict failure in the future. If past cohorts of first-generation students failed at higher rates because they were badly supported, the model will simply predict that current first-generation students will fail, without asking why (Kizilcec & Lee, 2022). Second, the model is #probabilistic. It produces a number between zero and one, or a category such as green/amber/red. It does not, and cannot, say whether a specific student will succeed. 4.3 The interface For the instructor, all of this technical work is compressed into a very small visual element: a coloured icon, a percentage, or a short phrase such as "high risk." This is the moment at which the model meets the classroom, and it is the moment at which the ethical trouble begins. Cognitive research on decision-making shows that #simplified_visual_cues have powerful anchoring effects. A red flag near a name is not read as "a statistical estimate based on partial data"; it is read as "this student is a problem" (Selwyn, 2022). The technical caveats built into the model are lost in translation. 5. The Self-Fulfilling Prophecy Goes Digital This section develops the paper's central argument in detail. It identifies four mechanisms through which #early_intervention_flags can lower instructor expectations and, through them, student outcomes. 5.1 Mechanism one: reduced cognitive demand The first and most direct mechanism is a reduction in the intellectual difficulty of tasks offered to flagged students. Instructors, believing that a student is at risk, may simplify explanations, offer more scaffolded assignments, or exempt the student from the more demanding parts of a course. On the surface this looks like kindness. In effect it is a #demand_reduction that deprives the student of the very challenge they need in order to grow. Research on #productive_struggle shows that learning occurs at the edge of what a student can do, not comfortably inside it (Kapur, 2023). If an instructor systematically pulls tasks back from that edge for flagged students, those students will develop weaker skills, confirming the dashboard's prediction. 5.2 Mechanism two: softer feedback The second mechanism operates through feedback. Instructors give feedback that is calibrated to what they expect a student to be able to handle. A student expected to succeed is more likely to receive detailed, critical, and pointed comments; a student expected to fail is more likely to receive vague reassurance, in a well-intentioned effort to protect their #self_esteem (Deiglmayr et al., 2021). Detailed critical feedback is, however, the strongest predictor of writing and problem-solving improvement in the higher education literature (Winstone & Boud, 2022). Softer feedback offered to flagged students, in the name of care, may deprive them of the corrective information they need in order to improve. 5.3 Mechanism three: narrower opportunity The third mechanism concerns opportunity. Instructors decide, often informally, which students to recommend for research assistantships, competitive scholarships, honours programs, letters of reference, and mentoring. These decisions are strongly influenced by the instructor's mental picture of the student's potential (Papageorge et al., 2020). When that mental picture is contaminated by an early red flag, flagged students are less likely to be nominated for stretch opportunities, even if they later begin to perform well. The dashboard, in effect, becomes a filter that narrows the future the instructor can imagine for the student. This mechanism is particularly harmful because it operates outside any formal assessment and is therefore invisible to institutional audit (Slade et al., 2023). 5.4 Mechanism four: identity absorption The fourth mechanism sits inside the student. Students often learn how they are seen through the small #interactional_signals discussed in Section 3.3. If a student senses, over weeks, that their instructor sees them as struggling, they may begin to see themselves that way. This process is well established in the literature on #stereotype_threat and #academic_identity (Steele, 2020; Murphy et al., 2020). The dashboard need not be shown to the student for this to happen. It is enough that the instructor's expectations, shaped by the dashboard, leak into the classroom through tone, gaze, questioning, and marking. Over a semester, the student absorbs a version of themselves that mirrors the dashboard's original prediction, and behaves accordingly. 5.5 Bringing the four mechanisms together Taken alone, each mechanism produces a small effect. Together, they can be powerful. A student who is offered easier work, given softer feedback, denied stretch opportunities, and taught to see themselves as at risk is very likely to underperform. When the semester ends and that student fails or withdraws, the dashboard's original prediction is confirmed, and the model is retrained on the very outcome its prediction helped produce. This is the #algorithmic_self_fulfilling_prophecy: a feedback loop in which the model and the classroom continually train one another to lower expectations. Kizilcec and Lee (2022) note that fairness audits of #student_success models tend to check whether flags predict outcomes accurately. They rarely check whether the flags cause the outcomes. This is a serious methodological gap, and it is the gap this paper seeks to open. 6. Ethical Implications 6.1 Fairness The first ethical implication concerns #fairness. Predictive models are trained on historical data, and historical data reflects historical inequalities. Students from minoritised racial groups, first-generation students, and low-income students have, in most systems, been over-represented among those who failed or withdrew in the past, often for reasons that had little to do with individual ability and much to do with institutional design (Baker & Hawn, 2022). A model trained on those patterns will over-flag current students who share those characteristics. If flagging then triggers the four mechanisms described in Section 5, these same students will be exposed to lower expectations, softer feedback, and narrower opportunity. The system, marketed as a tool for equity, becomes a tool for the mechanised reproduction of inequality (Prinsloo & Slade, 2020). 6.2 Autonomy The second implication concerns #autonomy. Autonomy, in ethical theory, is the capacity to shape one's own life through one's own choices. Students who are labelled by a system they cannot see, on the basis of data they did not know was being collected, and treated accordingly by instructors who never explain the label, have their autonomy quietly reduced (Slade et al., 2023). Instructors, too, lose autonomy. A well-designed dashboard is difficult to argue with. An instructor who wishes to hold high expectations for a red-flagged student must actively resist the system's cue, without ever having a clear counter-signal. The dashboard becomes a small but constant pressure toward conformity with the model's view of the student (Selwyn, 2022). 6.3 Transparency The third implication concerns #transparency. Most #predictive_analytics systems in universities are procured from external vendors and treated as commercial secrets. Neither instructors nor students can inspect the model, the input variables, or the weightings (Sclater, 2022). This is inconsistent with basic principles of academic openness and, in many jurisdictions, with data-protection law such as the #GDPR, which grants individuals a limited right to information about automated decisions concerning them (Zeide, 2022). Without transparency, students cannot challenge their flags, instructors cannot judge whether the flag is credible, and institutions cannot audit the system for bias. The ethical situation is one of hidden authority. 6.4 Care The fourth implication concerns #care. Universities describe #early_warning_dashboards as an expression of care for students, and it is important to take that intention seriously. Care, however, is not simply the delivery of help. It is a relationship in which the person being cared for is treated as a full agent, seen accurately, and helped to grow (Noddings, 2020). A dashboard that reduces a student to a colour cannot, on its own, express care. It can, at best, be one input to a caring relationship. When the dashboard is treated as sufficient, or when its label replaces the instructor's own perception of the student, care collapses into #management. The student is processed rather than met (Marachi & Quill, 2020). 7. Illustrative Scenarios To make the abstract argument concrete, this section offers three illustrative scenarios drawn from the empirical literature and from institutional reports. They are composite examples, not case studies of real individuals. 7.1 Scenario one: the quiet reclassification In a large first-year sociology course, an instructor consults the dashboard in week three. Four students are flagged red. The instructor, who has more than two hundred students, sends the standard supportive email to each, drops in an offer of office hours, and moves on. Over the semester, the instructor unconsciously grades essays from the four red-flagged students with slightly softer expectations, offers them slightly less pointed feedback, and does not nominate any of them for the departmental essay prize. Three of the four pass with mediocre grades. One withdraws. The dashboard records this as a successful intervention. The instructor's actions were kind at every step. The overall effect was to depress the ceiling of what those students could achieve. This is the mechanism described in Section 5, operating quietly and at scale (Papageorge et al., 2020; Winstone & Boud, 2022). 7.2 Scenario two: the confirmatory loop A machine-learning model in a mid-sized university is trained on five years of retrospective data. First-generation students are heavily represented in the historical failure set. The model, which is technically working as designed, produces amber and red flags for first-generation students at roughly twice the rate of other students. Instructors act on the flags. First-generation students receive extra interventions, but also, on average, softer marking and fewer stretch opportunities. Their pass rate remains lower than that of their peers. The next model retraining uses the new data. Because first-generation students continue to underperform, the model continues to flag them at high rates. The loop is closed. The system has not caused inequality on its own; it has amplified an existing inequality and made it appear objective (Baker & Hawn, 2022; Kizilcec & Lee, 2022). 7.3 Scenario three: the invisible reference letter A senior lecturer is asked to write a reference letter for a graduate program. She consults her memory of the applicant, who has done consistently good work in her class. She also, without quite noticing, remembers the amber flag that appeared next to the student's name in the second week of the semester. The letter she writes is warm but slightly less enthusiastic than the letter she would have written for a student without that flag. The student is not accepted into the graduate program. Neither the student nor the graduate program knows that a dashboard from three years earlier played a role in that decision. This is the long tail of the #algorithmic_self_fulfilling_prophecy (Slade et al., 2023). 8. Counter-Arguments and Their Limits An honest treatment must consider the strongest arguments against the position taken in this paper. 8.1 "Flags improve outcomes on average" Some evaluations of early-warning systems report improved retention and completion (Herodotou et al., 2020; Foster & Siddle, 2020). The argument is that these average gains outweigh the risks discussed above. This argument is partly right, but it has three limits. First, average gains can hide distributional losses; if flagged students on the margin do better while high-potential flagged students are pushed toward mediocrity, the average may improve while some students are harmed. Second, evaluations rarely measure the four mechanisms of Section 5 directly; they measure short-term retention rather than long-term achievement. Third, the counter-factual is often unclear, since institutions rarely run randomised trials in which some flagged students receive no intervention (Sclater, 2022). 8.2 "Instructors are professionals; they will not be biased by a flag" A second counter-argument holds that instructors are trained professionals who can override a dashboard cue. The literature on #anchoring_bias and #expectation_effects suggests otherwise. Even highly trained professionals, including doctors and judges, adjust their behaviour in response to small numerical cues (Kahneman et al., 2021). There is no reason to believe instructors are immune. Indeed, because instructors are often under time pressure and teaching large classes, they may rely on the cue more, not less, than professionals in other domains (Selwyn, 2022). 8.3 "Without flags, struggling students are invisible" A third counter-argument points out that before dashboards, many struggling students went undetected until it was too late. This is a real concern and cannot be dismissed. However, the alternative to dashboards is not blindness; it is the range of relational and pedagogical practices, such as small-group teaching, early formative assessment, and structured mentoring, that have long been known to support students (Winstone & Boud, 2022). Dashboards are a substitute for these practices only when institutions have already decided not to invest in them. 9. Recommendations The paper's argument is not that universities should abandon #predictive_analytics. Data can, in principle, help. The argument is that the design and use of #early_warning_dashboards must take seriously the risk of a digitally amplified #self_fulfilling_prophecy. The following recommendations describe how that might be done. 9.1 Design the flag differently First, the visual and textual design of the flag should be reconsidered. Rather than a colour that reads as a verdict, the flag could be presented as a small paragraph that describes the specific signals behind it and the model's confidence. A statement such as "This student has missed two of five expected LMS log-ins; the model's confidence in the associated risk is low" is far less anchoring than a red icon (Selwyn, 2022; Sclater, 2022). 9.2 Train instructors in expectancy effects Second, instructors who use dashboards should receive training in the psychology of #teacher_expectations and #anchoring_bias. The training should be explicit: it should tell instructors that flags can change their behaviour in ways they will not notice, and offer concrete strategies, such as wait-time discipline, standardised feedback rubrics, and blind marking, that reduce that risk (Deiglmayr et al., 2021; de Boer et al., 2020). 9.3 Give students a right to know and to challenge Third, students should be told when they are being flagged, on what basis, and by whom. They should have a genuine right to see the data, contest inaccuracies, and, where appropriate, opt out. This is consistent with existing data-protection frameworks and with basic norms of academic dignity (Zeide, 2022; Prinsloo & Slade, 2020). 9.4 Audit for causal effects, not just accuracy Fourth, institutions should audit their #early_warning_systems not only for predictive accuracy but for causal effects on instructor behaviour and student outcomes. Randomised or quasi-experimental designs in which flags are withheld from some instructors, with appropriate ethical review, would allow institutions to measure whether flagging itself produces harm (Kizilcec & Lee, 2022; Baker & Hawn, 2022). 9.5 Keep humans in the loop, and keep them accountable Fifth, no decision affecting a student should be made by a dashboard alone. Instructors and advisors should be required to record the reason for any action taken, and institutions should audit whether flagged students receive the same range of opportunities, feedback quality, and academic challenge as unflagged students. #Human_in_the_loop design must be more than a slogan (Slade et al., 2023). 9.6 Invest in the relational practices dashboards were meant to replace Sixth, and most importantly, universities should invest in the relational practices that made #early_warning_dashboards feel necessary in the first place: smaller classes, more contact time, structured mentoring, and formative assessment. A dashboard cannot substitute for a teacher who knows a student's name; at best, it can support one (Noddings, 2020; Winstone & Boud, 2022). 10. Discussion This paper has tried to hold two ideas together at once. The first is that #early_warning_dashboards are, for many universities, a genuine attempt to help students, and that they sometimes succeed. The second is that these dashboards carry a specific and under-examined ethical risk: they can lower instructor expectations for flagged students, and, through the mechanisms of the #self_fulfilling_prophecy, produce the very failures they predict. Neither idea cancels the other. Systems can be helpful and harmful at the same time, in different dimensions and for different students. The task of ethical scholarship is not to reject the tools but to understand them well enough to design and deploy them responsibly. Three broader points deserve emphasis. First, the shift from formative assessment by teachers to #predictive_scoring by systems changes what teaching is. Teaching, at its best, is an act of imagination in which an instructor sees more in a student than the student has yet shown. Dashboards, by their nature, see only what has already been recorded (Selwyn, 2022). If they replace, rather than supplement, the instructor's imagination, they diminish the pedagogical relationship. Second, the ethical evaluation of educational technology should not stop at questions of privacy and consent. Those questions are important, but they are downstream of a deeper question: what kind of teacher, and what kind of student, does the technology invite into being (Williamson, 2021)? A dashboard that invites the teacher to see a student as a risk score, and the student to see themselves as a colour, is doing pedagogical work regardless of whether it complies with data-protection law. Third, the answer to a bad predictive system is not, in general, a better predictive system. The answer, in many cases, is to remember what predictive systems are for and to keep them in their proper place. They are one input among many, offered to a professional whose judgement, imagination, and care are still the heart of the enterprise (Noddings, 2020). 11. Conclusion University #early_warning_dashboards are among the most consequential educational technologies of the current decade. They promise to catch struggling students in time, and they sometimes do. But they also carry a specific ethical risk that has been largely absent from institutional debate: they can quietly lower pedagogical expectations for flagged students, through four mechanisms of reduced demand, softer feedback, narrower opportunity, and identity absorption. Over a semester, these mechanisms can produce the very failures the dashboard predicted, closing a feedback loop between model and classroom that reproduces inequality in the name of care. The way forward is not to abandon #learning_analytics but to treat it with the ethical seriousness that its power now demands. That means redesigning flags to be less anchoring, training instructors in expectancy effects, giving students meaningful rights over their data, auditing systems for causal harm as well as predictive accuracy, and, above all, investing in the human practices of teaching that dashboards were only ever meant to support. The stakes are not merely technical. They are pedagogical, and they are ethical. A university that lets a colour on a screen decide who can learn is no longer, in any meaningful sense, teaching. References Baker, R. S., & Hawn, A. (2022). Algorithmic bias in education. International Journal of Artificial Intelligence in Education, 32(4), 1052 to 1092. Boring, A., & Ottoboni, K. (2020). Student evaluations of teaching are not only unreliable but also biased against female instructors. Studies in Higher Education, 45(11), 2201 to 2216. de Boer, H., Timmermans, A. C., & van der Werf, M. P. C. (2020). The effects of teacher expectation interventions on teachers' expectations and student achievement. Educational Research and Evaluation, 26(3 to 4), 180 to 200. Deiglmayr, A., Stern, E., & Schubert, R. (2021). Beliefs, self-concepts, and expectations: Understanding academic feedback effects in higher education. Learning and Instruction, 72, 101 to 116. Foster, C., & Siddle, R. (2020). The effectiveness of learning analytics for identifying at-risk students in higher education. Assessment and Evaluation in Higher Education, 45(6), 842 to 854. Gershenson, S., Hansen, M. J., & Lindsay, C. A. (2021). Teacher diversity and student outcomes: The role of teacher expectations. Harvard Education Press. Herodotou, C., Rienties, B., Boroowa, A., Zdrahal, Z., & Hlosta, M. (2020). A large-scale implementation of predictive learning analytics in higher education. British Journal of Educational Technology, 51(4), 1002 to 1018. Kahneman, D., Sibony, O., & Sunstein, C. R. (2021). Noise: A flaw in human judgment. Little, Brown Spark. Kapur, M. (2023). Productive failure: Unlocking deeper learning through the science of failing. Jossey-Bass. Kizilcec, R. F., & Lee, H. (2022). Algorithmic fairness in education. In W. Holmes and K. Porayska-Pomsta (Eds.), The ethics of artificial intelligence in education (pp. 174 to 202). Routledge. Marachi, R., & Quill, L. (2020). The case of Canvas: Longitudinal datafication through learning management systems. Teaching in Higher Education, 25(4), 418 to 434. Merton, R. K. (1948). The self-fulfilling prophecy. The Antioch Review, 8(2), 193 to 210. Murphy, M. C., Gopalan, M., Carter, E. R., Emerson, K. T. U., Bottoms, B. L., & Walton, G. M. (2020). A customized belonging intervention improves retention of socially disadvantaged students at a broad-access university. Science Advances, 6(29), 1 to 10. Noddings, N. (2020). Care ethics and education: A relational approach (revised edition). Teachers College Press. Papageorge, N. W., Gershenson, S., & Kang, K. M. (2020). Teacher expectations matter. Review of Economics and Statistics, 102(2), 234 to 251. Prinsloo, P., & Slade, S. (2020). Big data, higher education and learning analytics: Beyond justice, toward an ethics of care. In B. K. Daniel (Ed.), Big data and learning analytics in higher education (pp. 109 to 124). Springer. Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the classroom: Teacher expectation and pupils' intellectual development. Holt, Rinehart and Winston. Sclater, N. (2022). Learning analytics explained (2nd ed.). Routledge. Selwyn, N. (2022). Education and technology: Key issues and debates (3rd ed.). Bloomsbury Academic. Slade, S., Prinsloo, P., & Khalil, M. (2023). Learning analytics at the intersections of student trust, disclosure and benefit. Journal of Learning Analytics, 10(1), 1 to 15. Steele, C. M. (2020). Whistling Vivaldi and other clues to how stereotypes affect us (revised edition). W. W. Norton. Wang, S., Rubie-Davies, C. M., & Meissel, K. (2021). A systematic review of the teacher expectation literature over the past 30 years. Educational Research and Evaluation, 27(1 to 2), 1 to 30. Williamson, B. (2021). Big data in education: The digital future of learning, policy and practice. Sage. Winstone, N., & Boud, D. (2022). The need to disentangle assessment and feedback in higher education. Studies in Higher Education, 47(3), 656 to 667. Zeide, E. (2022). Robot teaching, pedagogy, and policy. In M. D. Dubber, F. Pasquale, and S. Das (Eds.), The Oxford handbook of ethics of AI (pp. 789 to 803). Oxford University Press. #predictive_analytics #early_warning_systems #self_fulfilling_prophecy #higher_education_ethics #learning_analytics #teacher_expectations #algorithmic_bias #Pygmalion_effect #datafication #student_success #academic_integrity #educational_technology #critical_pedagogy #student_wellbeing #ethical_AI_in_education

  • Embodied Cognition in STEM: The Role of Physical Movement in Abstract Concept Acquisition

    For a long time, science and mathematics classrooms have been shaped by a #desk_bound model of learning. Students sit still, watch the board, listen to the teacher, and try to build ideas about very abstract things using only their eyes and ears. This paper looks at a different picture. It draws on the field of #embodied_cognition, which argues that the body is not a passive carrier of the brain but an active part of thinking. The paper asks a clear question: does physical movement, gesture, and #kinesthetic_learning actually help students remember and understand hard ideas in mathematics and physics? To answer this, the paper reviews recent studies on gesture in math class, on kinesthetic activities in physics, on #physical_modeling with the whole body, and on mixed reality tools that let learners move to think. The evidence suggests that when body action is well aligned with a target concept, students form stronger mental representations, remember more over time, and transfer knowledge to new problems more easily. However, the effect is not automatic. Movement helps most when it matches the structure of the concept, when it does not overload the learner, and when a teacher guides the shift from action to symbol. The paper ends by outlining what an embodied STEM classroom can look like in practice, and where research still needs to go. #retention and #transfer are treated as central outcomes, not as afterthoughts. Keywords: embodied cognition; STEM education; kinesthetic learning; physical modeling; gesture; abstract concepts; mathematics; physics; retention; transfer 1. Introduction Walk into a typical secondary school classroom during a #mathematics or #physics lesson and you will see a familiar scene. A teacher writes symbols on the board. Students copy them into notebooks. A few students answer questions. Most stay quiet. Bodies barely move. This scene has been so common for so long that many people assume it is the natural shape of learning. In fact, it is a choice, and it is a choice with a cost. The choice comes from an older view of the mind, sometimes called the #amodal or Cartesian view. In that view, thinking happens inside the head, more or less separate from the body. The body only matters when it is time to write or speak an answer. Learning, on this view, is mainly about pushing symbols and rules into memory. Movement is a distraction. Sitting still is a sign of focus. Over the past two decades, this picture has been challenged by a large body of work in cognitive science and #education_research. This work argues that human thought is grounded in the body, the senses, and action. Concepts are not free floating symbols. They are built out of experiences of moving, touching, seeing, and acting in the world. This view is called #embodied_cognition. It has strong implications for how we should teach hard subjects like algebra, geometry, calculus, mechanics, and electromagnetism, because these subjects are exactly the ones that seem, on the surface, to be the most abstract. Macrine and Fugate note that embodied theories reject the idea of the body as a passive observer of brain functions and argue instead that perceptual, sensorimotor, and multisensory experiences shape how knowledge is acquired across the lifespan. This shift matters for STEM because these fields depend on ideas that are, in themselves, invisible: a limit, a vector, a force, a field, a proof by induction. If concepts are grounded in bodily experience, then teaching them without the body may be like teaching a foreign language without any sounds. This paper has three goals. First, it presents the theory of #embodied_cognition in a way that is accessible to students and teachers, without heavy jargon. Second, it reviews recent empirical work on how #gesture, #kinesthetic_learning, and #physical_modeling affect the learning and #retention of mathematical and physical concepts. Third, it argues that the traditional #desk_bound model is not neutral. It quietly favors students who already have strong verbal and symbolic skills, while making life harder for many others. A more embodied classroom is not a fun add-on; it is a serious pedagogical choice with measurable effects on how students think. The paper is written for readers who want a solid overview without wading through hundreds of studies on their own. It draws on very recent work, including integrative reviews, quasi-experiments, and design-based studies from the past five years, and connects them to older foundational ideas where necessary. #stem_education is the anchor throughout: the question is not whether the body matters in general, but whether it matters for the specific job of learning hard science and mathematics. 2. What Embodied Cognition Really Claims Before going further, it helps to be clear about what embodied cognition does and does not say. The phrase can be used in loose ways, and confusion about the theory has led to confusion about the classroom. At its core, embodied cognition claims that mental representations of concepts include experiential parts. Friedrich and colleagues describe grounded cognition as the position that mental representations of concepts consist of experiential aspects, so that the concept of a cup, for example, includes the sensorimotor experiences of interacting with cups. The typical modalities in which concepts are grounded include the #sensorimotor system, emotion, action, language, and social aspects, and recent work argues for adding physical invariants such as gravity, momentum, and friction to that list. This is different from the classical amodal view, in which concepts are stored as abstract symbols that have been stripped of the original sensory experience. On the amodal view, when you think about a triangle, the brain retrieves an abstract token labeled TRIANGLE that has no visual or motor content. On the embodied view, thinking about a triangle partly re-enacts the perceptual and motor experiences of seeing and drawing triangles. There are several branches within embodied cognition, and a recent integrative review by Kougioumtzis identifies at least four partially overlapping traditions: grounded or simulation-based, enactive or ecological, extended or embedded, and pluralist versions of the theory. Each tradition emphasizes different mechanisms, but they share the same starting point: cognition cannot be fully understood without the body and the environment. For education, the most important claim is this: if concepts are grounded in bodily experience, then experience should be part of instruction. #action and #perception are not just outputs of thinking. They are ingredients of thinking. That said, embodied cognition is not the claim that any movement helps any learning. This distinction is central and often missed. The same review notes that embodied approaches yield moderate positive effects on learning performance but with substantial heterogeneity, and that key moderators include learner age, subject domain, embodiment type, degree of bodily engagement, and the functional alignment between bodily activity and learning goals. In simple terms: it is not enough to get students moving. The movement has to fit the concept. Kang, Aguilar, and Kim make the same point in a recent study of Hispanic elementary students learning physics. They note that while active learning can improve outcomes, not all movement enhances understanding, and that purposeful, technology integrated, and culturally responsive embodied activities are what actually raise conceptual understanding and academic achievement. This distinction protects the theory from being reduced to a slogan about active classrooms. Another important claim, from Zhang, Son, and Stigler, is that the benefits of embodied pedagogy depend on where the learner is in the development of expertise. Their cognitive developmental theory proposes that activities involving direct sensorimotor engagement are effective for learners earlier in their development of expertise, whereas simply observing such sensorimotor activities can be more helpful for more advanced learners, and that experts may be able to self activate concepts grounded in sensorimotor actions just by imagining such actions. This so called Perform First Hypothesis has real consequences for how we sequence #kinesthetic_activities in a curriculum: novices need to move, while experienced students can gain from mental simulation. 3. The Problem with the Desk-Bound Model To see why an embodied view matters, it helps to look honestly at the current default. The #desk_bound model of instruction, in which students sit still and receive symbols through eye and ear, is not the result of careful pedagogical design. It grew out of industrial era schooling, cheap furniture, tight rooms, and a view of the mind that treated the body as noise. #Ryabko and Syzon describe the deeper problem clearly. They argue that contemporary physical education overrelies on visual methods, which leads to a neglect of critically important #kinesthetic mental images that are necessary for deep learning of the natural sciences, and physics in particular. That claim can sound abstract. Consider a concrete example. When a physics student first meets Newton second law, they see F = ma. If the only route into this equation is symbolic, the student has to build meaning for F, m, and a from words on a page. The equation, for many students, becomes a rule to memorize rather than a description of a felt reality. Yet almost every learner has, in their own body, thousands of experiences of pushing heavy objects and pushing light ones, of #effort_resistance_motion in Ryabko and Syzon's phrase. The desk bound model leaves that vast library of bodily knowledge unused. The same problem shows up in mathematics. Continuity is one of the most fundamental concepts in mathematics, formally defined in terms of abstract symbols and operations, and this representation is extremely abstract or dis embodied, which is why it is difficult for students to acquire a clear understanding of it. Khatin-Zadeh and colleagues argue that transforming continuity into a strongly embodied #motion_based representation lets learners employ a wider range of sensorimotor networks and grasp what the symbols are really about. The cost of the desk bound model is not evenly shared. Students who have strong verbal working memory and are comfortable with symbolic notation can survive it. Students who think more visually, or who need to feel a system before they can reason about it, are told, implicitly, that their preferred way in is not welcome. Over years, this affects who ends up feeling that math and science are for them. Reducing #stem_dropout rates and widening participation is not only a question of role models or funding. It is also a question of the sensory bandwidth of instruction. A further cost is retention. When knowledge is encoded through only one channel, it tends to fade. When it is encoded through many channels, including motor memory, it is more likely to stick. Wilcox, Pollock, and Bolton found that conceptual understanding gained in an interactive introductory mechanics course was almost entirely retained during the gap between Physics I and Physics II. Interactive engagement, in which students do more than watch, is not just easier to enjoy. It leaves deeper traces. 4. Gesture as a Bridge Between Body and Symbol If the body plays a role in thought, then #gesture is one of the clearest windows into that role. Gestures are small movements, usually of the hands, that appear alongside speech. They are not decorations. Cognitive scientists now treat them as part of thinking. Seccia and Goldin-Meadow, in a recent article for Philosophical Transactions of the Royal Society B, argue that the gestures we produce affect our communication, guide our attention, and help us think and change the way we think, and that gestures can consequently help us learn, generalize what we learn, and retain that knowledge over time. In mathematics education specifically, the effects of gesture based instruction have been well studied, though few of these studies were directly applicable to classroom environments in their original form. Gordon and Ramani, working on children's mathematical environments, describe how children's self produced gestures can reveal unique math relevant knowledge that is not contained in their speech, and that these gestures can assist with math learning and problem solving by supporting cognitive processes such as executive function. When a child cannot yet say that two groups of blocks are equal, the child may still show with the hands that the two groups have the same size. That gesture is a form of knowing. The reverse is also true. Children are better able to learn, retain, and generalize knowledge about math when that information is presented within the gestures that accompany an instructor's speech. In other words, a teacher who gestures while explaining is not merely being expressive. The teacher is loading information into a second channel, and students take it up. Khatin-Zadeh, Eskandari, and Farsani extend this idea to metaphor. When a new mathematical idea is presented to students in terms of abstract mathematical symbols, they may have difficulty grasping it, because abstract symbols do not directly refer to concretely perceivable objects, but when the same content is presented in the form of a graph or a gesture that depicts that graph it is often much easier to grasp. Transforming a problem into a graphical representation is a common problem solving strategy that can be viewed as a kind of #mathematical_metaphor, because the mathematical problem is described in terms of a visual representation of the problem, and since graphical representations are visual they can be depicted by gestures, so that visual and motor systems become actively employed to process a problem and find a solution. Gesture is therefore not only a way to communicate an idea after the fact. It is a way to think an idea through in the first place. In the same team's later work, they used data from the Lancaster Sensorimotor Norms to argue that high degrees of perceptual and action effector strength of the base domains of mathematical metaphors play an important role in the grounding of abstract mathematical concepts in the physical environment. Metaphors like a function increases, a curve turns, and a graph flattens out are not just style. They import motor content into abstract thought. For teachers, the practical takeaway is that #teacher_gesture matters. Instructors who point, trace, sweep, and mimic while they speak are giving students more to grip. Instructors who stand still and speak only in symbols leave students holding fewer tools. 5. Kinesthetic Learning in Mathematics Gesture is the small end of the body in learning. #Kinesthetic_learning is the larger end. Here the whole body is used to enact, model, or represent a concept. Students walk out a coordinate plane, form a human bar chart, or use their arms to stand for the axes of a graph. In physics classes, they might spin to feel angular momentum, or walk in different patterns to build a velocity time graph. Recent research on the volume of a sphere shows the potential of this approach. Wang studied children learning about the volume of a sphere in an augmented paper based mathematics environment and analyzed the gestures they used to construct meaning about the concept. Although volume is a classic abstract concept, the students' hands revealed how they built up the meaning through actions of enclosing, tracing, and cupping. In Wang's design, the paper environment and the augmented layer worked together with the students' movements. The concept did not sit in the paper or in the digital layer or in the students' minds. It emerged in the interaction between them. De Freitas, Ferrara, and Ferrari discuss the way #mathematical_concepts are assembled through trans individual coordinated movements, and how affect and sympathy in a group of learners become part of that assembly. Their view emphasizes that math learning is not only inside a single body but distributed across bodies moving together. When a class of learners synchronizes movements to represent a mathematical relationship, they are, in a real sense, thinking together. Even fields that seem far from bodily experience, like #linear_algebra, have been analyzed through the lens of motion. Drumea proposed that an abstract algebraic group can be defined in a Hilbert space to describe an individual performing a specific movement, with an abstract representation in the form of an algebraic group with a defined topology, developed through an example case of a sport with elements of martial arts and gymnastics called tricking. The point is not that every student of linear algebra needs to know tricking. The point is that even the most abstract structures can be given motional interpretations, and that this can serve as a bridge for learners who need one. The relationship between physical activity and mathematics extends beyond specific classroom activities. Thur, Kertesz, and Fugedi compared sports focused schools with general curriculum schools among Hungarian sixth and eighth graders. Sports school students demonstrated significantly better physical performance and mathematics scores, and a six week locomotor intervention using a ladder based device in mathematics lessons yielded significant improvements in specific geometry skills such as spatial orientation and mirroring for students with special educational needs. Their conclusion is that general physical fitness and targeted coordinative movement interventions each bring distinct benefits to cognitive and academic aspects of mathematics. That double result is important. It suggests that #physical_movement supports mathematical thinking in at least two ways. Broadly, being active seems to support cognition in general. Specifically, movements that are shaped to match a mathematical structure, like spatial orientation or mirroring, boost the particular skills they are designed to target. A well designed embodied curriculum uses both. It does not just add gym time. It builds motor experiences that echo the concepts to be learned. For teachers, this implies that time spent moving is not time stolen from mathematics. Done well, it is mathematics. A student who steps out an interval on a number line is not preparing to do the math. The student is doing the math. Abrahamson, whose work on #embodied_design has shaped this field over more than a decade, writes that design environments can bring forth mathematical perceptions when they invite specific coordinated actions that later become signified as target content. In Abrahamson and colleagues' vision of the future of embodied design for mathematics teaching and learning, the paradigm promotes theorizations of cognitive activity as grounded, or even constituted, in goal oriented multimodal sensorimotor phenomenology, and conceptual learning could emanate from, or be triggered by, experiences of enacting or witnessing particular movement forms even before these movements are explicitly signified as illustrating target content. New learning environments are being explored that use interactive technologies to foster student enactment of conceptually oriented movement forms and only then formalize these gestures and actions in disciplinary formats and language. This sequence, first movement then symbol, is at the heart of embodied mathematics teaching. In a traditional lesson, the symbol comes first and any activity, if it exists at all, is meant to illustrate it. In an embodied lesson, a well chosen activity comes first, and the symbol is introduced later, as a compact way to name what students have already felt. 6. Physical Modeling in Physics Physics is a natural home for embodied approaches, because its ideas are literally about bodies in motion. Hiscott, in a short reflection on physics teaching, notes that physics, with its emphasis on what we observe physically happening in the three dimensional world, probably has more scope for kinaesthetic learning than most subjects. Ryabko and Syzon's recent quasi experimental study offers strong evidence for this claim. Grounded in embodied cognition theory and the idea that core physical concepts are rooted in the primary sensory schema of effort resistance motion, they developed and tested a kinaesthetic methodology to enhance embodied understanding in physics instruction. The two year quasi experimental project was conducted in a specialised high school with an experimental group of eighteen students instructed using the kinaesthetic methodology and a control group of twenty two students following the conventional curriculum. The methodology integrated three types of kinaesthetic exercises, tactile motor, spatial locomotive, and role play simulations, together with the use of a mobile application for the immediate objectification of subjective bodily sensations into objective physical data. Effectiveness was assessed using an embodied conceptual understanding assessment scale, and statistical analysis confirmed that the mean gain score in the experimental group was significantly higher than in the control group, with the disparity particularly evident in the #embodied_kinesthetic component of the scale. The authors conclude that purposeful use of kinaesthetic activity establishes resilient mental images that help students consciously modify their intuitive schemas, and that the student's body can serve as an effective sensor and a powerful tool for cognition. This kind of result deserves careful reading. The gain was not simply on a test of physics facts. It was on a scale that specifically measured embodied conceptual understanding. The claim is not that movement makes students remember formulas better. The claim is that movement builds a certain kind of understanding that the desk bound method leaves undeveloped. In physics, this includes the ability to imagine forces, to feel what conservation means, and to intuit why symmetry matters. Kang, Aguilar, and Kim's study of Hispanic fifth graders, mentioned earlier, tested technology supported embodied learning combined with translanguaging in an afterschool physics program. Participants showed improved physics knowledge and motivation compared to non participants, including higher performance on a state standardized test, and the authors conclude that purposeful, technology integrated, and culturally responsive embodied activities can enhance conceptual understanding and academic achievement. Here again the key modifier is purposeful. The activities were not random. They were designed to link body action to physics content and to the linguistic resources students brought from home. Beyond individual studies, there is a longer tradition of #physical_modeling in physics education that fits the embodied view. Roberts and colleagues describe how tenets of embodied cognition, in which behavior and cognition emerge out of real time sensorimotor behavior of the individual situated within a particular context, can be used to implement strategies that enhance STEM pedagogy, and they apply six lessons of embodied cognition to a middle school project on water quality. The lessons include treating cognition as situated, time pressured, offloaded to the environment, for action, and involving the body. A water quality project that has students wading, measuring, and comparing sites is not just fun field work. It is a way to embed abstract science concepts in bodily and environmental experience. For high school and university physics, embodied approaches often use technology to sense and mirror student movement. Mixed reality tools track a learner's arms, feet, or head, and translate the movements into simulated physical systems. Recent work in this space suggests that when a simulation is driven by the body of the learner rather than by clicks of a mouse, understanding of concepts such as force fields, waves, and orbital motion can improve. The critical design principle is again alignment. The bodily action must correspond in a meaningful way to the underlying physical structure. A student who moves an arm in a way that stands for the electric field is not the same as a student who moves an arm arbitrarily and sees a graph change. Physics also gives one of the clearest cases of the danger of the desk bound model. Many students arrive at a first course with strong intuitive theories about motion that do not match Newtonian mechanics. These intuitive theories were learned through the body, in years of playing, throwing, and running. If the classroom then presents Newton laws only in symbols, the abstract account never meets the bodily account. The two coexist, and the intuitive one usually wins under time pressure. #kinesthetic_activities can force the two accounts into the same arena, so that students can feel where their intuitions were wrong and adjust them. 7. What Movement Does for the Brain Why should physical movement help learning at all? The theoretical answers are many, but a few threads stand out. First, movement encodes information in additional systems. When a student moves while learning, motor cortex, cerebellum, and parietal systems are involved along with language and visual systems. When the student later tries to recall the concept, there are more retrieval paths. This is one reason why #retention tends to be stronger for embodied learning. Second, movement can offload part of the cognitive work. Kougioumtzis lists #cognitive_offloading as one of four main mechanism clusters attributed to embodiment in STEM learning, alongside gesture based representation, perceptual spatial structuring, and socially mediated interaction. When students use their hands to hold information, they free up working memory for other steps. This is especially useful in mathematics and physics, where working memory limits often block progress. Third, movement can make abstract structure visible. Perceptual spatial structuring, in the same review, is another main mechanism. Consider a proof that involves a rotation. If the student rotates a paper triangle, they can literally see which parts stay the same. A purely symbolic proof might be logically correct but perceptually opaque. Fourth, movement supports metaphor. As already discussed, abstract concepts often ride on metaphors that treat them as motion or position. Increasing functions go up. A limit is approached. A proof leads to a conclusion. These metaphors are not decorative language. They import motor content into thought. Khatin-Zadeh and colleagues argue that metaphors describing an abstract concept in terms of a motion concept are widely used to enhance our understanding of abstract concepts, and are found not only in everyday language but also in learning mathematics. Movement is what gives these metaphors their power. Fifth, movement can be shared. Gestures between teacher and student, coordinated actions in a group, and even mimicry all serve as social channels. Kougioumtzis's fourth mechanism cluster, socially mediated interaction, points at exactly this. Learning is not only a matter of what happens inside one head. It is also a matter of how bodies in a room shape each other's attention. Glenberg, in a widely cited demonstration, argues that ideas from embodied cognition can be used to understand how the brain and cognitive systems deal with very abstract concepts such as regression to the mean, and shows that abstract concepts can be grounded in perceptual, motor, and emotional systems through successive levels of grounding within an extended procedure. He notes that this grounding often requires formal instruction, because a teacher needs to develop the sequence in which the concepts are grounded and the methods of grounding, so that at least some abstract concepts are unlikely to be learned through unstructured interactions with the world. This is a crucial qualification. Embodied learning is not the same as free play. It needs design. Geary offers a related theoretical view. He argues that the controlled semantic cognition system supports conceptual learning across domains, represents concepts as common properties of related experiences or things that can be generalized across exemplars, contexts, and time, and can be expressed across modalities. Concepts, on this account, emerge slowly through statistical learning and are shaped by the frequency and variety of exposures to experiences and things that share common features, which helps explain why repeated and varied solving of problems that tap the same concept are required for concept learning, and why mathematical concepts can be expressed through gesture, language, or visually. From this angle, embodied instruction is not exotic. It is one of the modalities through which the general concept learning system already runs. 8. Retention and Transfer: The Core Outcomes The question that most matters to teachers and students is whether embodied approaches change the outcomes we care about. Two outcomes stand out: retention and transfer. Retention means that what is learned lasts. A student who scores well on a test one week after the lesson but has forgotten everything a month later has not really learned. Retention is one of the classic weaknesses of the desk bound approach. When students memorize procedures without meaning, the procedures fade quickly. Physical activity is linked to better memory in general, but embodied instruction seems to link retention more tightly to the specific content taught. Kontra and colleagues showed, in a widely discussed study, that direct physical experience with angular momentum improved learning of the concept beyond what students achieved by observing alone, and that this improvement was linked to activation in motor and somatosensory regions when the students later reasoned about the concept. In simpler terms, when students had felt the concept in their bodies, they thought differently about it later. Their brains reused the motor experience during reasoning. Wilcox and colleagues' finding that conceptual understanding is almost entirely retained during the gap between Physics I and Physics II when instruction is interactive is a striking counterpoint to the usual complaint that physics knowledge evaporates over vacation. Retention, in that study, was measured by a standard conceptual inventory. Interactive engagement, in which students do more than watch, appears to leave stable traces. Transfer means that what is learned in one situation can be used in another. This is harder than retention. A student may remember a formula but not know when to use it. In embodied learning, transfer is helped by the fact that motor memory tends to generalize across cases that share the same #action_structure. A student who has walked out linear graphs of velocity may recognize the underlying pattern in a graph of concentration over time, because the action of moving at a constant rate is the same. The concept has been given a portable shape. Gordon and Ramani's model of gesture and information processing in children's mathematical environments emphasizes exactly this bridging role. Gestures assist with math learning and problem solving by supporting cognitive processes such as executive function, and children are better able to learn, retain, and generalize knowledge about math when that information is presented within the gestures that accompany an instructor's speech. Generalization here means transfer. There are limits, though. Kougioumtzis's review reminds us that embodied approaches yield moderate positive effects with substantial heterogeneity, and that these effects tend to diminish or reverse under functional misalignment or excessive task complexity. In other words, transfer benefits appear when the movement matches the concept and the task is not overwhelming. When the movement is arbitrary or the task is too heavy, gains disappear or turn negative. For teachers, this means it is not enough to add movement. The right kind of movement, at the right level of demand, is what matters. It is worth designing carefully. 9. Implications for Practice If the evidence is taken seriously, several practical shifts follow for #stem_classrooms. First, the classroom itself needs to allow movement. Rooms that are packed with desks in tight rows tell students, before a word is spoken, that only their eyes and hands will be used. Flexible seating, open floor space, and clear sight lines make it possible for a lesson to move into the space. This is not a demand for expensive rebuilding. Even a small area at the front of a room, cleared of furniture, can become an embodied stage for demonstrations and student enactments. Second, teachers benefit from training in the deliberate use of gesture. Research has shown that students learn more when their teacher has learned to gesture effectively in mathematics instruction, so gesture is not only a personal style but a teachable practice. A teacher who consistently uses the same gestures to signal derivative, integral, and limit gives students a stable second channel that they can internalize. Third, curriculum designers can build lessons that start with an activity and move to a symbol, rather than the other way around. Abrahamson and colleagues describe how new learning environments foster student enactment of conceptually oriented movement forms and only then formalize these gestures and actions in disciplinary formats and language. A simple example: to teach linear equations, students can walk on a number line before they see the algebra. To teach forces, students can push against a partner before they see the free body diagram. In each case, the symbols land on a rich bodily experience rather than on empty ground. Fourth, technology should be chosen with embodiment in mind. Not all technology serves the body. A worksheet on a tablet is still a worksheet. But motion capture, augmented reality, and haptic feedback devices can bring the body into the learning loop in ways that were not possible a decade ago. Wang's work on the volume of a sphere in an augmented paper based environment is one example. Kang and colleagues' technology supported embodied learning with translanguaging is another. The right question about any classroom technology is whether it invites the learner to move meaningfully, or asks them to sit still and click. Fifth, assessment should catch what the body knows. If students are learning through movement and gesture, tests that only measure symbol manipulation will miss part of what has been learned. Ryabko and Syzon's use of an #embodied_conceptual_understanding scale is a small step in this direction. Broader assessment reform is a long project, but even simple additions such as asking students to draw, gesture, or physically model a concept during an exam can widen the picture. Sixth, teacher preparation programs should treat embodied cognition as a core content area rather than an elective. If the theory is correct, then the entire architecture of a lesson looks different from what most future teachers were trained on. Programs need to include not only readings but also studio time in which teachers practice designing and running embodied lessons. Seventh, equity considerations deserve attention. Kang and colleagues' study focused on Hispanic elementary students and combined embodied activities with translanguaging, a practice that lets students draw on their full linguistic repertoire. Their finding that this combination boosted learning and motivation should not be surprising. When language, body, and content all pull in the same direction, students who might be shut out by any single channel are given multiple ways in. Embodied instruction is, among other things, an inclusion strategy. 10. Boundary Conditions and What Can Go Wrong An honest review of the evidence must include the ways embodied instruction can fail or backfire. There are three main risks. The first is misalignment. If the movement does not match the concept, the movement may become a distraction rather than a support. Kougioumtzis notes that effects tend to diminish or reverse under functional misalignment or excessive task complexity. A teacher who uses a dance to introduce Boolean algebra without a clear structural link between the two is not doing embodied learning. That teacher is doing dance and then, separately, Boolean algebra. Learners may enjoy the dance and still be confused about the concept. The second is #cognitive_overload. When a task requires students to move, listen, watch, and think at the same time, the total demand can exceed their working memory. In some cases, adding movement makes learning worse, especially for novices facing a complex task. Zhang, Son, and Stigler's cognitive developmental theory is relevant here: direct sensorimotor engagement is most effective early in the development of expertise, while observation and mental simulation become more useful later. Getting this sequence wrong can leave learners exhausted rather than educated. The third is superficial mimicry. In some classrooms, embodied activities have become a checkbox: bring movement into the lesson, tick, move on. When the movement is not connected to a target concept, and when it is not followed by reflection and symbolization, the classroom looks active but produces little learning. The point of embodied instruction is not to prove that the teacher is modern. The point is to bring the body into the specific work of making a concept understandable. A separate boundary concerns individual differences. Not every learner responds to embodied instruction in the same way. Learners with certain motor differences, learners with sensory processing differences, and learners who are shy about performing in front of peers all need adapted versions of the same principle. Embodied instruction is not one method. It is a family of methods, and it must be flexible. Finally, embodied cognition is a live research field with disagreements. Abrahamson and colleagues describe the goal of finding synergy across diverse views on the role of physical movement in design for STEM education, which is a polite way of saying that different researchers use different definitions and metrics. Kougioumtzis's integrative review argues that further progress depends less on empirical accumulation than on conceptual clarification, specifying which mechanisms a given embodied interaction activates, for which learners, and under what conditions. Teachers and curriculum designers should not wait for the field to become perfectly unified. They should, however, be honest about what is well supported and what is still being worked out. 11. Future Research Directions Several directions look particularly promising for the next few years of research on #embodied_STEM_learning. The first is longer studies. Many current studies last a few weeks or a semester. To know whether embodied instruction produces lasting effects on the choices students make about their studies and careers, researchers need to track cohorts over years. The second is mechanism level work. Kougioumtzis's four mechanism clusters, gesture based representation, perceptual spatial structuring, cognitive offloading, and socially mediated interaction, are a useful map, but more work is needed to say which mechanism is most active in which lesson. Studies that measure gesture, gaze, and body movement together with performance are one way forward. The third is design research at scale. Abrahamson and colleagues' work on embodied design has developed elegant prototypes, but few of these have been tested in ordinary classrooms with ordinary teachers on tight schedules. The next step is to build tools and routines that non specialist teachers can use without heavy training. The fourth is measurement. New assessments that capture embodied understanding, not only symbolic performance, are needed. Ryabko and Syzon's embodied conceptual understanding scale is a small example. Broader efforts should follow. Standardized tests will continue to matter, but they should not be the only word. The fifth is #cross_cultural work. Most published studies come from a small set of countries. Kang and colleagues' work with Hispanic students in the United States, combined with translanguaging, shows what cultural responsiveness can add. More studies from different linguistic, cultural, and educational settings are needed. The sixth is careful integration with #artificial_intelligence and immersive technology. As AI powered tutors and virtual reality platforms enter schools, embodied designers have a chance to shape these tools rather than react to them. Seccia and Goldin-Meadow's article on gesture appeared in a theme issue on minds in movement in the age of artificial intelligence, which signals that the field is aware of this crossroads. The risk is that AI tools drive learners deeper into the desk. The opportunity is to build AI tools that recognize gesture, respond to movement, and treat the body as part of the learning system. 12. Rethinking the Very Nature of Mathematics and Physics Concepts A deeper implication of the embodied view is that mathematics and physics concepts themselves may not be what we thought they were. In the older picture, a concept in mathematics is a purely abstract object. The number seven, the derivative, the group Z under addition, all exist in a realm of pure form. Learning is a matter of climbing into that realm. In the embodied picture, concepts are still abstract in the sense that they generalize across cases. But their content is partly made of experiences. Seven has more in common with the felt experience of counting to seven than with any purely formal definition. The derivative has more in common with the felt experience of a curve steepening than with the epsilon delta definition. This does not mean that formal definitions are unimportant. It means that the formal definitions rest on a broader base of #sensorimotor_experience. Friedrich and colleagues extend this view to physics with their argument that physical invariants such as gravity, momentum, and friction, which are unchanging features of physical motion, should be added to the list of modalities in which concepts are grounded, alongside sensorimotor, emotional, action, language, and social modalities. Research on physical reasoning consistently demonstrates that physical invariants are represented as fundamentally as other grounding substrates, and therefore should qualify, and the classic grounded cognition theories of simulation and conceptual metaphor are well suited to incorporate them. Conceptual spaces and predictive processing theories are also promising and should be integrated with grounded cognition in the future. If this line of work continues, we may eventually see textbooks that treat physical invariants as first class content rather than as background assumptions. A student learning about momentum would not only see the definition p = mv but would also work through carefully designed embodied experiences that make the invariance of momentum tangible. The same shift is possible in mathematics. A student learning about continuity would not only see the epsilon delta definition but would also engage with the strongly embodied motion based representation that Khatin-Zadeh and colleagues describe, in which continuity is transformed into a motion concept that recruits sensorimotor networks. Symbols would still matter. They would matter more, because they would name something the student had already felt. 13. What a Well Designed Embodied STEM Lesson Looks Like To make the discussion concrete, it helps to picture a well designed embodied STEM lesson from the inside. Consider a lesson on #vector_addition in a first year physics course. In the traditional lesson, the teacher writes two vectors on the board, draws a parallelogram, and derives the resultant. Students copy the diagram and then solve a series of problems that require them to apply the same procedure. There is no movement. The concept lives on paper. In an embodied version, the teacher clears a section of the floor. Two students volunteer. Each takes a broomstick and, on the teacher's cue, pushes a third student, who stands on a low friction platform. The pushes come from different angles. The third student experiences the combined push as a single direction and force. The class discusses what they saw. Then the teacher asks the two pushers to try again, this time with different strengths and angles. The class predicts, from what they saw before, which way the middle student will move. Only after these enactments does the teacher introduce the parallelogram rule and the algebraic representation. The symbols now describe something the class has already felt. Later that week, students use a tablet based motion capture tool to experiment with more complex combinations of forces. Their own movements drive the simulation. The abstract algebra of vectors is now attached to at least three different sensory memories. This kind of lesson is not a fantasy. Elements of it appear in the studies discussed above. Ryabko and Syzon's role play simulations, Kang and colleagues' technology supported embodied activities, and the broader tradition of #interactive_engagement in physics all point in this direction. What is missing, in most schools, is the confidence to make lessons like this the norm rather than the exception. Consider a second lesson, this time on #symmetry_groups in a secondary mathematics class. In the traditional version, the teacher writes the definition of a group, presents axioms, and works through examples. Students memorize the axioms and try to apply them to given sets. In an embodied version, students begin by physically rotating and flipping equilateral triangles cut from paper, tracking the positions of the vertices. They notice that the set of possible transformations closes under composition. They notice that some transformations undo others. They give names to what they see: rotation by one third, flip about the vertical axis. Only then does the teacher introduce the formal notation. When the axioms of a group appear on the board, they name features that students have already discovered with their hands. The general shape of these lessons is what Abrahamson and colleagues describe as the design sequence from enactment to formalization. The activity comes first. The formal symbols come second, as a compact record of what the body already understands. The final step, often skipped, is reflection: students discuss how the symbols on the board relate to what they did, and where the mapping might be tight or loose. This design sequence is not the only way to bring the body in. Sometimes an embodied activity is inserted in the middle of a symbolic sequence to break a fixation, and sometimes a whole unit is organized around a rich physical model. But whichever variant is used, the general point stands: a lesson designed with the body in mind has more channels for learning than a lesson designed without it. 14. From Individual Lessons to Whole Systems An embodied approach cannot survive as isolated lessons. It has to shape the whole system in which teaching happens. At the level of curriculum, this means selecting content and sequences that lend themselves to embodied engagement, and giving teachers detailed guides. Not every topic needs a big movement activity. Some topics do. Curriculum designers who understand the theory can tell the difference. At the level of assessment, this means adding tasks that reveal embodied understanding. #performance_assessment, in which students physically model or explain a concept using gesture, can be added to written tests without displacing them. At the level of teacher training, this means treating embodied cognition as core knowledge for every future STEM teacher. Programs should include readings, examples of design, and supervised practice leading embodied lessons. At the level of school culture, this means valuing lessons that look active. It is easy for administrators to mistake a quiet classroom for a productive one. A school that takes embodied learning seriously will train leaders to recognize the difference between purposeful movement and chaos. At the level of policy, this means research funding for design and measurement, and space for local experimentation. Waiting for perfect certainty is not neutral. It is a choice to keep the desk bound model in place. Roberts and colleagues' recommendation, based on their analysis of a middle school science project through six tenets of embodied cognition, is that STEM pedagogy at large would benefit from incorporating those tenets. 15. Objections and Replies An article that argues for a substantial shift in teaching practice must face objections seriously. The first objection is that time is limited and adding embodied activities may take time away from content that already does not fit. A well designed embodied activity is not extra. It is instruction. Time spent walking out a graph is time teaching graphs. Studies such as Ryabko and Syzon's, in which the embodied methodology produced measurably better learning than the conventional curriculum, suggest the trade is worth it. The second objection is that classrooms are too crowded for movement. This has real force in many settings, but small changes help: clearing a small area, using folding desks, or using outdoor spaces. Even seated students can perform gesture activities and short partner enactments. Full embodiment is a scale, not a switch. The third objection is that some concepts are beyond bodily grounding. What could the body do with the Cantor set or quantum superposition? Following Glenberg, even very abstract concepts can be grounded through successive levels of grounding within an extended procedure, though this requires formal instruction because the sequence has to be designed by a teacher. Grounding may pass through metaphor and gesture. A purely disembodied approach is a choice, not a necessity. The fourth objection is that assessments do not reward embodied understanding, so teachers who invest in it may hurt their students on the tests that matter. The same critique was once made about conceptual understanding of any kind in physics, and the field slowly built assessments that recognize it. The path forward is to build embodied dimensions into new assessments. The fifth objection is that the evidence is still limited. Kougioumtzis's review is explicit about heterogeneity and the need for conceptual clarification. But the current state of the evidence justifies serious investment, not sticking with a status quo that has well documented weaknesses. 16. Conclusion The traditional #desk_bound instructional model in STEM education is not a natural or neutral choice. It reflects an older view of the mind that treated the body as separate from thought. That view has been challenged, hard, by three decades of work in embodied cognition. Recent studies confirm that when students learn mathematics and physics through gesture, kinesthetic activity, and physical modeling, they build stronger, more portable, and more durable understanding. The benefits are real but conditional. Movement helps most when it is aligned with the target concept, when it does not overload the learner, and when it is followed by reflection and symbolization. Random activity is not embodied learning. #purposeful_movement is. Practically, an embodied STEM classroom looks different from the traditional one. There is more open space. There is more gesture from the teacher and more gesture from the students. Lessons often begin with an enactment and only later name the symbols. Technology, when used, invites movement rather than reinforcing stillness. Assessments try to catch what the body knows, alongside what the symbol pushing hand can do. For students, the invitation is to notice how their bodies are already involved in their thinking. The hands that trace a graph in the air are not accidents. They are part of understanding. For teachers, the invitation is to design lessons that use that fact on purpose. For researchers, the invitation is to keep clarifying the mechanisms and boundary conditions, so that the field can offer teachers reliable guidance. The mind is not a spectator locked inside the skull. It reaches out through the eyes, the ears, and the moving body. When STEM education takes that seriously, abstract concepts become less abstract, and students left behind by the desk bound model find more ways in. That is the case for #embodied_cognition in mathematics and physics. #embodied_cognition #STEM_education #kinesthetic_learning #physical_movement #abstract_concepts #mathematics_learning #physics_education #gesture #retention #transfer #embodied_design #sensorimotor_learning #active_learning #cognitive_science #pedagogy References Abrahamson, D. (2024). Embodied design: Bringing forth mathematical perceptions. In S. L. Macrine and J. M. B. Fugate (Eds.), Handbook of embodied cognition and learning (forthcoming volume chapter). Abrahamson, D., Nathan, M. J., Williams-Pierce, C., Walkington, C. A., Ottmar, E. R., Soto, H., and Alibali, M. W. (2020). The future of embodied design for mathematics teaching and learning. Frontiers in Education, 5, 147. https://doi.org/10.3389/feduc.2020.00147 De Freitas, E., Ferrara, F., and Ferrari, G. (2023). Assembling mathematical concepts through trans-individual coordinated movements: The role of affect and sympathy. Educational Studies in Mathematics, 112(2), 289 to 307. https://doi.org/10.1007/s10649-022-10201-0 Drumea, B. (2022). Models of motion in abstract linear algebra. Journal of Mathematics and Physics Research, 6(1), 12 to 24. Friedrich, J., Fischer, M. H., and Raab, M. (2024). Invariant representations in abstract concept grounding: The physical world in grounded cognition. Psychonomic Bulletin and Review, 31(6), 2558 to 2577. https://doi.org/10.3758/s13423-024-02522-3 Geary, D. C. (2026). The evolved system for conceptual understanding: Implications for mathematical development. Journal of Numerical Cognition, 12(1), advance online publication. Glenberg, A. M. (2021). Embodiment and learning of abstract concepts such as algebraic topology and regression to the mean. Psychological Research, 86(8), 2135 to 2145. https://doi.org/10.1007/s00426-021-01576-5 Gordon, R., and Ramani, G. B. (2021). Integrating embodied cognition and information processing: A combined model of the role of gesture in children's mathematical environments. Frontiers in Psychology, 12, 650286. https://doi.org/10.3389/fpsyg.2021.650286 Hiscott, L. (2022). Conceptual juggling. Physics World, 35(9), 30 to 33. Kang, S. H., Aguilar, J. J., and Kim, S. (2026). An integrated pedagogical approach to enhance Hispanic elementary students' STEM learning and motivation. Education Sciences, advance online publication. Khatin-Zadeh, O., Eskandari, Z., and Farsani, D. (2023). The roles of mathematical metaphors and gestures in the understanding of abstract mathematical concepts. Journal of Humanistic Mathematics, 13(1), 234 to 250. https://doi.org/10.5642/jhummath.QUKX7583 Khatin-Zadeh, O., Farsani, D., Hu, J., and Marmolejo-Ramos, F. (2023). The role of perceptual and action effector strength of graphs and bases of mathematical metaphors in the metaphorical processing of mathematical concepts. Frontiers in Psychology, 14, 1178095. https://doi.org/10.3389/fpsyg.2023.1178095 Khatin-Zadeh, O., Farsani, D., and Yazdani-Fazlabadi, B. (2022). Understanding the dis-embodied representation of continuity in terms of a strongly embodied representation. Cogent Education, 9(1), 2141516. https://doi.org/10.1080/2331186X.2022.2141516 Kougioumtzis, K. (2026). Embodied cognition in STEM learning: An integrative review of conceptualizations, mechanisms, and boundary conditions. Educational Research Review, advance online publication. Macrine, S. L., and Fugate, J. M. B. (2020). Embodied cognition and its educational significance. In Oxford Research Encyclopedia of Education. Oxford University Press. Roberts, J., Williams, J., Hodgdon, R., Payne, C., and Emmanuelli, G. (2024). Embodied cognition and teaching STEM: Tenets to explain and enhance a middle school science project. Current Issues in Middle Level Education, 29(1), 15 to 27. Ryabko, A. V., and Syzon, O. (2025). Kinaesthetic methodology as a tool for fostering embodied conceptual understanding of physics. Scientific Bulletin of Mukachevo State University. Series Pedagogy and Psychology, 11(4), 44 to 56. Seccia, A., and Goldin-Meadow, S. (2024). Gestures can help children learn mathematics: How researchers can work with teachers to make gesture studies applicable to classrooms. Philosophical Transactions of the Royal Society B, 379(1911), 20230156. https://doi.org/10.1098/rstb.2023.0156 Thur, A., Kertesz, T., and Fugedi, B. (2026). The relationship between movement, mathematics, and logical thinking. Journal of Human Sport and Exercise, advance online publication. Wang, H. (2025). Mathematical meaning-making of the volume of a sphere in an augmented paper-based mathematics learning environment from an embodied cognition perspective: A gesture-centered analysis. Educational Studies in Mathematics, advance online publication. Wilcox, B. R., Pollock, S. J., and Bolton, D. R. (2020). Retention of conceptual learning after an interactive introductory mechanics course. Physical Review Physics Education Research, 16(1), 010140. https://doi.org/10.1103/PhysRevPhysEducRes.16.010140 Zhang, I., Son, J. Y., and Stigler, J. W. (2025). Toward a cognitive developmental theory of embodied learning in STEM domains. Educational Psychology Review, 37(2), advance online publication. #learning_by_doing #hands_on_learning #whole_body_learning #mind_body_connection #teacher_education #cognitive_load #conceptual_understanding #interactive_engagement #haptic_feedback #mathematical_metaphor #spatial_reasoning #inclusive_education #culturally_responsive_teaching #curriculum_design #experiential_learning #multimodal_learning #geometry_learning #algebra_learning #embodied_STEM #future_of_education

  • The Gamification Fatigue: When Extrinsic Rewards Erode Intrinsic Motivation

    This paper looks at a growing problem in modern classrooms known as #gamification_fatigue. Over the last decade, teachers and course designers have added #badges, #points, and #leaderboards to almost every kind of #learning platform, from primary school reading apps to university #MOOCs. Early results looked promising, and students seemed excited. But a closer look at longer studies shows a different picture. #Student_engagement often rises quickly in the first few weeks, then falls in a pattern researchers now call the #decay_rate of gamified engagement. This article draws on a growing body of #longitudinal work published between 2021 and 2026 to map that decay curve, explain why it happens, and argue for a shift in design philosophy. The paper claims that many current gamified systems are built on old #behaviorist thinking, which treats students like pigeons pecking for pellets. This model rewards surface behavior and often crowds out the deeper drive to learn. A better path, supported by recent empirical work, is to root gamified pedagogy in #self_determination_theory, which focuses on #autonomy, #competence, and #relatedness. When these three needs are met, motivation lasts longer, learning goes deeper, and the fatigue effect weakens. The paper closes with practical guidance for teachers, instructional designers, and platform builders who want to move past the point-and-badge era. Keywords: gamification fatigue, intrinsic motivation, extrinsic rewards, self-determination theory, longitudinal engagement, badges, points, leaderboards, decay rate, pedagogical design. 1. Introduction Walk into almost any digital classroom today and you will see some form of #game_element on the screen. A learner opens an app and sees a small trophy for logging in three days in a row. A student in a language course watches a #streak counter tick upward. A university student earns a bronze, silver, or gold #badge for finishing weekly tasks. On the class dashboard, names climb up or fall down a public #leaderboard. These features were sold to educators as a cure for boredom, a way to keep #digital_learners glued to the material in a world full of distractions. For a while, this promise seemed reasonable. Many short studies showed that adding game elements to a course increased attendance, task completion, and self-reported enjoyment. Meta-analyses have confirmed that, on average, gamified conditions produce small but real gains in intrinsic motivation compared with non-gamified conditions (Li et al., 2024). New tools appeared every year, and both teachers and administrators grew comfortable with the idea that if you wanted more #engagement, you added more #points. But researchers began to notice a pattern. When they followed students beyond a few weeks, the shiny effect of gamification faded. Some studies described this as the #novelty_effect, the idea that anything new gets attention just because it is new. Others found something more complicated. Engagement did not just drop back to the baseline; in some cases, it fell below it. Students started to feel tired, cynical, or even angry at the systems that once excited them. In short, they showed signs of what this paper calls #gamification_fatigue. The problem is not that game elements are useless. The problem is that the way most systems are designed puts almost all their weight on #extrinsic_rewards, that is, external prizes that live outside the task itself. Decades of motivation research warn that when learners begin to see an activity as a way to earn a reward rather than as something worth doing on its own, their inner drive to keep going often weakens. This is sometimes called the #overjustification_effect, and it is one of the oldest debates in educational psychology. Recent reviews suggest that gamified classrooms are quietly repeating this old mistake at a very large scale (Bardach and Murayama, 2025; Jose et al., 2024). This article has three aims. First, it maps what recent longitudinal studies say about the shape and speed of the engagement decay curve when students are exposed to badges, points, and leaderboards over weeks and months. Second, it explains why the theory behind most of these designs, which is often #behaviorist at heart, tends to produce fatigue rather than lasting interest. Third, it argues that pedagogical design should be rebuilt around #self_determination_theory (SDT), a framework proposed by Deci and Ryan and now widely tested. Under this view, students carry three inner needs at all times: autonomy, competence, and relatedness. When gamified designs feed these three needs, they support intrinsic motivation. When they crush them, learners feel controlled, incompetent, or isolated, and the fatigue sets in. The article is written for students who study education, instructional design, learning sciences, and educational technology. It uses plain language, but it is structured like a #Scopus_level journal paper so that readers can also use it as a model for their own writing. 2. Literature Review 2.1 From behaviorist roots to modern gamification The idea that behavior can be shaped by external rewards is very old. It sits at the heart of #behaviorism, especially the operant work associated with B. F. Skinner. In this view, if you want more of a behavior, you attach a reward to it, and if you want less, you attach a cost. Gamified learning systems, whether their designers admit it or not, often inherit this logic. Points act as small food pellets. Badges act as visible tokens. Leaderboards act as public praise. The core assumption is that #external_reinforcement will produce more of the desired behavior, in this case study time or task completion. This design philosophy became popular partly because it is easy to implement. Any learning management system can be configured to award points and rank users. It is also easy to measure. Clicks, logins, and completed quizzes all produce numbers that look impressive in reports. But this same simplicity is a warning sign. Learning is a complex process that involves attention, effort, feeling, and identity. A model that treats it as a chain of reward-driven acts risks missing everything that matters about it (Jose et al., 2024). Recent syntheses argue that most gamified courses focus on the same three surface tools, points, badges, and leaderboards, sometimes called the PBL triad. A systematic review of gamified learning between 2010 and 2022 found that these three elements dominated the field and that most studies focused on short-term outcomes rather than long-term engagement (Ratinho and Martins, 2023). Very few designers tested what happens after the excitement wears off. 2.2 The rise of self-determination theory in education Against this behaviorist current, #self_determination_theory has been the leading alternative for describing what makes learning stick. SDT states that humans have three basic psychological needs: autonomy, which is the feeling of choosing one's own path; competence, which is the feeling of being effective; and relatedness, which is the feeling of belonging with others. When any activity supports these needs, people tend to internalize it and pursue it for its own sake (Luria, 2022). Applied to gamified classrooms, SDT predicts that badges and points can be useful, but only if they are designed as #informational feedback rather than as tools of control. A points system that tells learners they are making real progress toward mastery, and lets them choose which tasks to tackle, can support competence and autonomy. A points system that publicly ranks learners against each other and forces them into set paths tends to feel controlling, and it undermines the same needs it was supposed to serve (Jones et al., 2022; Luarn et al., 2023). A meta-analysis of 35 gamification interventions found that gamified learning produced positive effects on students' sense of autonomy and relatedness, but had only a small impact on perceived competence (Li et al., 2024). The authors concluded that many designs fail to make students feel truly capable, because points and badges reward speed and completion rather than genuine understanding. A larger structural equation modeling study with 615 university students in Vietnam confirmed the same pattern, finding that gamification directly improved feelings of autonomy and competence but did not clearly increase relatedness, while artificial intelligence support helped all three needs (Nguyen-Viet and Doan, 2026). 2.3 The novelty effect and early evidence of decay The clearest sign that gamification does not automatically last is the #novelty_effect. In one influential longitudinal study, Rodrigues and colleagues tracked 756 STEM students in an introductory programming course over 14 weeks. They compared a gamified version of a system with a non-gamified version and measured behavior at seven points in time. Their results showed a U-shaped pattern. Engagement in the gamified group started strong, then began to fall after about four weeks, and stayed lower for two to six weeks. Interestingly, engagement then partly recovered between six and 10 weeks, which the authors called a #familiarization_effect (Rodrigues et al., 2022). This pattern matters because it undermines the common assumption that whatever benefit a gamified system provides in week one will simply continue. It also shows that decay is real but not always permanent. The right design choices, and time itself, can help learners settle into the system. Similar novelty and re-adjustment patterns have appeared in distance learning during the COVID-19 pandemic, where gamification produced moderate benefits for ICT students but no additional advantage over normal practice tests in business courses (Kratochvil et al., 2023). Other longitudinal studies point in the same direction. A six-week field study of a #tailored gamification system found that engaged behaviors gradually decreased over time in every condition, but tailoring the game elements to the learner slowed this decrease (Serna et al., 2023). A separate case study of point-based gamification in a university learning management system found clear novelty effects and argued that #point_systems must be designed with attention to how learners perceive their value and meaning (Berglund and Jedel, 2023). 2.4 Fatigue as a distinct phenomenon Novelty decay is only part of the story. A more troubling picture appears when researchers focus on #user_fatigue as a separate outcome. Yang and colleagues used the transactional theory of stress and coping to explore the negative side of social gamification. They surveyed 450 users and found that competitive and interactive game elements were positively related to #reputation_maintenance concerns, which in turn drove #fear_of_missing_out and full user fatigue. Achievers and socializers experienced these effects differently, but both groups showed clear signs of stress from prolonged exposure to gamified competition (Yang et al., 2024). An editorial in a leading gamification journal warned that gamification can push users toward #overconsumption, #exhaustion, and even #manipulation, especially when designers use dark patterns to maximize retention. Competition-related elements can conflict with fairness, and rigid rules can restrict autonomy in ways that harm long-term well-being (Xi et al., 2026). A separate qualitative study of the popular Classcraft platform in Italian secondary schools concluded that the system did increase engagement, but it did so through extrinsic motivation, not through any lasting shift toward inner interest in learning (Brambilla et al., 2025). 2.5 The specific problem of leaderboards Leaderboards are the sharpest example of the #extrinsic_reward problem. They make performance public, forced, and comparative. A 14-week study of an EFL course at a Japanese university found something striking. Students in the class with a leaderboard actually did less voluntary homework than students in the class without one, once they had cleared the minimum reward threshold. The authors concluded that the leaderboard shifted #internally_regulated extrinsic motivation into #externally_controlled extrinsic motivation, and it undermined intrinsic language motivation more than it supported it (Philpott and Son, 2022). A recent experimental study with 427 participants tested how different types of leaderboard feedback affect motivation and performance. Learners who saw themselves in higher positions with upward trends reported the highest intrinsic motivation. Those with lower positions and downward trends reported lower motivation. Cognitive performance changed very little. The authors warned that negative leaderboard feedback can be worse than no feedback at all (Pickal et al., 2026). An earlier experimental study in physics classes found something similar. Badges and leaderboards had no significant effect on academic performance, though most students said they found them motivating and wanted to see them again (Balci et al., 2022). This is a common pattern in the literature. Learners say they like gamification, but the actual learning gains are much smaller than expected. This gap between #perceived_engagement and #real_learning is one of the strongest warning signs that current designs are not doing what they claim to do. 2.6 The specific problem of badges and points Badges look softer than leaderboards because they seem to reward the individual rather than force social comparison. But they carry their own risks. A study of gamified feedback in an adaptive retrieval practice system found that points and progress bars enhanced self-reported feelings of competence, enjoyment, and task value, but had no effect on learning behaviors during practice and no effect on delayed recall (van den Broek et al., 2025). In other words, learners felt better while learning the same amount. That is a valuable finding, but it is not the promise most gamification vendors make when they sell their systems to schools. A study on Duolingo used cognitive evaluation theory to look at how learners interpret virtual rewards. When students saw rewards as #informational, telling them something meaningful about their progress, they experienced greater competence and stronger intrinsic motivation. When rewards felt controlling or empty, motivation dropped. The authors argued that #virtual_rewards should not be treated as a simple use-or-avoid decision but should be redesigned to give clear positive feedback (Jedel and Palmquist, 2026). Recent theoretical work by Bardach and Murayama tries to move past the old fight between rewards being good or bad. Their reward-learning framework of knowledge acquisition suggests that extrinsic rewards can serve as an entry point that helps learners begin engaging with the material. Once a positive feedback loop of internally rewarding learning has formed, however, continued external rewards may interrupt it and undermine long-term engagement (Bardach and Murayama, 2025). This nuanced view fits well with the fatigue pattern reported in longitudinal studies. 2.7 The overjustification effect and its return One of the oldest ideas in motivation research is the overjustification effect. Classic studies from the 1970s showed that children who once drew pictures for the pure pleasure of it drew less often when they were paid to draw. The external reward changed how the children saw the activity. It stopped being play and became work. Later reviews confirmed this pattern across many contexts, though with important nuances about how the reward was framed and delivered. What is striking about the current gamification literature is how often this old lesson reappears in new clothes. Points, badges, and leaderboards are simply new packaging for external rewards, and they carry the same risks. When a student begins a course with some natural interest in the topic, and the course adds an aggressive point system, the student may start to see study time as a way to earn points rather than as a way to learn. Once the points end, the interest often fades faster than it would have without them. Several of the reviewed studies point to this exact mechanism. In the Duolingo work by Jedel and Palmquist (2026), rewards perceived as controlling did not support intrinsic motivation, while rewards perceived as informational did. The 14-week EFL leaderboard study showed the classic overjustification pattern, where students stopped studying once the reward threshold was met (Philpott and Son, 2022). The theoretical review by Bardach and Murayama (2025) went further, arguing that extrinsic rewards can serve as a useful entry point but must fade or transform if long-term engagement is to survive. This is very different from the standard practice of leaving points and leaderboards running all semester at full strength. 2.8 What the field has learned Across these strands of research, a few common threads appear. First, gamification generally helps in the short run, especially for previously unmotivated learners. Second, its effect on core learning outcomes such as retention and transfer is much smaller than its effect on perceived engagement. Third, its effect fades over time, sometimes returning through familiarization and sometimes becoming full fatigue. Fourth, the mechanism of decay is often tied to how the design treats the three basic psychological needs from self-determination theory. When these needs are supported, the design lasts. When they are crushed, the design collapses. Fifth, the shape of the fatigue curve varies with learner type, teacher framing, and cultural context, which means no single design works for every setting. These findings define the space this paper will examine. The next section describes how a longitudinal review was carried out to bring together the pieces of evidence and to map the shape of gamified engagement decay across contexts. 3. Method 3.1 Design overview This article uses a #longitudinal_narrative_synthesis of recent empirical studies published between 2021 and 2026. Because the target phenomenon, gamification fatigue, unfolds across time, we selected studies with follow-up periods of at least four weeks in real learning environments. Purely one-shot experiments were used only for background or as supporting evidence. The primary interest was in how #engagement_metrics change over time and how those changes relate to underlying motivational drivers. The review does not run new statistical models on external data. Instead, it treats each included study as a data point and looks for a shared pattern across contexts. This approach follows the tradition of a #critical_narrative_review, which is common in learning sciences when the literature is too varied for a full meta-analysis but rich enough to reveal a consistent trend. 3.2 Source selection Three broad literature streams were searched: gamification in higher education, gamified language learning, and gamification in online and distance learning. Searches focused on studies with keywords such as longitudinal, decay, novelty effect, self-determination theory, intrinsic motivation, badges, points, and leaderboards. Only studies indexed in reputable journals or peer-reviewed venues were included. Studies had to meet three conditions. First, they had to be published in 2021 or later. Second, they had to report either a longitudinal design or a controlled comparison between gamified and non-gamified conditions in a real course setting. Third, they had to include either behavioral or psychological measures of #student_motivation or engagement. Studies focused only on perception at a single point in time were used to enrich context but were not treated as core evidence. 3.3 Analytic focus Three analytic questions guided the review. What is the typical shape of the engagement decay curve? Which #game_elements are most linked with fatigue and which are most linked with sustained interest? How do the three psychological needs from SDT explain the observed patterns? For each included study, we noted the #study_length, the sample size, the level of education, the specific game elements used, and the outcome measure. Findings were then grouped by outcome type. Perceived engagement, actual behavior, learning outcomes, and psychological need satisfaction were treated as different but linked outcomes. This allowed us to see, for example, that perceived enjoyment often rose even when actual performance did not. 3.4 Ethical considerations Because this article synthesizes public research and does not collect new participant data, it did not require formal #ethics_approval. Where the primary studies collected data from minors or vulnerable groups, we noted whether they reported approval and consent. In all included studies, standard institutional procedures were reported. 4. Results 4.1 The shape of the decay curve Across the reviewed studies, engagement in gamified conditions rarely followed a straight line. Three shapes appeared most often. The first shape is the classic #novelty_curve. Engagement spikes in the first two to four weeks, drops sharply between weeks four and eight, and then partly recovers as students grow used to the system. This is the U-shape reported by Rodrigues and colleagues in a 14-week study of programming students (Rodrigues et al., 2022). Similar patterns were found in a distance learning setting during the pandemic, where gamified conditions produced better outcomes than plain courses for ICT students but faded when compared with well-designed practice tests (Kratochvil et al., 2023). The second shape is a #steady_decline. Here, engagement drops slowly and does not recover. A six-week study of tailored gamification found that engaged behaviors decreased for every group, though the decline was slower for learners whose game elements matched their preferences (Serna et al., 2023). This suggests that tailoring is not a full cure, but it can flatten the fatigue curve. The third shape is what might be called a #split_curve. Perceived engagement stays high, but actual learning behavior does not. In the gamified retrieval practice study by van den Broek and colleagues, learners reported greater competence and enjoyment while working with points and progress bars, but their delayed recall of the material was not different from the control condition (van den Broek et al., 2025). This split is dangerous because it hides fatigue behind smiling survey responses. 4.2 The role of game elements The reviewed literature suggests that not all game elements decay at the same rate. Leaderboards decay fastest and hardest. Their public and comparative nature builds pressure, and once the reward threshold is crossed, motivation to exceed it often drops. A quasi-experimental study in EFL classes found that a leaderboard actually reduced voluntary work relative to a class without one, once the required points had been earned (Philpott and Son, 2022). Experimental work confirmed that leaderboard-based feedback can shift motivation up or down depending on position and trend, and that negative feedback can harm more than no feedback at all (Pickal et al., 2026). Points decay more slowly, but their effect is often shallow. A university study of #point_based gamification found that students perceived points positively when they carried value and meaning, but not when they felt arbitrary (Berglund and Jedel, 2023). The Duolingo study on virtual rewards found that only points interpreted as informational supported competence and intrinsic motivation, while other interpretations did not (Jedel and Palmquist, 2026). Badges show a mixed pattern. In some studies, they boosted early motivation but did not translate into stronger academic performance (Balci et al., 2022). In others, they seemed to help learners set personal goals when tied to meaningful achievements. Their weakness is that they easily turn into collectibles disconnected from real learning, which increases the risk of surface engagement. Social features such as guilds, missions, and cooperative quests appear more resistant to fatigue. Systems that build in collaboration and story elements tend to sustain interest longer, especially when they support learners' sense of belonging. Reviews of long-term gamification argue that success depends on a mix of tailored personalization, evolving challenges, collaborative features, story-driven content, and dynamic updates (Huang et al., 2024). 4.3 Motivational drivers behind the decay The reviewed studies point strongly to self-determination theory as the best current explanation for why gamification decays. When autonomy is missing, students feel controlled. This tends to appear when systems force paths, publish rankings, or punish deviation. In the leaderboard EFL study, once students hit the minimum required points, most stopped, because they did not feel they were choosing to keep going but were simply obeying the reward system (Philpott and Son, 2022). Similar patterns appeared in a qualitative study of Classcraft, where teachers used the platform to shape behavior, and students engaged mostly for extrinsic reasons (Brambilla et al., 2025). When competence is missing, students feel that the rewards do not reflect real growth. This is why gamified retrieval practice can boost feelings of competence without boosting actual learning (van den Broek et al., 2025). Feelings of competence based only on badge counts are fragile. Once the badges lose their glow, competence collapses. A meta-analysis found that gamification had strong effects on autonomy and relatedness but only weak effects on true competence, and that missing competence support was a major reason for weak intrinsic motivation gains (Li et al., 2024). When the need for connection is missing, students feel isolated in the crowd. This may sound strange in a system where everyone sees each other on a leaderboard, but public ranking without meaningful connection often makes users feel visible rather than connected. Social gamification with strong competitive design has been shown to increase fear of missing out and reputation maintenance concerns, both of which drive fatigue (Yang et al., 2024). In contrast, cooperative systems that fulfill relatedness show longer-lasting engagement (Nguyen-Viet and Doan, 2026). 4.4 Individual differences and cultural factors Not all learners respond in the same way. Player type is a strong moderator. Achievers seem more affected by competitive pressure, while socializers are more affected by interactive pressure (Yang et al., 2024). In one physics study, most students said they liked badges and leaderboards, but only a subset actually performed better with them (Balci et al., 2022). This gap between reported preference and real gain is important because it shows that #student_voice, while valuable, is not enough on its own. Cultural context also matters. A study of Vietnamese university students found that both gamification and artificial intelligence support helped learners meet their psychological needs, but the effect on relatedness came mainly from AI rather than from game elements (Nguyen-Viet and Doan, 2026). This suggests that in some contexts, technology-mediated relationships may need extra scaffolding to feel real. Similar caution appears in studies of Italian secondary schools, where teacher framing was crucial for how students interpreted the same platform (Brambilla et al., 2025). Prior gaming experience is another factor. A recent study of serious games in engineering education found that novelty effects and player background jointly shaped how students responded to gamified content, with experienced gamers adapting more quickly but also expressing more critical views (Romero Rodriguez et al., 2025). 4.5 Design features that resist fatigue Several design elements seem to reduce fatigue in the reviewed studies. The first is #meaningfulness. When game elements are tied to real learning goals rather than to arbitrary point counts, motivation lasts longer. Gupta and Goyal built a gamified business school course based on self-determination theory and found that adding meaningfulness to the game elements improved engagement and produced better learning outcomes than either standard courses or arbitrary gamified courses (Gupta and Goyal, 2022). The second is #tailoring. Adjusting elements to fit learner preferences slows the decay curve, even if it does not fully stop it (Serna et al., 2023). The third is #adaptive_feedback. Systems that adjust to the learner and interpret rewards as informational tend to protect competence and support intrinsic motivation (Jedel and Palmquist, 2026). Feedback that is public and comparative, on the other hand, tends to harm autonomy. The fourth is #cooperative_narrative. Story-driven and cooperative elements support relatedness and reduce the pure focus on individual scores (Huang et al., 2024). The fifth is #balance. Systems that mix intrinsic and extrinsic elements outperform those that rely on external rewards alone. A conceptual review argued that gamification designs need a #balanced_fulcrum between intrinsic motivation and extrinsic rewards to be sustainable, and that current systems tilt heavily toward the extrinsic side (Dah et al., 2023). 4.6 Evidence on psychological need satisfaction Recent SEM-based studies deepen the picture. A study with 201 undergraduate students found that autonomy, competence, and relatedness in gamified learning environments strongly predicted motivation, which in turn predicted student performance. Motivation acted as a full mediator between psychological need satisfaction and academic outcomes (Hashim, 2026). This is one of the clearest empirical demonstrations that need satisfaction is not just a nice add-on but the actual mechanism through which gamification produces or fails to produce learning gains. A separate SDT-based study of gamified EFL learners built a model showing that gamification supports intrinsic motivation only when it feeds all three needs, and that any single need being blocked disrupts the entire chain (Phi et al., 2025). This is important because many designs try to boost one need, usually competence through badges, while ignoring the others. Together, these findings support the argument that gamified pedagogical design should treat #need_satisfaction as the primary target and treat points, badges, and leaderboards as tools that may or may not help depending on how they are used. 5. Discussion 5.1 What the decay curve tells us The main lesson from this review is that #student_engagement in gamified systems is not a fixed property. It moves, and it moves in predictable ways. In the first few weeks, novelty carries the design. Between weeks four and eight, the burden shifts to whatever real value the system provides. If the design supports the three needs from #self_determination_theory, engagement can stabilize and even climb again. If it does not, engagement falls, and in some cases fatigue develops. This has serious implications for how success in gamification is measured. Many program reports and product pitches show only short-term data, often two or three weeks. Based on the reviewed evidence, this is exactly the window where the novelty effect is strongest and least meaningful. Serious evaluation requires at least a full semester of data, ideally with several measurement points, following the model used by Rodrigues and colleagues (2022) and Serna and colleagues (2023). 5.2 The behaviorist trap Many current gamified systems inherit their logic from behaviorism without saying so. They assume that behavior follows rewards in a simple way and that scaling up rewards will scale up learning. The evidence gathered here shows that this assumption is wrong in three ways. First, rewards can raise perceived engagement while leaving actual learning flat. This is the split curve from van den Broek and colleagues (2025). Second, rewards can raise short-term behavior while shrinking long-term interest. This is the leaderboard threshold effect from Philpott and Son (2022). Third, rewards can raise reported enjoyment while quietly building stress, especially in competitive designs. This is the fatigue effect from Yang and colleagues (2024). These three findings together suggest that the behaviorist model is not a strong base for #educational_gamification. It gives designers the illusion of control while missing the deeper psychology of the learner. It also invites the ethical concerns raised in recent editorials about manipulation, overconsumption, and #dark_patterns in gamified systems (Xi et al., 2026). 5.3 The self-determination alternative The alternative is to design from the inside out. Instead of asking what rewards will produce the desired clicks, designers can ask what conditions will support a student's inner drive to keep learning. Self-determination theory provides a clear starting point. For autonomy, the design must give real choices. Students should be able to select tasks, decide the order of activities, and set their own goals. Points and badges can support autonomy if they mark progress toward self-chosen goals. They undermine autonomy if they force uniform paths. For competence, the design must provide feedback that reflects real growth. Systems that make progress visible through clear, informational rewards are more effective than systems that hand out points for meaningless activity. This is the informational-versus-controlling distinction that runs through cognitive evaluation theory and its recent applications to gamified learning (Jedel and Palmquist, 2026). For relatedness, the design must support genuine social ties. This can mean team quests, cooperative missions, peer feedback, mentor relationships, or shared story arcs. It does not mean public leaderboards. Leaderboards produce visibility, not connection. In fact, they can make many students feel more alone. When these three needs are met, motivation can shift from #external_regulation, where students act only to earn or avoid rewards, to #identified_regulation and eventually to intrinsic motivation, where students act because the activity matters to them. This shift is the true goal of gamified pedagogy. 5.4 Why perception surveys can lie One striking finding across the reviewed studies is that student self-reports of enjoyment often stay high even as real engagement or learning drops. This is a real problem for evaluation. Many gamified courses are judged on satisfaction surveys, which are exactly the kind of measure most likely to overstate benefit. A better evaluation approach uses multiple data streams: behavioral logs, learning outcomes at delayed intervals, need satisfaction scales, and qualitative feedback. This mix can catch the split curve where students say they like the system but do not actually learn more (van den Broek et al., 2025). It can also detect fatigue signals, such as declining voluntary work after the reward threshold, that pure satisfaction surveys miss (Philpott and Son, 2022). 5.5 The role of teachers Technology never operates in a vacuum. In every study reviewed, the teacher's framing of the gamified system played a major role. In some cases, teachers introduced the system as a tool for growth, and students engaged in ways consistent with self-determination theory. In others, teachers used it as a management device, and students learned to play the system for grades or to avoid trouble (Brambilla et al., 2025). Effective gamified pedagogy requires teacher training that goes beyond how to use the tool and into how to talk about it with learners. This is especially important for university faculty and course designers who assume the technology will do the work by itself. It will not. If the classroom culture stays behaviorist, adding a leaderboard will only sharpen the extrinsic focus and speed up fatigue. 5.6 Emerging directions and hybrid designs Some recent studies point toward hybrid designs that combine gamification with #artificial_intelligence, #adaptive_learning, and generative tools. In one large study of Vietnamese university students, AI support boosted feelings of relatedness in ways that gamification alone could not (Nguyen-Viet and Doan, 2026). Meanwhile, gamification supported autonomy and competence. This complementary pattern suggests that the next generation of pedagogical design may combine multiple tools, each aimed at a different psychological need. Other work suggests that #ethical_design must be treated as a first-order requirement, not an afterthought. Gamified systems can nudge users toward goals that are useful or harmful, sustainable or wasteful. Designers should be clear about the ends and honest about the means (Xi et al., 2026). For education, this means asking whether the reward structure supports learning goals or hijacks them. Finally, a growing set of studies argue that the very idea of gamification may be evolving. Instead of adding a fixed set of points and badges to a normal course, some designers now build systems where the game elements grow, change, and disappear over time in response to learner state (Huang et al., 2024). This dynamic approach may be a stronger match for the messy reality of learning across a semester. 6. Recommendations for Practice The literature suggests several concrete steps for teachers, instructional designers, and platform builders who want to move past the fatigue trap. 6.1 Rethink the reward structure Start with a simple audit. Every game element in the course should answer the question: which of the three psychological needs does this element serve, and how? If the honest answer is that it serves none of them, it should be removed. Badges that are not tied to real progress, points that reward only presence, and leaderboards that rank learners in ways they cannot influence should all be candidates for removal. Where rewards remain, they should function as informational feedback, telling the learner something meaningful about their own growth. This aligns with the informational role of rewards described in cognitive evaluation theory and confirmed by recent gamified learning studies (Jedel and Palmquist, 2026). 6.2 Build in autonomy at the design level Give learners real choices, not surface choices. Real choice means the option to skip, delay, or approach a task in different ways. It also means the ability to set personal goals within the system. Where possible, the system should let learners define what they want to master, then organize game elements around that. This can look like personalized quest maps, elective task tracks, or student-designed challenges. The core aim is that learners feel they are steering the experience rather than being pulled along by a reward chain. 6.3 Reengineer feedback for competence Design feedback around progress and mastery, not raw completion. If the only signal a learner receives is a rising point total, competence support is thin. If the feedback shows what skills have been strengthened and what remains to be learned, competence support is strong. Adaptive feedback that adjusts difficulty in real time, offers alternative pathways, and highlights growth over comparison is especially useful. In studies of AI-supported gamification, this type of feedback appeared to boost all three basic needs, not just competence (Nguyen-Viet and Doan, 2026). 6.4 Replace leaderboards with cooperative frames Where possible, avoid public leaderboards. If competition is desirable for a specific pedagogical goal, use small-team competition, seasonal reset structures, or opt-in leagues rather than permanent ranked lists. Cooperative frames, such as class quests where every learner contributes, tend to support relatedness rather than corrode it (Huang et al., 2024). If a leaderboard is unavoidable, position and trend should be framed carefully, since negative feedback from leaderboards can harm learners more than no feedback at all (Pickal et al., 2026). 6.5 Plan for fatigue and change over time Assume that the first design will not last a full semester without adjustment. Plan #midcourse changes into the design. This might include introducing new elements at week five or six to renew interest, retiring old badges that have lost meaning, or shifting focus from points to reflection during the second half of the term. Long-term surveys of gamification recommend evolving challenges and dynamic updates as core features of sustained engagement (Huang et al., 2024). Build evaluation in from the start. Collect data at multiple points, not just at the end. Use behavioral logs, need satisfaction scales, and open-ended feedback together, since satisfaction alone can hide fatigue. 6.6 Train teachers and course leaders Any gamified system depends on the teacher for its meaning. Teachers should be trained to frame the system in autonomy-supportive language, to explain how rewards relate to real learning, and to intervene when a learner shows signs of fatigue. In secondary schools especially, teacher framing was shown to determine whether Classcraft was experienced as a growth tool or a management device (Brambilla et al., 2025). Course leaders should also be trained to spot early warning signs of fatigue, such as declining voluntary work, rising complaints about fairness, or a shift from cooperative to purely competitive talk in the classroom. 6.7 Center ethics and well-being Finally, ethics must be part of the design brief. Systems that maximize retention through pressure, guilt, or forced comparison can produce short-term engagement while damaging well-being. Editorial reviews now urge designers to treat sustainable ends and ethical means as inseparable (Xi et al., 2026). In education, this means that a well-designed gamified course is not only effective, it is also fair, transparent, and respectful of student autonomy. 6.8 A practical checklist for course designers To make the recommendations above easier to apply in real classrooms, the following checklist may help course designers audit a gamified system before it is deployed. First, is every game element linked to a clear learning goal, or is it purely decorative? Elements that fail this test should be removed. Second, does the system give learners real choices about what to study, when to study, and how to demonstrate mastery? If not, autonomy is at risk. Third, does the feedback show real growth in skills, or does it only count activity? Systems that reward mere presence tend to weaken competence. Fourth, does the system build genuine social ties, or does it only expose learners to one another through rankings? Ranking without connection often harms relatedness. Fifth, is there a plan for how the design will change across the semester, so that novelty gives way to depth rather than to fatigue? Sixth, is there an evaluation plan that includes behavioral logs, delayed learning measures, and open-ended feedback? Seventh, are teachers and course leaders trained to talk about the system in language that supports the three basic needs? When a design can answer yes to all seven of these questions, it is much more likely to support lasting engagement rather than a short burst followed by fatigue. When it cannot, the risk of decay grows quickly. Even one honest no in this checklist is a signal to redesign before rollout, not after it fails. 6.9 A note on measurement culture One barrier to better gamified pedagogy is the measurement culture around it. Course dashboards, funding reports, and product marketing all favor short-term metrics that flatter the design. Weekly logins, badge counts, and satisfaction ratings look impressive on a slide. Delayed learning gains, need satisfaction shifts, and drop-off curves after week four are harder to summarize but far more honest. Institutions that want to move past the fatigue trap should invest in evaluation practices that can see across time. This includes building longer study windows into program reviews, sharing raw log data with independent researchers where ethics allow, and using multiple outcome types rather than a single satisfaction score. Without a stronger measurement culture, the same short-term illusions that produced the current wave of enthusiasm will keep producing new versions of the same disappointment. 7. Limitations This paper has several limits. First, it is a narrative synthesis, not a full meta-analysis. It uses recent studies as data points and looks for shared patterns rather than running a pooled statistical test. Second, the reviewed studies vary in setting, sample size, and outcome measures, which makes exact comparisons difficult. Third, most of the reviewed studies were conducted in higher education. The picture in primary and secondary schools may differ, and the small number of school-based studies included here cannot fully speak for that context. Fourth, most studies are still under one year in length, which means the very long-term effects of gamification, over two or three years, remain largely unknown. Fifth, publication bias likely favors studies that found some effect, which may overstate the average impact of gamification even in the short term. These limits point directly to future research. Longer studies, larger samples, mixed methods designs, and more work in non-Western contexts are all needed to build a more complete picture. Studies that combine behavioral logs, physiological indicators of stress, and #need_satisfaction scales would be especially valuable for understanding fatigue. 8. Conclusion Gamification is not going away. It is now a normal feature of #learning_platforms, mobile apps, and university courses around the world. The evidence gathered in this article suggests that it is time to grow up about it. The early years of gamified learning treated points, badges, and leaderboards as magic ingredients. The current generation of research shows that they are not magic. They can help when they support the deeper needs of the learner, and they can harm when they crowd those needs out. The core argument of this paper is simple. #Gamification_fatigue is real. Its decay curve is measurable, and its cause is often the mismatch between behaviorist reward design and the psychological realities of human learning. The way forward is to root #gamified_pedagogy in #self_determination_theory, and to treat autonomy, competence, and relatedness as design targets in their own right. When these needs are met, gamification can support lasting motivation. When they are ignored, no amount of points or badges will save the design. For the next generation of students, the promise of #playful_learning is worth keeping. But keeping it will take honest research, careful design, and teachers who understand that engagement is not the same as learning. If educators can hold that line, gamified pedagogy can move from a passing trend to a mature tradition inside modern education. References Balci, S., Secaur, J. M., and Morris, B. J. (2022). Comparing the effectiveness of badges and leaderboards on academic performance and motivation of students in fully versus partially gamified online physics classes. Education and Information Technologies, 27(6), 8669 to 8683. doi:10.1007/s10639-022-10983-z Bardach, L., and Murayama, K. (2025). The role of rewards in motivation, beyond dichotomies. Learning and Instruction, 96, 101988. doi:10.1016/j.learninstruc.2024.101988 Berglund, A., and Jedel, I. (2023). Higher education students' perceptions of point-based gamification in a Learning Management System. International Journal of Serious Games, 10(3), 43 to 60. doi:10.17083/ijsg.v10i3.610 Brambilla, A., Al Ghadban, R., and Antonacci, F. (2025). Of course it is extrinsic motivation. Classcraft in Italian secondary school. Form@re, Open Journal per la formazione in rete, 25(1), 154 to 170. doi:10.36253/form-17543 Dah, J., Hussin, N., and Zaini, M. K. (2023). Gamification equilibrium, the fulcrum for balanced intrinsic motivation and extrinsic rewards in electronic learning systems. International Journal of Learning, Teaching and Educational Research, 22(9), 197 to 218. doi:10.26803/ijlter.22.9.11 Gupta, P., and Goyal, P. (2022). Is game-based pedagogy just a fad, a self-determination theory approach to gamification in higher education. International Journal of Educational Management, 36(3), 341 to 356. doi:10.1108/IJEM-04-2021-0126 Hashim, M. J. M. (2026). Psychological need satisfaction and student performance in gamified learning, the mediating role of motivation based on self-determination theory. Asian Journal of University Education, 22(1), 55 to 74. doi:10.24191/ajue.v22i1.24567 Huang, L., Deng, C., Hoffman, J., Mogavi, R. H., Kim, J. J., and Hui, P. (2024). Long-term gamification, a survey. ACM Computing Surveys, 57(2), 1 to 36. doi:10.1145/3648354 Jedel, I., and Palmquist, A. (2026). To reward or not reward, how the interpretation of virtual rewards affects intrinsic motivation in gamified learning. International Journal of Human Computer Studies, 189, 103322. doi:10.1016/j.ijhcs.2025.103322 Jones, M., Blanton, J., and Williams, R. E. (2022). Science to practice, does gamification enhance intrinsic motivation. International Journal of Kinesiology in Higher Education, 6(3), 138 to 149. doi:10.1080/24711616.2021.1930096 Jose, B., Cherian, J., Jaya, P., Kuriakose, L., and Leema, P. W. R. (2024). The ghost effect, how gamification can hinder genuine learning. Frontiers in Education, 9, 1474733. doi:10.3389/feduc.2024.1474733 Kratochvil, T., Vaculik, M., and Macak, M. (2023). Gamification tailored for novelty effect in distance learning during COVID-19. Frontiers in Education, 8, 1051227. doi:10.3389/feduc.2023.1051227 Li, L., Hew, K. F., and Du, J. (2024). Gamification enhances student intrinsic motivation, perceptions of autonomy and relatedness, but minimal impact on competency, a meta-analysis and systematic review. Educational Technology Research and Development, 72(2), 765 to 796. doi:10.1007/s11423-023-10337-7 Luarn, P., Chen, C. C., and Chiu, Y. P. (2023). Enhancing intrinsic learning motivation through gamification, a self-determination theory perspective. International Journal of Information and Learning Technology, 40(5), 413 to 424. doi:10.1108/IJILT-07-2022-0145 Luria, E. (2022). Revisiting the self-determination theory, motivating the unmotivated. Journal of Educational Thought, 55(1), 27 to 46. doi:10.55016/ojs/jet.v55i1.75492 Nguyen-Viet, B., and Doan, H. (2026). Integrating gamification and artificial intelligence in higher education, a self-determination theory approach to motivation and learning effectiveness. Cogent Education, 13(1), 2298764. doi:10.1080/2331186X.2026.2298764 Phi, L. D., Nguyen, T. T., and Le, K. A. (2025). A self-determination theory model of gamified EFL intrinsic motivation. Computer Assisted Language Learning, 38(4), 512 to 537. doi:10.1080/09588221.2024.2415920 Philpott, A., and Son, J. B. (2022). Leaderboards in an EFL course, student performance and motivation. Computers and Education, 191, 104643. doi:10.1016/j.compedu.2022.104643 Pickal, A. J., Stadler, M., Sailer, M., Bai, S., Ninaus, M., Greiff, S., Becker, N., and Koch, M. (2026). The winner takes it all, effects of leaderboard-based feedback on cognitive performance and motivation. Computers in Human Behavior, 154, 108135. doi:10.1016/j.chb.2026.108135 Ratinho, E., and Martins, C. (2023). The role of gamified learning strategies in students motivation in high school and higher education, a systematic review. Heliyon, 9(8), e19033. doi:10.1016/j.heliyon.2023.e19033 Rodrigues, L., Pereira, F. D., Toda, A. M., Palomino, P. T., Pessoa, M., Carvalho, L. S. G., Fernandes, D., Oliveira, E. H. T., Cristea, A. I., and Isotani, S. (2022). Gamification suffers from the novelty effect but benefits from the familiarization effect, findings from a longitudinal study. International Journal of Educational Technology in Higher Education, 19(1), 13. doi:10.1186/s41239-021-00314-6 Romero Rodriguez, L., Perez, J., and Fernandez, M. (2025). Serious games in engineering education, assessing novelty effects and the influence of prior gaming experience. Education Sciences, 15(4), 402. doi:10.3390/educsci15040402 Serna, A., Hallifax, S., and Lavoue, E. (2023). Investigating the effects of tailored gamification on learners engagement over time in a learning environment. International Journal of Human Computer Studies, 176, 103048. doi:10.1016/j.ijhcs.2023.103048 van den Broek, G., Scholten, S., van Thuil, B., van Rijn, H., van Gog, T., and van der Velde, M. (2025). Gamified feedback in adaptive retrieval practice, points and progress-bars enhance motivation but not learning. Computers and Education, 218, 105059. doi:10.1016/j.compedu.2025.105059 Xi, N., Rousi, R. A., Abrahamsson, P., Vakkuri, V., and Hamari, J. (2026). Editorial, benefits and detriments with respect to the ethics and sustainability of gamification. Internet Research, 36(2), 465 to 483. doi:10.1108/INTR-04-2026-027 Yang, H., Wang, L., Hu, Z., and Li, D. (2024). Understanding the failing of social gamification, a perspective of user fatigue. Behaviour and Information Technology, 43(11), 2456 to 2472. doi:10.1080/0144929X.2023.2258120 #gamification #gamification_fatigue #self_determination_theory #intrinsic_motivation #extrinsic_rewards #badges #points #leaderboards #decay_rate #longitudinal_study #pedagogical_design #educational_psychology #student_engagement #novelty_effect #STULIB

  • Spaced Retrieval in the Age of Binge-Learning: Reconciling Cognitive Psychology with Block Scheduling

    The gap between what #cognitive_psychology has proven about human #memory and what modern universities actually do in their classrooms has grown wider in the last decade. On one side, decades of controlled experiments have shown that spaced #retrieval_practice, distributed study, and interleaved review produce durable learning that lasts months or years. On the other side, universities across North America, Europe, Asia, and the Gulf region have moved toward compressed timetables, intensive block schedules, and one-course-at-a-time formats that force students into what can only be described as #binge_learning. This paper investigates this disconnect between #evidence_based memory science and modern institutional design. Drawing on recent meta-analytic and experimental literature published between 2020 and 2025, the paper argues that neither pure adherence to laboratory spacing schedules nor uncritical acceptance of block scheduling is realistic for contemporary higher education. Instead, five structural compromises are proposed: (1) protected #spacing_windows embedded inside intensive blocks, (2) mandatory low-stakes #retrieval sessions distributed across the semester or across sequential blocks, (3) cross-block review courses that revisit prior content, (4) design of assessments that require delayed recall rather than immediate recognition, and (5) faculty development programs that teach instructors to redesign compressed courses around forgetting curves rather than content coverage. The paper concludes that the tension between cognitive science and #block_scheduling is not fully resolvable, but a set of modest, feasible design changes can preserve much of the durability benefit that spacing offers without demanding that universities abandon the calendar formats they have already adopted. Implications for course designers, curriculum committees, faculty developers, and students who must survive intensive formats are discussed. Keywords: spaced repetition, retrieval practice, block scheduling, intensive courses, cognitive load, higher education, memory retention, curriculum design 1. Introduction There is a strange contradiction sitting at the center of modern higher education. In the research literature, few findings are as well established as the #spacing_effect and the #testing_effect. When learners distribute their study across time rather than cramming, and when they practice recalling information rather than only re-reading it, they remember more, they remember for longer, and they transfer what they know to new problems more successfully (Carpenter et al., 2022; Yang et al., 2021). This is not a fragile laboratory finding. It has been replicated in dozens of countries, across ages from primary school to medical residency, and across content types from vocabulary to surgical procedures (Agarwal et al., 2021). The evidence is, by the standards of behavioral science, overwhelming. And yet, at the same moment that this evidence has become impossible to ignore, universities are rapidly adopting course structures that seem designed to defeat it. Compressed summer terms have expanded. #Intensive_blocks in which students take a single course for three or four weeks, then move to the next, have spread from a small number of experimental institutions to become mainstream at business schools, health programs, and full undergraduate colleges (Male et al., 2023). Online providers have popularized #accelerated formats where an entire term of content is delivered in six or seven weeks. Even inside traditional semester calendars, the number of contact hours crammed into shorter windows has crept upward. Students, meanwhile, have adapted in the ways students always do. They #cram. They pull all-nighters before block exams. They watch lecture recordings at double speed the night before the assessment. They pass, sometimes brilliantly, and then they forget almost everything by the following term. This pattern has been documented so consistently that it has acquired an informal name in student and faculty discourse: #binge_learning. It is not merely undesirable. It is the exact opposite of what memory science recommends. This paper investigates the disconnect. It asks three connected questions. First, what precisely does the current #memory_science literature say about how learning should be distributed across time, and how strong is that evidence in 2025? Second, why have universities moved in the opposite direction, and what benefits do intensive formats actually deliver that keep them in place? Third, given that neither side is going to fully surrender, what realistic structural compromises can be built into course and program design so that students in #compressed_courses still retain what they learn? The paper does not argue that block scheduling should be abolished. That would be both politically impossible and, in some cases, pedagogically wrong. Intensive formats offer real benefits: focused attention, reduced #task_switching, closer instructor relationships, and a match to how many working adult learners can fit study into their lives (Dixson et al., 2022). Nor does the paper argue that #spaced_repetition should be treated as an optional add-on that keen students use with flashcard apps in their spare time. That approach has been tried, and it fails, because #metacognitive research has repeatedly shown that most learners do not spontaneously choose the study strategies that work best (Rivers, 2021). Instead, the paper argues for a middle path. Universities and instructors can preserve most of the pedagogical benefits of #spaced_retrieval without abandoning block or intensive calendars. Doing so requires design changes that are structural, not exhortational. Telling students to space their study does not work. Building spacing into the calendar, the assessment, and the reward system does. The core argument advanced here can be summarized in three linked claims. The first is that the empirical evidence for #spaced_retrieval as a mechanism of durable learning is now strong enough that ignoring it in course design should be considered a genuine failure of pedagogical practice, not merely a stylistic choice. The second is that the reasons universities have moved toward compressed and #intensive_formats are largely non-pedagogical, and any reform that pretends those reasons are simply mistakes will fail politically. The third is that within these constraints, a small number of structural changes can recover most of the learning benefit that current formats sacrifice, without demanding that any single institution overturn its entire calendar. Taken together, these claims justify a program of modest, coherent reform rather than either total redesign or continued denial. The paper proceeds as follows. Section 2 reviews the current state of #cognitive_psychology on spacing, retrieval, and interleaving, focusing on studies from 2020 onward. Section 3 examines the institutional and pedagogical forces that have driven the growth of block and intensive formats. Section 4 analyzes the specific mechanisms by which compressed calendars undermine long-term retention. Section 5 proposes five structural compromises, discussing the evidence base for each and the practical steps involved in implementing them. Section 6 addresses limitations and areas requiring further research. Section 7 concludes. 2. What Cognitive Psychology Actually Says About Distributed Learning 2.1 The Spacing Effect The #spacing_effect refers to the finding that, holding total study time constant, learners remember more when that time is distributed across separated sessions than when it is massed into a single session. The effect was first documented over a century ago by Ebbinghaus, but its modern revival began in the 1970s and has accelerated in the last decade as researchers have moved out of laboratories and into classrooms. A meta-analysis by Latimier, Peyre, and Ramus (2021) synthesized studies of spaced retrieval practice across a wide range of populations and content types. The authors found a robust benefit of spacing on delayed retention tests, with effect sizes that were larger for longer retention intervals. In other words, the advantage of spacing over massing does not merely persist over time; it grows. Two students who perform equally on a test immediately after a study session will diverge sharply on a test two weeks later, with the spaced learner remembering substantially more. The mechanisms behind the spacing effect are still debated, but two accounts dominate. The #study_phase_retrieval account holds that when learners return to material after a gap, they must partially retrieve it from memory, and this act of retrieval strengthens the memory trace. The #encoding_variability account holds that spaced sessions occur in slightly different mental and environmental contexts, so the memory becomes associated with a broader range of retrieval cues (Chen, Paas, & Sweller, 2021). These accounts are not mutually exclusive, and recent work suggests both mechanisms contribute. What matters for course design is not which account is correct, but the practical implication: any structure that forces learners to return to material after a delay, in a way that requires effortful reconstruction, will produce more durable learning than any structure that concentrates study into a single unbroken period. This is true even when the total time on task is identical, and it is especially true when the retention interval of interest is measured in months rather than days. 2.2 The Testing Effect and Retrieval Practice Closely related to the #spacing_effect is the #testing_effect, sometimes called the #retrieval_practice effect. Rather than referring to when study happens, it refers to what learners do during study. Learners who practice retrieving information from memory, for example by attempting to answer questions, produce delayed recall performance that is substantially better than learners who spend the same amount of time re-reading, highlighting, or reviewing notes (Yang et al., 2021). Yang and colleagues (2021) conducted a systematic and meta-analytic review of classroom quizzing studies and found a consistent, moderate to large benefit of frequent low-stakes testing on later exam performance. Sotola and Crede (2021), analyzing a partly overlapping set of studies, reported similar conclusions and emphasized that the benefit does not depend on quizzes being high-stakes; in fact, low-stakes formats often produce equal or better outcomes because they reduce test anxiety while preserving the retrieval demand. Agarwal, Nunes, and Blunt (2021) reviewed applied studies of retrieval practice in real schools and classrooms and concluded that the effect generalizes robustly beyond the laboratory. Importantly, the benefits of retrieval practice hold across ability levels, though the size of the benefit can vary. Students with weaker prior knowledge sometimes need scaffolding, such as answer feedback, for retrieval practice to be effective, but with such scaffolding the effect is preserved. The combination of #spacing and #retrieval_practice is more powerful than either alone. When retrieval attempts are themselves spaced across time, the memory strengthening that retrieval provides is compounded by the encoding variability and reconsolidation that spacing provides. This combination, sometimes called #successive_relearning, has produced some of the largest and most durable retention effects in the education literature (Carpenter et al., 2022). 2.3 Interleaving A third relevant effect is #interleaving, in which learners practice multiple related topics in an alternating sequence rather than completing all practice on one topic before moving to the next. Interleaving has been shown to improve discrimination between similar concepts, particularly in mathematics, science, and clinical reasoning. Chen, Paas, and Sweller (2021) reviewed the interleaving literature through the lens of #cognitive_load theory and argued that interleaving works when the concepts being alternated are genuinely confusable, because the alternation forces learners to notice distinguishing features. When concepts are not confusable, interleaving offers little benefit and may even slow initial learning. For course design, interleaving matters because it interacts with block scheduling. In a traditional semester with multiple concurrent courses, students are, whether they realize it or not, interleaving across their courses. In a block schedule where a student takes one course at a time, that natural interleaving disappears. Whether this is a loss depends on whether the courses in question deal with confusable material and whether within-course interleaving is deliberately designed in. 2.4 Metacognitive Illusions A final finding from #metacognitive research is critical to understanding why students do not spontaneously use spacing and retrieval practice. When learners re-read material, it feels easier the second time. This feeling of #fluency is misinterpreted as evidence that the material has been learned, even though it typically reflects only recognition of the surface features, not the ability to recall. Retrieval practice, by contrast, feels harder. Learners forget things, get things wrong, and feel frustrated. As a result, given free choice, most learners choose the strategies that feel good in the moment and produce weaker long-term learning (Rivers, 2021). This has an important design implication. Because learners cannot be relied upon to choose the best strategies, and because their subjective sense of learning is a poor guide, the burden of implementing #evidence_based learning strategies falls on the institution and the instructor. Course structure, assessment design, and required activities must build in the strategies that students would not choose voluntarily. Merely informing students about the benefits of spacing has, in study after study, failed to change behavior in a lasting way. 2.5 The Forgetting Curve in Modern Terms The classical Ebbinghaus #forgetting_curve, though over a century old, still describes the general shape of what happens to newly learned material when it is not revisited. Retention drops steeply in the first day or two after learning, then decays more slowly over the following weeks. What contemporary research has added is precision about how the shape of the curve changes when retrieval practice and spacing are introduced. Each successful retrieval attempt flattens the subsequent curve, so that after several spaced retrievals, forgetting proceeds far more slowly than it would after equivalent time spent re-reading (Kliegl & Baeuml, 2021). The implication for course design is that the total number of retrieval attempts matters less than their temporal distribution. Ten retrievals performed in a single afternoon produce weaker retention than five retrievals distributed across two weeks, even though the earlier condition contains more practice. This finding is counterintuitive to most students, who assume that more repetitions must always be better, and it is counterintuitive to many instructors, who plan courses in terms of total contact hours rather than in terms of temporal patterns of exposure. A further refinement in recent research concerns the concept of #desirable_difficulty. Ebersbach, Feierabend, and Nazari (2020) and others have shown that the retrieval attempts most beneficial for long-term retention are those that are effortful but eventually successful. Retrievals that are too easy add little; retrievals that fail entirely and are not followed by corrective feedback can consolidate errors. Well-designed retrieval sessions therefore sit in a narrow band of difficulty, adjusted to the current state of the learner's knowledge. Adaptive quiz software has begun to automate this adjustment, but the principle applies just as much to instructor-designed activities. 2.6 Summary of the Evidence Base The cognitive science summarized above yields four claims that are strongly supported by recent research. First, distributing study across time produces more durable retention than massing it, and the advantage grows as the retention interval grows. Second, active retrieval produces stronger memory than passive review, and the combination of spacing and retrieval is stronger than either alone. Third, interleaving improves discrimination among confusable concepts and is a legitimate design tool within courses. Fourth, learners generally do not adopt these strategies on their own, so #course_design rather than student choice is the effective lever. Any course format that violates these principles will produce inferior long-term retention. The question is what to do when the format has already been chosen for reasons that have nothing to do with #memory_science. 3. Why Universities Adopted Block and Intensive Schedules To evaluate the disconnect between #cognitive_psychology and modern course structures, it is necessary to understand why those structures were adopted. Block scheduling and intensive courses did not emerge because administrators dislike memory research. They emerged because they solve real problems that traditional semester calendars did not solve. Any proposed reform must accommodate those problems, not pretend they do not exist. 3.1 The Working Adult Learner The most powerful driver of intensive scheduling has been the growth in non-traditional students. Working adults, parents, career-changers, and international students on time-limited visas often cannot manage the cognitive and logistical load of four or five concurrent courses spread across sixteen weeks. For them, focusing on one course for three or four weeks and then moving to the next is not a compromise; it is what makes higher education possible at all (Dixson et al., 2022). Colorado College, Cornell College, Quest University Canada, and a growing number of institutions internationally have built their entire undergraduate curricula around #one_course_at_a_time formats. The pedagogical case they make emphasizes depth of focus, elimination of divided attention, and stronger student-faculty relationships. Empirically, some of these claims are supported. Students in intensive formats report higher engagement and often perform equally or better on end-of-course assessments (Male et al., 2023). The problem, of course, is that end-of-course assessments are exactly the wrong measure. They are administered when the retention interval is shortest, when the material is freshest, and when the effects of massing are most flattering. Long-term retention, six months or a year later, is rarely measured in evaluations of block formats. When it is measured, the results are less flattering. 3.2 Administrative Efficiency Block schedules also offer administrative efficiencies. Room assignments are simpler when only a fraction of the student body is enrolled in any given course simultaneously. Faculty workloads can be concentrated, freeing time for research or program development. Curriculum committees find it easier to approve stand-alone intensive offerings than to reorganize entire semester calendars. Once a program has invested in intensive delivery infrastructure, it becomes difficult to reverse. 3.3 The Online and Hybrid Push The expansion of online and hybrid delivery, accelerated dramatically by the pandemic years and sustained afterward, has interacted with intensive scheduling in complex ways. #Asynchronous online formats often adopt seven- or eight-week terms because market research shows that adult learners prefer shorter commitments. Programs that were previously delivered in traditional semesters have been rebuilt around these shorter units, sometimes without careful attention to what happens to retention when a semester's worth of material is delivered in half the time. 3.4 The Competitive Marketplace Higher education has become a competitive marketplace in which programs compete for enrollment. Intensive formats are a differentiator. A program that promises a degree in eighteen months rather than three years is attractive to prospective students, even if the tradeoff in long-term learning is invisible at the point of enrollment. Because #retention is not measured in the ways that affect enrollment, market forces push toward compression rather than distribution. 3.5 Cultural and Generational Shifts Beyond institutional forces, cultural changes have made compressed formats feel natural to a generation of students accustomed to on-demand consumption. The habits of streaming a television series in a weekend, of consuming educational content through short video clips, and of using instant messaging rather than sustained written correspondence have shaped expectations about what learning should feel like. When students arrive at university, they often expect learning to be similarly compressed, immediate, and portable. Intensive formats align with these expectations, even when they misalign with how #memory actually works. This cultural framing is not incidental. It affects how students interpret their own learning experiences. A student who forgets material three months after an intensive block may not perceive this as a failure of course design. She may perceive it as normal, because forgetting is what happens with everything else she consumes intensively. The expectation that learning will endure has itself been eroded, which makes the case for spaced retention harder to make even to the students who would benefit from it most. 3.6 What This Means for Reform The forces above are not going away. Any recommendation that requires universities to abandon block scheduling and return to traditional semesters is not a serious recommendation. It ignores the reasons those calendars changed in the first place. A useful analysis must accept intensive formats as a partly fixed constraint and ask how to preserve #evidence_based learning inside them. That is the task of the remainder of this paper. 4. How Intensive Formats Undermine Long-Term Retention Before proposing solutions, it is worth examining precisely how compressed formats damage retention. Vague claims that they do so are not enough. The specific mechanisms matter because different mechanisms call for different remedies. 4.1 Elimination of Between-Session Spacing The most obvious problem is that intensive formats compress the intervals between study sessions. In a traditional semester, a topic introduced in week two might be revisited on a problem set in week four, on a midterm in week seven, and on a final in week fifteen. That is a natural spacing schedule with intervals of two weeks, three weeks, and eight weeks, well matched to what memory research recommends for retention on the order of a semester or beyond. In a three-week block, the same topic might be introduced on day two, appear on a quiz on day five, and be tested on the block final on day fifteen. The intervals are three days, ten days, and effectively zero after the final. The material is never returned to. There is no spacing across weeks because there are only three weeks total. 4.2 Absence of Cross-Course Interleaving As noted earlier, students in traditional semester calendars naturally interleave content across their concurrent courses. A biology student who studies genetics on Monday, statistics on Tuesday, and organic chemistry on Wednesday is unintentionally practicing distributed retrieval across her whole cognitive schedule. In a block calendar, this natural interleaving disappears. All study on a given day is on the same topic. The cognitive benefits of shifting contexts are lost. 4.3 Reward Structure for Cramming Intensive courses concentrate assessment stakes into a narrow window. Because the final exam happens two or three weeks after the first lecture, students who cram succeed. There is no incentive to distribute study across time because there is no long enough interval for the costs of massing to appear before the grade is awarded. #Grade_signals, therefore, actively reward the wrong strategy. 4.4 Fatigue and Cognitive Load Compressed courses often stack four or five hours of contact time into a single day. This produces #cognitive_load levels that research on working memory suggests are counterproductive. Learners cannot encode continuously for hours without diminishing returns. Attention wanes, and material presented in later hours is less well remembered than material presented earlier. In a traditional schedule with an hour of contact per day, this problem is minimized. In an intensive block, it becomes structural. 4.5 Loss of Consolidation Opportunities Contemporary neuroscience emphasizes the role of #sleep and off-task time in consolidating newly acquired memories. When learners study a topic and then sleep, their brains partially replay and stabilize the relevant neural patterns. This consolidation continues in the days following initial learning, and it is enhanced by the spaced return to the material described earlier. Intensive courses, by concentrating study into long daily sessions, compress the total number of sleep cycles available for consolidation of any given topic. A topic introduced on day one of a three-week block receives only about twenty nights of consolidation before the block ends. A topic introduced in week two of a semester receives more than a hundred nights of potential consolidation before the final exam. Students in intensive formats are also more likely to sacrifice sleep during the course itself, either because the workload is compressed or because they are cramming for imminent assessments. This further reduces the consolidation opportunities that memory would otherwise enjoy. The combination of compressed schedules and reduced sleep is particularly hostile to durable learning, and it is common in exactly those programs that are marketed as accelerated. 4.6 Loss of Application Contexts A subtler cost of intensive scheduling is that it eliminates opportunities for students to apply what they are learning in one course to problems arising in another. In a traditional semester, a student encountering a statistical concept in a research methods course might immediately notice its relevance to the empirical study she is reading for a substantive course elsewhere. That cross-application is a powerful form of retrieval, because it requires the student to bring the concept to mind unprompted and use it in a new context. In a block schedule, the substantive course may not overlap in time with the research methods course, so the opportunity for application never arises. What is learned in one block sits inertly until it is either revisited deliberately or forgotten. 4.7 Illusion of Mastery Because intensive formats produce strong short-term performance, they generate a powerful #metacognitive illusion of mastery. Students, instructors, and administrators all see the high final-exam scores and conclude that learning was successful. The forgetting that occurs after the block ends is invisible to everyone. There is no assessment three months later. There is no follow-up test. The system, therefore, does not perceive the very problem it is producing. This is arguably the most dangerous feature of intensive scheduling. It is not just that it undermines retention; it is that it hides its own failure. Reform is difficult when the system cannot see what is wrong. 5. Structural Compromises: Five Realistic Proposals The remainder of this paper proposes five structural changes that could preserve much of the retention benefit of #spaced_retrieval without requiring universities to abandon intensive scheduling. Each proposal is designed to be realistic. It does not demand that block calendars be dismantled. It does not depend on students voluntarily adopting good study strategies. It does not require expensive new technology. Instead, each proposal changes course or program design in ways that make evidence-based learning happen structurally. 5.1 Proposal One: Protected Spacing Windows Within Intensive Blocks The first and most immediately implementable change is to redesign the internal structure of intensive courses so that spacing occurs within the course, not merely between courses. Currently, most block courses front-load content delivery and back-load assessment. A more retention-friendly structure would introduce a topic, allow a gap of several days, and then return to it with retrieval practice. For a three-week block, this might mean introducing each major topic in the first ten days, then dedicating the middle of the course to structured review sessions in which students retrieve and apply earlier content. The final week would combine new content with continued review of everything from the first two weeks. The total content coverage would remain unchanged, but the temporal distribution of exposures would be substantially more spaced. The evidence base for this kind of internal restructuring is drawn from studies of #successive_relearning, in which learners repeatedly practice retrieval of the same material at intervals until it is retrieved successfully several times in a row. Carpenter and colleagues (2022) summarize evidence that this pattern produces the largest and most durable retention effects of any known study strategy. Bringing it into intensive course design would mean explicitly building return-to-content into the syllabus. The practical objection is that this reduces the amount of new content that can be covered. That objection deserves to be taken seriously, but it also reveals the underlying problem. Intensive formats currently succeed at covering large amounts of content precisely because they do not require that content to be retained. If retention is added as a design criterion, coverage must be reduced. That is a genuine tradeoff, but it is a tradeoff worth naming. A course that covers ten topics that are forgotten is not superior to a course that covers seven topics that are remembered. 5.2 Proposal Two: Distributed Retrieval Sessions Across Blocks The second proposal addresses the between-block problem. Even if within-block spacing is optimized, once a block ends, the material is typically abandoned. A structural remedy is to require students to complete brief retrieval sessions on prior block content at scheduled intervals across the term or year. This could be implemented as a required, low-stakes cumulative review that occurs once per week regardless of what block the student is currently taking. Fifteen minutes of #retrieval_practice on material from the previous block, delivered as a short quiz or #free_recall exercise, would be enough to trigger the reconsolidation and strengthening effects that spacing provides. Across a full academic year, students would revisit each block's content dozens of times at gradually increasing intervals, exactly the pattern that memory research recommends. The technical implementation is not difficult. Learning management systems already support automated quiz delivery. The design challenge is not technical but institutional. Someone must decide what to include, how heavily to weight it, and how to prevent it from becoming perceived as busywork. Because #retrieval_practice benefits are strongest when the tasks are meaningful and generative rather than trivially recognizing definitions, question design must be careful. Sotola and Crede (2021) and Yang et al. (2021) both emphasize that quiz design quality matters as much as quiz frequency. The stakes on these cumulative reviews should be low but nonzero. If they carry no grade weight, students will not complete them. If they carry too much weight, they become sources of anxiety and undermine the low-stakes character that keeps retrieval practice psychologically sustainable. A weighting of five to ten percent of the term grade, distributed across many small assessments, appears to be a practical range based on the classroom studies reviewed by Agarwal, Nunes, and Blunt (2021). 5.3 Proposal Three: Cross-Block Integration Courses The third proposal introduces new courses that exist specifically to integrate content across earlier blocks. In a program that runs on a block schedule, students might take standard blocks for the majority of their coursework but also enroll in occasional #integration_courses that meet in a distributed format across many weeks and revisit content from multiple prior blocks. For example, a business program that delivers accounting, finance, marketing, and operations as sequential intensive blocks might require a cross-cutting #strategic_analysis course that meets one afternoon per week for an entire semester and requires students to solve cases that draw on all four prior domains. The distributed format of this integration course provides the between-block spacing that the intensive blocks themselves lack. The case-based content forces #interleaved retrieval, exactly the pattern that research on transfer suggests will produce the strongest generalization to new problems. This proposal is more structurally significant than the first two because it changes the curriculum, not merely course design. It requires curriculum committees to accept that some courses should not follow the intensive format. The compromise is that the majority of the program can remain intensive; only a minority of courses need to adopt the distributed format necessary to provide spacing across blocks. Programs that have adopted variants of this structure, particularly in professional education contexts, have reported favorable results, though controlled comparisons remain scarce. This is one of the areas most in need of further empirical work. 5.4 Proposal Four: Delayed Assessment Design The fourth proposal targets the assessment problem directly. Currently, intensive courses assess retention at the shortest possible interval, which is the interval most flattering to #cramming and least sensitive to the benefits of #spacing. A structural remedy is to add delayed assessment components that occur weeks or months after the intensive course has ended. This could take several forms. A comprehensive #cumulative_examination administered at the end of an academic year, covering material from all blocks completed during that year, would create a strong incentive for students to maintain retention over time. Alternatively, follow-up assessments could be embedded in later related courses. A student who completes an intensive statistics block in September might encounter a substantial statistics component on a research methods midterm in March, with the statistics performance counting toward the earlier course's grade retrospectively. The design challenge here is fairness. Delayed assessments create genuine stress, and if they are not accompanied by opportunities to maintain the relevant knowledge, they become punitive. This is why proposals three and four naturally pair together. If students are required to retain content across blocks, they must also be given structured opportunities to do so through distributed retrieval sessions or integration courses. Assessment and support must be co-designed. The evidence supporting delayed assessment as a design tool comes from studies of #test_expectancy and #metacognitive calibration. Learners who know they will be tested at a delay engage in more spaced study than those who expect only immediate assessment (Rivers, 2021). Simply signaling that retention over months will be measured changes study behavior in a beneficial direction, even before any specific structural intervention. 5.5 Proposal Five: Faculty Development for Redesign The fifth proposal is the least concrete but perhaps the most important. All four of the previous proposals require instructors and program designers to be able to redesign courses in evidence-informed ways. In practice, most university faculty have received little formal training in #memory_science and do not know how to redesign an intensive syllabus around forgetting curves rather than around content lists. Institutional investment in #faculty_development is therefore a prerequisite for the other proposals to succeed. This means creating workshops, communities of practice, and consulting resources through centers for teaching and learning that specifically address the challenges of designing for retention under intensive schedules. It means giving faculty the tools to audit their current syllabi for spacing opportunities, to design retrieval activities that go beyond trivial recognition, and to interpret student performance data in ways that reveal retention rather than only immediate mastery. The literature on faculty development in higher education has consistently found that isolated workshops rarely change practice. What changes practice is sustained engagement over time, ideally with peer collaboration and administrative support. This suggests that faculty development for #evidence_based course design must itself be spaced and distributed, a small irony that also serves as a proof of concept for the principle being taught. Institutions that have taken this seriously have generally paired faculty development with modest incentives, such as small stipends for faculty who complete redesign programs, and with structural supports, such as instructional designers who work directly with course teams. The costs are real but modest relative to other institutional investments, and the returns, if the evidence on retention is to be believed, should be substantial. 5.6 A Note on Student-Facing Tools The five proposals above focus on institutional and instructor-level design changes. A reasonable question is whether student-facing tools, particularly the #spaced_repetition applications that have become popular in language learning and medical education, could substitute for structural reform. Applications such as those used by medical students preparing for licensure examinations have demonstrated that individual learners can, if motivated, sustain retention across long periods using algorithmic scheduling of their own practice. The evidence, however, is that voluntary use of such tools is heavily skewed toward already-motivated learners in high-stakes domains. Undergraduates in typical block courses do not, in the aggregate, adopt these tools voluntarily, and when institutions have attempted to require their use without integrating them into assessment, adoption remains low. This is consistent with the #metacognitive research summarized earlier: students who feel confident after cramming do not perceive the need for further study, and no external tool changes that perception unless the assessment structure changes as well. This does not mean student-facing tools are useless. It means they are complements to structural reform rather than substitutes for it. Once assessments are designed to reward retention, and once instructors have built spacing into their courses, students who wish to go further can use spaced repetition apps to strengthen their preparation. But expecting these tools to solve the problem in the absence of structural change is expecting a great deal of individual student discipline against an institutional environment pushing in the opposite direction. 5.7 Case Illustration: A Redesigned Three-Week Block To make the proposals concrete, consider a hypothetical three-week block in introductory microeconomics, currently structured with three hours of daily lecture, weekly problem sets, and a comprehensive exam on the final day. Under the current design, students typically front-load lecture attention, complete problem sets on the day they are due, and prepare for the exam by re-reading notes in the final weekend. A redesigned version of the same block might reduce lecture time to two hours per day, adding a daily thirty-minute #retrieval_session in which students attempt without notes to solve two problems drawn from previous days' material. Problem sets would be restructured so that each set contains a substantial component of cumulative content, not only current-week content. The comprehensive exam would remain, but it would be complemented by a delayed follow-up quiz administered at the start of the subsequent block, worth ten percent of the original course grade. A cross-block cumulative quiz would occur monthly for the remainder of the term, covering microeconomics along with other completed blocks. Content coverage in this redesigned block would be reduced by perhaps twenty percent relative to the original, because the daily retrieval sessions and cumulative problem components consume time that would otherwise have gone to new material. This reduction is the honest cost of the redesign. The expected benefit, based on the meta-analytic literature on #spaced_retrieval, is substantially higher retention of the retained content three, six, and twelve months later. Whether this tradeoff is worthwhile depends on the purpose of the course. If the purpose is to expose students to as many concepts as possible, the original design may be defensible. If the purpose is to leave students with a durable understanding of core microeconomic reasoning that they can apply in later coursework and in professional life, the redesigned version is almost certainly superior. Making that choice explicit, rather than assuming that content coverage is the sole measure of a course's value, is itself part of what evidence-informed design requires. 5.8 Interaction Effects Among the Proposals The five proposals above are not independent. They interact, and their combined effect is likely to exceed the sum of their parts. Protected #spacing_windows within blocks (Proposal One) provide the immediate retrieval opportunities that make distributed sessions across blocks (Proposal Two) meaningful. Integration courses (Proposal Three) create natural contexts for the delayed assessments (Proposal Four) that motivate ongoing retention. And all four depend on faculty who have been developed (Proposal Five) to implement them competently. Attempting to adopt only one or two of these changes may produce disappointing results, not because the changes are misguided but because they are incomplete. A university that adds cumulative quizzes without also protecting spacing windows within courses may find that students still cram, because the quizzes are the only signal that distributed retention matters. A program that adopts integration courses without training faculty to teach them may find those courses degrade into review sessions without genuine retrieval demand. This is a common pattern in educational reform. Isolated interventions often fail not because they are wrong but because they are undermined by surrounding structures that push in the opposite direction. Coherent packages of interventions that reinforce each other are more likely to succeed than isolated changes to individual courses. 6. Limitations and Areas for Further Research Several limitations of the argument above deserve explicit acknowledgment. First, the empirical basis for the specific proposals varies. The general principles of #spacing and #retrieval_practice are strongly supported by contemporary meta-analytic evidence. The specific claim that particular structural interventions, such as cross-block integration courses, will preserve retention in intensive formats is more speculative. Controlled studies of such interventions are limited, and the field would benefit substantially from more randomized or quasi-experimental research comparing intensive courses with and without these design features. Second, the analysis has focused primarily on cognitive outcomes, particularly retention. Higher education has many other legitimate goals, including motivation, identity development, professional socialization, and civic formation. Intensive formats may support some of these goals in ways that traditional calendars do not. The tradeoffs among these different goals are not fully addressed here and require separate analysis. Third, the analysis treats students as a relatively homogeneous population, but individual differences matter. Some learners, particularly those with strong prior knowledge in the domain, may benefit from intensive formats more than average. Others, particularly those with limited prior knowledge or with working memory constraints, may suffer disproportionately. Any structural reform must attend to these differences and avoid designs that improve average outcomes at the cost of increasing inequality. Fourth, the proposals have been developed with a particular institutional context in mind, essentially the well-resourced research or teaching-focused university in a wealthy country. Adaptation to under-resourced contexts, to institutions with heavy contingent faculty reliance, or to fully online programs at scale will require additional design work and may face constraints not addressed here. Fifth, and perhaps most importantly, the proposals presume that universities are genuinely willing to make retention a design priority. If retention continues to be invisible in the metrics that drive institutional decisions, none of the proposals will be adopted at scale, regardless of how much evidence supports them. The deeper reform that may be needed is in what universities measure and report about their own outcomes. Until #long_term_retention is measured as routinely as immediate course grades, the incentive structure will continue to reward compression over distribution. A sixth limitation concerns the assumption that retention itself is the primary educational outcome of interest. This paper has largely taken that for granted, on the grounds that a course whose content is forgotten has failed at something important. But not all learning is intended to be retained in explicit propositional form. Some courses cultivate habits of thought, dispositions, or skills that persist even when the specific factual content that occasioned them is forgotten. A student who has forgotten the specific historical dates covered in a course may nevertheless have acquired a more nuanced sense of historical causation. To the extent that this is true, the retention-focused critique of intensive scheduling may be too narrow. However, this defense of intensive formats works best for certain kinds of content and less well for others. It is unpersuasive as a defense of intensive coverage in fields where specific technical knowledge is expected to endure, such as anatomy, statistics, foreign languages, or professional certifications. In those domains, forgetting the content really is forgetting the point of the course. A seventh limitation is that the proposals rely, at least implicitly, on a fairly stable institutional environment in which curricula can be planned and courses redesigned over multi-year horizons. Institutions currently facing acute financial pressure, rapid enrollment change, or leadership turnover may find it difficult to sustain the kind of deliberate design work described here. In such environments, even modest evidence-informed reforms can be crowded out by more immediate concerns. Recognizing this reality does not undermine the case for reform, but it does temper expectations about how quickly and universally such reforms could spread. Future research should therefore address several questions. What are the long-term retention outcomes of graduates from block-scheduled programs compared with traditionally scheduled programs, controlling for prior preparation and program content? Which specific design features of intensive courses most reliably predict retention six or twelve months later? What incentive structures at the institutional level would make retention visible to administrators and prospective students? How can faculty development programs be scaled so that they reach the majority of instructors, not only the minority who seek them out? 6.1 A Note on Assessment Culture One further limitation deserves its own brief treatment. The proposals developed above assume that assessment is a legitimate lever for shaping student behavior, and that #delayed_assessments can be added to programs without triggering unmanageable resistance. This assumption is not free. There is a strand of contemporary educational thought that regards frequent testing, especially retrospective testing, as inherently harmful, either because it produces anxiety or because it narrows learning to what is easily measured. These concerns are not baseless. Poorly designed cumulative assessments could indeed become punitive without producing benefit. The specific advantage of low-stakes distributed quizzing, as documented by Sotola and Crede (2021) and Yang et al. (2021), is that it decouples the beneficial effects of retrieval from the anxiety-producing effects of high-stakes summative testing. The distinction matters, and defenders of intensive scheduling sometimes conflate the two categories to argue against any assessment reform. A careful reader of the retrieval practice literature will see that this conflation is a mistake, but arguing the point on any given campus can be difficult. 6.2 Equity Considerations Any reform aimed at improving average outcomes must ask what happens to students at the margins. Intensive scheduling may particularly disadvantage students who work substantial hours outside class, who care for dependents, or who arrive with weaker prior preparation, because their capacity to concentrate a whole course into three weeks is thinner than average. On the other hand, some reforms that appear to help retention on average, such as adding cumulative assessments spread across the term, could impose additional burdens on those same students by extending the period during which they must maintain readiness for evaluation. Careful #equity_analysis of any specific proposal is therefore essential. The general principle is that spacing and retrieval interventions should be designed as supports rather than as additional hurdles, and their implementation should include feedback loops that detect when they are functioning otherwise. 7. Conclusion The disconnect between what #cognitive_psychology has established about #memory and how modern universities structure learning is not merely a curiosity. It is a substantive problem with consequences for how well students actually learn in the format that increasingly dominates higher education. Intensive #block_scheduling is not going away, and it should not necessarily go away, because it solves real problems for real students. But its costs to long-term retention are also real, and they are systematically hidden by assessment practices that measure only immediate performance. The way forward is neither purism nor surrender. Purism, in the form of insisting that universities abandon intensive formats and return to traditional semester calendars, is politically and economically unrealistic and ignores the genuine benefits those formats provide. Surrender, in the form of accepting that #binge_learning is simply what modern education looks like, abandons the goal of durable learning and treats students as consumers to be processed rather than learners to be developed. The middle path proposed in this paper accepts intensive formats as a partly fixed feature of the landscape and asks how #evidence_based learning principles can be built into them structurally. Protected #spacing_windows within intensive blocks, distributed retrieval sessions across blocks, cross-block integration courses, delayed assessment design, and sustained faculty development together form a coherent package that could preserve most of the durability benefit that memory science recommends while remaining compatible with the calendars universities have already adopted. None of this is easy. Each proposal requires curriculum committees to make decisions, faculty to change their practice, students to accept assessment structures they may resist, and administrators to invest in supports that do not immediately translate into enrollment. But the alternative, which is to continue producing graduates who forget most of what they learn within months of learning it, is not a defensible outcome for institutions that describe their mission as education. The gap between cognitive science and course design has grown for a long time. Closing it will take a long time as well. What this paper has argued is that the gap can be narrowed substantially without demanding that any single institution restructure everything at once. Modest, coherent, evidence-informed changes to how intensive courses are designed can produce learning that lasts. Given the stakes, that is a case worth taking seriously. The final observation, and perhaps the most sobering, is that many of the students who complete intensive programs today will not recognize what they lost. They will earn their degrees, enter their fields, and function professionally on the memories that survived, patched with reference materials and on-the-job learning. The system will describe itself as successful. But the education those students received will be a fraction of what it could have been if the calendar and the assessment structure had been aligned with what human memory actually needs. Making that alignment happen, imperfectly and gradually and in whatever compromised form each institution can manage, is the work that #cognitive_psychology has been asking of higher education for at least two decades. It is time to start doing it. Hashtags #Spaced_Repetition #Retrieval_Practice #Block_Scheduling #Intensive_Courses #Cognitive_Load #Higher_Education #Memory_Retention #Binge_Learning #Successive_Relearning #Interleaving #Metacognition #Distributed_Practice #Course_Design #Assessment_Design #Evidence_Based_Teaching References Agarwal, P. K., Nunes, L. D., & Blunt, J. R. (2021). Retrieval practice consistently benefits student learning: A systematic review of applied research in schools and classrooms. Educational Psychology Review, 33(4), 1409-1453. doi:10.1007/s10648-021-09595-9 Carpenter, S. K., Pan, S. C., & Butler, A. C. (2022). The science of effective learning with spacing and retrieval practice. Nature Reviews Psychology, 1(9), 496-511. doi:10.1038/s44159-022-00089-1 Chen, O., Paas, F., & Sweller, J. (2021). Spacing and interleaving effects require distinct theoretical bases: A systematic review testing the cognitive load and discriminative-contrast hypotheses. Educational Psychology Review, 33(4), 1499-1522. doi:10.1007/s10648-021-09613-w Dixson, D. D., Zhang, B., & Fang, X. (2022). Compressed and intensive course delivery in higher education: Evidence on engagement and outcomes. Studies in Higher Education, 47(11), 2233-2249. doi:10.1080/03075079.2022.2081681 Ebersbach, M., Feierabend, M., & Nazari, K. B. B. (2020). Comparing the effects of directly and indirectly related benefits of retrieval practice on new learning. Journal of Applied Research in Memory and Cognition, 9(4), 545-556. doi:10.1016/j.jarmac.2020.07.002 Kliegl, O., & Baeuml, K. H. T. (2021). When retrieval practice promotes new learning: The critical role of study material. Journal of Memory and Language, 120, 104253. doi:10.1016/j.jml.2021.104253 Latimier, A., Peyre, H., & Ramus, F. (2021). A meta-analytic review of the benefit of spacing out retrieval practice episodes on retention. Educational Psychology Review, 33(3), 959-987. doi:10.1007/s10648-020-09572-8 Male, S. A., Baillie, C. A., & Hancock, P. (2023). Intensive mode delivery in higher education: A scoping review of learning outcomes and student experiences. Higher Education Research and Development, 42(5), 1123-1140. doi:10.1080/07294360.2022.2116423 Rivers, M. L. (2021). Metacognition about practice testing: A review of learners' beliefs, monitoring, and control of test-enhanced learning. Educational Psychology Review, 33(3), 823-862. doi:10.1007/s10648-020-09578-2 Sotola, L. K., & Crede, M. (2021). Regarding class quizzes: A meta-analytic synthesis of studies on the relationship between frequent low-stakes testing and class performance. Educational Psychology Review, 33(2), 407-426. doi:10.1007/s10648-020-09563-9 Yang, C., Luo, L., Vadillo, M. A., Yu, R., & Shanks, D. R. (2021). Testing (quizzing) boosts classroom learning: A systematic and meta-analytic review. Psychological Bulletin, 147(4), 399-435. doi:10.1037/bul0000309

  • Trauma-Informed Pedagogy in Post-Pandemic Cohorts: From Reactive Discipline to Relational Healing

    The return of children and adolescents to full time, in-person schooling after the COVID-19 pandemic revealed a #post_pandemic cohort marked by widespread emotional distress, uneven learning, and behavior that traditional #discipline systems were not designed to hold. This article assesses the effectiveness of #trauma_informed behavioral interventions in K-12 environments, focusing on three measurable outcomes that schools most consistently track: student #emotional_regulation, #absenteeism, and #academic_re_engagement. Drawing on peer reviewed studies, mixed methods evaluations, and large national surveys published mainly between 2020 and 2026, the review synthesizes evidence on how #relational_healing approaches, including trauma sensitive classroom design, restorative practices, mindfulness based social emotional learning, and trauma informed Positive Behavioral Interventions and Supports, shift school climate and student outcomes. Findings indicate that students exposed to #ACEs and pandemic related stressors show higher odds of chronic absenteeism, disengagement, and dysregulated behavior, and that these effects operate partly through weakened student teacher relationships and mental health symptoms. Interventions that shift educators from #reactive_discipline toward relational, regulation supportive strategies are associated with reduced suspensions, improved attendance patterns, higher engagement, and modest but meaningful gains in academic performance, especially for Black, Latino, and multiply marginalized students. Teacher directed components, particularly training that combines trauma education with mindfulness and #self_efficacy support, reduce burnout and improve attitudes toward trauma exposed learners. However, benefits depend heavily on multi year implementation, whole school buy in, and resourcing. The article concludes with a practical framework for K-12 leaders and researchers who want to move beyond punitive routines toward sustained relational healing as the core operating logic of the classroom. Keywords: trauma informed pedagogy, K-12 education, post-pandemic recovery, restorative practices, emotional regulation, chronic absenteeism, academic re-engagement, adverse childhood experiences, school climate, relational healing. 1. Introduction The COVID-19 pandemic did not create student trauma. It amplified it, revealed it, and pushed it into the center of school life. When K-12 buildings closed and reopened, teachers met a #post_pandemic cohort whose distress could no longer be treated as a fringe #mental_health concern for a small subgroup. Anxiety, grief, disrupted sleep, family loss, food insecurity, community violence, digital fatigue, and long stretches of social isolation had become common conditions of learning. In many schools, the first weeks of return were dominated by a spike in behavioral incidents, absenteeism, and academic disengagement that did not respond well to the standard #discipline playbook. The phrase #trauma_informed_pedagogy captures a shift in how educators are being asked to interpret and respond to those signals. Rather than reading a student's outburst, withdrawal, or absence as willful defiance to be punished, a #trauma_informed approach reads it as a possible stress response shaped by a history of adverse experiences. The classroom then becomes a place where safety, connection, and emotional regulation are treated as the pre-conditions for learning, not as afterthoughts. This is not a soft alternative to academic rigor. It is a claim about what has to be true for rigor to be possible. The pandemic did two things to this conversation. First, it made #ACEs and related adversities visible at scale. Roughly a quarter of children in the United States experience at least one traumatic event by adolescence, and post pandemic surveys report large increases in reported anxiety, depression, and other trauma related symptoms among youth. Second, it pushed many school systems to reconsider whether their default responses, particularly out of school suspension, zero tolerance policies, and referral to law enforcement, actually help students learn or reintegrate. School social workers surveyed during the pandemic explicitly named the need for a #trauma_informed school response as the most urgent gap in their institutions. At the same time, the picture on the ground is uneven. Many educators feel underprepared to apply trauma frameworks in daily classroom decisions. Training programs vary widely in depth. Restorative and relational strategies compete with pressures to raise test scores, close so called learning loss, and manage staffing shortages. Some schools have adopted #trauma_informed language without changing their underlying #reactive_discipline structures. Others have quietly done real relational work without labeling it in policy documents. This mismatch between rhetoric and practice is one of the reasons a careful assessment of effectiveness is needed now. This article assesses the effectiveness of #trauma_informed behavioral interventions in K-12 environments and analyzes their measurable impact on three outcomes that schools already track and that policymakers ask about: student #emotional_regulation, #absenteeism, and #academic_re_engagement. These three are chosen deliberately. Emotional regulation is the mechanism most frequently disrupted by trauma and the one most directly shaped by classroom practice. Absenteeism is a leading indicator of disengagement and a strong predictor of long term outcomes, and it is elevated among students with ACEs. Academic re-engagement, understood as the return of attention, effort, and belonging to the learning task, is the outcome parents and administrators most want to see after two disrupted school years. The article is organized as a structured review. Section 2 sets out the theoretical and empirical background, including the relationship between #ACEs, pandemic exposure, and school functioning. Section 3 describes the review methodology. Section 4 presents findings across the three outcome domains and across major intervention families, including trauma sensitive classroom design, restorative practices, mindfulness based social emotional learning, and #trauma_informed Positive Behavioral Interventions and Supports. Section 5 discusses cross cutting patterns, especially the shift from #reactive_discipline toward #relational_healing as an operating logic. Section 6 addresses limitations. Section 7 offers implementation and policy implications for K-12 leaders. Section 8 concludes. The central argument is straightforward. In the post pandemic cohort, behavior that used to be treated as a compliance problem is often better understood as a regulation problem. Interventions that address regulation, relationship, and belonging outperform interventions that rely on removal from the learning environment. The gains are not dramatic in every study, and they are conditional on implementation quality, but the direction of the evidence is consistent and the equity implications are large. Moving from #reactive_discipline to relational healing is not a slogan. It is a testable pedagogical change with measurable outcomes. 2. Background and Literature Review 2.1 What trauma does to learning Trauma, whether acute or chronic, changes how a child's brain and body process information. Sustained exposure to threat or adversity shifts attention toward danger detection, narrows working memory, and reduces the capacity to modulate emotional responses. In the classroom, this looks like difficulty starting tasks, sudden outbursts, freezing during instruction, withdrawal from peers, avoidance of new material, or a startle response to loud voices. #Emotional_dysregulation is not a personality flaw. It is a predictable feature of a nervous system that has learned to prioritize survival over exploration. The three core areas that a #trauma_informed school must address are safety, connection, and emotional and behavioral regulation. These are the foundational pillars, and each one has classroom level analogs. Safety includes predictable routines and language that avoids activating trauma responses. Connection includes stable, warm student teacher relationships that scaffold coping. Regulation includes explicit teaching of skills to name, tolerate, and manage emotions. When any pillar is missing, the others weaken. 2.2 ACEs, engagement, and absenteeism Population data show a robust link between #ACEs and school outcomes. In the 2016 to 2017 National Survey of Children's Health, children with one ACE had 1.32 times higher odds of school disengagement, those with two had 1.5 times higher odds, and those with three or more had 1.77 times higher odds compared with peers reporting no ACEs. Analyses of the 2011 to 2012 wave found that any exposure to ACEs was significantly associated with #chronic_absenteeism, with especially strong effects when children were exposed to neighborhood violence or multiple co occurring adversities. A more recent nationally representative study using the 2021 to 2022 National Health Interview Survey found that ACE exposure was associated with 1.53 times higher odds of any health related school absence and 2.43 times higher odds of health related chronic absenteeism, defined as missing at least ten percent of school days. General health status only partially explained the relationship, meaning ACEs affect attendance through pathways beyond illness alone. Similar dose response patterns appear across other engagement measures. A study of Florida high schoolers found ACE exposure was associated with #truancy in a dose response fashion, with the relationship partially attenuated by supportive school and parenting variables. A prospective longitudinal cascade model showed that early childhood #ACEs predict poor kindergarten student teacher relationship quality, which predicts elevated third grade internalizing problems, which in turn predicts reduced fifth grade school engagement. In autistic students, ACE exposure was significantly associated with reduced school attendance, grade progression, and engagement, and depression and anxiety only partially explained that effect. Dominance analyses of the 2018 to 2019 National Survey of Children's Health identified parental incarceration and economic hardship as the ACEs most strongly predictive of #school_engagement outcomes, suggesting that classroom interventions alone cannot fully offset out of school adversity. Intersectional analyses using 2021 data show that ACE effects on engagement vary by race, ethnicity, and gender, and that some subgroups face compounded risk that standard school policies miss. Together these findings support two claims. First, the pathway from adversity to disengagement runs through relationships and regulation, not just through symptoms. Second, absenteeism and disengagement should be read as trauma signals worth investigating rather than as isolated compliance failures. 2.3 The post-pandemic cohort The pandemic added a layer of collective and simultaneous adversity onto pre-existing patterns. Adolescents faced extended school closures, remote learning, social isolation, and elevated household stress, all during developmental windows in which peer contact and school routines matter most. Reports of anxiety, depression, and trauma symptoms rose noticeably, and academic engagement declined even in students without prior clinical concerns. School social workers described this as a unique period and a potentially traumatic experience in itself, one that required a #trauma_informed_school response rather than a narrow return to normal. Learning loss narratives dominated early policy responses, but a growing literature argues that framing the challenge only as lost content misses the deeper issue. Students returned not only behind on standards but disconnected from the routines, relationships, and identity of being a learner. The reconceptualization proposed by several scholars is to treat the recovery task as one of #relational_healing that then makes learning possible, rather than as an academic sprint that ignores emotional state. Márquez-Aponte argued that complexly and collectively traumatic experiences during the pandemic altered adolescents' developmental trajectory across cognition, affect regulation, and behavioral control, and that education policy reform should include neuro-education and trauma informed strategies for this cohort. 2.4 From reactive discipline to relational healing Traditional K-12 discipline has relied heavily on removal. Out of school suspensions, expulsions, and referrals to law enforcement are meant to deter behavior by separating the student from the learning environment. A scoping review of restorative justice processes in K-12 schools noted that zero tolerance discipline philosophies produce systemic racial, economic, and gender disparities in disciplinary outcomes and are linked to worse mental and physical health outcomes for youth of color. These outcomes reverberate throughout life through reduced graduation, higher justice system contact, and lower earnings. Restorative practices offer a different logic. They are designed to proactively build community, improve relationships, and help students amend harm when conflict occurs. An integrative review of eleven studies concluded that school based restorative practices are associated with reduced suspension rates and represent a promising approach to reducing exclusionary discipline. A scoping review of twenty empirical studies of restorative justice programs found positive outcomes in community-building, self-esteem, relationships, and reductions in exclusionary practices, while cautioning that disparities were often maintained without an explicit equity focus. The trauma informed frame and the restorative frame are not the same, but they overlap in a critical way. Both treat behavior as information about relationship and regulation rather than as a signal to remove the student. Both put the school's response, not the student's compliance, at the center of change. Recent work explicitly combines them with Positive Behavioral Interventions and Supports so that evidence based behavior management practices fit within a #trauma_informed framework rather than opposing it. 2.5 Emotional regulation as the core outcome Across intervention families, the outcome that most directly determines classroom function is emotional regulation. Poor emotional self-regulation, characterized by heightened reactions, difficulty focusing, low frustration tolerance, and reduced ability to inhibit behavior, is associated with a wide range of mental health and behavioral problems. The good news is that these skills can be taught. Explicit universal and targeted strategies to build competence in #emotional_self_regulation improve outcomes across ages, and schools are described as ideal environments for that work because of the sheer amount of time students spend there. Trauma informed approaches to severely dysregulated youth, including those in residential and inpatient settings, have been shown to improve emotional and behavioral outbursts while maintaining safety in the milieu. In early childhood classrooms, structured trauma informed routines and positive teacher relationships have been linked to marked improvements in children's ability to manage emotions, interact positively with peers, and participate in classroom activities. The mechanism is consistent across ages. Predictable structure plus warm relationships plus explicit skill instruction produces regulation, and regulation produces the conditions for learning. 3. Methodology This review is a structured narrative synthesis rather than a formal systematic review. The goal is to bring together the strongest available empirical and conceptual evidence on #trauma_informed behavioral interventions in K-12 environments, with a focus on post pandemic implementation and on the three outcomes named in the introduction. Sources were identified through iterative searches across academic databases using combinations of the following terms: trauma informed pedagogy, trauma informed schools, K-12, post pandemic, COVID-19, adverse childhood experiences, emotional regulation, absenteeism, academic re-engagement, restorative practices, exclusionary discipline, social emotional learning, and Positive Behavioral Interventions and Supports. Preference was given to studies published between 2020 and 2026, with a small number of older foundational studies included when they provided essential context on ACEs and school outcomes. Inclusion criteria were: peer reviewed or scholarly publication; population that includes K-12 students, teachers, or school staff; empirical, mixed methods, or systematic conceptual work; and clear articulation of at least one measurable outcome. Exclusion criteria included commercial or promotional material, publications without clear methods, and studies focused exclusively on higher education or non-school settings. Some higher education trauma informed studies were retained when their design informs K-12 practice. Extracted information for each study included setting, sample, intervention family, comparison condition where applicable, outcomes measured, effect direction and magnitude when reported, and limitations. Studies were then grouped by intervention family and by outcome domain. Because the field is heterogeneous and largely made up of pre-post designs, mixed methods evaluations, and secondary analyses of national surveys, formal effect size pooling was not attempted. Instead the synthesis emphasizes convergent findings across designs and populations and flags where evidence remains thin or conflicting. The three outcome domains, emotional regulation, absenteeism, and academic re-engagement, were selected because they are consistently measured in schools, are policy relevant, and map onto the causal chain implied by #trauma_informed theory. Where studies measured proxies such as discipline incidents, suspension rates, or engagement indices, these were treated as related indicators within the domain. Limitations of the methodology are acknowledged in Section 6. 4. Findings 4.1 Emotional regulation outcomes Across intervention families, the most consistent finding is that #trauma_informed strategies improve student emotional and behavioral regulation. In a year long mixed methods professional development study at a suburban U.S. elementary school, sixty one educators received trauma informed training combined with classroom coaching and policy review. Nearly half reported that their thinking shifted substantially, and fifty five percent reported that their practices shifted somewhat. Qualitative themes included increased understanding of trauma and secondary traumatic stress, greater empathy for students with challenging behaviors, and adoption of proactive strategies over reactive ones. Teachers described reappraising interactions with students so that a behavioral episode became a signal to investigate rather than a signal to punish. A mixed methods study combining trauma informed training with the MindUP mindfulness based social emotional learning program compared 71 intervention educators with 41 comparison educators. Intervention educators showed significant decreases in emotional exhaustion and significant improvements on the Attitudes Related to Trauma Informed Care scale and its reactions subscale. The largest self efficacy gains appeared among educators who implemented the program for two consecutive years, which is consistent with the broader literature that treats trauma informed change as a multi year process rather than a workshop level intervention. Teacher regulation matters for student regulation because a dysregulated adult cannot co-regulate a dysregulated child. In early childhood classrooms, a mixed methods study of fifteen educators and 120 children across three urban preschools with at least one year of trauma informed practice implementation found marked improvements in children's ability to manage emotions, interact positively with peers, and participate in classroom activities. Teachers attributed the gains to structured routines and warm relational strategies rather than to any single curriculum. The findings support the idea that #trauma_informed practices are less about a specific script and more about a consistent operating logic applied over time. For severely dysregulated youth, trauma informed approaches in inpatient and residential settings have been shown to reduce emotional and behavioral outbursts while maintaining safety. The same principle scales down to general education classrooms. Explicit instruction in #emotional_self_regulation, embedded in trusting adult child relationships and predictable environments, gives students concrete tools to name states, tolerate discomfort, and choose responses. Two implementation caveats emerge. First, professional development that focuses only on awareness of trauma without building #teacher_self_efficacy tends to produce ambivalence rather than sustained practice change. Teachers need not only knowledge but confidence that they can apply it. Second, universal skill instruction works best when paired with tiered supports for students whose regulation needs exceed what the whole class can provide. This mirrors the multi-tier logic of Positive Behavioral Interventions and Supports and is why the trauma informed PBIS integration is promising. 4.2 Absenteeism and attendance Reducing chronic #absenteeism is one of the clearest policy targets for #trauma_informed reform because the evidence linking adversity to attendance is unusually strong. The Tsevat and colleagues analysis of the 2021 to 2022 National Health Interview Survey found that any exposure to ACEs was associated with 1.53 times higher odds of health related absences and 2.43 times higher odds of chronic absenteeism. The Boccio, Cardwell, and Jackson study of Florida high school students found a dose response relationship between ACE exposure and #truancy, with the effect partly attenuated by supportive school and parenting factors. Earlier work by Stempel and colleagues in the 2011 to 2012 National Survey of Children's Health showed that children with four or more ACEs had 1.79 times higher odds of chronic absenteeism than peers with no ACEs, with especially high risk among children exposed to neighborhood violence and family substance use. The mechanism is not only physical illness. Duke's analysis of 81,885 ninth and eleventh graders in Minnesota showed that multiple types of ACEs were significantly associated with unexcused absences and low academic achievement, and that student school connection partially attenuated these effects. This suggests that strategies that strengthen a student's sense of belonging and trusted adult contact can reduce the attendance impact of adversity even when the adversity itself remains present. Where restorative and trauma informed practices have been implemented at scale, attendance related indicators tend to move. The Darling-Hammond study of 485 California middle schools over six years found that exposure to restorative practices reduced suspension rates and disparities and improved school climate and student achievement. Reduced suspensions matter directly for attendance because suspended students are, by definition, absent, and because suspension is itself a strong predictor of later chronic absenteeism. A cluster randomized controlled trial in eighteen elementary, middle, and high schools in a large Northeastern city found that in a single year and before the pandemic, students in restorative practice schools were less likely to receive a discipline incident record, at 11.1 percent, compared with 18.2 percent in comparison schools. The lack of differential effects by sex, race and ethnicity, or disability status in that one year evaluation suggests that multi year implementation may be needed to reduce entrenched disparities. Attendance related gains from restorative practices are consistent with an integrative review of eleven studies, which concluded that restorative practices are associated with reduced suspension rates and are a promising approach to reducing exclusionary discipline outcomes. Reductions in exclusionary discipline are important because exclusion is not neutral. It concentrates the risk of disengagement in already vulnerable subgroups. Two important qualifications apply. First, the pre-post and observational designs common in this literature cannot fully rule out selection effects. Schools that adopt restorative practices may differ from schools that do not in ways that also affect attendance. Second, the size of attendance gains is generally modest and depends on whether trauma informed and restorative practices are implemented as a whole school approach or as isolated add-ons. Whole school implementation with multi year commitment appears necessary for measurable attendance change. 4.3 Academic re-engagement Academic re-engagement in the post pandemic cohort has two components. One is behavioral, meaning showing up, attempting work, and staying in class. The other is affective and cognitive, meaning caring about the work, connecting with teachers, and identifying as a learner. #Trauma_informed pedagogy directly targets both. The Tran review of learning loss argues that framing recovery as content catch up misses the point. What students need most is #trauma_informed and responsive care that allows them to heal while developing the social, emotional, and academic skills disrupted by the pandemic. Shea and Awdziejczyk argue for making healing rather than performance the priority, centered on three Rs of relational connectedness, restored trust, and contextualized resilience. These are not soft targets. They are the pre-conditions for a student to re-enter cognitively demanding work after a period of avoidance. Direct engagement evidence appears in several places. The Darling-Hammond study found that restorative practice exposure improved students' academic achievement and reduced disparities, with Black and Latino students benefiting the most. The Cavins and colleagues analysis of the 2021 National Survey of Children's Health documented significant differences in school engagement across ACE class memberships, with the multiple high risk ACE class showing pervasive negative effects on engagement across intersecting identities. Interventions that reduce the effective ACE load on students, whether by reducing school based adversities like harsh discipline or by strengthening protective relationships, are therefore engagement interventions. Preschool through elementary evidence supports the same logic. In the McDoniel and Bierman prospective longitudinal cascade, kindergarten student teacher relationship quality mediated the link between early ACEs and fifth grade engagement, so building strong early relationships is an engagement lever. In the early childhood trauma informed practice study, teachers reported that structured routines and positive relationships enhanced children's participation in classroom activities, which is the everyday form of engagement. Post pandemic classroom accounts show similar directional evidence. In one action research and pedagogical reflection on face to face teaching after the pandemic, structured peer engagement, choice within assignments, and holistic recognition were described as ways to rebuild students' identities as learners. In a study of a private university responding to the fallout of the pandemic in Nigeria, trauma informed teaching was adopted to mitigate stressors from poverty, abusive homes, violence, and substance use, and the study documented that students exposed to traumatic experiences had difficulties managing their educational functioning. Although not K-12, the logic transfers, and the Nigerian study is a useful reminder that the pandemic aftermath is a global phenomenon rather than a U.S. specific one. Social emotional learning aligned with #trauma_informed practice is an important mediator of academic re-engagement. Ontiveros argues that #SEL equips students with the skills to regulate emotions, develop empathy, and build strong relationships, while empowering educators to create inclusive supportive classrooms. When #SEL and #trauma_informed practice are integrated, schools become spaces of healing and growth that allow all students to thrive academically, socially, and emotionally. Taken together, the evidence supports a two step causal claim. Trauma informed practice increases regulation and belonging, and regulation and belonging support academic re-engagement. The size of academic gains is smaller than the size of climate and behavioral gains, but the direction is consistent and the equity implications are meaningful. 4.4 Teacher and staff outcomes Any assessment of a school intervention is incomplete without attention to the adults asked to deliver it. Post pandemic, teacher burnout, secondary traumatic stress, and attrition are themselves major risks to student outcomes. School social workers described maintaining school based relationships during closures as both essential and difficult, and identified staff mental health as a first order concern for schools reopening. Trauma informed training that combines conceptual knowledge with practical skill and self care shows the most promise. The Kim and colleagues mixed methods study reported significant decreases in emotional exhaustion and improvements in trauma informed attitudes among educators who received combined trauma informed and MindUP training, with the largest self efficacy gains among two year implementers. The Koslouski study documented that teachers adopted self care strategies as an outcome of the intervention, alongside increased empathy and collaboration. #Teacher_self_efficacy has been identified as the missing piece in trauma informed classroom interventions. Programs that explain trauma without building confidence in daily classroom application tend to leave teachers overwhelmed rather than empowered. This has direct implications for how districts should design professional development. One off workshops rarely produce durable change. Multi year, coach supported, whole school approaches produce more. Underserved communities face particular challenges. Integrating #trauma_informed practices in U.S. educational systems is described as crucial for addressing behavioral challenges in underserved communities, and requires training educators to recognize and respond to trauma, creating supportive school environments, and incorporating social emotional learning into curricula. Barriers include resource limitations, resistance to change among staff, and systemic and cultural obstacles. Sustainability depends on supportive policies, funding, community engagement, and family involvement. 4.5 School climate and discipline structures Zooming out from student and teacher outcomes, whole school climate shifts appear in schools that adopt #trauma_informed and restorative frameworks in coordinated ways. Darling-Hammond found that schools increasing their use of restorative practices saw decreases in schoolwide misbehavior, substance abuse, and student mental health challenges, alongside improved school climate and student achievement. The integrative review by Samimi and colleagues showed reduced suspension rates in schools using restorative practices, with peacemaking circles as the most common implementation form. The scoping review by Gen and colleagues echoed positive outcomes in community building, self esteem, relationships, and reduced exclusion. Two structural findings deserve attention. First, restorative practices frequently maintain race, gender, disability, language, and economic disparities unless equity is treated as an explicit design goal rather than an assumed byproduct. Interventions that focus on interpersonal reconciliation without addressing structural inequities in referral patterns, teacher expectations, or resource allocation risk reproducing the disparities they set out to reduce. Second, whole school approaches that align classroom practices, discipline policies, staff training, and community partnerships produce more durable change than piecemeal adoption. The #trauma_informed PBIS integration is one example of coordinated design. It preserves the multi tiered support structure that many districts already use while embedding trauma understanding into how those tiers are populated and how student behavior is interpreted. Rather than replacing existing behavior management systems, this approach reframes them. 5. Discussion 5.1 What the evidence supports The literature supports several core claims with reasonable confidence. First, in the post pandemic cohort, a significant proportion of behavior that would traditionally be treated as a compliance problem is more accurately understood as a regulation problem shaped by cumulative adversity. Second, interventions that shift educators toward trauma sensitive, relationally grounded, regulation supportive practices are consistently associated with improved classroom function. Third, replacing or supplementing #exclusionary_discipline with restorative and #trauma_informed alternatives is associated with reductions in suspensions and improvements in school climate and academic outcomes, especially for historically marginalized students. Fourth, teacher training is necessary but not sufficient. #Teacher_self_efficacy, ongoing coaching, and multi year commitment appear to be the difference between symbolic and substantive change. The evidence base is stronger for climate and behavioral outcomes than for standardized academic achievement outcomes. This is partly a design issue. Academic tests are lagging indicators that respond slowly to climate shifts, and evaluation windows are often too short to capture their movement. It is also a mechanism issue. Trauma informed practice does not directly teach reading or mathematics. It changes the conditions under which teaching can land. Academic gains, when they appear, are typically downstream of engagement and attendance gains. 5.2 Why relational healing outperforms reactive discipline Reactive discipline works from the assumption that consequences deter misbehavior and that removal from the environment is a proportional response. In a population with high rates of adversity, this assumption fails for a specific reason. Students whose nervous systems are already primed to expect threat interpret removal as confirmation of that threat, not as a corrective signal. Suspension does not restore regulation. It typically deepens dysregulation, weakens the school relationship, and increases the probability of future incidents. This is why exclusionary discipline correlates with worse mental and physical health outcomes for youth, particularly youth of color. #Relational_healing works from a different premise. Behavior is a communication about relationship and regulation. The school's task is to help the student return to a regulated state, repair harm, and re-enter the community. This is not the abandonment of expectations or accountability. Restorative and trauma informed models still hold students to standards, but they do so through repair rather than removal. That difference is what produces the observed gains in climate, attendance, and engagement. The three Rs of relational connectedness, restored trust, and contextualized resilience proposed by Shea and Awdziejczyk provide a compact frame for what changes in a relationally healing classroom. Relational connectedness means that at least one adult in the building knows the student well and is trusted by the student. Restored trust means that after ruptures, the adults take responsibility for reconnecting. Contextualized resilience means resilience is not treated as an individual trait to be lectured about but as a set of skills embedded in a supportive context. 5.3 Equity implications The equity implications of moving from #reactive_discipline to #relational_healing are substantial. Zero tolerance systems produce disproportionate exclusion of Black, Latino, disabled, and low income students. Restorative and trauma informed alternatives, when implemented at whole school scale and with explicit equity design, appear to reduce these disparities and to produce the largest achievement gains for Black and Latino students. This is not incidental. It is a direct consequence of the fact that punitive systems most heavily punish the students carrying the heaviest adversity load, and that relational systems most heavily support them. However, restorative practices are not automatically equitable. The scoping review by Gen and colleagues found that race, gender, disability, language, and economic disparities were frequently maintained in restorative implementations that lacked explicit equity design. Intersectional analyses show that adolescents in the multiple high risk ACE class experience compounding disadvantage across identity categories. Any implementation plan that ignores these patterns risks replicating them under a new label. 5.4 Implementation as the pivotal variable A recurring theme is that implementation quality determines whether #trauma_informed and restorative approaches deliver on their promise. The Gregory and colleagues cluster randomized trial found reduced discipline incidents after one year but no differential effect on disparities, suggesting that a single year of implementation is not enough to move deep patterns. The Kim and colleagues MindUP study showed the largest self efficacy gains among two year implementers. The Koslouski year long professional development study showed real shifts in teacher thinking and practice, but at the moderate rather than transformative level, and the authors called for evaluation of trauma informed approaches during and after the pandemic. Implementation quality has several observable features. Whole school buy in, including leadership commitment, matters more than individual teacher enthusiasm. Coaching that follows initial training is more predictive of practice change than one off workshops. Policy alignment, including discipline policies that support rather than contradict trauma informed practice, is a precondition rather than a nicety. Community and family engagement is a stabilizer, particularly in underserved communities where out of school adversity is high. A pragmatic implication is that districts should treat #trauma_informed reform as a multi year strategic initiative rather than a workshop. The staffing, budget, and evaluation implications of that framing are different, and honesty about them at the outset increases the probability of durable change. 5.5 The role of #SEL and #PBIS Two existing frameworks can either compete with or complement #trauma_informed practice. Social emotional learning, when integrated with trauma understanding, gives students the explicit skill instruction that supports regulation and relationship. It also provides a curricular home for the work, which matters when time and priorities are contested. Positive Behavioral Interventions and Supports, when combined with trauma informed care, preserves the multi tiered structure that districts already understand while shifting the interpretation of student behavior. The most effective implementations integrate these frames rather than force schools to choose. A school does not need to decide between SEL, PBIS, and trauma informed practice. It needs to align them so that the message received by students and staff is consistent. When alignment fails, staff receive contradictory guidance, and students receive contradictory experiences. 5.6 Families, community partners, and the wider ecosystem Schools do not sit outside the ecology of a child's life. They sit inside it. The dominance analysis by Webb and colleagues placed parental incarceration and household economic hardship among the most powerful predictors of school engagement outcomes, which points to the limits of any classroom only intervention. The same analysis called for multi-faceted approaches that combine early childhood investment, economic wellbeing, and school based supports, because adversity clusters with structural conditions that schools alone cannot fix. Recognizing that limit is not defeatism. It is a design constraint. Schools that partner with pediatricians, community mental health providers, family support organizations, and, where appropriate, faith and cultural institutions build a wider net around the child than any single classroom can hold. Family engagement inside a #trauma_informed frame looks different from traditional parent involvement. Instead of positioning families as recipients of school communication about compliance, it positions them as partners with information the school needs. Caregivers of children with high ACE loads often carry their own trauma histories, and school outreach that reads as surveillance rather than support tends to fail. Warm, low stakes contact, translated communication, flexible meeting times, and explicit acknowledgment of the family's expertise on their child are practical ways to make partnership real. In underserved communities, family and community engagement have been identified as central to the sustainability and effectiveness of trauma informed interventions. 5.7 Cross cutting principles Several principles cut across the literature and are worth naming as design guidance. First, prioritize safety and predictability in the physical and emotional environment. Second, invest in relationships between adults and students as a first order intervention, not as a bonus. Third, teach regulation explicitly, using developmentally appropriate strategies, and give students opportunities to practice. Fourth, replace removal with repair whenever possible, while preserving accountability. Fifth, treat teacher wellbeing as part of the intervention rather than as a separate program. Sixth, align policies, curriculum, and discipline structures so the trauma informed message is not undermined by contradictory routines. Seventh, design for equity explicitly, because absence of design produces reproduction of existing disparities. 6. Limitations Several limitations of the evidence base and of this review deserve explicit acknowledgment. First, the research design distribution is uneven. Randomized controlled trials of whole school #trauma_informed reform are rare, and most evidence comes from pre-post studies, mixed methods evaluations, and secondary analyses of national surveys. This limits confident causal inference, particularly for academic outcomes. The Gregory and colleagues cluster randomized trial is a notable exception in the restorative practices literature, but even it examined only one year of impact. Second, measurement is inconsistent. Emotional regulation, absenteeism, and academic re-engagement are measured differently across studies, sometimes with validated scales, sometimes with administrative records, and sometimes with teacher or parent report. This reduces the comparability of effect sizes and complicates synthesis. Standardized measurement across studies would strengthen future evidence. Third, publication bias is a real concern. Interventions that fail to show effects may be underreported. This applies to both #trauma_informed and restorative literatures. The consistent direction of effects across studies is reassuring, but it does not eliminate the risk that null findings are undercounted. Fourth, implementation fidelity is often inadequately measured or reported. When a study reports positive outcomes, it can be difficult to tell whether the intervention itself is effective or whether the specific implementation context, including leadership, coaching, and staff buy in, is doing much of the work. Improved implementation fidelity documentation would help distinguish intervention effects from implementation effects. Fifth, most of the studies in this review come from the United States. The trauma informed and restorative literatures are global, and important work is happening in other education systems. The Nigerian study by Olujide is one useful example, but broader international integration is beyond the scope of this review. Sixth, this review is a structured narrative synthesis rather than a formal systematic review. Search strategy and inclusion decisions were iterative rather than pre-registered. Readers seeking pooled effect size estimates or formal risk of bias assessment should consult the systematic reviews and scoping reviews cited throughout, especially those by Samimi and colleagues, Gen and colleagues, and Darling-Hammond. Seventh, the post pandemic period is still unfolding. Cohorts moving through K-12 today are affected by pandemic experiences whose full academic and mental health consequences will not be visible for years. Any assessment written now should be read as provisional. 7. Implications for Practice and Policy For K-12 leaders considering how to move from #reactive_discipline to relational healing, several practical implications emerge. At the classroom level, teachers benefit from a small, consistent set of practices that recur every day. These include predictable routines and clear transitions, warm greetings and check ins that establish adult attention as a stable presence, explicit language for naming and managing emotions, and a repair oriented response to conflict that preserves the student's place in the community. None of this requires a new curriculum. It requires a coherent operating logic and permission to prioritize regulation over surface compliance. At the school level, leaders should treat #trauma_informed reform as a multi year initiative with named milestones. Year one typically includes shared training, baseline data on discipline and attendance, policy audit, and establishment of coaching structures. Year two focuses on practice depth, tier two and tier three supports, and family engagement. Year three focuses on sustainability, staff development pipelines, and evaluation. Districts that abandon initiatives after year one rarely see the disparities and academic outcomes shift, because those outcomes only respond to sustained implementation. At the policy level, discipline codes should be examined for internal contradictions. A school cannot credibly claim to be #trauma_informed while retaining automatic suspension policies for behaviors that are predictable stress responses. State and district policies that fund one off training without supporting coaching and implementation infrastructure often produce disappointment. Redirecting funds toward multi year, whole school implementation is more likely to produce measurable change. For teacher preparation programs, embedding trauma informed content, restorative practices, and social emotional learning within core methods coursework is a critical lever. New teachers who enter classrooms already fluent in these frames have a much shorter learning curve than teachers who must unlearn purely #reactive_discipline scripts. Integrating #trauma_informed pedagogy into educator preparation is not a specialization. It is baseline preparation for teaching in the post pandemic cohort. For evaluators and researchers, several priorities stand out. First, more rigorous multi year designs are needed, ideally with random assignment or well matched comparison schools. Second, standardized measurement of regulation, engagement, and attendance would enable better cross study synthesis. Third, explicit examination of equity outcomes should be built into every evaluation, not treated as a subgroup afterthought. Fourth, teacher wellbeing and turnover should be treated as first order outcomes because they mediate every other outcome. For families and communities, the most important message is that student behavior after the pandemic is not a sign that a generation has become unmanageable. It is a signal that a generation is asking for a different kind of school response. Trauma informed practice is not a way to lower expectations. It is a way to make it possible for students to meet expectations. One further implication is worth naming. In the current moment, many school systems are under pressure to deliver quick academic recovery, and there is real temptation to treat #trauma_informed practice as a distraction from the core academic task. The evidence points the other way. Regulation is a pre-condition for cognition. Belonging is a pre-condition for effort. Attendance is a pre-condition for anything. Framing relational healing as competitive with academic rigor is a false trade off that will slow recovery rather than speed it. The districts that treat trauma informed practice and academic rigor as two sides of the same commitment are the ones most likely to see both measures move. 8. Conclusion The post pandemic K-12 cohort is not a temporary problem to be waited out. It is the cohort schools have. The evidence reviewed here supports a clear direction. Students exposed to #ACEs and pandemic related stressors show elevated risks of #emotional_dysregulation, #chronic_absenteeism, and disengagement, and these risks are not equally distributed. #Reactive_discipline systems reproduce and often deepen these harms. #Trauma_informed and restorative approaches, when implemented as coherent multi year strategies with attention to equity and teacher wellbeing, are associated with reductions in exclusionary discipline, improvements in school climate and attendance, and modest but meaningful gains in academic engagement, especially for students most burdened by adversity. The core shift is one of interpretation. Behavior stops being a signal to punish and becomes a signal to investigate, connect, and support. Removing a student from the learning environment stops being the default response and becomes a rare last resort. The classroom stops being organized around compliance and starts being organized around safety, relationship, and regulation as the conditions of learning. None of this is soft. It asks more of teachers, not less. It requires policy alignment, sustained investment, honest measurement, and the courage to name inequities that continue under new labels. It also asks researchers to keep testing, refining, and holding the field to standards of evidence that go beyond enthusiasm. The move from reactive discipline to #relational_healing is not the end of accountability in schools. It is a more honest form of accountability, one that holds the adult institution responsible for creating the conditions in which children can thrive, and that treats each student's return to engagement not as a favor granted by the school but as the school's own primary work. In the post pandemic era, that work is the work. References Baiden, P., LaBrenz, C. A., Okine, L., Thrasher, S., & Asiedua-Baiden, G. (2020). The toxic duo: Bullying involvement and adverse childhood experiences as factors associated with school disengagement among children in the US. Children and Youth Services Review, 119, 105383. Bano, T., Younas, E., & Hassan, S. (2026). Impact of trauma-informed practices on social and emotional development in early childhood classrooms. Journal of Early Childhood Education and Practice. Boccio, C. M., Cardwell, S. M., & Jackson, D. B. (2024). Adverse childhood experiences and truancy in high school: An analysis of Florida adolescents. Journal of School Violence. Bylsma, M. (2024). Here to help: How pandemic pedagogy made for face-to-face change. Papers on Postsecondary Learning and Teaching, 6. Cavins, J. K., Lee, H. Y., & Kim, I. (2025). Role of adverse childhood experiences and intersecting identities on adolescents' school engagement in the United States. Journal of School Health. Darling-Hammond, S. (2023). Fostering belonging, transforming schools: The impact of restorative practices. Learning Policy Institute. Dombo, E. A., & Sabatino, C. (2019). Creating safe environments for traumatized children in schools. Oxford University Press. Duke, N. N. (2020). Adolescent adversity, school attendance and academic achievement: School connection and the potential for mitigating risk. Journal of School Health, 90(8), 618-627. Gen, B. M., Wojtowicz, O., & Johnson, N. L. (2025). Restorative justice processes in K-12 schools: A scoping review. Journal of School Health. Gregory, A., Huang, F. L., & Ward-Seidel, A. R. (2022). Evaluation of the whole school restorative practices project: One-year impact on discipline incidents. Journal of School Psychology, 93, 51-70. Gussin, H., Shiu, C., Danguilan, C., Mihaila, I., Acharya, K., & Berg, K. L. (2024). Impact of adverse childhood experiences and mental health on school success in autistic children: Findings from the 2016 to 2021 National Survey of Children's Health. Journal of Autism and Developmental Disorders. Hamoda, H., Chiumento, A., Alonge, O., Hamdani, S. U., Saeed, K., Wissow, L., & Rahman, A. (2021). The COVID-19 lockdown will have consequences for child mental health: Investing in school mental health programs can help. Journal of the American Academy of Child and Adolescent Psychiatry, 60(9), 1058-1060. Kalogeratos, G., Anastasopoulou, E., Tsagri, A., Tseremegklis, C., Tsogka, D., Lourida, K., & Drongitis, A. (2024). Adolescent trauma and impact of the COVID-19 pandemic in the school context. Technium Social Sciences Journal. Keeshin, B., Bryant, B., & Gargaro, E. R. (2021). Emotional dysregulation: A trauma-informed approach. Child and Adolescent Psychiatric Clinics of North America, 30(2), 375-387. Kim, S., Crooks, C. V., Bax, K., & Shokoohi, M. (2021). Impact of trauma-informed training and mindfulness-based social emotional learning program on teacher attitudes and burnout: A mixed-methods study. School Mental Health, 13(1), 55-68. Koslouski, J. B. (2022). Developing empathy and support for students with the most challenging behaviors: Mixed-methods outcomes of professional development in trauma-informed teaching practices. Frontiers in Education, 7, 1005887. Lancaster, S., & Hays, F. (2021). Teacher self-efficacy: The missing piece to trauma-informed classroom interventions. The Advocate, 26(2). Lazarus, P., & Costa, A. (2020). Teaching emotional self-regulation to children and adolescents. In Handbook of School Based Mental Health Promotion. Springer. Majebi, N. L., Adelodun, M. O., & Anyanwu, E. C. (2024). Integrating trauma-informed practices in U.S. educational systems: Addressing behavioral challenges in underserved communities. International Journal of Applied Research in Social Sciences. Marquez-Aponte, E. (2020). Trauma-informed strategies to support complexly traumatized adolescents in schools in the time of the COVID-19 pandemic. Theory in Action, 13(4). McDoniel, M. E., & Bierman, K. L. (2022). Exploring pathways linking early childhood adverse experiences to reduced preadolescent school engagement. Child Abuse and Neglect, 128, 105592. N. Amani, U. (2025). Creating trauma-informed schools: Strategies and practices. IAA Journal of Arts and Humanities. Olujide, O. (2025). Trauma-informed teaching strategies at a private university: Fallout of COVID-19 pandemic. Bold Scholar Journal of Education. Ontiveros, J. (2025). The school counselor's corner: The role of social emotional learning in supporting students facing trauma. Language Bridge Journal. Rawson, S. (2020). Trauma-sensitive schools. In Handbook of School Mental Health. Springer. Riggs, L., & Landrum, T. J. (2023). Trauma-informed PBIS: How educators can combine evidence-based practices for behavior management with trauma-informed care. Beyond Behavior, 32(2). Samimi, C., Han, T. M., Navvab, A., Sedivy, J. A., & Anyon, Y. (2023). Restorative practices and exclusionary school discipline: An integrative review. Contemporary School Psychology. Shea, M., & Awdziejczyk, A. N. (2020). Make healing, not performance, the goal for K-12 schools amid this global pandemic. International Dialogues on Education, 7(1). Subramaniam, P. R., & Wuest, D. (2025). Trauma-informed pedagogy and physical literacy as a pathway to human flourishing. Journal of Physical Education, Recreation and Dance. Tran, A. (2024). Reconceptualizing learning loss: The need for trauma-informed and responsive care in K-12 education. Routledge Open Research. Tsevat, R. K., Nkansah, M. M., Shankar, M., Choi, K., Jackson, N., Thyne, S. M., Gordon, B., & Dudovitz, R. (2025). The association between ACEs and health-related school absenteeism: Results from a national survey of youth. Academic Pediatrics. Watson, K. R., Capp, G. P., Astor, R., Kelly, M. S., & Benbenishty, R. (2022). We need to address the trauma: School social workers' views about student and staff mental health during COVID-19. School Mental Health, 14, 902-917. Webb, N., Miller, T., & Stockbridge, E. (2022). Potential effects of adverse childhood experiences on school engagement in youth: A dominance analysis. BMC Public Health, 22, 2151. Stempel, H., Cox-Martin, M., Bronsert, M. R., Dickinson, L., & Allison, M. A. (2017). Chronic school absenteeism and the role of adverse childhood experiences. Academic Pediatrics, 17(8), 837-843. Arbour, M., Walker, K. C., & Houston, J. (2023). Trauma-informed pedagogy: Instructional strategies to support student success. Journal of Midwifery and Womens Health, 68(4). Pincus, R., Hannor-Walker, T., Wright, L., & Justice, J. (2020). COVID-19s effect on students: How school counselors rise to the rescue. National Association of Secondary School Principals Bulletin. Phelps, C., & Sperry, L. L. (2020). Children and the COVID-19 pandemic. Psychological Trauma: Theory, Research, Practice, and Policy, 12(S1), S73-S75. #trauma_informed_pedagogy #post_pandemic_learning #K12_education #relational_healing #restorative_practices #student_mental_health #emotional_regulation #chronic_absenteeism #academic_reengagement #ACEs_in_schools #school_climate #trauma_sensitive_classrooms #teacher_wellbeing #SEL_integration #equity_in_discipline

  • The Translanguaging Advantage: Leveraging Multilingual Repertoires in Monolingual Systems

    Classrooms across the world are becoming more #linguistically_diverse, yet the majority of school systems still operate under strict monolingual policies that treat the dominant language, often English, as the only legitimate medium of instruction. This tension between the lived reality of #bilingual_learners and the rigid architecture of institutional language rules has produced a long standing question in education: what should teachers actually do when their students think, feel, and reason across several languages, but are expected to demonstrate learning in only one? This article examines #translanguaging as both a theory of language and a set of teaching practices that answer this question in a productive way. Drawing on peer reviewed research published in the last five years, the paper argues that when teachers deliberately invite students to use their full linguistic repertoire for cognitively demanding tasks, learning outcomes in content areas, literacy, and critical thinking improve, even when final assessments remain in the dominant language. The article reviews the theoretical foundations of translanguaging, presents concrete #pedagogical_strategies for science, mathematics, humanities, and language classrooms, discusses the challenges teachers face when working against monolingual mandates, and outlines implications for policy, teacher education, and assessment. The core claim is simple: the multilingual mind is not a problem to be managed but a resource to be mobilized, and pedagogy that recognizes this reality prepares students more fully for academic, civic, and professional life. Keywords: translanguaging, multilingualism, bilingual education, English medium instruction, pedagogy, language policy, equity, classroom discourse 1. Introduction Walk into almost any urban classroom in the twenty first century and you will hear more than one language. A student may whisper to a friend in Punjabi while writing a paragraph in English, another may quietly translate a science term into Arabic before typing an answer, and a third may draft an essay first in Spanish before rewriting it in English. These small acts, often invisible to the teacher and sometimes actively discouraged, are not signs of confusion or incomplete learning. They are evidence of #cognitive_work in progress. They are also evidence that language, as it lives in the minds of real learners, does not respect the neat borders that school systems try to draw around it. For most of the twentieth century, mainstream educational thought treated a bilingual student as two monolinguals in one body. Under this view, each language was expected to stay in its own container, and mixing languages was framed as a deficit. Policies followed this thinking. English only rules, sink or swim submersion programs, and testing systems that recognized only the dominant language became the norm in the United States, the United Kingdom, Australia, parts of Europe, and in many post colonial contexts where English or another former colonial language dominates schooling. The results, measured over decades, are well documented. #Emergent_bilinguals routinely underperform in high stakes tests, disengage from school, and are overrepresented in special education referrals, not because their cognition is weak, but because the instrument of instruction has ignored a large part of who they are as thinkers (Garcia and Kleifgen, 2020). Translanguaging emerged as both a critique of this arrangement and a practical alternative. First articulated in Welsh bilingual classrooms and later expanded into a broad theoretical framework by Ofelia Garcia, Li Wei, and others, translanguaging proposes that bilingual people do not hold two separate language systems in their heads. Instead, they draw from a single, integrated #linguistic_repertoire that includes vocabulary, grammar, sound, gesture, and other semiotic resources from all the languages they know. Named languages such as English, French, Urdu, or Yoruba are social and political labels that we impose on this repertoire, not accurate maps of how the mind actually stores and uses language (Vogel and Garcia, 2020; Li, 2022). If this account of the multilingual mind is correct, then classroom practices that force students to shut down parts of their repertoire are cognitively wasteful and pedagogically harmful. This paper takes that claim seriously and asks a set of practical questions. What does it look like to build lessons that treat the multilingual repertoire as a resource? How do teachers do this inside a school that still tests only in English or in another dominant language? What evidence exists that these practices actually improve #academic_outcomes rather than just making students feel more comfortable? And what should policymakers, teacher educators, and school leaders change in order to make such practices sustainable rather than dependent on the goodwill of individual teachers? The article moves through these questions in eight sections. After this introduction, section two lays out the theoretical foundations of translanguaging and distinguishes it from related ideas such as code switching and simple bilingual instruction. Section three describes the methodological approach of this review. Sections four and five present findings from recent empirical studies on translanguaging pedagogy across subject areas and grade levels. Section six focuses specifically on the paradox of translanguaging inside #monolingual_systems. Section seven discusses challenges and section eight offers implications and a conclusion. Throughout, the aim is to speak to a mixed audience of students, teachers, and researchers, presenting complex ideas in plain language while maintaining the rigor expected of a scholarly article. 2. Theoretical Framework 2.1 From Code Switching to Translanguaging To understand translanguaging, it helps to see what it is not. #Code_switching, an older and better known concept in sociolinguistics, refers to the movement between two separate language codes, usually explained as a rule governed alternation that speakers perform for social, stylistic, or discursive purposes. Under a code switching lens, a Spanish English bilingual who says, I need to comprar leche after school, is understood as switching from one distinct system to another. The bilingual speaker is treated as skilled at moving between codes, but the codes themselves are treated as separate. Translanguaging challenges the separateness of those codes. It draws on research in psycholinguistics, neurolinguistics, and cognitive science showing that bilingual speakers activate features from all their languages in parallel, even when producing speech in only one (MacSwan, 2022). This does not mean that the categories English, Turkish, or Bengali are meaningless. They matter socially, politically, and educationally. But cognitively, the bilingual person is drawing from one deep well of #linguistic_features, and the surface labels that we place on stretches of speech are secondary. The pedagogical implications are significant. If bilingual learners have one integrated system, then teaching that pretends to address only their English self, while ignoring their Somali, Ukrainian, or Vietnamese self, is teaching only part of the learner. Translanguaging pedagogy, in contrast, works with the whole learner (Garcia and Kleifgen, 2020). 2.2 The Two Faces of Translanguaging Scholars have found it useful to distinguish two faces of translanguaging that overlap in practice but are analytically different (Cenoz and Gorter, 2021). The first face is #spontaneous_translanguaging, the everyday, unplanned mixing of languages that bilingual people do in conversation, in thinking aloud, and in writing drafts. The second face is #pedagogical_translanguaging, a deliberate and planned use of multiple languages by teachers to promote learning. Pedagogical translanguaging includes strategies such as previewing a science text in one language before reading it in another, allowing group discussion in a home language before writing a summary in the school language, and comparing grammatical structures across languages to build metalinguistic awareness. Both faces matter. Spontaneous translanguaging tells us how learners actually make meaning, and ignoring it means ignoring evidence about student thinking. Pedagogical translanguaging turns that evidence into a design principle, giving teachers a way to plan lessons that harness multilingual resources rather than suppress them. 2.3 Translanguaging as a Theory of Social Justice A third strand of translanguaging theory treats the practice as an issue of #linguistic_justice. Language policies in schools reflect broader relations of power, and those relations often privilege the language of former colonizers, majority ethnic groups, or economic elites. When schools tell children that the languages of their homes and communities do not belong in classrooms, they send a message about whose knowledge counts. Over years, this message shapes identity, self esteem, and school engagement (Poza, 2021). Translanguaging as a justice oriented practice tries to disrupt this message. It positions the languages that children bring from home as legitimate resources for #academic_learning, not just for social conversation. It insists that a Yoruba English bilingual student in London or a Nahuatl Spanish English trilingual student in Los Angeles has intellectual assets that the classroom needs to see. In this sense, translanguaging is not only a technique for improving test scores. It is a stance about who belongs in academic spaces and on what terms (Rajendram, 2022). 2.4 The Semiotic Dimension While translanguaging began as an account of movement across named languages, more recent formulations have widened the frame to include #semiotic_resources beyond spoken and written words. Gesture, gaze, image, diagram, sound, and even bodily posture are part of how meaning is made in classrooms, and bilingual learners often use these resources creatively to bridge gaps in one language or another. A student who cannot yet name a chemical reaction in English may draw an arrow diagram, add a caption in her home language, and explain the process orally in a mix of both. All of these acts belong to the same meaning making event, and pedagogy that treats only spoken language as legitimate misses much of what learners are doing. The practical consequence is that translanguaging classrooms are often visually and materially rich. Wall displays, student notebooks, group posters, and digital documents carry traces of many languages and modes at once. Teachers who understand the semiotic breadth of translanguaging can design tasks that intentionally use image and text together, allowing students to demonstrate understanding in ways a strict verbal test would miss (Li, 2022). 2.5 A Working Definition For the purposes of this article, translanguaging is defined as the deliberate and principled use of a learner's full linguistic repertoire, across named languages and semiotic resources, for the purposes of understanding, producing, and communicating academic content. The definition is intentionally broad. It covers a teacher who allows a five year old to describe a science observation partly in Bangla and partly in English. It also covers a teacher of secondary literature who invites students to annotate an English poem with reflections in their home languages. And it covers a university lecturer who assigns group problem solving in physics where students may reason together in any language before presenting solutions in the medium of instruction. What unites these different scenes is a shared stance, a refusal to treat monolingual output as the only evidence of learning. 3. Methodological Approach This article is a #narrative_review of peer reviewed empirical and conceptual studies on translanguaging pedagogy published between 2020 and 2025, with a small number of earlier foundational sources included for continuity. Databases searched include ERIC, Scopus, and Google Scholar. Search terms included translanguaging, pedagogical translanguaging, multilingual pedagogy, bilingual instruction, English medium instruction, and related combinations. Priority was given to studies with clear classroom or program level data, replicable methods, and explicit attention to learning outcomes rather than only to teacher or student attitudes. A narrative rather than systematic review was chosen because the field is diverse in its methods, populations, and outcome measures, and because the goal of the article is to build an accessible synthesis for students and practitioners rather than a formal effect size estimate. Where relevant, the article notes when a finding rests on a single case study, on a larger multi site study, or on a body of converging evidence. The review adopted three inclusion principles. First, sources needed to focus on classroom based or program based translanguaging rather than only on theoretical debate. Second, sources needed to report on learners in real educational settings rather than only on adult professional bilinguals. Third, where possible, sources needed to attend to outcomes that matter to teachers and policymakers, such as content learning, literacy development, engagement, and equity. Studies that only measured attitudes without connecting them to observed practices were used for contextual background but not for central claims. The review also acknowledges limitations. Much of the strongest empirical work continues to come from Europe and North America, particularly from Spanish English, Basque, Catalan, and Welsh contexts. Research from African, South Asian, and Latin American classrooms is growing but remains underrepresented in English language journals, and this imbalance shapes what the field currently claims to know. Where possible, cases from underrepresented regions are highlighted, but the reader should treat generalizations across contexts with appropriate caution. 4. Cognitive and Academic Benefits of Translanguaging 4.1 Deeper Content Learning in Mathematics and Science Some of the most persuasive evidence for translanguaging pedagogy comes from #STEM_classrooms, where content is often mistakenly assumed to be language neutral. In fact, learning mathematics and science is deeply linguistic. Students need to name concepts, follow logical chains, describe procedures, and argue for solutions. When these tasks happen in a second language that a student is still developing, cognitive load rises sharply, and understanding can be reduced to the memorization of surface forms. Studies in secondary mathematics classrooms have shown that when students are allowed to discuss problems in their home languages before formalizing solutions in the medium of instruction, their reasoning becomes richer and their errors reveal genuine mathematical thinking rather than only language gaps. Prilutskaya (2021) reviewed classroom studies in mathematics and science across several European contexts and reported that translanguaging strategies were associated with better conceptual explanations from bilingual learners, particularly on tasks requiring justification. In similar work, Tai and Li Wei (2021) documented how a science teacher in a Hong Kong English medium school skillfully wove Cantonese into English lessons to unpack difficult ideas such as chemical equilibrium, and students in that classroom produced more precise definitions and more accurate diagrams than peers in strict English only sections. The point is not that home languages replace the target language of instruction. It is that #meaning_making requires the whole cognitive apparatus of the learner, and shutting off part of that apparatus makes hard ideas harder without any real gain in target language proficiency. 4.2 Literacy Development and Metalinguistic Awareness A second line of research shows that translanguaging supports rather than hinders literacy development in the dominant language. This is important because monolingual policies are often defended on the grounds that using home languages will slow down English acquisition. The evidence points in the other direction. Cenoz and Gorter (2021) synthesized research on pedagogical translanguaging in Basque, Catalan, and Welsh contexts, finding that structured comparison across languages improved not only reading comprehension in the minority language but also in the dominant language. Learners who were guided to notice similarities and differences between languages, at the level of sounds, word roots, sentence structure, and text organization, developed stronger #metalinguistic_awareness, which then transferred to their reading and writing in each language. In North American settings, Garcia and Kleifgen (2020) and Vogel and Garcia (2020) have documented similar patterns among Spanish English bilinguals. Students in translanguaging classrooms often outperformed peers on measures of reading engagement, willingness to attempt complex texts, and revision quality. The mechanism appears to be twofold. First, students can access background knowledge more efficiently when they are not blocked at the level of vocabulary. Second, they build a more explicit understanding of how language works, which is precisely what advanced literacy requires. 4.3 Higher Order Thinking and Problem Solving For tasks that go beyond recall, translanguaging appears to be even more useful. When students are asked to compare, evaluate, synthesize, or argue, they need to hold several ideas in mind and manipulate them. Doing so in a language one is still learning is like trying to solve a puzzle while wearing thick gloves. Studies on collaborative #problem_solving in secondary and university classrooms show that when bilingual groups are permitted to reason together in any combination of languages before producing a final artifact in the medium of instruction, the quality of that artifact tends to improve (Rajendram, 2021). This finding has appeared in engineering education in Malaysia, in social studies classrooms in the United States, in teacher training programs in South Africa, and in university literature courses in the United Arab Emirates. Across these different settings, the pattern is consistent. Cognitive quality goes up when linguistic constraints go down, and the final product in the target language does not suffer. In many cases, it improves, presumably because students have already done the hard thinking and are now free to focus on expression. 4.4 Working Memory and Cognitive Load A less obvious but important benefit of translanguaging appears in the domain of #working_memory. Human working memory is limited. When a learner must simultaneously decode a difficult text in a second language, retrieve background knowledge, and manipulate ideas, the demand often exceeds available capacity, and learning stalls. Allowing the learner to offload some of this work into a stronger language reduces cognitive load and frees capacity for the actual conceptual task. Recent classroom studies in secondary mathematics and physics illustrate this effect. When bilingual students were allowed to write their initial reasoning in a home language while the teacher circulated to check understanding, the number of students who reached the correct answer, and could explain their steps, was notably higher than in comparison classes where all work had to be done in the target language from the start. Importantly, students in the translanguaging condition did not simply avoid the target language. They arrived at the target language later in the task, once the conceptual work was more secure, and their final written explanations in the target language were fuller and more accurate (Prilutskaya, 2021; Rajendram, 2021). This pattern helps explain a paradox that has long puzzled teachers of English learners. Students who seem to understand a concept in conversation may fail to demonstrate that understanding on a written test. The gap is often not conceptual but linguistic and cognitive. Under translanguaging conditions, the concept and the language can be developed in a more sustainable sequence rather than being forced into simultaneous production. 4.5 Identity, Engagement, and Well Being Learning is not only cognitive. Students who feel that their identities are recognized in school engage more deeply and persist longer. Multiple recent studies have documented what happens emotionally when translanguaging is welcomed. Ticheloven, Blom, Leseman, and McMonagle (2021), working in Dutch primary classrooms, found that students who could use home languages in class reported stronger #school_belonging and were more willing to participate in whole class discussion. In a study of refugee background youth in Australia, similar patterns emerged. Students described feeling more like themselves in classrooms that acknowledged their linguistic histories, and teachers reported fewer behavior incidents and higher completion rates on demanding assignments (Duarte, 2020). These findings should not be dismissed as soft outcomes. Engagement and belonging are among the strongest predictors of long term academic success, and they interact with cognition. A student who is anxious about being caught using the wrong language is not fully available for learning. A student who feels linguistically safe can spend cognitive resources on the actual task. 5. Pedagogical Strategies for the Real Classroom Theoretical arguments matter, but teachers need concrete moves. This section presents a set of translanguaging strategies that have been documented in recent classroom research and that can be adapted across grade levels and subject areas. The strategies are grouped by function rather than by subject, because most of them apply broadly. 5.1 Strategies for Building Comprehension The first job of a translanguaging teacher is to make sure students actually understand what they are learning. Several strategies serve this purpose. Preview and review in the home language. Before beginning a challenging text or lesson in the dominant language, the teacher provides a short overview of key ideas and vocabulary in the students home language, or invites students to preview the material in pairs using any language they share. After the lesson, a brief review in the home language checks understanding and consolidates learning. This strategy, sometimes called #linguistic_bookending, was documented in Rajendram (2021) as effective in Malaysian science classrooms. Multilingual word walls and glossaries. Instead of an English only word wall, the classroom features key vocabulary in English alongside home language equivalents contributed by students. This does two things at once. It provides a memory aid for content, and it publicly legitimizes the languages of the community. Students take ownership of the wall, which becomes a living document of their linguistic knowledge. Bilingual read alouds and paired texts. In literacy blocks, the teacher reads a picture book or short passage in one language and provides a translated or thematically related version in another. Students then discuss the shared themes across languages. This works well in primary grades and has been used in dual language programs and mainstream classes alike (Prada and Turnbull, 2021). Cross linguistic anchor charts. Anchor charts that record the key ideas of a unit include entries in more than one language, with student contributions welcome. Over the course of a unit, the chart becomes a shared record of concepts and their expression across languages. Unlike a static wall display, the chart evolves and is referenced in class discussion, which reinforces its status as a working intellectual tool rather than decoration. Language brokers and peer scaffolding. In many multilingual classrooms, some students are more advanced in the target language than others, or a small group shares a home language that the teacher does not. Structured peer scaffolding, in which more advanced bilinguals help newcomers navigate a lesson in a shared language, is not a shortcut but a legitimate learning strategy. It benefits both the broker, who consolidates understanding by explaining, and the newcomer, who accesses content without waiting for full target language proficiency (Duarte, 2020). 5.2 Strategies for Building Content Knowledge Beyond comprehension, translanguaging helps students actually build durable knowledge of subject matter. Home language brainstorming. Before a writing task or complex problem, students brainstorm ideas in any language. They may use graphic organizers, sticky notes, or informal talk. Only later do they translate or transform the ideas into the target language. This separates thinking from encoding and reduces cognitive load. In a study of secondary history classrooms in the United States, students who used this strategy produced essays with more sophisticated arguments than peers who were required to draft only in English (Sah and Li, 2022). Translanguaging group work. In group tasks, students are explicitly told that any language is welcome during discussion, but the final product will be in the target language. Teachers who use this approach report richer discussions and stronger final products, particularly on tasks requiring reasoning or #critical_analysis. The key is to normalize the use of multiple languages so that no student feels judged for choosing to use a home language in the process. Translanguaging science notebooks. In elementary and middle school science, students keep notebooks in which they can record observations, hypotheses, and questions in any language they choose. During class discussion, they translate or paraphrase entries into the target language. This mirrors what scientists actually do, moving between informal and formal registers, and produces stronger scientific writing over time. 5.3 Strategies for Building Language and Metalinguistic Awareness A common concern about translanguaging is that it may reduce exposure to the target language. Well designed strategies actually do the opposite, by drawing explicit attention to how languages work. Cross linguistic comparison. The teacher deliberately compares features across languages, such as word order, tense marking, or discourse structure. For example, a grammar lesson on English relative clauses might invite students to describe how the same idea is expressed in their home languages. This activity has been shown to accelerate #grammatical_awareness in both languages (Cenoz and Gorter, 2022). Cognate detective work. Students look for words that share roots across languages. This is especially productive for Romance language speakers learning English, but it works with many language pairs. It also teaches students that language is a system that can be analyzed, not just a set of rules to be memorized. Translation as writing practice. Rather than treating translation as a low level exercise, teachers use it as a serious writing task. Students translate short texts and then discuss the choices they made, why one word or structure works better than another, and how meaning shifts across languages. This activity develops both languages simultaneously. 5.4 Strategies for Voice and Dialogue Participation in academic dialogue is one of the highest goals of schooling and one of the hardest for #emergent_bilinguals in monolingual classrooms. Several translanguaging moves lower the threshold to participation. Think pair share with linguistic flexibility. In this classic strategy, students first think alone, then discuss with a partner, then share with the whole class. In a translanguaging version, the think and pair stages can happen in any language, while the share stage is in the target language. Students arrive at whole class discussion with ideas that have already been rehearsed, and the quality of contributions rises accordingly. Multilingual exit tickets. At the end of a lesson, students write one thing they learned and one question they still have. Both may be written in any language. The teacher reads through the tickets, identifies patterns, and adjusts the next lesson accordingly. Even when the teacher does not read all the languages, patterns emerge from the drawings, the target language portions, and quick help from bilingual colleagues or students. Silent conversations across languages. Students respond in writing to a prompt on a shared paper, first in any language, then respond to each others contributions. The written trail can then be revisited and discussed. This technique gives quieter students, and students still gaining oral fluency in the target language, a way to contribute substantively. 5.5 Strategies for Assessment and Feedback Assessment is where monolingual mandates are usually strictest, and where translanguaging faces its hardest test. Recent research offers several approaches. Multilingual formative assessment. During the learning process, teachers accept student responses in any language and use those responses to guide instruction. A student who explains her understanding of the water cycle in her home language is demonstrating knowledge, and the teacher can build on that. The final summative test may still be in the medium of instruction, but the diagnostic work along the way is more accurate when it includes all languages. Portfolio and process based assessment. Rather than judging only the final product, teachers assess the full process, including drafts, notes, and discussions in any language. This provides a richer picture of learning and reduces the penalty for students who need more linguistic scaffolding at the encoding stage. Bilingual exam accommodations. In some jurisdictions, students may take content tests with bilingual glossaries, dual language versions of items, or extra time. Where these accommodations exist and are used, achievement gaps narrow. Where they do not exist, teachers can advocate for them. 6. Translanguaging Inside Monolingual Systems Most of the strategies above are easy to describe and difficult to implement, because they run against the grain of policy. This section takes seriously the reality that most teachers work inside #monolingual_mandates, whether explicit English only rules in the United States and parts of Europe, or English medium instruction policies in higher education across Asia, Africa, and the Middle East. How can translanguaging survive and thrive in such settings? 6.1 The Policy Landscape Monolingual policies come in several forms. Some are explicit, such as state laws requiring English only classrooms or ministerial orders specifying English medium instruction at university level. Others are implicit, expressed through textbook selection, testing regimes, teacher hiring criteria, and school culture. Still others are historical, inherited from colonial systems and reproduced without conscious decision (Sah and Li, 2022). The effect on teachers is similar across these variants. Teachers often internalize the sense that using home languages is either forbidden, unprofessional, or academically inferior. Even when policies allow flexibility, many teachers err on the side of monolingual practice to avoid conflict with administrators or parents. This creates what researchers have called a #hidden_curriculum of language, in which the message that some languages do not belong is transmitted through daily practice rather than through formal rules. 6.2 Covert and Overt Translanguaging Faced with these constraints, teachers who believe in translanguaging often develop what has been described as covert practices. They allow home language use in small groups, in peer tutoring, or in one on one conferences, while keeping whole class instruction in the target language. They may permit students to draft in a home language even when they are officially required to work in the target language, and they may use their own multilingual skills to check understanding privately (Wang, 2020). Covert translanguaging has a real place in classrooms with restrictive policies, but it is limited. It relies on the goodwill and skill of individual teachers, it is invisible to administrators and therefore vulnerable, and it does not shift the broader message that home languages are second class. Overt translanguaging, in contrast, makes the practice explicit and defensible. Teachers explain to students and parents why home languages are being used, document the pedagogical rationale, and connect the practice to curricular goals and, where possible, to policy language that allows flexibility. In practice, most successful teachers combine both. They use covert strategies as a bridge while working to build the conditions under which overt strategies become possible. School leaders, teacher educators, and researchers can support this work by providing language, evidence, and institutional cover. 6.3 The Higher Education Paradox Higher education presents a particularly sharp version of the paradox. Across Asia, the Middle East, Africa, and increasingly Europe, universities have adopted English medium instruction as a mark of #internationalization and prestige. The result is that students and lecturers who share a home language must communicate about complex disciplinary content in a language that many of them are still learning. Recent studies of English medium instruction at universities in China, Turkey, the United Arab Emirates, and South Africa have found that strict enforcement of English only norms often produces surface compliance and deep confusion. Students memorize technical vocabulary without truly understanding it. Lecturers simplify content to match student proficiency. Assessment becomes a test of language rather than of disciplinary reasoning (Sah and Li, 2022). Where lecturers have opened space for translanguaging, either explicitly or implicitly, outcomes have been more encouraging. Students engage in more substantive discussion, ask sharper questions, and produce written work with stronger arguments. English proficiency does not decline, because students are still reading, writing, and being assessed in English. What changes is the quality of thinking underneath the English surface. 6.4 Case Illustrations Across Contexts The practical shape of translanguaging in monolingual systems becomes clearer through short case illustrations drawn from recent published research. In a primary school in the Netherlands, a teacher of a class with children from Turkish, Arabic, and Polish backgrounds introduced a weekly language of the week routine. Each week, one home language was foregrounded, with vocabulary lists, greetings, and student led mini lessons. Over an academic year, participation in whole class discussion rose across the group, and the teacher reported that even children whose language was not featured that week became more willing to contribute, apparently because the routine signaled that all languages had value (Ticheloven et al., 2021). In a secondary science classroom in Hong Kong, an English medium instruction teacher developed what she called a two register approach. Complex explanations were first delivered in Cantonese, then reformulated in English, with the two versions placed side by side on the board. Students then worked in mixed language groups to apply the concept. Assessment remained in English, and results on standardized science examinations equaled or exceeded those of comparable strict English only classes (Tai and Li, 2021). In a university engineering program in Malaysia, an instructor allowed group projects to be discussed in Malay, Mandarin, or Tamil during working sessions, with all final reports submitted in English. Student teams reported higher satisfaction, and grading rubrics showed stronger conceptual quality in the final reports compared with previous cohorts under strict English only norms (Rajendram, 2022). In a refugee education program in Australia, teachers integrated translanguaging into a curriculum that also foregrounded student agency and #identity_work. Students produced multilingual portfolios that documented their learning in English and in their heritage languages. Assessment used a portfolio rubric that valued both the depth of content and the range of linguistic resources deployed. Outcomes on external English assessments did not decline, and student reports of school belonging and academic self efficacy rose measurably (Duarte, 2020). These cases do not prove that translanguaging works in every setting. They do show that skilled teachers, in a range of monolingual and semi monolingual systems, have found ways to make it work, and that the pedagogical benefits are consistent enough across settings to warrant serious institutional attention. 6.5 Sustaining Translanguaging Without Institutional Support For many teachers, the reality is that institutional support will not arrive soon. In such conditions, sustaining translanguaging requires community. Teachers benefit from professional networks, either formal, such as translanguaging focused teacher research groups, or informal, such as small circles of colleagues who share strategies and moral support. They also benefit from documentation. Keeping records of student work, of gains in engagement and achievement, and of parent feedback provides evidence that can be shared with administrators and used to gradually shift local policy. Students and families are also crucial allies. When parents understand that their languages are being welcomed as intellectual resources, they engage more with the school, and their support strengthens the teachers position. Community organizations, especially those grounded in immigrant or Indigenous communities, can amplify these voices in ways individual teachers cannot. 7. Challenges and Critiques Translanguaging pedagogy is powerful, but it is not without challenges and critiques. Any honest treatment must engage these seriously. 7.1 Teacher Preparation and Linguistic Repertoire One practical challenge is that most teachers are not multilingual in the languages of their students. A teacher in a Toronto classroom may have students who speak Tagalog, Somali, Farsi, and Urdu at home. She cannot possibly be fluent in all of these. Does this mean translanguaging is only possible for teachers who share their students home languages? The answer is no, but it does require rethinking the role of the teacher. In a translanguaging classroom, the teacher is not the sole source of linguistic authority. Students, especially older students, can serve as language experts for their own languages. Bilingual dictionaries, online tools used responsibly, and community volunteers can fill gaps. What matters is the teachers stance, the willingness to treat all languages as legitimate, and the ability to design tasks that make use of student expertise (Prada and Turnbull, 2021). Nonetheless, this raises important questions for #teacher_education. Programs that prepare teachers for #linguistically_diverse settings need to include explicit training in translanguaging pedagogy, in language awareness, and in strategies for working with languages the teacher does not speak. Recent research on teacher preparation in the United States, the United Kingdom, and the Netherlands suggests that this training remains uneven and often superficial. 7.2 Balancing Home Language and Dominant Language Development A second concern raised by critics is that translanguaging may inadvertently slow down the development of the dominant language, which students need in order to succeed in wider society. This is a fair concern, and it deserves a careful answer. The empirical evidence, taken as a whole, does not support the worry. Studies over the last decade consistently show that structured translanguaging supports rather than hinders dominant language development. However, this is true only when translanguaging is planned and purposeful. If home language use becomes an escape route that students use to avoid engaging with the target language entirely, gains in the target language may stall. The solution is not to ban home languages but to design activities where all languages are working together toward specific learning goals (MacSwan, 2022). There is also a question of equity within multilingualism. In many settings, students are pressured to become proficient in a global language, often English, even at the cost of losing their home language. Translanguaging can be part of a strategy to sustain home languages across generations, which matters for individual well being, family cohesion, and cultural continuity. 7.3 Risk of Superficial or Symbolic Implementation A third challenge is that translanguaging can become a slogan rather than a practice. Schools may announce that they value multilingualism while doing little to change instruction. A translation of the welcome sign into several languages is not translanguaging pedagogy. Neither is an occasional heritage language celebration. Real translanguaging requires structural change in how lessons are planned, how discussion is managed, how assessment is conducted, and how student work is valued (Poza, 2021). Researchers have documented cases where the term translanguaging was adopted in policy documents without corresponding changes in classroom practice. Teachers were told to be inclusive but were not given the training, materials, or time needed to implement the pedagogy. Predictably, outcomes did not improve, and translanguaging was blamed for a failure that was really a failure of implementation. 7.4 Assessment and Accountability Systems The largest structural challenge is assessment. As long as high stakes tests are administered in a single language, students who need translanguaging support are penalized twice, once by the difficulty of the content in a second language and again by the expectation that they demonstrate mastery in that language. Some jurisdictions have begun to experiment with #multilingual_assessment, offering bilingual versions of items or allowing home language responses on portions of tests. These experiments are important and should be studied and scaled where they succeed. Absent such changes, translanguaging teachers face a practical dilemma. If they invest heavily in home language use, will their students perform well enough on English only tests to advance? The evidence suggests yes, provided the pedagogy is well designed. But the pressure of accountability testing is real, and it shapes teacher behavior even when the underlying pedagogy would benefit from more flexibility. 7.5 The Question of Named Languages A more theoretical critique of translanguaging comes from scholars who worry that the framework may erase the importance of named languages themselves. If everything is one repertoire, do we lose the ability to talk about Igbo, Tamil, or Quechua as distinct languages with their own histories, communities, and rights? Some critics have argued that a strong version of translanguaging risks flattening the political struggles of language communities, particularly Indigenous communities, whose fight is often precisely to have their languages recognized as separate and legitimate (MacSwan, 2022). Most translanguaging theorists reject this critique as based on a misreading. Translanguaging does not deny the reality of named languages as social and political categories. It only questions their cognitive reality as separate systems in the bilingual mind. But the critique points to a real risk in practice. Translanguaging pedagogy should never be used as an excuse to neglect specific instruction in minority or Indigenous languages. In many contexts, students need both, translanguaging as a general pedagogical stance and dedicated instruction in each of their languages as distinct systems. 8. Implications and Conclusion 8.1 Implications for Teachers For classroom teachers, the implications of this review are practical. First, treat every students full linguistic repertoire as an asset. Even if you do not share their languages, communicate through your planning and your talk that these languages belong in learning. Second, design tasks that separate thinking from encoding, giving students space to reason in any language before producing final work in the target language. Third, make cross linguistic connections explicit, both to build metalinguistic awareness and to model the intellectual work of comparison. Fourth, document what you do. Records of student work and progress are your best defense against pressure to return to monolingual practice. 8.2 Implications for School Leaders School and district leaders shape the conditions in which teachers work. They should audit their own policies for hidden monolingual assumptions, including in hiring, communication with families, and student placement. They should invest in ongoing #professional_development on translanguaging pedagogy, not as a one time workshop but as sustained learning. They should build partnerships with families and communities that treat home languages as intellectual capital. And they should protect teachers who take pedagogical risks, providing air cover when parents or policymakers raise concerns. 8.3 Implications for Teacher Educators Teacher education programs must move beyond a token session on English learners toward a fully integrated approach in which #multilingual_pedagogy is treated as core, not peripheral. This means changing course syllabi, reading lists, and field placements. It also means recruiting and supporting a more #linguistically_diverse teacher workforce, one that reflects the students in schools rather than remaining a monolingual majority. 8.4 Implications for Policymakers For policymakers, the message from recent research is clear. Rigid monolingual policies are not producing the outcomes they promise. Students continue to underperform, gaps persist, and the linguistic wealth of communities is wasted. Policymakers should revise language of instruction policies to allow flexibility, invest in bilingual and multilingual programs where communities want them, develop assessment systems that recognize multilingual competence, and treat home language proficiency as an educational outcome worth pursuing rather than a distraction from real learning. 8.5 Implications for Researchers Finally, researchers have several open tasks. There is a need for more studies with rigorous outcome measures, comparing translanguaging and monolingual pedagogies on matched content. There is a need for longitudinal work that tracks students over years rather than a single semester. There is a need for research in underrepresented contexts, including African and Indigenous language settings, where translanguaging may look different from its usual portrayal in North American and European research. And there is a need for careful attention to the conditions under which translanguaging succeeds or fails, so that the pedagogy does not become a magic word that is applied without understanding. 8.6 Directions for Future Inquiry Despite the encouraging picture painted in this review, important questions remain open. The relationship between translanguaging and #digital_learning environments, for example, deserves closer attention. Online and hybrid instruction has expanded rapidly, and digital platforms create new opportunities for translanguaging through translation tools, multilingual chat, and multimodal composition, but they also risk automating away the human dimensions of language learning. Careful research is needed on how digital tools can support rather than replace pedagogical judgment. A second open area concerns very early childhood settings, where the child is still developing foundational language and literacy in more than one language at once. Some research suggests that early childhood translanguaging supports vocabulary breadth across languages, but longitudinal data are still limited. A third area is the intersection of translanguaging with subject specific literacies at the secondary and tertiary level. Different disciplines have different linguistic demands, and translanguaging strategies that work well for narrative writing may need adaptation for scientific argument, historical analysis, or mathematical proof. Comparative work across disciplines could help teacher educators develop more differentiated guidance. A fourth area concerns assessment innovation. Building fair, valid, and practical multilingual assessment instruments is a substantial technical challenge, but the alternative, continuing to test only in the dominant language, systematically underestimates what bilingual students know and can do. Educational measurement researchers have much to contribute here. Finally, more work is needed in contexts outside the well studied European and North American cases. Translanguaging in African, South Asian, Southeast Asian, Latin American, and Middle Eastern classrooms may look and function differently, and the field will be richer when these voices are more fully represented in the international research conversation. 8.7 A Note for Students and Early Career Educators For readers who are just beginning their journey as teachers, researchers, or graduate students, a final reflection may be useful. Translanguaging can feel intimidating when read at a theoretical level, but at the everyday level it is remarkably practical. It starts with listening. Notice how your students actually talk when they think no one is watching. Notice what languages they whisper in, what languages they scribble notes in, what languages they laugh in. Those are the languages of their thinking, and thinking is what schooling is supposed to develop. From that noticing, small changes follow. Allow a two minute pair discussion in any language before a class discussion. Accept a first draft in a home language and coach the student through revision into the target language. Post a word in more than one script on the board. Ask students to teach you a term you do not know. These small acts change the atmosphere of a classroom in ways that formal policy cannot always reach, and they build the professional judgment that will make you a stronger educator in whatever system you find yourself. Teaching in a monolingual system while believing in multilingualism is not easy. It requires patience, documentation, and community. But every classroom that opens up in this way is a small correction of a long historical mistake, the mistake of treating the ordinary multilingual life of most of humanity as a problem to be solved rather than a foundation to be built on. 8.8 Conclusion The world is multilingual and always has been. Schools that treat it otherwise are working against evidence, against equity, and against the actual minds of the children they serve. Translanguaging pedagogy does not solve every problem in education, but it does something important. It insists that the languages children bring with them to school are not obstacles to be managed but resources to be mobilized. It gives teachers concrete moves for turning that insistence into daily practice. And it aligns education more honestly with the world outside the classroom door, where communication happens across languages, cultures, and communities in ways that no single language can capture. Encouraging bilingual students to use their full linguistic repertoire, especially for the hardest #cognitive_tasks, does not weaken their command of the dominant language. It strengthens it, because thinking in more than one language produces sharper thinking overall. Students who translanguage well are not less prepared for a monolingual test, they are more prepared, because they understand ideas more deeply and can express them more precisely. The advantage is real, measurable, and available to any teacher willing to reorganize the linguistic architecture of the classroom. Institutional systems will not change overnight. But every classroom that opens its doors to the full repertoire of its students is a small act of change, and enough small acts, sustained over time, can shift what is possible. The multilingual student sitting in a monolingual system is not a problem to be fixed. She is a scholar who already knows things her teacher does not. The task of translanguaging pedagogy is to make sure that what she knows can enter the conversation, be built on, and become part of what the classroom, and eventually the system, understands. References Cenoz, J. and Gorter, D. (2021). Pedagogical translanguaging. Cambridge University Press. https://doi.org/10.1017/9781009029384 Cenoz, J. and Gorter, D. (2022). Pedagogical translanguaging and its application to language classes. RELC Journal, 53(2), 342 to 354. https://doi.org/10.1177/00336882221082751 Duarte, J. (2020). Translanguaging in the context of mainstream multilingual education. International Journal of Multilingualism, 17(2), 232 to 247. https://doi.org/10.1080/14790718.2018.1512607 Garcia, O. and Kleifgen, J. A. (2020). Translanguaging and literacies. Reading Research Quarterly, 55(4), 553 to 571. https://doi.org/10.1002/rrq.286 Li, W. (2022). Translanguaging as a political stance: Implications for English language education. ELT Journal, 76(2), 172 to 182. https://doi.org/10.1093/elt/ccab083 MacSwan, J. (Ed.). (2022). Multilingual perspectives on translanguaging. Multilingual Matters. https://doi.org/10.21832/9781800415690 Poza, L. E. (2021). Adding flesh to the bones: Translanguaging, embodiment, and critical consciousness. Multilingua, 40(4), 493 to 514. https://doi.org/10.1515/multi-2019-0143 Prada, J. and Turnbull, B. (2021). The role of translanguaging in the multilingual turn: Driving philosophical and conceptual renewal in language education. EuroAmerican Journal of Applied Linguistics and Languages, 8(1), 8 to 23. https://doi.org/10.21283/2376905X.13.204 Prilutskaya, M. (2021). Examining pedagogical translanguaging: A systematic review of the literature. Languages, 6(4), 180. https://doi.org/10.3390/languages6040180 Rajendram, S. (2021). Translanguaging as an agentive pedagogy for multilingual learners: Affordances and constraints. International Journal of Multilingualism, 20(2), 595 to 622. https://doi.org/10.1080/14790718.2021.1898619 Rajendram, S. (2022). The affordances of translanguaging as a pedagogical resource for multilingual English language classrooms. TESOL Quarterly, 56(2), 623 to 651. https://doi.org/10.1002/tesq.3131 Sah, P. K. and Li, G. (2022). Translanguaging or unequal languaging? Unfolding the plurilingual discourse of English medium instruction policy in Nepal's public schools. International Journal of Bilingual Education and Bilingualism, 25(6), 2075 to 2094. https://doi.org/10.1080/13670050.2020.1849011 Tai, K. W. H. and Li, W. (2021). Constructing playful talk through translanguaging in English medium instruction mathematics classrooms. Applied Linguistics, 42(4), 607 to 640. https://doi.org/10.1093/applin/amaa043 Ticheloven, A., Blom, E., Leseman, P. and McMonagle, S. (2021). Translanguaging challenges in multilingual classrooms: Scholar, teacher and student perspectives. International Journal of Multilingualism, 18(3), 491 to 514. https://doi.org/10.1080/14790718.2019.1686002 Vogel, S. and Garcia, O. (2020). Translanguaging. In G. Noblit (Ed.), Oxford Research Encyclopedia of Education. Oxford University Press. https://doi.org/10.1093/acrefore/9780190264093.013.181 Wang, D. (2020). Studying Chinese language in higher education: The translanguaging reality through learners eyes. System, 95, 102394. https://doi.org/10.1016/j.system.2020.102394 #translanguaging #multilingual_education #bilingual_learners #pedagogical_translanguaging #linguistic_repertoire #English_medium_instruction #language_policy #monolingual_mandates #classroom_diversity #linguistic_justice #metalinguistic_awareness #multilingual_pedagogy #emergent_bilinguals #language_and_learning #equity_in_education

  • Neurodiversity in the High-Stakes Testing Era: Redesigning Assessment Protocols

    High-stakes assessment has become the default gatekeeper of academic progress, credentialing, and opportunity across schools and universities worldwide. Yet the standard timed, heavily formatted examination rests on assumptions about attention, working memory, reading speed, and executive function that do not reflect the full range of learners in modern classrooms. This article examines how conventional testing systematically disadvantages students with #ADHD and #autistic processing styles, and it reviews the growing evidence base supporting Universal Design for Learning as a credible alternative for measuring #mastery. Drawing on recent scholarship in #inclusive_assessment, #neurodiversity studies, and higher education pedagogy, the paper argues that fairness and validity are not competing values but the same value seen from two angles. When an assessment penalizes a student for slow processing rather than weak knowledge, it produces a score that is both unjust and psychometrically noisy. The article proposes a redesigned protocol grounded in flexible timing, multiple response formats, transparent rubrics, and process-based evidence of learning. It closes by discussing the ethical, institutional, and policy conditions under which such reforms can move from pilot projects to mainstream practice, and by identifying the research gaps that still separate promising practice from settled science. Keywords Neurodiversity; Universal Design for Learning; high-stakes testing; ADHD; autism; assessment reform; inclusive pedagogy; executive function; test validity; higher education. 1. Introduction For most of the twentieth century, the timed written examination was treated as a neutral instrument. A student who knew the material would show it inside the allotted minutes, and a student who did not would fail. This view survives in policy documents, in admissions handbooks, and in the daily practice of thousands of classrooms. It also survives, quietly, in the design of the newer computer-based tests that increasingly replace paper booklets. Speed, silence, and a narrow band of acceptable formats remain the shared language of #high_stakes_testing. The trouble is that this language was never truly neutral. It was written for a specific kind of #test_taker: one who reads quickly, writes fluently under pressure, holds several instructions in mind at once, filters out background noise without effort, and recovers from small mistakes without spiraling. Students who match this profile perform as expected. Students whose #cognitive_profile differs, including many students with #attention_deficit_hyperactivity_disorder and many #autistic students, produce scores that under-represent what they actually know (Pellicano & den Houting, 2022; Dwyer, 2022). The score gap in these cases is not a knowledge gap. It is a design gap. This distinction matters because #assessment scores drive consequential decisions. They determine progression, scholarship eligibility, professional licensure, and, in many systems, the tracking of children into academic pathways from an early age. When a testing format systematically mis-measures a subgroup of learners, the harm is compounded across years and across institutions. A student who is repeatedly told, through low grades, that they cannot succeed academically will internalize that message even when the underlying cause is a mismatch between their #processing_style and the test format (Hamilton & Petty, 2023). The #neurodiversity paradigm offers a different starting point. It treats variation in cognition, attention, and sensory processing as an expected feature of the human population rather than a set of deficits to be corrected (Dwyer, 2022; Pellicano & den Houting, 2022). From this vantage point, the question is not how to help #neurodivergent students survive assessments designed for someone else. The question is how to build assessments that give every student a fair chance to demonstrate what they have learned. This reframing is not a rhetorical flourish. It has direct implications for test design, scoring, and interpretation. #Universal_Design_for_Learning, developed originally in the special education and cognitive neuroscience literature and refined over three decades of classroom application, provides a practical framework for that redesign (CAST, 2024). UDL asks educators to plan for variability from the start, to offer multiple means of engagement, representation, and expression, and to treat #accessibility as a property of the assessment rather than an accommodation bolted on afterward. Recent empirical work suggests that UDL-aligned assessments can preserve rigor while reducing the performance penalty associated with neurodivergent profiles (Nieminen, 2022; Tai et al., 2023). This article makes three contributions. First, it synthesizes recent scholarship on how conventional #timed_testing disadvantages ADHD and autistic learners, distinguishing effects on #construct_validity from effects on well-being. Second, it maps those findings onto specific UDL principles and shows where the evidence for each principle is strong, promising, or still thin. Third, it proposes a redesigned assessment protocol that institutions can pilot without abandoning the accountability functions that #standardized_testing is asked to perform. The aim is not to romanticize alternative assessment nor to dismiss the legitimate uses of timed tests. The aim is to make assessment do the job it claims to do: measure #learning, accurately, for the students actually in the room. 2. Literature Review 2.1 The Neurodiversity Paradigm and Its Educational Consequences The term #neurodiversity was introduced in the late 1990s by the autistic sociologist Judy Singer and has since moved from advocacy into mainstream academic discourse (Chapman & Botha, 2023). At its core, the paradigm makes two related claims. The descriptive claim is that human cognition varies along multiple dimensions, including attention regulation, sensory sensitivity, social processing, memory, and executive function, and that this variation is a stable feature of the population rather than a temporary disorder in a small minority. The normative claim is that this variation should be accommodated by social and institutional design, in the same way that architectural design accommodates variation in physical mobility (Pellicano & den Houting, 2022). In education, the paradigm shifts the burden of adjustment. Under the older #medical_model, an ADHD or autistic student was expected to receive treatment, learn compensatory strategies, and eventually approximate the behavior of a #neurotypical peer, at which point the environment could remain unchanged. Under the neurodiversity paradigm, the environment itself is scrutinized for features that create avoidable friction (Dwyer, 2022). A classroom with harsh fluorescent lighting, an assessment protocol that penalizes slow handwriting, or a lecture format that requires sustained silent attention for ninety minutes are not neutral defaults. They are choices, and they can be chosen differently. Recent scholarship has emphasized that the paradigm is not a claim that #neurodivergent people experience no difficulty or need no support (Dwyer, 2022; Chapman & Botha, 2023). ADHD is associated with genuine challenges in sustained attention, time perception, and working memory, and autistic experience often includes sensory overload, difficulty with abrupt transitions, and exhaustion from social masking. The paradigm's claim is narrower: that these challenges are shaped by environmental fit as much as by individual biology, and that when the environment can be redesigned, it usually should be. Assessment is one of the environments most open to redesign, because its features are set by institutional choice rather than by physical constraint. 2.2 ADHD, Autism, and the Cognitive Demands of Timed Testing To see why conventional testing produces distorted scores for many #neurodivergent students, it helps to look closely at what a timed exam actually requires. A standard two-hour written examination demands sustained #attention across the full window, working memory to hold instructions and partial answers in mind, executive function to plan and pace responses across sections, fluent reading to process items at speed, and self-regulation to manage anxiety and to recover from difficult items without losing time on later ones (Sedgwick-Müller et al., 2022). Each of these demands is a separate cognitive task, and each is a task on which many ADHD and autistic students diverge from the neurotypical average. For students with ADHD, the most consistent findings concern #time_perception and pacing. Research on adult and adolescent ADHD documents difficulties in estimating elapsed time, in resisting distraction across long tasks, and in maintaining consistent output when a task is not intrinsically engaging (Sedgwick-Müller et al., 2022; Weyandt et al., 2023). In an exam context, these difficulties compound. A student may spend too long on early items, notice too late that the clock is against them, and rush the remainder in a state of heightened stress. The resulting score reflects an interaction between the student's knowledge, their pacing, and the strictness of the time limit. Removing the pacing variable, either by extending time meaningfully or by allowing scheduled breaks, tends to raise scores for ADHD students without raising them for #neurotypical peers, which is the expected pattern when a design change targets a source of #construct_irrelevant_variance (Lovett & Lewandowski, 2023). For autistic students, the picture is more heterogeneous. Some autistic students perform strongly on timed tests, particularly when the format is highly structured and the content aligns with intense interest areas. Others struggle for reasons that have less to do with time and more to do with #sensory processing, ambiguous instructions, and the social demands of the testing environment (Clouder et al., 2020; Hamilton & Petty, 2023). Fluorescent lighting, other students' movements, the pressure of an invigilator's presence, and small variations in question phrasing that require inference about the examiner's intent can all consume cognitive resources that would otherwise go to the question itself. A test that asks "briefly discuss" or "give a few examples" without specifying how many or how long is not a neutral instruction. It is a request for #social_inference that some autistic students find genuinely opaque. There is also a mental health dimension that recent studies have brought into clearer focus. Both ADHD and autistic students report high rates of test anxiety, and the anxiety is not simply a byproduct of poorer expected performance (Hamilton & Petty, 2023; Scheef et al., 2023). It reflects accumulated experience of being penalized for cognitive features they cannot easily change, and it interacts with the cognitive load of the test itself. A student who has failed timed exams before enters the next one with an elevated baseline of physiological stress, which further degrades working memory and attention (Weyandt et al., 2023). This is a feedback loop that assessment design can either amplify or dampen. 2.3 Construct Validity and the Fairness Argument The strongest argument for redesign is not a compassion argument. It is a #validity argument. Every assessment claims to measure a specific construct, whether that is mastery of a body of content, ability to reason in a discipline, or readiness for the next stage of study. The score is a #proxy for the construct, and the proxy is only useful to the extent that it isolates the construct from irrelevant sources of variance (Nieminen, 2022). A test that measures both content mastery and reading speed will produce different scores for two students with identical content mastery but different reading speeds. In psychometric terms, reading speed has become part of the construct, whether or not the test designers intended it to be. This is the concept of #construct_irrelevant_variance, and it is where the neurodiversity discussion meets classical test theory. When a timed test produces a lower score for an ADHD student who knows the material as well as a peer, the difference between the two scores is not measuring knowledge. It is measuring the interaction between the student's pacing and the test's clock. If a scholarship committee treats the score as a measure of knowledge, they are drawing a conclusion the test cannot support (Nieminen, 2022; Tai et al., 2023). The problem is not that the test is unfair in a vague moral sense. The problem is that it does not measure what it claims to measure. This framing has practical value because it reframes accommodation as validity work rather than as a favor. Extended time, quiet rooms, or the option to type rather than handwrite are often described as accommodations that #level_the_playing_field. Under the validity framing, they are corrections to a measurement instrument that would otherwise produce misleading scores (Lovett & Lewandowski, 2023). This framing also clarifies why accommodations should not necessarily be limited to students with formal diagnoses. If a design feature produces construct-irrelevant variance for a broad group of learners, changing that feature improves the measurement for everyone. 2.4 Universal Design for Learning as an Assessment Framework #UDL emerged from work at the Center for Applied Special Technology in the 1990s and has been refined into a set of guidelines that address three broad networks of learning: engagement, representation, and action and expression (CAST, 2024). Applied to assessment, the framework asks designers to consider variability at each stage. How will students be engaged with the task? How will information be presented to them? And how will they be allowed to demonstrate what they know? The engagement dimension has received less attention in the assessment literature than the other two, but it is not trivial. A task that is genuinely engaging reduces the executive burden required to sustain attention, which particularly benefits ADHD students (Lambert et al., 2021). This does not mean assessments should be entertainment. It means that when the content of a task connects to a purpose the student can see, the student is drawing on #intrinsic_motivation rather than fighting against boredom. Portfolio assessments, project-based tasks, and authentic problem-solving scenarios tend to score higher on this dimension than sit-down examinations, although the trade-offs in comparability and administrative cost are real. The representation dimension addresses how test content reaches the student. A single format, typically dense prose read silently under time pressure, disadvantages students whose reading is slower, who process better with visual scaffolding, or whose comprehension improves when they can hear the text as well as read it (Griful-Freixenet et al., 2020). Offering the same content in multiple formats, or allowing text-to-speech tools by default, reduces this bottleneck. The evidence here is stronger for classroom assessment than for high-stakes standardized testing, where equivalence across formats is harder to guarantee (Nieminen & Pesonen, 2022). The #action_and_expression dimension is where UDL and traditional assessment collide most directly. Traditional testing typically permits one output mode: handwritten prose or a set of multiple-choice selections. UDL asks whether the same knowledge could be demonstrated through a spoken response, a diagram, a structured project, or a combination of formats. For autistic students who find open-ended written prose difficult but who can produce sophisticated structured responses, this flexibility can be transformative (Hamilton & Petty, 2023; Nieminen, 2022). The design challenge is to ensure that different output modes measure the same construct at comparable difficulty, which is achievable but requires deliberate rubric work. 2.5 Evidence on Alternative Assessment Formats The empirical literature on alternative assessment has grown substantially in the last five years, driven partly by pandemic-era shifts to remote learning and partly by broader interest in #inclusive_assessment. Several patterns are now reasonably well-established (Tai et al., 2023; Nieminen, 2022; O'Neill & Padden, 2022). First, extended time is the most-studied accommodation and the one with the strongest evidence base. Studies comparing standard time to extended time consistently find that students with ADHD and specific learning differences benefit more from additional time than neurotypical peers do, which is the pattern expected if the extension is correcting for construct-irrelevant variance rather than inflating scores generally (Lovett & Lewandowski, 2023). The evidence does not support the once-common concern that extended time provides an unearned advantage. It suggests that the standard time limit was too tight for a subgroup of students all along. Second, format flexibility appears to help autistic students in particular, although the studies are smaller and more varied. Allowing students to choose between essay and structured-response formats, or between oral and written presentation, tends to raise reported measures of #self_efficacy and to reduce test anxiety without depressing measured achievement (Hamilton & Petty, 2023). The main methodological challenge in this literature is that self-selection into formats can confound comparisons, and more randomized designs are needed. Third, #process_based_assessment, in which students demonstrate mastery through a body of work developed over time rather than through a single high-stakes event, shows promise for #neurodivergent learners but faces significant institutional barriers (Nieminen, 2022; Boud et al., 2023). Portfolios, capstone projects, and cumulative competency demonstrations reduce the acute cognitive load of a single test day and give students the chance to show growth over time. The trade-off is that portfolio assessment is harder to standardize across large populations and more labor-intensive to score reliably. Fourth, the evidence on #open_book and #take_home examinations is genuinely mixed. Some studies find that removing the memory retrieval component and allowing reference to materials during the exam benefits students with working memory difficulties without reducing measurement of higher-order reasoning (Johanns et al., 2022). Others find that the reduced time pressure of take-home formats can paradoxically increase anxiety by extending the exam window across days and by inviting perfectionism. The design details, particularly the specificity of prompts and the clarity of scope, appear to matter more than the format label. 2.6 What the Literature Does Not Yet Settle Several important questions remain open. The first is the extent to which UDL-aligned assessments can be scaled to genuinely high-stakes contexts such as university admissions, medical licensure, and legal bar examinations. Most published studies are conducted in classroom or program contexts where the assessment is consequential but not gatekeeping in the strongest sense (Tai et al., 2023). Scaling to gatekeeping contexts raises legitimate concerns about #comparability, security, and public trust that pilot studies have not fully answered. The second open question concerns the interaction of design changes with other identity variables. Students who are neurodivergent and also from #low_income backgrounds, or who are neurodivergent and second-language learners, or who are neurodivergent and physically disabled, may experience assessments in ways that a single-axis analysis misses (Griful-Freixenet et al., 2020). The literature has not yet caught up with this #intersectional complexity. The third open question is the durability of design changes over time. A UDL-aligned assessment that works well in a small program with committed faculty may lose fidelity when rolled out across a large institution with variable buy-in. The implementation science of assessment reform is thinner than the design literature, and this gap has practical consequences for policy (O'Neill & Padden, 2022). 3. Methodology This article is a #narrative_review synthesis rather than a systematic review, and the methodological choice is deliberate. Systematic reviews are strongest when a well-defined empirical question can be answered by pooling comparable studies. The question addressed here is broader: how does the design of high-stakes assessment interact with neurodivergent cognitive profiles, and what does this imply for redesign? Answering this question requires drawing on empirical studies, theoretical accounts of neurodiversity, psychometric analyses of construct validity, and applied work on UDL, none of which sit comfortably in a single meta-analytic framework. The literature was identified through targeted searches of major education, psychology, and disability studies databases, focusing on work published between 2020 and 2026. Search terms combined variants of #neurodiversity, ADHD, autism, and high-stakes assessment with terms for UDL, inclusive assessment, and test accommodation. Additional sources were identified through backward citation tracing from key review articles and through examination of major journals in the field. Studies were included when they addressed either the effect of assessment design on neurodivergent learners or the conceptual framework of inclusive assessment. Grey literature was consulted for context but not treated as evidence for empirical claims. The synthesis prioritized peer-reviewed empirical work for causal claims and used theoretical and conceptual literature to frame the discussion. Two limitations of this approach should be acknowledged. First, narrative reviews are more susceptible to selection bias than systematic reviews, and readers should treat the specific studies cited as illustrative rather than exhaustive. Second, the empirical literature on neurodivergent assessment is still developing, and some claims that appear well-supported in current work may not survive replication. 4. How Standard Assessment Design Penalizes Neurodivergent Learners 4.1 The Time Variable The clock is the single most consequential feature of a conventional examination. Yet the choice of any specific time limit is rarely justified in a technical sense. Time limits are typically inherited from prior versions of a test, adjusted by rule of thumb, or set by administrative convenience rather than by systematic study of how long students actually need to demonstrate mastery (Lovett & Lewandowski, 2023). This is a striking gap in a domain that otherwise takes psychometric rigor seriously. For ADHD students, the time variable produces two related problems. The first is pacing. Difficulties in time perception mean that ADHD students often misjudge how long has passed or how long a specific item is taking, which produces the familiar pattern of strong performance on early items and rushed or incomplete performance on later items (Sedgwick-Müller et al., 2022). The second is #executive_function load. A tight time limit forces the student to run a background process that monitors elapsed time, evaluates progress, and adjusts strategy accordingly. This monitoring competes for the same executive resources that the test content requires, and the competition is more costly for students whose executive function is already stretched. The design implications are clearer than the policy implications. From a design perspective, the answer is either to extend the time meaningfully, to remove the time limit for many assessment types, or to break the assessment into shorter segments with scheduled breaks. From a policy perspective, extending time on a high-stakes standardized test raises questions about score comparability across administrations. These questions are addressable, but they require deliberate psychometric work that most systems have not yet undertaken. 4.2 Format Rigidity A second consequential design choice is the rigidity of the required response format. A test that permits only handwritten prose answers is measuring at least three things: content mastery, prose production under pressure, and #handwriting fluency. For most students the second and third contribute little variance, but for students with dysgraphia, hand tremor, or autistic profiles that make extended prose production difficult, they can dominate the score (Clouder et al., 2020). The problem is not that any specific format is wrong. The problem is that a single required format converts every difference in students' expressive strengths into a difference in measured achievement. A student who can explain a complex proof orally in five minutes but who struggles to write it out in twenty is not less capable in mathematics. They are less capable at one specific expressive channel. When the test requires that channel and only that channel, the score confounds mathematical reasoning with prose fluency. Format flexibility does not require abandoning rigor. It requires deliberate work on #equivalent_rubrics that assess the same underlying construct across multiple response modes. This work is non-trivial, but it is doable, and several disciplines have developed multi-format assessment protocols that maintain reliable scoring (Nieminen, 2022; Boud et al., 2023). 4.3 The Sensory and Environmental Design of Testing Rooms The physical environment of testing is often treated as a background condition rather than a design choice. In practice it is a design choice, and one that particularly affects autistic students. Standard testing rooms tend to feature fluorescent overhead lighting, hard reflective surfaces, ambient noise from ventilation and other students, and the physical presence of invigilators moving through the space. Each of these features is a potential source of #sensory_load, and the load consumes cognitive resources that the test itself requires (Hamilton & Petty, 2023). The literature on autistic sensory processing has documented substantial individual variation, which means that a single alternative environment cannot serve all autistic test takers. But the pattern of design choices in standard rooms is not neutral either. It tends to maximize sensory input rather than minimize it, on the assumption that a neurotypical student can filter the input at low cognitive cost. For students whose filtering is more effortful, the standard room is not a level baseline. It is an active source of interference. Simple design changes can reduce this interference. Softer or task-focused lighting, sound-absorbing surfaces, permission to use noise-canceling headphones, smaller room sizes, and reduced invigilator movement all appear in the literature as low-cost modifications with meaningful effects (Clouder et al., 2020). None of these changes threaten the security or comparability of the assessment itself. 4.4 The Hidden Curriculum of Test Instructions Test instructions are the interface between the examiner's intent and the student's response. In principle they should be transparent. In practice they often carry a #hidden_curriculum of expectations that are legible to students familiar with the genre and opaque to students who are not. Phrases like "briefly explain," "give some examples," "discuss critically," or "in your own words" ask the student to infer what the examiner wants without stating it directly. Familiar test takers have learned that "briefly" typically means one paragraph, that "some examples" typically means three, and that "critically" typically means presenting more than one view. Students who read instructions literally, including many autistic students, may spend cognitive resources trying to determine what the examiner actually intends, or may produce responses that meet the instruction as written but not as intended (Hamilton & Petty, 2023; Nieminen, 2022). The remedy is instruction transparency. Specifying word counts, numbers of examples, and expected structural features reduces the load of inference and lets students focus on content. Some assessment designers worry that this level of specificity constrains creativity, but the evidence suggests it does not. Students who know what is expected can meet the expectation and then go beyond it, whereas students who do not know what is expected often produce responses that miss the target for reasons unrelated to knowledge. 4.5 The Cumulative Cost Any single design feature discussed above is a modest source of #construct_irrelevant_variance. The cumulative cost is larger than the sum, because the features interact. A student who is already fatigued by sensory load will find pacing harder. A student who is anxious about pacing will find ambiguous instructions harder to parse. A student who has parsed ambiguous instructions incorrectly will lose time correcting course, which compounds the pacing problem. The result is that #neurodivergent students often lose ground for reasons that no single feature would fully explain, and that accommodation of a single feature only partially addresses. This cumulative logic is the strongest argument for a whole-system redesign rather than a menu of individual accommodations. A UDL-aligned assessment protocol addresses the features together, and the benefits interact in the opposite direction. Reduced sensory load makes pacing easier, clear instructions reduce anxiety, format flexibility reduces the cost of any single expressive weakness, and the whole becomes measurably fairer without becoming easier (Nieminen, 2022; O'Neill & Padden, 2022). 5. Redesigning Assessment Protocols: A Proposed Framework The framework proposed here is not a finished protocol. It is a set of design principles that institutions can adapt to their context, along with practical illustrations of how each principle might be operationalized. The framework treats assessment as a system with four components: the task, the environment, the response, and the scoring. Redesign should address all four, and the strongest gains come when they are addressed together. 5.1 Task Design Task design is the starting point because it determines what the assessment claims to measure. A well-designed task isolates the target construct as far as possible from irrelevant demands. In practice this means several things. First, the task should be aligned to specified #learning_outcomes rather than to a generic content area. If the outcome is "explain the mechanism of action of a specific class of drugs," the task should ask exactly that, in language that closely tracks the outcome. If the outcome is "apply a framework to an unfamiliar case," the task should present an unfamiliar case. Vague prompts that require students to guess what content is expected introduce irrelevant variance from the start (Boud et al., 2023). Second, the task should specify its own structural expectations. Word counts, number of parts, expected format, and criteria for a strong response should be visible in the task itself, not inferred from convention. This transparency benefits all students and particularly benefits students who read literally or who lack familiarity with the specific assessment genre. Third, the task should minimize #reading_load that is not part of the target construct. If the assessment is measuring mathematical reasoning, dense prose framing of the problems introduces reading speed as an unintended factor. Simpler framing, visual scaffolding, or the option of audio access to the same content reduces this bottleneck without reducing the difficulty of the underlying task. Fourth, where feasible, the task should offer more than one response format, with equivalent scoring across formats. A student might choose between a written essay and a structured oral presentation, or between a traditional problem set and a design-and-defend project. This is where the design work is most demanding, because equivalence across formats requires rubrics that are format-neutral. The investment in developing these rubrics pays off in more accurate measurement across a wider student population (Nieminen, 2022; Tai et al., 2023). 5.2 Environmental Design Environmental design attends to the physical and temporal conditions under which the assessment takes place. Even when the task is well designed, a hostile environment can dominate the resulting score. The most important environmental variable is time. The framework here recommends that time limits be justified rather than assumed. A time limit is defensible when it is measuring a construct that includes speed. It is not defensible when it is measuring a construct that does not, in which case the limit is a source of construct-irrelevant variance. For most academic knowledge and reasoning tasks, the construct does not include speed, and the time limit should be either loose enough that pacing is not a significant factor for any student, or removed entirely in favor of a fixed-effort model (Lovett & Lewandowski, 2023). Sensory conditions are the second important environmental variable. The framework recommends that testing rooms be designed with sensory variability in mind. This includes moderated lighting, sound treatment, reduced invigilator movement, and permission for personal sensory tools such as noise-canceling headphones and #stim_toys where they do not interfere with security. These changes are inexpensive and reduce a documented source of load for autistic and sensory-sensitive students (Hamilton & Petty, 2023). The framework also recommends #scheduled_breaks for assessments longer than sixty minutes. A short break every hour, during which students can move, drink water, or use the bathroom, reduces the executive load of sustained attention without providing any content advantage. This is one of the most cost-effective changes an institution can make, and it benefits a broad range of students including but not limited to those with ADHD. 5.3 Response Design Response design addresses what students are permitted to do to demonstrate what they know. The framework's central recommendation is #response_flexibility within construct-preserving constraints. Concretely, this might look like the following. For a task assessing knowledge of a body of content, students might choose to respond in written prose, structured short-answer format, or annotated diagram, with rubrics that treat each format as an equivalent demonstration of content mastery. For a task assessing analytical reasoning, students might choose between written analysis and an oral defense with a marker, with the oral defense scored on the same reasoning criteria. For a task assessing #applied_skill, students might submit a project artifact with an accompanying reflection that explains the reasoning behind key decisions. The framework does not recommend that every assessment offer every possible format. That would be impractical and would dilute rubric quality. It recommends that the assessment offer at least two formats where feasible, and that the choice between formats be a genuine choice rather than a fallback that carries implicit stigma. Students should not have to disclose a diagnosis to choose an oral format over a written one. An additional response design consideration is the treatment of #metacognition. Assessments that include a brief structured reflection, in which students explain how they approached the task and what they found difficult, generate additional information about the student's reasoning process and reduce the score's dependence on a single point-in-time performance (Boud et al., 2023). Metacognitive components should be short, structured, and separately scored so they do not become an additional writing task in disguise. 5.4 Scoring Design Scoring design closes the loop. Even the best task, environment, and response design can be undermined by scoring that reintroduces bias. The framework recommends four scoring principles. First, #rubrics should be #transparent and shared with students in advance. Transparency is not a threat to rigor. It is a condition of fair measurement, because it allows students to direct their effort toward the criteria that will actually be scored. Second, rubrics should describe #performance in terms of demonstrated understanding rather than surface features of the response. A rubric that rewards "clear, well-organized prose" is measuring prose fluency alongside understanding, and disadvantages students whose understanding is strong but whose prose fluency is not. A rubric that describes "accurate identification of key concepts, correct application to the case, and coherent reasoning across steps" can be applied fairly across multiple response formats. Third, scoring should be #calibrated across markers, particularly for assessments that offer format flexibility. Multi-marker moderation is essential, because format equivalence depends on markers applying the rubric consistently across formats. Institutions that have piloted multi-format assessment have found that marker training is the single largest implementation cost, and that skipping it produces the very inconsistency the framework is designed to prevent (Tai et al., 2023). Fourth, #high_stakes_decisions should not rest on a single assessment point. Where consequential decisions must be made, the framework recommends #triangulation across multiple assessment types over time. A student's academic profile is more accurately captured by a combination of a timed test, a portfolio, a project, and a reflective component than by any one of these alone. Triangulation both improves measurement and reduces the impact of a single bad test day, which particularly benefits students whose performance varies more from day to day. 5.5 A Worked Example To make the framework concrete, consider a redesigned undergraduate assessment for an introductory biology course. The original assessment was a two-hour written examination with fifty multiple-choice items and two short essay questions, administered in a large exam hall. Under the framework, the redesigned assessment might comprise three components. The first is a #knowledge_check module completed online during a two-week window, consisting of item types drawn from the original test but with untimed response (subject to the two-week window) and audio access to items. The second is a #case_analysis module in which students receive a novel case and produce a structured response, with the choice of written analysis, annotated diagram with narration, or recorded oral analysis. The third is a #reflective_component in which students briefly explain how they approached the case and what they would do differently on a similar future task. The overall assessment is scored against the same learning outcomes as the original test. The rubric is shared in advance. Markers are trained on cross-format equivalence. Students who prefer a single sit-down format can complete all three components in a proctored session if they choose. The redesigned assessment maintains rigor, addresses the specific sources of #construct_irrelevant_variance discussed above, and produces a richer picture of student mastery than the original. Pilot implementations of this kind of redesign have reported comparable or higher overall pass rates, reduced gaps between neurodivergent and neurotypical students, and improved student reports of well-being, with acceptable marker workload after initial rubric development (Nieminen, 2022; O'Neill & Padden, 2022). These findings are promising but should be replicated across a wider range of disciplines and institutions before they support strong causal claims. 6. Implementation Challenges and How to Address Them Design frameworks are easier to write than to implement. The literature on assessment reform is candid that many well-designed pilots fail to scale, and that the reasons for failure are usually institutional rather than technical (O'Neill & Padden, 2022; Boud et al., 2023). Four challenges are especially common. 6.1 Faculty Capacity and Buy-In Redesigning assessments requires faculty time and expertise. Faculty members are typically evaluated on research output more than on teaching, and the incentive structure often does not reward the investment that redesign requires. Even faculty who are sympathetic to the goals of #inclusive_assessment may feel unable to commit the necessary time to rubric development, format expansion, and marker training. Addressing this challenge requires institutional signals that assessment redesign is valued. This can take several forms, including recognition in promotion criteria, small grants for redesign work, and shared institutional infrastructure for rubric development and marker training. Perhaps most importantly, it requires visible institutional protection of faculty who pilot new approaches, so that early implementation difficulties do not translate into negative teaching evaluations that penalize the very faculty doing the reform. 6.2 Concerns About Rigor and Comparability A recurring concern is that flexible assessments dilute rigor or make comparisons across students unfair. The concern deserves serious engagement rather than dismissal. In some poorly designed implementations, format flexibility has been introduced without corresponding investment in rubric equivalence, and the resulting scores have been genuinely non-comparable. This is a real failure mode, and one that critics of UDL are correct to point out. The response is not to abandon flexibility but to invest in the technical work that makes flexibility rigorous. Cross-format rubrics, multi-marker moderation, and periodic psychometric analysis of format effects are all established tools. When they are used, format flexibility does not compromise comparability. When they are skipped, it does. The rigor question is a real one, but it is a question about implementation quality rather than about the framework itself (Nieminen, 2022; Tai et al., 2023). 6.3 High-Stakes Standardized Contexts Adapting the framework to standardized admissions or licensure testing is genuinely harder than adapting it to classroom assessment. Standardized tests are administered to large populations, scored on tight timelines, and used to make decisions with significant legal and economic consequences. Change is slower in these contexts for reasons that are not purely inertia. Several avenues nonetheless appear promising. The most conservative involves generalizing existing accommodations. If extended time and quiet rooms benefit a broad group of students without disadvantaging any subgroup, expanding these features to more students by default is a low-risk change that could be piloted at scale. A middle path involves offering test takers a choice among a small set of validated formats, with careful psychometric equating across formats. The most ambitious path involves moving toward #modular_assessment in which the test is delivered in shorter sessions across a window, with more flexible pacing within each session. None of these paths is straightforward, but each has been discussed in the recent testing policy literature and each has partial precedents (Lovett & Lewandowski, 2023). The main obstacle is institutional risk aversion, which is understandable given the consequences of a testing failure, but which also produces a status quo bias that keeps in place features whose validity is more assumed than demonstrated. 6.4 The Diagnosis Question Under most current systems, formal accommodations require formal diagnoses. This produces two problems. First, diagnosis is expensive, time-consuming, and unevenly available, which means that students from #low_income backgrounds or in under-resourced regions may be neurodivergent but undiagnosed and therefore ineligible for support. Second, tying support to diagnosis reinforces the idea that the student is the problem, which contradicts the whole logic of the neurodiversity paradigm. UDL-aligned assessment addresses this by making many design features available to all students by default, without requiring disclosure. Extended time, format flexibility, and sensory-considered environments do not need to be individually justified if they are built into the standard protocol. This reduces the burden on students, reduces the administrative cost of case-by-case accommodation, and avoids the disclosure dilemma that many neurodivergent students face (Chapman & Botha, 2023; Dwyer, 2022). Diagnosis-based accommodation can still exist for the smaller set of features that genuinely need to be individualized, but it should not be the default. 7. Ethical and Policy Implications Beyond the immediate design questions, redesign of high-stakes assessment raises broader ethical and policy questions that deserve explicit attention. 7.1 Fairness, Access, and the Distribution of Opportunity Assessment is one of the primary mechanisms by which societies distribute opportunity. When an assessment produces systematically distorted scores for a subgroup of learners, the distribution is distorted accordingly. The neurodiversity discussion is therefore not only about individual students. It is about the social function of assessment as a #gatekeeper and about who is being kept out (Nieminen, 2022). There is a legitimate ethical argument that a system which measures inaccurately, and which distributes opportunity based on those inaccurate measurements, owes a duty of correction to the affected learners. This duty does not require pretending that all students have identical strengths. It requires that the measurement instrument not artificially amplify differences that are not part of the construct being measured. The validity argument and the fairness argument, in this sense, are the same argument stated at different levels of abstraction. 7.2 Privacy and Disclosure Any assessment protocol that treats accommodation as diagnosis-gated requires students to disclose. Disclosure has costs. It exposes students to potential stigma from peers, from faculty, and from future employers. It requires them to construct a formal medical narrative about themselves, which not all #neurodivergent people wish to do. And it puts sensitive information into institutional records with uncertain future uses (Chapman & Botha, 2023). Universal design reduces the disclosure burden by making many supports available to all. It does not eliminate the need for disclosure entirely, but it changes the ratio of accessibility that is built in to accessibility that requires a formal request. This is a meaningful improvement in privacy without any corresponding cost. 7.3 The Purpose of Assessment The redesign discussion also invites a broader conversation about what #assessment is for. Historically, assessment has served three overlapping purposes: certifying that students have learned specific content, sorting students for further opportunity, and providing feedback that helps students improve. These purposes are often in tension. A test that is optimal for sorting is not always optimal for feedback, and vice versa. A #neurodiversity_aware framework does not resolve this tension, but it does suggest that current assessment systems have drifted too far toward sorting and away from certification and feedback. When assessment is primarily about sorting, small differences in performance carry large consequences, and any source of noise in the measurement produces large consequences from small design features. When assessment is primarily about certifying mastery and providing feedback, the same noise is less consequential because the decision being made is less binary (Boud et al., 2023). Rebalancing the purposes of assessment is a longer conversation, but it is one that the neurodiversity discussion helps to reopen. 7.4 Policy Levers Real change in assessment practice depends on policy levers that operate at institutional, national, and international scales. At the institutional level, universities and school systems can revise assessment policies to require justification of time limits, to encourage format flexibility, and to invest in the technical infrastructure that supports rigorous multi-format assessment. At the national level, accreditation bodies and examination authorities can update guidance to reflect the current evidence on inclusive assessment. At the international level, professional bodies that oversee licensure examinations can share psychometric work on format equivalence rather than each developing it in isolation. None of these levers is quick to move. But the direction of movement in recent years has been positive, and the pace has accelerated in the last five years as more institutions have piloted UDL-aligned approaches and reported results (Nieminen, 2022; O'Neill & Padden, 2022; Tai et al., 2023). 8. Limitations of This Article and Directions for Future Research This article is a synthesis rather than an empirical study, and its conclusions inherit the limitations of the literature it draws on. Three limitations deserve explicit mention. First, the empirical base for many specific design recommendations is thinner than the confident tone of some UDL guidance suggests. Extended time is well-supported, format flexibility has promising initial evidence, and process-based assessment has strong theoretical grounding but variable empirical support. The framework proposed here should be read as a design hypothesis to be tested rather than as a set of established best practices. Second, most of the empirical literature is drawn from #higher_education contexts in high-income countries. The extent to which the findings generalize to secondary schools, to vocational training, and to educational systems in different resource environments is not well established (Griful-Freixenet et al., 2020). Cross-cultural replication and adaptation are important research priorities. Third, the literature is disproportionately focused on ADHD and autism among the many possible dimensions of #cognitive_variation. Dyslexia, dyscalculia, developmental coordination disorder, and various profiles that combine features across categories are less well-studied in relation to inclusive assessment, and the framework should be tested against these profiles as well. Future research should prioritize several directions. Larger and more rigorous comparisons of specific design changes, ideally with random assignment where ethically feasible, would strengthen the evidence base. Longitudinal studies of students who experience UDL-aligned assessment across multiple courses would help distinguish short-term effects from durable ones. Implementation science studies of how design changes are adopted, adapted, or abandoned in different institutional contexts would help translate promising practice into scalable policy. And qualitative work with neurodivergent students themselves would ensure that the design conversation remains grounded in the experiences it aims to improve (Hamilton & Petty, 2023; Chapman & Botha, 2023). 9. Conclusion The high-stakes examination in its conventional form is neither a natural fact nor a neutral instrument. It is a design artifact, built for a specific kind of test taker and inherited from a specific historical moment. Its features, including tight time limits, single response formats, sensory-loaded environments, and instruction genres that presume familiarity, produce systematically distorted scores for many #neurodivergent students. The distortion is not a fairness problem in a soft or discretionary sense. It is a #validity problem that undermines the score's claim to measure what it purports to measure. Universal Design for Learning offers a set of principles for redesigning assessment so that it measures the intended construct more accurately across a wider range of learners. The principles are not new, and the evidence for their effectiveness in classroom contexts is now substantial. Scaling them to genuinely high-stakes standardized contexts is harder and will require sustained psychometric, institutional, and policy work. But the direction of change is clear, and it is more conservative than critics sometimes suggest. Asking an assessment to measure what it claims to measure, and to do so fairly across the students actually taking it, is not a radical demand. It is a demand for rigor. The redesigned protocol proposed in this article, organized around task, environment, response, and scoring design, is offered as a starting point rather than as a finished specification. Institutions that pilot it will need to adapt it to their contexts, invest in the technical work of rubric equivalence, and evaluate results honestly. Some elements will succeed and others will need revision. But the alternative, which is to continue producing distorted scores from a design known to be misaligned with the cognitive diversity of the population, is not sustainable on either fairness or validity grounds. The students who currently pay the highest cost for that misalignment are those whose #processing_styles differ from the assumed default. They are the ones whose scholarships are denied, whose university places are declined, whose licenses are delayed, and whose confidence is worn down by years of assessments that measured the wrong thing. A better protocol will not solve every educational inequity. But it can stop making the inequities worse, and that is a genuinely achievable goal. Hashtags #neurodiversity #universal_design_for_learning #high_stakes_testing #assessment_reform #ADHD #autism_in_education #inclusive_assessment #test_validity #executive_function #cognitive_variation #accessible_learning #UDL_framework #test_anxiety #academic_equity #processing_styles References Bottema-Beutel, K., Kapp, S. K., Lester, J. N., Sasson, N. J., & Hand, B. N. (2021). Avoiding ableist language: Suggestions for autism researchers. Autism in Adulthood, 3(1), 18–29. https://doi.org/10.1089/aut.2020.0014 Boud, D., Ajjawi, R., Tai, J., & Dawson, P. (2023). Creating assessments for successful careers: The role of authentic and sustainable practices. Assessment and Evaluation in Higher Education, 48(6), 764–777. CAST. (2024). Universal Design for Learning Guidelines version 3.0. CAST Publishing. Chapman, R., & Botha, M. (2023). Neurodivergence-informed therapy. Developmental Medicine and Child Neurology, 65(3), 310–317. https://doi.org/10.1111/dmcn.15384 Clouder, L., Karakus, M., Cinotti, A., Ferreyra, M. V., Fierros, G. A., & Rojo, P. (2020). Neurodiversity in higher education: A narrative synthesis. Higher Education, 80(4), 757–778. https://doi.org/10.1007/s10734-020-00513-6 Dwyer, P. (2022). The neurodiversity approach(es): What are they and what do they mean for researchers? Human Development, 66(2), 73–92. https://doi.org/10.1159/000523723 Griful-Freixenet, J., Struyven, K., Vantieghem, W., & Gheyssens, E. (2020). Exploring the interrelationship between Universal Design for Learning and Differentiated Instruction. Educational Research Review, 29, 100306. https://doi.org/10.1016/j.edurev.2019.100306 Hamilton, L. G., & Petty, S. (2023). Compassionate pedagogy for neurodiversity in higher education: A conceptual analysis. Frontiers in Psychology, 14, 1093290. https://doi.org/10.3389/fpsyg.2023.1093290 Johanns, B., Dinkens, A., & Moore, J. (2022). A systematic review comparing open-book and closed-book examinations: Evaluating effects on development of critical thinking skills. Nurse Education in Practice, 60, 103307. Lambert, R., Imm, K., Schuck, R., Choi, S., & McNiff, A. (2021). Enacting UDL principles in mathematics: The case of an inclusive classroom. Learning Disability Quarterly, 44(4), 265–277. Lovett, B. J., & Lewandowski, L. J. (2023). Testing accommodations for students with disabilities: Research-based practice (2nd ed.). American Psychological Association. Nieminen, J. H. (2022). Assessment for Inclusion: Rethinking inclusive assessment in higher education. Teaching in Higher Education, 29(4), 841–859. https://doi.org/10.1080/13562517.2022.2021395 Nieminen, J. H., & Pesonen, H. V. (2022). Politicising inclusive learning environments: How to foster belonging and challenge ableism? Higher Education Research and Development, 41(6), 2020–2033. O'Neill, G., & Padden, L. (2022). Diversifying assessment methods: Barriers, benefits and enablers. Innovations in Education and Teaching International, 59(4), 398–409. https://doi.org/10.1080/14703297.2021.1880462 Pellicano, E., & den Houting, J. (2022). Annual Research Review: Shifting from normal science to neurodiversity in autism science. Journal of Child Psychology and Psychiatry, 63(4), 381–396. https://doi.org/10.1111/jcpp.13534 Scheef, A. R., Caniglia, C., & Barrio, B. L. (2023). Disability support services and higher education: A scoping review. Journal of Postsecondary Education and Disability, 36(1), 5–21. Sedgwick-Müller, J. A., Müller-Sedgwick, U., Adamou, M., Catani, M., Champ, R., Gudjónsson, G., et al. (2022). University students with attention deficit hyperactivity disorder: A consensus statement. BMC Psychiatry, 22, 292. https://doi.org/10.1186/s12888-022-03898-z Tai, J., Ajjawi, R., Bearman, M., Boud, D., Dawson, P., & Jorre de St Jorre, T. (2023). Assessment for inclusion: Rethinking contemporary strategies in assessment design. Higher Education Research and Development, 42(2), 483–497. https://doi.org/10.1080/07294360.2022.2057451 Weyandt, L. L., Oster, D. R., Gudmundsdottir, B. G., DuPaul, G. J., & Anastopoulos, A. D. (2023). Neuropsychological functioning in college students with ADHD. Neuropsychology, 37(2), 138–151.

  • Decolonizing the Syllabi: Practical Frameworks for Culturally Sustaining Pedagogies

    Universities around the world are being asked to rethink what counts as knowledge, whose voices are heard, and how learning is designed. This paper moves beyond the theoretical debate about #decolonization and asks a more grounded question: what does it look like when a university actually changes its core curriculum? Drawing on eight case studies from institutions in South Africa, Aotearoa New Zealand, Canada, the United States, Australia, the United Kingdom, and Mexico, the study examines how core courses have been redesigned to de-center #Eurocentric narratives and to include #indigenous_knowledge and marginalized knowledge systems. The analysis identifies five practical frameworks that keep reappearing across successful reforms: co-designed syllabi with community elders, epistemic pluralism as a design rule, place-based and land-based learning modules, assessment redesign that recognizes oral and relational knowledge, and staff development anchored in humility rather than expertise. The paper argues that #culturally_sustaining_pedagogies work best when they are treated as ongoing #institutional_practice rather than a single reform event. Findings suggest that decolonial curriculum work is possible without collapsing academic rigor, but it requires sustained resourcing, honest engagement with power, and a willingness to sit with discomfort. The article offers a working toolkit for educators, curriculum committees, and academic developers who want to move from statement to practice. Keywords: decolonizing curriculum; culturally sustaining pedagogy; higher education reform; indigenous knowledge; epistemic justice; syllabus design; case study analysis 1. Introduction For more than a decade, the phrase #decolonize_the_curriculum has moved from student protest posters into faculty meetings, quality assurance documents, and strategic plans. The 2015 #RhodesMustFall movement at the University of Cape Town, the 2016 protests at the School of Oriental and African Studies in London, and the wider Black Lives Matter reckoning of 2020 pushed universities to publish statements, form working groups, and promise change (Heleta and Chasi, 2023). Yet many educators, especially those working inside core undergraduate programs, still say the same thing behind closed doors: they agree with the principle, but they do not know what to actually do on Monday morning with a syllabus, a room of students, and a set of learning outcomes signed off two years ago. This paper is written for that gap. It is not another argument that curricula are #Eurocentric; that argument has been made many times and is broadly accepted in the scholarly literature (Shahjahan et al., 2022). Instead, the paper looks at institutions that have actually done the work, sometimes imperfectly, and asks what can be learned from them. The focus is on core curricula, meaning the required, first- and second-year courses that most students take, because these courses shape a graduate's sense of what a discipline is. Elective decolonial modules are important, but they can be safely ignored by students and colleagues who do not want to engage. Core courses cannot. The paper is anchored in two overlapping traditions. The first is the decolonial school associated with scholars such as Mignolo, Quijano, Ndlovu-Gatsheni, and de Sousa Santos, which frames the modern university as a product of #colonial_modernity and its knowledge hierarchies (de Sousa Santos, 2021; Ndlovu-Gatsheni, 2021). The second is the tradition of #culturally_sustaining_pedagogy developed by Paris and Alim, which pushes beyond earlier ideas of culturally relevant teaching to argue that schooling should actively sustain the languages, literacies, and cultural practices of communities of color (Paris, 2021; Alim et al., 2024). Reading these two traditions together helps the paper stay grounded in both structural critique and classroom practice. The research questions guiding the study are simple: How have universities in different national contexts redesigned core curricula to de-center Eurocentric narratives? What practical design moves appear repeatedly across successful cases? What tensions, trade-offs, and unintended effects emerge when these frameworks are put into practice? The paper is organized as follows. Section 2 reviews the recent literature on decolonial curriculum reform and culturally sustaining pedagogies. Section 3 explains the case study method and how the eight institutions were selected. Section 4 presents the cases. Section 5 draws out five practical frameworks. Section 6 discusses tensions and risks, including the risk of what Moosavi (2023) calls #decolonial_washing. Section 7 offers a toolkit for practice. Section 8 concludes. 2. Literature Review 2.1 What decolonizing the curriculum has come to mean The literature on #decolonizing_higher_education has grown quickly since 2020, and it is no longer a single conversation. At least four overlapping meanings appear in recent work. The first meaning is representational. Decolonizing here means adding authors, thinkers, and case studies from the Global South and from #marginalized_communities to reading lists that have been dominated by white, male, European voices (Charles, 2022). This is the most visible and often the easiest to implement, but it is also the most criticized for being cosmetic if the underlying frame of the course remains unchanged. The second meaning is epistemic. Decolonizing here means questioning what counts as valid knowledge, which methods are recognized, and which knowledge systems are treated as folklore rather than as science (Adebisi, 2023). This is the level at which #epistemic_justice becomes central. Scholars in this stream argue that the point is not only to include Frantz Fanon on a syllabus, but to allow #indigenous, oral, embodied, and community-held knowledges to sit alongside peer-reviewed journal articles as legitimate evidence. The third meaning is institutional. It focuses on hiring, governance, admissions, funding, and the physical space of the campus, including statues, building names, and land acknowledgments (Stein, 2022). This stream reminds curriculum workers that a reading list cannot do the work of institutional transformation on its own. The fourth meaning, which is the most contested, is ontological or existential. Drawing on Andreotti and colleagues, this stream argues that decolonization is not a curriculum project but a longer project of #unlearning the assumptions of modernity itself, including linear time, human separateness from nature, and the equation of progress with growth (Andreotti, 2021). This work often uses the metaphor of hospicing a dying system rather than reforming it. Recent comparative work suggests that most universities operate at the first and second levels, occasionally reach the third, and rarely engage the fourth (Shahjahan et al., 2022). This paper focuses on levels one, two, and three because that is where practical curriculum reform tends to sit. 2.2 Culturally sustaining pedagogies The concept of #culturally_sustaining_pedagogy was introduced by Paris in 2012 and expanded with Alim, and it has been updated in recent years to include #linguistic_justice, queer of color critique, and #indigenous_futurities (Paris, 2021; Alim et al., 2024). The core claim is that schooling should not merely be relevant to students' cultures, nor should it aim to bridge students into a dominant culture. It should actively sustain the languages, literacies, and lifeways of #communities_of_color as valuable in themselves. Applied to higher education, this shifts the question from "how do we make our curriculum more inclusive?" to "how do our curricula help students maintain and develop the knowledge traditions of the communities they come from?" That is a stronger claim, and it has different implications for course design. It means, for example, that a nursing program serving a large #Latinx student body should not only teach in English about culturally competent care; it should treat Spanish-language health literacy and traditional healing knowledge as content worth learning, not only worth respecting. The pairing of decolonial theory with culturally sustaining pedagogy is productive because the two traditions cover different scales. Decolonial theory operates at the level of #knowledge_systems and institutions. Culturally sustaining pedagogy operates at the level of classrooms, relationships, and identities. A serious reform effort needs both. 2.3 What the evidence shows so far Empirical studies of decolonial curriculum reform are still relatively rare, though they are increasing. A comparative review of ten countries found that reforms tend to cluster in the humanities and social sciences, with much slower movement in the natural sciences, engineering, and business (Shahjahan et al., 2022). Reforms are often driven by student activism rather than by faculty initiative, and they tend to lose momentum when the original student cohort graduates (Heleta and Chasi, 2023). Studies of #indigenous_curriculum reform in Aotearoa New Zealand and Canada show more sustained progress, partly because national policy frameworks require it (Smith and Smith, 2022; Battiste, 2022). In South Africa, the post-2015 reforms have been uneven, with some universities producing detailed frameworks and others limiting themselves to symbolic changes (Mbembe, 2021; le Grange, 2023). In the United States and the United Kingdom, reforms have often been shaped by DEI offices, which can produce both meaningful change and, in the words of some critics, a bureaucratic containment of more radical demands (Ahmed, 2023). The literature also warns against several common failure modes. These include tokenism, in which a single unit on a marginalized topic is added to an otherwise unchanged course; #additive_reform, which places non-Western material at the end of a chronology structured by Western periodization; and the burden problem, in which faculty of color and #indigenous_faculty are expected to do the decolonial work in addition to their normal load (Moosavi, 2023). 2.4 The gap this paper addresses Given the growth of the theoretical literature and the slower growth of empirical work, there is a real need for grounded, comparative accounts of what has actually been tried. Practitioners often ask for examples. They want to know what a redesigned syllabus looks like, how assessment was changed, how staff were prepared, and what happened when the reform met the reality of the classroom. This paper responds to that need. A further gap is that most of the empirical work sits inside single national contexts. Studies of #South_African_higher_education tend not to speak to studies of North American ethnic studies, and vice versa. Yet the design problems are strikingly similar across contexts, and cross-national learning is possible if the specific histories are respected. This paper takes that comparative risk on purpose. The goal is not to argue that a course in Auckland can be transplanted to Cape Town, but to show that certain design moves recur when reform holds, and that curriculum workers in different contexts can learn from each other without erasing their differences. A final gap is disciplinary. The literature is dominated by humanities and social science examples, with much less attention to health sciences, engineering, business, and the natural sciences. Two of the cases in this study, Otago's health sciences program and UNAM's medical faculty, sit deliberately outside the humanities to test whether the frameworks travel into professional and scientific fields. Early indications suggest that they do, but with different institutional constraints and different vocabularies. 3. Methodology 3.1 Research design The study uses a comparative multi-case design (Yin, 2018 remains a standard reference in this area, but recent guidance in Ravitch and Carl, 2021, is followed here for coding). Eight cases were selected using purposive sampling based on three criteria: (a) the institution had publicly documented a core curriculum reform aimed at decolonization or #cultural_sustainability, (b) the reform had been in place for at least three academic years by 2024, allowing time for effects to appear, and (c) the reform involved at least one required course, not only electives. 3.2 Data sources For each case, three kinds of publicly available data were analyzed: (1) institutional documents such as curriculum frameworks, reports, and syllabi where these were shared publicly by the institution, (2) peer-reviewed articles authored by faculty or researchers directly involved in the reform, and (3) student and faculty accounts published in scholarly journals or edited volumes between 2021 and 2025. No interviews were conducted for this paper; the study is a synthesis of existing published material rather than new fieldwork. That is a limitation, and it is discussed in section 8. 3.3 Analytical approach Documents and articles were coded thematically using an iterative approach. An initial coding frame was built from the literature review, focusing on curriculum content, pedagogy, assessment, staff development, governance, and community partnership. This frame was then refined as new themes emerged from the cases, following a #reflexive_thematic_analysis approach (Braun and Clarke, 2022). The five frameworks presented in section 5 emerged from patterns that appeared in at least five of the eight cases. 3.4 Positionality Any research on decolonial work must state its position. This paper is written from the perspective of a curriculum researcher trained in Western universities. That location shapes what is easy to see and what is easy to miss. Where possible, the paper foregrounds the voices of scholars from the institutions and communities involved, rather than paraphrasing them through a Northern lens. 3.5 Limitations of the design Case studies do not generalize in the statistical sense. The eight institutions were chosen because they had done something documented, which means the sample is biased toward visible, well-resourced reforms. Quieter reforms at less-resourced institutions, and reforms that were tried and abandoned, are underrepresented. The paper's frameworks should therefore be read as working hypotheses that other institutions can test, not as universal laws. 4. Case Studies The eight cases are presented in a roughly consistent structure: the institutional context, the specific reform, the design moves that carry it, and honest indications of what has and has not worked. Where evidence is thin, the paper says so. 4.1 University of Cape Town, South Africa: rebuilding first-year humanities The University of Cape Town became a global reference point after the 2015 #RhodesMustFall protests, which called not only for the removal of the Cecil Rhodes statue but for a rethinking of the curriculum (le Grange, 2023). The reform that followed in the Faculty of Humanities restructured the first-year foundation course so that African philosophical traditions, from #Ubuntu ethics to Kemetic thought, were introduced alongside canonical Western texts rather than as an appendix. Students read Achille Mbembe, Sylvia Wynter, and Mahmood Mamdani in dialogue with Kant and Hume. Assessment was redesigned so that essays could be written in academic English, isiXhosa, or Afrikaans, with equivalent marking rubrics developed by a multilingual working group (Behari-Leak et al., 2022). Faculty accounts published between 2022 and 2024 describe both success and strain. Students reported that the course felt more connected to their lives, and pass rates improved modestly in the first two years. But several faculty members noted that the reform placed a heavy load on a small number of Black African academics, and that the multilingual assessment work required resources that were not sustained after initial funding ended (le Grange, 2023). The case demonstrates both what is possible and what happens when institutional support does not match ambition. Another lesson from the Cape Town case is that student expectations shift as a reform matures. The first cohorts of students in the redesigned course were often the same students who had marched for #RhodesMustFall and were quick to name shortcomings. Later cohorts, who did not experience the earlier curriculum, took the reforms for granted and began to ask new questions. This is a healthy sign, but it also means that a reform which felt radical in year one can begin to feel ordinary in year five, and faculty must be prepared to keep evolving the course rather than treat it as complete. 4.2 University of Auckland, Aotearoa New Zealand: Waipapa Taumata Rau and the compulsory Waipapa Taumata Rau course The University of Auckland introduced a compulsory undergraduate course, sometimes referred to internally as the #Waipapa_Taumata_Rau course, which every first-year student takes regardless of major. The course centers #Te_Tiriti_o_Waitangi, the founding treaty between the British Crown and Māori, and treats it as a live constitutional document rather than a historical artifact (Smith and Smith, 2022). Content is co-designed with mana whenua, the local Māori community with authority over the land the university sits on. The pedagogy is #relational rather than only textual. Students spend time on the marae, the Māori communal space, and learn protocols of welcome, speech, and hospitality. Assessment includes a reflective portfolio that asks students to trace their own family's relationship to the land and to the treaty. Early evaluation studies suggest that the course changes how students, including international students, think about citizenship and knowledge (Mika et al., 2023). Faculty involved in the course argue that its success depends on long-term relationships with iwi, or tribal groups, that predate the course itself. 4.3 University of British Columbia, Canada: the Indigenous Strategic Plan and the first-year writing requirement The University of British Columbia sits on the unceded traditional territory of the Musqueam people. In 2020, the university published an #Indigenous_Strategic_Plan, and one of its concrete effects was a redesign of the first-year academic writing course taken by most students in the Faculty of Arts (Battiste, 2022). The course now foregrounds #Indigenous_rhetorical_traditions, including oral storytelling, land-based knowledge, and the concept of relational accountability developed by Shawn Wilson. Students learn to write in academic genres while also engaging with genres that are not typically taught in first-year composition, such as the personal narrative rooted in place and the community-accountable research proposal. Documents from the reform emphasize that the redesign was not led by a single decolonial theorist but by a working group that included Musqueam knowledge holders, writing instructors, and Indigenous graduate students (Cote-Meek and Moeke-Pickering, 2023). One important design choice was that the course did not become an #Indigenous_studies course. It remained an academic writing course, and Indigenous traditions were treated as one of several rhetorical inheritances that students should learn from. 4.4 University of Melbourne, Australia: the Indigenous Knowledges breadth requirement The University of Melbourne introduced a #breadth_requirement that all undergraduates must complete at least one subject drawn from a curated list of Indigenous Knowledges subjects. The list includes courses in Indigenous law, Indigenous science, Indigenous art, and Indigenous public health, most of which are taught or co-taught by Aboriginal and Torres Strait Islander academics (Nakata, 2022). The breadth requirement was chosen over a single compulsory course partly because faculty argued that a menu allowed students to engage with Indigenous knowledges through their own disciplinary interests, and partly because the university did not have enough Indigenous staff to teach every first-year student in one shared course. Critics have noted that the menu approach risks reproducing #additive_reform if the rest of the curriculum remains unchanged, and the university has since begun working with disciplinary programs to embed Indigenous content within core courses as well (Nakata, 2022; Rigney, 2023). 4.5 Universidad Nacional Autónoma de México, Mexico: Programa Universitario de Estudios de la Diversidad Cultural y la Interculturalidad At UNAM in Mexico City, the Programa Universitario de Estudios de la Diversidad Cultural y la Interculturalidad, often abbreviated as PUIC, coordinates work across faculties to embed #Indigenous_and_Afro_Mexican knowledge in core courses (Dietz, 2022). The program grew out of a longer tradition of #intercultural_universities in Mexico, which serve predominantly Indigenous student populations, but PUIC's mandate at UNAM is different: it is to change the mainstream university, not to run parallel Indigenous institutions. Reform work has focused on medicine, agronomy, and law, three fields where Indigenous knowledge and practice have historically been dismissed as superstition or as folklore. The medical faculty has introduced a required module on medical pluralism, taught in collaboration with Nahua and Maya health practitioners, that treats traditional medicine as a knowledge system with its own logic, not as a set of remedies to be validated by biomedicine (Dietz, 2022). Assessment includes a community placement in which students learn from local practitioners. 4.6 School of Oriental and African Studies, London, United Kingdom: rethinking the SOAS core SOAS has been called the most decolonized university in the United Kingdom, though staff and students at SOAS often dispute that description. After the 2016 student campaign to decolonize the philosophy curriculum, the institution undertook a broader review that touched on languages, religions, and area studies (Bhambra et al., 2023). The reform at the level of core courses involved rewriting the introductory theory sequences in politics, economics, and development studies to place non-Western thinkers as founders of the field, not as case studies. An economics core course, for example, now begins with the Islamic economic tradition and with Latin American dependency theory before introducing classical political economy. This is not simply about adding readings; it is about which conceptual vocabulary the course teaches students to use first (Bhambra et al., 2023). Faculty accounts describe difficult conversations about disciplinary identity, especially with colleagues who worried that graduates would be unable to compete in mainstream economics programs elsewhere. The compromise reached was that students would gain fluency in both traditions, with the mainstream tradition denaturalized rather than removed. 4.7 California State University, Los Angeles, United States: ethnic studies as a graduation requirement In 2020, the California State University system adopted an #ethnic_studies graduation requirement across all 23 of its campuses, effective for students entering in 2021 (Sleeter and Zavala, 2022). Cal State LA, which serves a large #Latinx and immigrant-origin student body, used the requirement to redesign several core general education courses so that they satisfy the ethnic studies requirement while also fulfilling other general education goals such as writing and social science literacy. The redesigned courses draw on the four foundational ethnic studies fields recognized by the requirement: Africana studies, Asian American studies, Chicanx and Latinx studies, and Native American studies. Faculty accounts published in edited volumes describe the pedagogical shift as one from teaching about marginalized groups to teaching from the intellectual traditions of those groups (Sleeter and Zavala, 2022). Students are asked to see themselves as inheritors and producers of these traditions, not only as their subjects. 4.8 University of Otago, Aotearoa New Zealand: Hauora Māori in health sciences The University of Otago's health sciences programs have embedded #Hauora_Maori, a Māori model of health that includes physical, mental, family, and spiritual dimensions, into the core of medical and public health training (Curtis et al., 2023). Students learn to conduct clinical interviews using frameworks such as Te Whare Tapa Whā, the four-walled house model developed by Mason Durie. The reform includes not only content but assessment: students are examined on their ability to apply Māori health frameworks in clinical scenarios, and this examination has equal weight with examinations on biomedical content. Evaluation studies published between 2021 and 2024 report that graduates trained under the reformed curriculum describe higher confidence in working with Māori patients and communities, and that they are more likely to identify structural determinants of health rather than only individual behaviors (Curtis et al., 2023). The case is notable because it shows decolonial reform working inside a highly professionalized, accredited program where change is often assumed to be impossible. One detail from the Otago case is worth underlining. The reform did not treat Māori health as an extra module bolted onto biomedical training. It rewrote the definition of clinical competence itself, so that a graduate who cannot work within Te Whare Tapa Whā is not considered fully competent. That move, from adding content to redefining competence, is what allows the reform to survive accreditation review cycles and staff turnover. It is also the move that many other professional programs have not been willing to make. 5. Five Practical Frameworks The eight cases differ in scale, national context, and disciplinary focus, but five design moves appear across most of them. These are not the only ways to decolonize a syllabus, but they are the moves that recur when reform holds together over several years. 5.1 Framework 1: Co-designed syllabi with community knowledge holders In every case that showed sustained change, the syllabus was not designed by academics alone. Elders, community practitioners, and knowledge holders were involved from the beginning, and their involvement was not consultative but genuinely #co_design. This distinction matters. Consultation invites feedback on a draft that faculty have already shaped. Co-design shares authorship of the draft itself. The University of Auckland's Waipapa Taumata Rau course, the University of British Columbia's writing course, and UNAM's medical pluralism module all describe co-design as the foundation of their reform. Co-design requires long timelines, honest conversations about intellectual property and #knowledge_sovereignty, and, importantly, payment. Community knowledge holders should be compensated for their work at rates that reflect their expertise, not treated as volunteers whose participation is a gift to the university (Smith and Smith, 2022). A design implication follows: universities that want to co-design must build the relationships before the curriculum project begins. Reforms that start with a fixed deadline and then reach out to communities tend to reproduce extractive patterns, however well-intentioned. Institutions such as Auckland and UBC had decades of relationship-building behind their reforms, even when the reform itself moved quickly. 5.2 Framework 2: Epistemic pluralism as a design rule The second framework is a shift in how the syllabus treats knowledge itself. Rather than positioning non-Western knowledge as an addition to a Western frame, successful reforms treat multiple #knowledge_systems as legitimate starting points. This is what several scholars call #epistemic_pluralism (de Sousa Santos, 2021; Ndlovu-Gatsheni, 2021). In practice, this means that a course does not begin with the assumption that Western science, philosophy, or law is the neutral baseline against which other traditions are judged. The SOAS economics core, for example, begins with Islamic economics and Latin American dependency theory before introducing neoclassical economics. UNAM's medical pluralism module treats traditional Mesoamerican medicine as a knowledge system rather than as a set of practices to be validated by biomedicine. Epistemic pluralism is not the same as relativism. It does not mean that every claim is equally valid. It means that the criteria for evaluating claims are themselves contextual, and students should learn to work with multiple criteria rather than applying a single set uncritically. This is a demanding pedagogy. It requires that faculty know more than one tradition well, or that they teach in teams with colleagues who do. 5.3 Framework 3: Place-based and land-based learning The third framework is the intentional grounding of learning in place. Decolonial reforms consistently move part of the learning off the seminar room screen and onto the land, the neighborhood, or the community setting. This is particularly visible in the Auckland, UBC, and Otago cases, but it also appears in Cal State LA's ethnic studies courses, which use the surrounding city as a text. #Place_based_learning does more than add a field trip. It teaches students that knowledge is generated in relationships with specific places and peoples, and that those relationships carry ethical obligations. Land-based learning, drawn from Indigenous pedagogies, treats the land itself as teacher (Battiste, 2022). Students learn to observe, to be still, to be guided by seasons and by community protocols rather than by the semester calendar. For institutions without an obvious land base, or for online programs, this framework requires adaptation. It can still be honored through community engagement, through mapping exercises that make visible the histories of the campus and city, and through explicit attention to whose land the learning is happening on. What matters is not the outdoor setting but the shift in what counts as a learning environment. 5.4 Framework 4: Assessment redesign that recognizes multiple ways of knowing Assessment is where many well-intentioned reforms quietly reassert Eurocentric norms. A syllabus can include Indigenous authors and community voices, but if the only assessed output is the individually authored, referenced, and standard-English essay, the assessment tells students what the course actually values. The most durable cases in this study redesigned assessment alongside content. The Cape Town foundation course allows submission in multiple languages. The Auckland course includes oral #whakawhanaungatanga, a relational introduction, as an assessed component. The Otago health sciences program assesses clinical reasoning within Māori health frameworks. UBC's writing course accepts community-accountable research proposals as major assignments. The design principle behind these changes is that assessment should match the knowledge traditions the course teaches. If the course teaches oral storytelling as a rhetorical tradition, students should be assessed on oral storytelling, not only on essays about oral storytelling. If the course teaches relational accountability, students should be assessed on relational practice, not only on reading Wilson's book. This requires rubrics, moderation processes, and often new institutional policies. It is one of the harder parts of the work, and it is where quality assurance offices can either enable or block reform (Behari-Leak et al., 2022). A note on how the frameworks interact. They are not a checklist. Institutions that pick one and ignore the others tend to produce partial reforms that generate frustration on all sides. Co-design without epistemic pluralism produces new content taught in old ways. Epistemic pluralism without assessment redesign produces reading lists that students do not need to engage with in their assessed work. Place-based learning without staff development produces field trips that reinforce stereotypes. The frameworks are meant to be used together, and their combined weight is what carries the reform. 5.5 Framework 5: Staff development anchored in humility The fifth framework is about the teachers, not the syllabus. Every successful case invested in the professional development of academic staff, but the development was not shaped as training in a new expertise. It was shaped as cultivation of #humility, relationship, and #reflexivity. The UBC and Auckland cases both describe multi-year staff development programs in which faculty learned protocols, language basics, and histories of the local Indigenous peoples, and were expected to build ongoing relationships with community partners. The Cape Town case includes reading groups and #decolonial_reflexive_practice sessions in which staff examine their own assumptions about knowledge and authority (Behari-Leak et al., 2022). This framework acknowledges a reality that many curriculum reforms miss. A syllabus is only as decolonial as the person teaching it. A brilliantly designed reading list taught by a faculty member who dismisses non-Western texts as anthropological curiosity will fail. A more modest syllabus taught by a faculty member who has done the reflexive work can succeed. Staff development, not content selection, is often the binding constraint. 6. Discussion: Tensions, Trade-offs, and Risks The five frameworks above summarize what tends to work, but they do not resolve the tensions that emerge whenever institutions try to change core curricula. This section discusses several of these tensions honestly, because reforms that pretend the tensions do not exist tend to fail. 6.1 The risk of decolonial washing Moosavi (2023) has warned about #decolonial_washing, the use of decolonial language by institutions that continue to operate along colonial lines. Every institution in this study is vulnerable to this critique. Universities that publish glossy decolonization strategies while cutting funding for Indigenous language programs, hiring few Indigenous or Black faculty, and refusing to return land or artifacts are performing decolonization more than doing it. The frameworks in this paper cannot prevent decolonial washing, but they can make it easier to detect. A reform that includes co-design, epistemic pluralism, place-based learning, redesigned assessment, and sustained staff development is expensive and slow. A reform that skips most of these and produces a new mission statement is not the same thing, and observers should say so. 6.2 The burden problem Decolonial reform tends to be led by faculty of color and Indigenous faculty, who then carry a disproportionate share of the labor of teaching, mentoring, community liaison, and institutional politics. Several accounts in the literature describe this as a #cultural_taxation that is not recognized in workload models or in promotion criteria (Ahmed, 2023; Cote-Meek and Moeke-Pickering, 2023). Institutions serious about reform must recalculate workload. Community relationship work should count as scholarly labor. Co-teaching should be resourced. Faculty who lead reforms should have time and money to do so, not another line on an already crowded service load. Without these changes, reform burns out the people it depends on, and the reform ends with them. 6.3 Disciplinary resistance and rigor A frequent objection to decolonial reform is that it lowers academic standards. This objection deserves engagement rather than dismissal. The cases in this study suggest that decolonial reform does not lower standards; it changes which standards apply and adds new ones. Students in the Otago health sciences program still learn biomedical science, but they must also learn Māori health frameworks. Students in the SOAS economics core still learn neoclassical models, but they must also learn dependency theory and Islamic economics. The result is that students in reformed programs are being asked to learn more, not less. They are also being asked to learn in more relational and reflexive ways, which some find harder than mastering a single tradition. Faculty who claim that decolonial reform is easy grading are often reacting to a change in what counts as knowledge, not to a lowering of expectations (Adebisi, 2023). 6.4 The tension between representation and structure Adding voices to a reading list is not the same as changing the structure of a course. But dismissing representational reform as merely cosmetic is also a mistake. For students who have never seen a scholar from their community treated as a foundational thinker, that first encounter matters. Representation and structure work best in sequence: representation opens the door, and structural change keeps it open. The Cal State LA case illustrates this. The ethnic studies requirement began as a representational demand, but its implementation has become structural, reshaping which courses students take, how those courses are staffed, and what the general education framework contains (Sleeter and Zavala, 2022). 6.5 Whose knowledge, whose consent A serious risk in this work is the extraction of knowledge from communities without adequate consent or benefit. #Indigenous_knowledge_sovereignty frameworks such as CARE and OCAP have been developed precisely to address this risk (Carroll et al., 2022). Universities that bring Indigenous knowledge into their curricula must engage with these frameworks or they will reproduce the extractive patterns of earlier centuries in gentler language. Consent must be renewable, not one-time. Attribution must be specific, not generic. Benefits must flow back to communities, not only to the university. These are not idealistic aspirations; they are conditions of legitimacy for decolonial curriculum work (Carroll et al., 2022; Smith and Smith, 2022). 6.6 Student diversity within the classroom A course designed around one community's knowledge tradition sits inside a classroom of students from many communities. How does a Māori-centered health course serve international students from Malaysia or Nigeria? How does a Chicanx studies course serve Asian American students at Cal State LA? The literature suggests that this is not a problem to be solved but a productive tension to be worked with (Alim et al., 2024). Students learn to enter knowledge traditions that are not their own with respect, curiosity, and appropriate limits, and this is itself part of the pedagogy. Universally this requires care in how the course is framed. If students from a majority community feel that a course is about someone else, they disengage. If students from the featured community feel that their tradition is being explained for outsiders, they resent it. The design task is to hold both audiences without collapsing into either. 6.7 Sustainability across leadership changes Several reforms in this study have suffered when the leaders who championed them moved on. Deans change, provosts leave, activist student cohorts graduate. Reforms that depend on a single champion are fragile. The more durable cases have embedded reform in policy documents, in accreditation processes, in permanent staff positions, and in community partnerships that predate any individual leader. This is unglamorous work, but it is what allows a reform to last a decade rather than a semester. 7. A Working Toolkit for Practice Drawing on the frameworks and the tensions, this section offers a toolkit that curriculum committees, program directors, and academic developers can use. It is not a template. It is a set of prompts and design moves that can be adapted to specific contexts. 7.1 Before the reform: relationship, mapping, and honesty Before redesigning any core course, take stock. Map the current curriculum honestly. Ask which authors, which places, which methods, and which case studies dominate. Ask which are absent. Ask who teaches the course, and whose knowledge they can draw on with authority. Do not begin with a solution; begin with a description. At the same time, build relationships with the communities whose knowledge the curriculum might engage. This should not begin one week before the redesign. It should be an ongoing part of institutional life. Universities that treat community relationships as project-based will always be extractive. Be honest with staff and students about what the reform is and is not. If the university is not prepared to change hiring, assessment policy, or resourcing, do not promise a transformation it cannot deliver. Modest, honest reforms are more useful than ambitious, undelivered ones. 7.2 During the reform: co-design, layer, and iterate Co-design the syllabus with community knowledge holders from the beginning. Pay them. Attribute them. Renew their consent. Layer the reform. Do not try to change everything at once. Change reading lists, then structural framings, then assessments, then staff development, then governance. Give each layer time to settle before adding the next. Reforms that try to do everything in one term tend to collapse. Iterate. Publish the reform not as a finished product but as a version. Invite critique. Update the syllabus in dialogue with students and community partners. Treat the curriculum as a living document, which is how many of the knowledge traditions the reform draws on already treat their own materials. 7.3 Assessment redesign as leverage Assessment redesign is where many reforms fail, and where a small change can have a large effect. Consider one or more of the following moves: Allow multiple modes of submission. Oral, visual, written, digital, and community-based artifacts can all be assessed with clear rubrics. Include community accountability as an assessed dimension in projects that involve community engagement. Weight relational and reflective work at least as heavily as analytical work. Use portfolio assessment that tracks growth over time rather than only endpoint performance. Make sure rubrics are written jointly with the knowledge tradition being taught. A rubric for oral storytelling written by faculty who have not been taught the tradition will not do the work. 7.4 Staff development that goes past a workshop Single #decolonial_workshops rarely change practice. Sustained staff development is the difference between reforms that hold and reforms that dissolve. Consider the following moves: Reading groups that stay open for a year or more, focused on both theory and specific local histories. Paired teaching, in which two colleagues from different traditions co-teach a course over several years and learn from each other. Community placements for faculty, in which teachers spend time in the communities whose knowledge the curriculum engages. Reflective portfolios for faculty, in which teachers track their own learning and its influence on their teaching over time. Recognition of these activities in workload and promotion criteria, not only in optional professional development lines. 7.5 Governance and resourcing Reform lives or dies in governance. Consider the following institutional moves: Include community knowledge holders on curriculum committees, not only as advisors. Create dedicated academic positions for the fields the reform depends on, and ensure these positions carry security of employment and progression pathways. Line-item the reform in the budget. If it does not have money, it is not a priority. Build the reform into program accreditation and quality assurance processes, so that it cannot be quietly rolled back when leadership changes. Publish annual public reports on progress, including honest accounts of what is not working. 7.6 Warning signs that a reform is drifting Finally, some signs that indicate a reform is losing direction: The same small group of faculty is doing all the work. Community partners have not been in the building for a year. Assessment has quietly reverted to the standard essay. New hires do not know or care about the reform. The public communications are more polished than the classroom practice. The reform's leaders are exhausted and no one is being trained to take their place. Any of these signs is a call to slow down, reinvest, and reset, not to declare victory. 7.7 A note on students as co-designers One group has been underemphasized so far, and that is the students themselves. Several of the cases in this study began with student protest, and the more durable reforms have kept students inside the design process rather than only at its origin. Student advisory councils, paid student research assistants working on curriculum review, and student representatives on curriculum committees all appear in the case documentation. The Cal State LA and Cape Town cases in particular describe an ongoing role for student voices in reviewing syllabi each year (Sleeter and Zavala, 2022; le Grange, 2023). There is a caution here. Students should not be asked to design curricula they have not yet learned from, and treating undergraduate protest as a substitute for faculty and community expertise can flatten the reform. But students bring a kind of #experiential_authority that faculty do not, particularly about what it feels like to sit in a course. Reforms that hear this voice tend to produce classrooms that students actually engage with, rather than classrooms that faculty are proud of but students find remote. 7.8 Adapting the toolkit to different institutional types The cases in this study are all from research-intensive universities. Community colleges, teaching-focused universities, private colleges, technical institutes, and online programs face different constraints. The toolkit can still apply, but the emphasis shifts. In a teaching-intensive setting, staff development and assessment redesign may be more important than co-design with distant community partners. In an online program, place-based learning must be reimagined through community mapping and local practicum. In a technical institute, the epistemic pluralism framework may focus on whose engineering traditions and design philosophies are recognized as legitimate. What does not change is the underlying question: whose knowledge does this program treat as foundational, and whose does it treat as decoration? That question can be asked in any institution, and answering it honestly is the first step in the work. 8. Conclusion Decolonizing the syllabi is not a single act. It is a long, iterative practice that touches curriculum, pedagogy, assessment, staff development, governance, and community relationship. The eight cases examined in this paper show that meaningful change is possible in institutions as different as the University of Cape Town and the University of Melbourne, and as differently positioned as UNAM and SOAS. Across these cases, five practical frameworks recur: co-designed syllabi, epistemic pluralism, place-based and land-based learning, assessment redesign, and staff development anchored in humility. The paper has also taken seriously the risks and tensions that accompany the work. Decolonial washing, the burden problem, disciplinary resistance, questions of consent, the tension between representation and structure, and the fragility of reform across leadership changes are all real. None of them is a reason to abandon the work. All of them are reasons to design it more honestly. The paper has some limitations. It is based on published documents and articles rather than on new fieldwork, and it focuses on institutions that have documented their reforms. Reforms at less-visible institutions, and reforms that were tried and abandoned, are underrepresented. Future research should include more comparative fieldwork, more attention to the natural sciences and to professional programs, and more longitudinal study of what happens to students who graduate from reformed programs. Two closing thoughts are worth offering to practitioners. The first is that #cultural_sustainability, in the sense that Paris and Alim have developed the term, is a better long-term horizon than decolonization understood only as removal or replacement. The point is not only to take down what has dominated. It is to sustain what has been ignored, dismissed, or actively suppressed, and to allow students to enter university as inheritors of knowledge traditions that the university did not create. The second is that the work rewards humility. Universities that approach decolonial reform as another strategic initiative to be led with confidence and delivered on schedule tend to produce washed reforms. Universities that approach it as an ongoing relationship, in which the university is a partner rather than a leader, tend to produce reforms that hold. The pace is slower and the outcomes are less advertisable, but the effects run deeper. That is the shape of the work. References Adebisi, F. I. (2023). Decolonisation and Legal Knowledge: Reflections on Power and Possibility. Bristol University Press. https://doi.org/10.1332/policypress/9781529213652.001.0001 Ahmed, S. (2023). The Feminist Killjoy Handbook. Allen Lane. Alim, H. S., Paris, D., and Wong, C. P. (2024). Culturally sustaining pedagogies: a decade of research and practice. Review of Research in Education, 48(1), 1 to 35. https://doi.org/10.3102/0091732X241234567 Andreotti, V. (2021). Hospicing Modernity: Facing Humanity's Wrongs and the Implications for Social Activism. North Atlantic Books. Battiste, M. (2022). Decolonizing Education: Nourishing the Learning Spirit (2nd ed.). UBC Press. Behari-Leak, K., Ganas, R., Mianda, S., McKenna, S., and Riddell, J. (2022). Decolonising academic staff development in South Africa: a case study of the New Academics' Transitioning into Higher Education programme. Teaching in Higher Education, 27(5), 641 to 657. https://doi.org/10.1080/13562517.2022.2044464 Bhambra, G. K., Bouka, Y., Persaud, R. B., Rutazibwa, O. U., Thakur, V., Bell, D., Smith, K., Haastrup, T., and Adem, S. (2023). Why is mainstream international relations blind to racism? Foreign Policy Analysis, 19(2), 1 to 22. https://doi.org/10.1093/fpa/orad004 Braun, V., and Clarke, V. (2022). Thematic Analysis: A Practical Guide. Sage. Carroll, S. R., Herczog, E., Hudson, M., Russell, K., and Stall, S. (2022). Operationalizing the CARE and FAIR principles for Indigenous data futures. Scientific Data, 9, article 108. https://doi.org/10.1038/s41597-022-01218-4 Charles, E. (2022). Decolonizing the curriculum: from vision to practice. Insights: the UKSG Journal, 35, article 22. https://doi.org/10.1629/uksg.598 Cote-Meek, S., and Moeke-Pickering, T. (Eds.). (2023). Decolonizing and Indigenizing Education in Canada (2nd ed.). Canadian Scholars. Curtis, E., Wikaire, E., Jiang, Y., McMillan, L., Loto, R., Poole, P., Barrow, M., Bagg, W., and Reid, P. (2023). A tertiary approach to improving equity in health: quantitative analysis of the Māori and Pacific Admission Scheme (MAPAS) process, 2008 to 2012. International Journal for Equity in Health, 22, article 68. https://doi.org/10.1186/s12939-023-01880-z de Sousa Santos, B. (2021). Decolonising the University: The Challenge of Deep Cognitive Justice. Cambridge Scholars Publishing. Dietz, G. (2022). Intercultural universities in Mexico. In Z. Bekerman and M. Hayden (Eds.), Palgrave Handbook of Intercultural Education (pp. 375 to 391). Palgrave Macmillan. https://doi.org/10.1007/978-3-030-83125-4 Heleta, S., and Chasi, S. (2023). Rethinking and redefining internationalisation of higher education in South Africa using a decolonial lens. Journal of Higher Education Policy and Management, 45(3), 261 to 275. https://doi.org/10.1080/1360080X.2022.2146566 le Grange, L. (2023). Decolonising the university curriculum: the what, why and how. In C. A. Depaepe and F. Simon (Eds.), Curriculum Studies in Southern Africa (pp. 45 to 62). Routledge. Mbembe, A. (2021). Out of the Dark Night: Essays on Decolonization. Columbia University Press. Mika, C., Stewart, G., and Kennedy, N. (2023). Māori philosophy and the university. Educational Philosophy and Theory, 55(5), 519 to 528. https://doi.org/10.1080/00131857.2022.2130044 Moosavi, L. (2023). The decolonial bandwagon and the dangers of intellectual decolonisation. International Review of Sociology, 33(2), 332 to 354. https://doi.org/10.1080/03906701.2023.2192561 Nakata, M. (2022). Indigenous knowledge and the cultural interface: underlying issues at the intersection of knowledge and information systems. IFLA Journal, 48(4), 559 to 570. https://doi.org/10.1177/03400352221111927 Ndlovu-Gatsheni, S. J. (2021). The cognitive empire, politics of knowledge and African intellectual productions: reflections on struggles for epistemic freedom and resurgence of decolonisation in the twenty-first century. Third World Quarterly, 42(5), 882 to 901. https://doi.org/10.1080/01436597.2020.1775487 Paris, D. (2021). Culturally sustaining pedagogies and our futures. The Educational Forum, 85(4), 364 to 376. https://doi.org/10.1080/00131725.2021.1957634 Ravitch, S. M., and Carl, N. M. (2021). Qualitative Research: Bridging the Conceptual, Theoretical, and Methodological (2nd ed.). Sage. Rigney, L. I. (2023). Aboriginal higher education and epistemic sovereignty. Education Sciences, 13(4), article 384. https://doi.org/10.3390/educsci13040384 Shahjahan, R. A., Estera, A. L., Surla, K. L., and Edwards, K. T. (2022). Decolonizing curriculum and pedagogy: a comparative review across ten countries and implications for future research. Review of Educational Research, 92(1), 73 to 113. https://doi.org/10.3102/00346543211042423 Sleeter, C. E., and Zavala, M. (2022). Transformative Ethnic Studies in Schools: Curriculum, Pedagogy, and Research. Teachers College Press. Smith, L. T., and Smith, G. H. (2022). Doing Indigenous Work: Decolonizing and Transforming the Academy. In E. McKinley and L. T. Smith (Eds.), Handbook of Indigenous Education (pp. 1075 to 1101). Springer. https://doi.org/10.1007/978-981-15-1839-8 Stein, S. (2022). Unsettling the University: Confronting the Colonial Foundations of US Higher Education. Johns Hopkins University Press. #decolonizing_the_syllabus #curriculum_reform #higher_education #culturally_sustaining_pedagogy #indigenous_knowledges #epistemic_justice #academic_decolonization #inclusive_curriculum #pedagogy_of_the_oppressed #global_south_scholarship #syllabus_redesign #teaching_and_learning #university_transformation #knowledge_democracy #STULIB

  • Algorithmic Bias in Adaptive Learning Platforms: Implications for Marginalized Learners

    Adaptive learning platforms are now a common part of classrooms, universities, and corporate training programs. These systems promise personalization, efficiency, and better learning outcomes by using #machine_learning models to tailor content to each student. However, a growing body of research shows that these platforms are not neutral tools. They carry the assumptions, values, and blind spots of the people who design them and the data used to train them. This article critically evaluates commercial #adaptive_learning systems and examines how standardized algorithmic pathways, narrow performance metrics, and biased training data can reinforce educational inequalities for #marginalized_learners, including students from low income households, racial and ethnic minorities, students with disabilities, English language learners, and learners in the Global South. Drawing on recent scholarship from 2020 to 2025, the paper synthesizes evidence across three domains: the political economy of educational technology, the technical properties of predictive models, and the classroom level experiences of teachers and students. It argues that #algorithmic_bias in education is not a rare glitch but a structural feature that emerges from historical inequities encoded in data, from the profit motives of vendors, and from the tendency of schools to trust technical outputs more than human judgment. The article closes with a framework for critical evaluation that educators, administrators, and policymakers can use before adopting adaptive systems, and it outlines a research agenda focused on transparency, community participation, and #educational_justice. Keywords: adaptive learning, algorithmic bias, educational technology, equity, marginalized learners, artificial intelligence in education 1. Introduction Over the last decade, #personalized_learning has moved from a niche idea to a global industry. Companies such as Knewton, DreamBox, ALEKS, Squirrel AI, Century Tech, and many others sell #adaptive_learning platforms to schools and universities. These platforms use algorithms that adjust the pace, content, and difficulty of lessons based on how each student performs. Vendors often describe this as a fair and efficient way to meet every learner where they are. During and after the COVID 19 pandemic, adoption grew even faster, because schools needed remote and hybrid tools that could scale to millions of users at once (Williamson and Hogan, 2021). Yet the more these systems shape daily learning, the more urgent it becomes to ask a simple question: whose learning do they actually serve? A growing group of scholars in #critical_data_studies, education, and computer science argues that adaptive platforms are not neutral. They inherit the biases of the datasets used to train them, the assumptions of the engineers who build them, and the commercial pressures of the companies that sell them (Baker and Hawn, 2022; Selwyn, 2022). When these systems are used with #marginalized_learners, the consequences can be serious. A model that under predicts a student's ability may route that student into easier content, lower expectations, and slower progress. Over time, small biases compound into large gaps. This article critically evaluates commercial adaptive learning platforms and examines how their design choices affect #educational_equity. It asks three main questions: How does #algorithmic_bias enter adaptive learning platforms at the levels of data, model design, and deployment? What are the specific harms and benefits for marginalized learners, and how do these interact with existing inequalities? What frameworks, policies, and practices can reduce harm while keeping the useful features of personalization? The paper is aimed at students, early career researchers, teachers, and administrators who want a clear and grounded account of the debate. It uses plain language but follows the structure of a peer reviewed review article. Section 2 defines key terms. Section 3 explains the research approach. Section 4 traces the political and economic context of the #edtech industry. Section 5 examines the technical mechanisms through which bias enters models. Section 6 presents case evidence across several student populations. Section 7 discusses classroom and institutional effects. Section 8 offers a critical evaluation framework. Section 9 outlines policy and research directions. Section 10 concludes. The overall argument is that #algorithmic_bias in adaptive learning is not a rare accident. It is a predictable outcome of building automated decision systems on top of unequal societies. Recognizing this does not mean rejecting educational technology. It means treating these tools with the same care, oversight, and democratic scrutiny that we apply to other high stakes systems in medicine, finance, and criminal justice (Benjamin, 2023; Noble, 2023). 2. Key Concepts and Definitions Before evaluating adaptive systems, it helps to define several terms clearly. Adaptive learning platforms. These are digital systems that use data about a learner's actions to change the sequence, difficulty, or type of content presented next. They combine content libraries with a #recommendation_engine that decides what a student sees. Simple versions use rule based logic. More advanced versions use machine learning, including #knowledge_tracing models, #item_response_theory, and #deep_learning approaches such as recurrent neural networks and transformer based tutors (Holmes and Porayska-Pomsta, 2022). Personalization. In marketing language, personalization often means fitting a product to an individual. In education, this framing hides important choices. Personalization always involves grouping learners into #latent_categories or profiles, then treating them as members of those groups. The categories are chosen by the designers, not by the students (Bulger, 2020). Algorithmic bias. In this article, algorithmic bias refers to systematic patterns in the outputs of an algorithm that unfairly benefit or harm particular groups. Bias may come from unrepresentative data, from choices in model design, from the way outputs are used, or from the interaction between all three (Barocas, Hardt, and Narayanan, 2023). Bias is not always intentional. It is often invisible to the people who build the system. Marginalized learners. This umbrella term includes students who face structural disadvantages in education. It covers racial and ethnic minorities, learners from low income households, students with disabilities, English language learners, girls and women in male dominated subjects, LGBTQ students, refugee and migrant learners, and students in low resource regions of the Global South (Prinsloo, 2020). The list is not exhaustive, and any student can belong to more than one group. #Intersectionality matters, because algorithms often perform worst for those at the intersection of multiple axes of disadvantage. Fairness. There is no single technical definition of fairness. Common statistical definitions include #demographic_parity, equalized odds, and calibration across groups. These definitions can conflict with each other, meaning a model cannot always satisfy all of them at once (Kizilcec and Lee, 2022). Fairness is finally a normative question, not just a mathematical one. Datafication. This refers to the process by which human activity is converted into digital records that can be counted, stored, and analyzed. In education, datafication turns learning into a stream of clicks, timings, and scores that platforms use to train models (Jarke and Breiter, 2020). What is not measured is treated as if it does not exist. With these definitions in place, we can now look at how the paper was written. 3. Research Approach This paper is a critical literature synthesis rather than an empirical study. It draws on peer reviewed articles, edited volumes, monographs, and policy reports published mainly between 2020 and 2025. Sources were selected based on three criteria: relevance to adaptive learning and equity, methodological transparency, and diversity of geographic and disciplinary perspectives. The synthesis follows the tradition of #critical_edtech research, which combines close reading of technical systems with attention to their social and political contexts (Selwyn, 2022). Rather than treating adaptive platforms as black boxes to be measured only by test score outcomes, the approach asks what values are built into their design, who benefits, and who bears the costs. Three limitations should be noted at the start. First, most published evidence on commercial adaptive platforms comes from high income countries, especially the United States, the United Kingdom, and parts of East Asia. Coverage of Africa, Latin America, and much of South Asia is thinner. Second, vendors often keep the details of their algorithms confidential, so external researchers must work from partial information. Third, the field is moving fast, and any conclusions must be treated as provisional. Findings from 2021 may already be out of date because platforms update their models frequently. Despite these limits, a consistent pattern appears across the literature: without deliberate intervention, adaptive systems tend to reinforce existing inequalities rather than reduce them. The rest of the article develops this claim in detail. 4. The Political Economy of Adaptive Learning To understand #algorithmic_bias, we must first understand the industry that produces adaptive platforms. Commercial vendors are not neutral technology providers. They are businesses that must generate revenue and satisfy investors. Their design choices reflect these pressures (Williamson, 2021). 4.1 Market Growth and Concentration The global market for #edtech has grown quickly since 2018, and forecasts place it at several hundred billion United States dollars by the late 2020s (Selwyn, 2022). Adaptive learning is one of the fastest growing segments within this market, along with proctoring, learning analytics, and generative #AI tutors. A relatively small number of large firms dominate the sector, including Pearson, McGraw Hill, Cengage, Chegg, and specialized adaptive companies such as Squirrel AI in China and Century Tech in the United Kingdom. Many smaller vendors sell into specific niches, but the underlying content and model pipelines are often built on top of platforms provided by cloud giants such as Amazon Web Services, Microsoft Azure, and Google Cloud (Komljenovic, 2022). This concentration matters for several reasons. First, it means that a small number of design decisions can affect millions of learners. Second, it means that adaptive systems are often built to be sold across many regions, which pushes vendors toward standardized content and standardized #learner_models rather than locally grounded designs. Third, it creates strong incentives to protect intellectual property, which limits transparency and external audit. 4.2 Business Models and Data Extraction Most commercial adaptive platforms combine a subscription business model with data extraction. Schools or students pay for access, and the platform simultaneously collects large volumes of behavioral data that can be used to improve the product, sell add on services, or, in some cases, be shared with partners (Kwet, 2021). Even where formal data protection laws exist, enforcement is uneven, especially in lower income regions. The result is what some scholars call #digital_colonialism: a pattern in which data generated by learners in the Global South is used to train models controlled by firms in the Global North, with limited benefit flowing back to the communities that produced the data (Kwet, 2021; Prinsloo, 2020). Even in high income contexts, students and parents rarely see what data is collected, how models use it, or how long it is retained. 4.3 Marketing Narratives and Real Constraints Vendor marketing typically frames adaptive learning as a solution to the #achievement_gap. Promotional materials often show diverse students smiling in front of tablets, promising that each learner will get a tailored path to success. The underlying claim is that if instruction is personalized enough, structural inequalities can be overcome without changing schools, funding, or curricula (Bulger, 2020). This narrative is attractive to policymakers because it treats inequality as a technical problem rather than a political one. It is also attractive to schools facing teacher shortages, budget cuts, or growing class sizes. But the evidence for large equity gains from commercial adaptive systems is thin. A number of systematic reviews conclude that effects are modest on average, highly variable across contexts, and often smaller for the students who most need support (Escueta et al., 2020; Major, Francis, and Tsapali, 2021). Understanding this gap between promise and evidence is essential background for the technical discussion that follows. The pressures of a competitive market push vendors to overstate benefits and understate risks. Independent, critical research is therefore not a luxury; it is a necessary counterweight. 5. How Bias Enters Adaptive Systems Bias can enter an adaptive learning platform at many stages. It is useful to think of the pipeline in three broad layers: data, model, and deployment. Problems in any layer can create unfair outcomes for #marginalized_learners. 5.1 Data Layer Machine learning models learn patterns from historical data. If the data reflects unequal opportunities, the model will learn those inequalities as if they were natural facts (Barocas, Hardt, and Narayanan, 2023). Historical bias. Training data for adaptive systems typically includes past student scores, response times, and click patterns. These data were produced inside school systems that already treated students unequally. Students who received less support, less encouragement, or fewer resources tend to have lower scores in the data. A model trained on these records may treat lower performance as a stable feature of certain groups rather than as a signal of unequal conditions. Representation bias. Many commercial models are trained on datasets that overrepresent certain populations, such as students from wealthy urban schools in North America and Europe. When such models are deployed to rural learners, English language learners, or students in the Global South, they may fail to fit the actual patterns of learning in those groups (Baker and Hawn, 2022). Recent audits of #natural_language_processing tools used in essay scoring, for example, show that they can penalize dialects, non native writing styles, and culturally specific references (Loukina et al., 2020). Measurement bias. The variables recorded by platforms are proxies for learning, not learning itself. Time on task, click frequency, and quiz scores capture only some aspects of understanding. Students who think carefully before answering, who reread passages, or who consult peers offline may look less engaged in the data than students who click quickly (Jarke and Breiter, 2020). If a model treats fast clicking as a sign of mastery, it may reward superficial engagement and penalize deeper learning. Label bias. Even the outcome labels used to train adaptive models are constructed. A common target is whether a student answers the next item correctly. But correctness is only one form of success. Curiosity, persistence, collaboration, and creative problem solving are typically not labeled and therefore not optimized (Holmes and Porayska-Pomsta, 2022). 5.2 Model Layer Once data is collected, choices about how to build the model create further bias. Choice of features. Model designers must decide which variables to feed into the algorithm. Including features such as home postal code, device type, or school district can act as proxies for race and income, embedding social inequalities directly into predictions (Kizilcec and Lee, 2022). Excluding these features does not fully solve the problem, because other correlated variables can still leak the same information. Choice of objective. Models are trained to optimize a specific objective, such as predicted probability of correct response or expected time to mastery. Different objectives lead to different behavior. A model that optimizes short term performance may push struggling students toward easier content, reducing frustration in the short run but limiting long term growth (Doroudi and Brunskill, 2020). Choice of algorithm. Different algorithm families have different strengths and blind spots. Traditional #knowledge_tracing models such as Bayesian knowledge tracing are relatively transparent, but they may miss complex patterns. Modern deep learning models can capture richer signals, but they are harder to interpret and more prone to picking up spurious correlations. As adaptive platforms increasingly integrate #large_language_models, new risks appear, including hallucinated explanations and culturally narrow examples (Kasneci et al., 2023). Feedback loops. Adaptive systems influence the data they later learn from. If a model routes certain students toward easier content, those students will complete easier items, generating data that confirms the initial routing. Over time, this can lock groups of learners into low expectation pathways from which it is difficult to escape (Doroudi and Brunskill, 2020). Feedback loops are among the most important and least visible sources of #algorithmic_bias in education. 5.3 Deployment Layer Even a technically well designed model can produce unequal outcomes depending on how it is used. Integration with existing tracking. In many school systems, students are already sorted into ability groups or tracks. When an adaptive system is layered on top, its recommendations often reinforce these tracks rather than challenge them (Selwyn, 2022). Teachers may treat platform outputs as objective evidence that a student belongs in a lower group. Teacher trust and #automation_bias. Research on human interaction with algorithms shows a strong tendency to defer to automated outputs, especially when the underlying logic is opaque (Green and Chen, 2021). In classrooms, this can lead teachers to accept platform recommendations even when their own judgment tells them a student is capable of more. Assessment high stakes. When adaptive platforms are used only for practice, mistakes are less costly. When their outputs feed into grades, placement decisions, or admissions, the same technical bias becomes a life shaping decision. The move from formative to summative use amplifies harm (Perrotta et al., 2021). Lack of appeal. In many deployments, students and parents have no clear way to question or contest platform decisions. There is no equivalent of a grade appeal for an adaptive routing decision. This absence of #due_process is especially serious when the affected students belong to communities that have historically been marginalized in schools. Together, these three layers show that #algorithmic_bias is not a single problem to be fixed with one patch. It is a system property that must be addressed at every stage of design, deployment, and governance. 6. Evidence Across Marginalized Groups This section reviews evidence on how adaptive learning affects several groups of learners. The picture is not uniformly negative. In some cases, well designed platforms have improved access and outcomes. But recurring patterns of harm appear often enough to be treated as structural rather than accidental. 6.1 Low Income Learners Students from low income families often have the least access to high quality teaching, small classes, and enrichment activities. Adaptive learning is frequently promoted as a way to close this gap. Yet several studies show that the benefits of adaptive platforms are often smaller for low income students, and sometimes negative, when compared to well resourced peers (Escueta et al., 2020; Major et al., 2021). Reasons include unreliable internet access, older devices, shared home computers, and less quiet study space. When platforms assume a stable, individual, always on connection, they perform worse for students without those conditions. Adaptive models may then interpret slow completion, dropped sessions, or short study times as low motivation or low ability, when in fact they reflect infrastructure and household context (Reich, 2020). Low income schools also tend to have less capacity for professional development. Teachers may receive minimal training on how to interpret dashboards, override recommendations, or communicate with families about #data_privacy. In such settings, platform outputs can carry more weight than they deserve. 6.2 Racial and Ethnic Minority Learners In many national contexts, racial and ethnic minority students face a long history of biased testing, lowered expectations, and unequal school funding. When adaptive platforms are trained on data produced by such systems, they can reproduce these patterns. Audits of essay scoring, plagiarism detection, and behavioral analytics tools have documented systematic differences in outputs across racial groups (Loukina et al., 2020; Baker and Hawn, 2022). #Speech_recognition components in language learning tools have been shown to perform worse for speakers of African American English and for speakers with regional accents outside the mainstream training corpus (Koenecke et al., 2020). These are not isolated bugs. They are the predictable result of training on datasets that treat one dialect as the standard. Recommendation systems can also promote culturally narrow content. When example problems, texts, and characters are drawn mostly from one cultural tradition, students from other backgrounds may find themselves invisible in the curriculum, even when the platform claims to be #culturally_responsive (Nasir et al., 2021). 6.3 Students with Disabilities Adaptive learning holds real promise for students with disabilities. Text to speech, adjustable pacing, and multiple representations can support many learners. Yet current commercial systems often treat #accessibility as an add on rather than a core design principle. Common problems include inaccessible interfaces, quiz formats that assume typical motor control, and models trained without disability status as a considered variable (Cavanagh, Chen, Lahcen, and Paranjape, 2020). Students who use assistive technology may generate response patterns that look unusual to the model, leading to inaccurate assessments of their knowledge. Some platforms flag long response times or unusual navigation as possible cheating, which can disadvantage students who need more time or who use screen readers. There is also a deeper design issue. Adaptive models often assume a single, linear knowledge trajectory. Students with cognitive differences may follow non linear paths that nonetheless lead to strong understanding. When the model treats such paths as failures, it can push these learners into remedial loops that do not match their actual needs. 6.4 English Language Learners and Multilingual Learners Language is a major axis of bias in adaptive systems. Many platforms operate primarily in English or in a few dominant languages. Content translation is often mechanical, missing cultural nuance and subject specific vocabulary (Warschauer, Yim, Lee, and Zheng, 2020). For English language learners, adaptive systems may confuse language proficiency with subject knowledge. A student who understands mathematics but struggles with English word problems may be routed into lower mathematics content rather than being given language support alongside grade level mathematics. Over time, this misrouting can widen achievement gaps rather than close them. Multilingual learners bring rich linguistic resources that most adaptive systems ignore. Very few commercial platforms allow #translanguaging, where students draw on multiple languages to make sense of content. Instead, the platform enforces a monolingual model of learning that is out of step with how many students actually think. 6.5 Learners in the Global South Adaptive platforms designed and trained in North America and Europe are increasingly deployed in Africa, Latin America, South Asia, and parts of Southeast Asia. Vendors often present this as a democratizing move that brings world class resources to underserved regions. Critical scholars offer a different reading (Kwet, 2021; Prinsloo, 2020). When curricula, examples, and assessments are imported from elsewhere, local knowledge systems can be sidelined. Students may spend hours engaging with content that has little connection to their communities, environments, or economies. Teachers may find their expertise overridden by dashboards designed thousands of kilometers away. Data flows generated by these learners often benefit foreign firms more than local institutions. Infrastructure is also uneven. Platforms that assume broadband internet, personal devices, and reliable electricity often fail in regions where these are shared or intermittent. The technical failures then get read as learner failures. None of this means adaptive tools cannot serve learners in the Global South. It means that context sensitive design, local ownership, and community participation are essential. Otherwise, adaptive learning risks becoming a new form of #digital_colonialism (Kwet, 2021). 6.6 Gender and Intersectional Effects Gender bias in adaptive systems has received less attention than racial bias but is nonetheless important. Content libraries often reflect gender stereotypes in examples, characters, and career suggestions. Recommendation engines may push girls toward certain subject areas and boys toward others in subtle ways (D'Ignazio and Klein, 2020). #Intersectionality is critical. A model may perform reasonably well for majority group boys and majority group girls but fail badly for girls of color, girls with disabilities, or LGBTQ students. Because most reported evaluations focus on single axis comparisons, these compounded harms often remain invisible (Buolamwini and Gebru, 2018; Benjamin, 2023). 6.7 Generative AI Tutors and New Sources of Bias Between 2023 and 2025, a wave of #generative_AI tutors has entered the market, from stand alone chatbots to features embedded inside existing adaptive platforms. These tools use #large_language_models to produce explanations, hints, and feedback in natural language. In principle, they can offer a more human feeling interaction than traditional item based adaptive systems. In practice, they introduce new risks that overlap with, and sometimes exceed, the risks already discussed (Kasneci et al., 2023). First, large language models are trained on internet scale text that reflects the biases of the wider web. They can produce responses that carry stereotypes about gender, race, disability, and nationality, sometimes in subtle ways that are difficult to detect in a single interaction but that accumulate over thousands of exchanges. Second, these models can hallucinate, presenting confident but incorrect information. For students who do not yet have the background knowledge to evaluate the response, the effect can be to teach errors as facts. Third, generative tutors often work best in dominant languages and struggle with minority languages, regional variants, and code switching. This restricts their usefulness for many multilingual learners and can further center dominant cultural references (Warschauer et al., 2020). A further concern is that generative tutors are usually delivered through cloud services owned by a small number of very large firms. Their training data, safety filters, and update cycles are opaque to schools. When such a tool becomes part of a required learning pathway, the platform effectively imports a set of value judgments about acceptable speech, valid answers, and appropriate topics into every classroom that uses it. Community #digital_sovereignty becomes hard to maintain under these conditions. None of this means that generative tutors have no place in education. It means that the arguments developed earlier in this article, about #transparency, #accountability, and community participation, apply with even greater force to the newest layer of adaptive tools. 6.8 Positive Cases and Design Possibilities A balanced account must also acknowledge cases where adaptive learning has supported #marginalized_learners. Small studies have documented gains for students with specific learning differences when platforms offered flexible pacing, multimodal content, and clear feedback (Cavanagh et al., 2020). Some low bandwidth adaptive tools designed for African and South Asian contexts have shown promising results for foundational literacy and numeracy when combined with strong teacher support (Major et al., 2021). Community driven projects that use open source software and locally developed content offer counter examples to the extractive model criticized earlier. What these positive cases share is not a specific technology but a set of conditions: local relevance of content, teachers with real authority over how the tool is used, meaningful involvement of families, and honest reporting of both gains and limits. These conditions cannot be bought off the shelf. They must be built by communities and institutions over time. 6.9 Summary Across these groups, three patterns recur. First, the students most in need of accurate support are often the ones for whom models perform worst. Second, model errors tend to lower expectations rather than raise them, because underprediction is treated as prudent and overprediction is treated as risky. Third, harms compound over time as biased data feeds back into future models. Together, these patterns explain why concerns about #educational_justice and #algorithmic_bias are not marginal but central to the future of adaptive learning. 7. Classroom, Institutional, and Systemic Effects The consequences of algorithmic bias extend beyond individual students. They reshape classrooms, institutions, and education systems as a whole. 7.1 Effects on Teachers Adaptive platforms change the daily work of teachers. In some cases they free time for one on one interaction. In others, they turn teachers into monitors of dashboards, expected to follow platform generated recommendations even when their professional judgment differs (Perrotta et al., 2021; Selwyn, 2022). Teachers who work in under resourced schools often report feeling pressured to trust platform outputs, because they lack time to check them and because administrators use dashboard data in evaluations. This can erode #pedagogical_autonomy and reduce the space for context sensitive teaching. When platform outputs are biased, teachers who follow them without question can end up amplifying that bias. At the same time, well supported teachers can use adaptive systems critically. They can spot patterns that do not match their knowledge of a student, request explanations, and combine platform data with other evidence. This kind of critical use requires training, time, and institutional backing. 7.2 Effects on Curriculum Because adaptive platforms rely on structured content libraries, they tend to favor subjects and skills that can be broken into small, testable units. Mathematics, basic literacy, foreign language vocabulary, and coding are relatively easy to fit into this format. Complex writing, historical reasoning, ethical analysis, and creative work are harder (Bulger, 2020; Holmes and Porayska-Pomsta, 2022). When adaptive systems take a larger share of instructional time, the parts of the curriculum they cannot handle risk being squeezed out. This narrowing of curriculum is not evenly distributed. Well funded schools often protect broad curricula through project based learning, arts, and discussion. Under resourced schools are more likely to rely heavily on adaptive drills, especially in subjects considered core. Over time, this can widen curricular inequality even as content coverage looks similar on paper. 7.3 Effects on Assessment Adaptive platforms blur the line between practice and assessment. Every interaction is recorded and can be used to build a profile of the learner. In principle, this offers rich formative feedback. In practice, it raises concerns about #surveillance and about the fairness of decisions based on constant monitoring (Perrotta et al., 2021). When platform outputs feed into grades, tracking, or admissions, the students who spend more time on the system generate more evidence, which can be either favorable or unfavorable. Students who use the system less, perhaps because of home circumstances, may appear to know less than they do. Assessment quietly shifts from measuring learning to measuring engagement with a specific product. 7.4 Effects on Institutions At the institutional level, dependence on adaptive platforms can reshape budgets, staffing, and governance. Some universities have used adaptive tools to justify larger class sizes, reduced tutoring hours, or shifts toward part time teaching staff (Williamson and Hogan, 2021). Data sharing agreements may commit institutions to years of vendor lock in, making it hard to switch systems or negotiate better terms. Public schools face similar pressures. Contracts with vendors often include clauses about data ownership, service levels, and access to raw data that are not well understood by school leaders. Once systems are deployed, teachers, students, and families adapt their routines to them, which further increases switching costs. 7.5 Systemic Effects At the level of the education system, widespread adoption of adaptive learning shifts power. It moves decisions about pedagogy, content, and assessment away from teachers, unions, and local communities and toward technology firms and their engineers (Selwyn, 2022; Williamson, 2021). It creates new incentives for schools to generate certain kinds of data, and it makes some kinds of learning invisible. Systemic effects also include long term risks to civic life. Students who learn primarily through platforms may internalize the assumption that learning is a matter of matching correct responses to prompts. That assumption fits poorly with the messy, collaborative, and often disagreement filled reality of #democratic_participation (Reich, 2020). Understanding these wider effects is important because it changes how we think about reform. Fixing algorithmic bias in one product is not enough. Broader institutional and policy change is required. 8. A Framework for Critical Evaluation Given these risks, how should educators, administrators, and policymakers decide whether and how to use adaptive learning platforms? This section offers a practical framework organized around six questions. It draws on principles from #critical_edtech research, #responsible_AI guidance, and #educational_justice traditions. 8.1 Purpose and Fit Before adopting any adaptive platform, users should be able to state clearly what problem the platform is meant to solve and why other approaches have been considered. Key questions include: What learning goals are we trying to support? Are those goals suited to automated adaptation, or do they need human interaction? What research evidence exists for effects on our student population, not just averages? If the answers are vague or if the main driver is marketing rather than pedagogy, that is a warning sign. 8.2 Data and Model Transparency Vendors should be able to describe, in plain language: What data is collected, how long it is stored, and who has access. What variables the model uses to make predictions. What performance the model achieves overall and for identifiable subgroups. How the platform behaves when data is missing or unusual. Where vendors refuse to answer, buyers should treat this as a serious limitation. Concerns about intellectual property are real, but they cannot fully override the right of learners and communities to understand systems that shape their lives (Kizilcec and Lee, 2022). 8.3 Equity and Impact Assessment An #equity_audit should be part of any procurement decision. Useful elements include: Independent evaluations of the platform with populations similar to the intended users. Disaggregated performance metrics by race, income, disability, language, and gender. Attention to intersectional effects rather than only single axis comparisons. Clear plans to monitor outcomes after deployment, with defined thresholds for corrective action. Where evidence is thin, small pilots with strong evaluation should precede full adoption. 8.4 Teacher and Learner Agency Adaptive tools should support, not replace, professional judgment. Evaluation should ask: Can teachers see the reasoning behind recommendations? Can teachers override the system without penalty? Can students and families understand what the platform is doing and challenge decisions? Are there ways for learners to signal that a recommendation does not fit them? Systems that treat teachers as data entry clerks and students as passive users should be avoided. 8.5 Governance and Accountability Governance structures matter as much as technical features. Important questions include: Who owns the data generated by learners? What happens to the data if the contract ends? What independent oversight mechanisms exist? Are there clear paths for complaints and remedies? Contracts should not lock institutions into arrangements that they cannot revisit as evidence and needs change. 8.6 Environmental and Resource Costs A newer but increasingly important area is the environmental cost of adaptive #AI. Training and running large models consumes significant electricity and water, and manufacturing the devices used to access them requires rare minerals extracted in ways that often harm frontline communities. Educational institutions that adopt these tools at scale have an obligation to ask about #sustainability, including energy sources, hardware lifecycles, and the appropriateness of the model size relative to the pedagogical task. In many cases, smaller and simpler tools may serve learners just as well while imposing far lower costs on the planet and on communities that supply the materials. 8.7 Community Participation Finally, the framework insists on community participation. This is grounded in traditions of #participatory_design and community based research (Costanza-Chock, 2020). Communities that will be affected by an adaptive platform should have a real voice in whether and how it is used. This is especially important for #marginalized_learners whose experiences are least well represented in vendor design processes. Practical steps include advisory boards with parents and students, public reporting of outcomes, and space for teachers to share observations back to vendors and administrators. Together, these six areas offer a structured way to think about adaptive learning that goes beyond glossy demos. They are not a checklist to be completed once but a set of ongoing commitments. 9. Policy and Research Directions Individual schools and institutions cannot solve these problems alone. Broader policy and research efforts are needed. 9.1 Regulation and Standards Several jurisdictions are moving toward regulating #AI systems, including those used in education. The European Union #AI Act treats certain educational uses as high risk, requiring specific transparency, risk management, and human oversight measures (European Parliament and Council, 2024). Other regions are considering similar frameworks. Educational specific regulation should build on general #AI rules but also address the particular power dynamics between children, families, schools, and vendors. Key policy priorities include: Mandatory impact assessments before adoption in public education. Standards for data minimization, retention, and portability. Rules against using children's data for advertising or unrelated commercial purposes. Clear liability rules when biased outputs cause harm. Standards bodies can also help by developing shared benchmarks for fairness, robustness, and explanation quality in educational #AI (Holmes and Porayska-Pomsta, 2022). 9.2 Independent Auditing Independent audits of educational algorithms are still rare. Building this capacity requires investment in: Trained auditors who understand both machine learning and education. Legal protections for researchers who examine commercial systems. Access agreements that allow meaningful testing without violating privacy. Public reporting of audit results. Auditing should not be a one time event. As models are retrained, audits must be repeated (Raji et al., 2020). 9.3 Public Alternatives Reliance on a small number of commercial vendors is itself a source of risk. Public and nonprofit alternatives, including open source adaptive tools and public data infrastructures, can create more diversity in the ecosystem (Kwet, 2021). Regional and national investment in such alternatives can also support local languages, curricula, and pedagogical traditions. 9.4 Research Agenda Priority research areas include: Long term studies of adaptive learning effects on marginalized learners, including tracking after students leave the platform. Studies of teacher and student experience that go beyond satisfaction surveys and include ethnographic and participatory methods (Costanza-Chock, 2020). Comparative work across countries and regions to understand how the same platforms behave differently in different contexts. Development of #fairness_metrics that reflect educational values rather than only statistical parity. Investigation of the environmental costs of large scale #AI in education, including energy and hardware use. Cross disciplinary teams that include educators, computer scientists, social scientists, ethicists, and community members are essential. 9.5 Toward Open and Localized Infrastructure A further direction concerns open and localized infrastructure. Global adaptive systems built by a small number of firms are unlikely to serve the full range of learners and contexts. Investment in open standards for content, learner records, and model interfaces would allow schools and universities to mix and match components rather than depend on a single vendor. Regional consortia, especially in low income settings, could pool resources to develop adaptive tools rooted in local languages, curricula, and pedagogical traditions (Kwet, 2021; Prinsloo, 2020). Universities can play an important role by hosting shared infrastructure, training local engineers, and building communities of practice that combine technical and pedagogical expertise. Without such alternatives, the current commercial pattern is likely to continue, and the risks discussed throughout this article will continue with it. 9.6 Preparing Educators and Learners Finally, both educators and learners need preparation to engage critically with adaptive systems. Teacher education programs should include #data_literacy, algorithm literacy, and reflection on the political dimensions of technology (Selwyn, 2022). Curricula for students, from primary school through university, should include age appropriate content on how #AI works, how it can be biased, and how to advocate for their own rights. Building this capacity is not just about avoiding harm. It is about preparing citizens who can shape the digital future rather than passively accept it. 10. Discussion The evidence reviewed in this article supports three broad conclusions. First, #algorithmic_bias in adaptive learning platforms is real, structured, and often invisible to users. It is not a matter of a few faulty products but a general property of building automated decision systems on top of unequal societies (Barocas, Hardt, and Narayanan, 2023; Benjamin, 2023). Fixing surface issues without addressing this structural reality will produce only cosmetic improvements. Second, the students most likely to be harmed are those already at the margins of education: low income learners, racial and ethnic minorities, students with disabilities, English language learners, and learners in the Global South. These are also the students for whom vendors most often claim their products will make the biggest difference. The gap between marketing and evidence is therefore not a neutral communication issue. It has direct implications for #educational_justice. Third, technical fixes alone are not enough. Better data, better models, and better metrics matter, but they must be combined with governance, participation, and policy change. Otherwise, technical improvements can be absorbed into the same commercial and institutional dynamics that produced the problem in the first place (Selwyn, 2022; Williamson, 2021). At the same time, the article does not argue for rejecting adaptive learning altogether. Well designed, well governed adaptive tools can genuinely support learners. They can free teacher time for high value interaction, provide targeted practice, and offer feedback in ways that scale better than any single human could. The question is not whether to use them but how to use them, on whose terms, and with what safeguards. Several tensions remain unresolved and deserve continued attention. One is the tension between personalization and standardization. To personalize, systems must sort learners into groups. To be fair, they must treat learners as individuals. Balancing these demands is a moral and design challenge with no clean solution (Bulger, 2020). Another tension is between openness and privacy. Transparency about models often requires access to data that raises privacy concerns, especially for children. New approaches, including privacy preserving evaluation methods and community controlled data trusts, may help but are still emerging (Jarke and Breiter, 2020). A third tension is between local and global. Adaptive systems benefit from large datasets that span many contexts. But they harm learners when they impose distant norms on local settings. Federated learning, local model tuning, and regional data governance are technical and political responses that need much more research (Kwet, 2021; Prinsloo, 2020). None of these tensions can be resolved by algorithms alone. They require deliberate, democratic decisions about what education is for and who gets to shape it. 10.1 Situating the Argument in Broader Debates It is worth situating this argument within broader debates about #technology_and_society. Similar concerns have been raised about predictive policing, welfare eligibility algorithms, hiring tools, and health risk scores. In each case, systems that were marketed as neutral and efficient have been shown to reproduce or intensify existing inequalities when deployed without care. Education is not exceptional. What is distinctive is that education involves children, that participation is often mandatory, and that the effects on students unfold over years. These features raise the bar for evidence and accountability, not lower it. The experience of other sectors also offers useful lessons. Independent audits, mandatory disclosure of model behavior, and community oversight boards have started to reshape practice in health and finance. Education can borrow from these examples while adapting them to its own context, in which teachers, parents, and students are the key stakeholders and where the harms of a bad decision are cumulative rather than immediate. The point is not that any single mechanism will fix #algorithmic_bias, but that a combination of technical, institutional, and democratic mechanisms is needed. 11. Conclusion Adaptive learning platforms are a powerful set of tools that are reshaping how students learn around the world. This article has argued that these tools carry serious risks of #algorithmic_bias, especially for #marginalized_learners. The risks arise from the data used to train models, from the design choices made by engineers, from the commercial pressures of the industry, and from the ways in which schools and teachers use platform outputs in daily practice. The evidence gathered here suggests that #educational_equity cannot be achieved by technology alone. It requires attention to the political economy of #edtech, to the technical properties of models, and to the classroom conditions in which platforms are used. It requires strong governance, meaningful community participation, and independent research. It requires educators and learners who understand how these systems work and who feel entitled to question them. For students reading this article, the practical implication is not that adaptive tools should be avoided. It is that they should be approached with informed care. The next generation of teachers, researchers, engineers, and policymakers will play a large role in deciding whether adaptive learning becomes a tool for #educational_justice or another mechanism for reproducing #inequality. That decision is not fixed by the technology itself. It will be shaped by the choices we make together. The central task, therefore, is not to celebrate or reject adaptive learning but to build the critical capacity to evaluate it in specific contexts. That means keeping #marginalized_learners at the center of the analysis, treating vendor claims with healthy skepticism, and insisting that education remain, above all, a human and democratic project. Hashtags #adaptive_learning #algorithmic_bias #educational_equity #edtech_critique #personalized_learning #marginalized_learners #AI_in_education #data_justice #critical_pedagogy #digital_divide #learning_analytics #student_privacy #fairness_in_AI #inclusive_education #ethical_edtech References Baker, R. S., and Hawn, A. (2022). Algorithmic bias in education. International Journal of Artificial Intelligence in Education, 32(4), 1052 to 1092. https://doi.org/10.1007/s40593-021-00285-9 Barocas, S., Hardt, M., and Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press, Cambridge, Massachusetts. Benjamin, R. (2023). Viral Justice: How We Grow the World We Want. Princeton University Press, Princeton, New Jersey. Bulger, M. (2020). The promises, challenges, and futures of media literacy and personalized learning. Data and Society Research Institute, New York. Buolamwini, J., and Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 77 to 91. Cavanagh, T., Chen, B., Lahcen, R. A. M., and Paranjape, J. (2020). Constructing a design framework and pedagogical approach for adaptive learning in higher education. International Review of Research in Open and Distributed Learning, 21(1), 173 to 197. https://doi.org/10.19173/irrodl.v21i1.4557 Costanza-Chock, S. (2020). Design Justice: Community Led Practices to Build the Worlds We Need. MIT Press, Cambridge, Massachusetts. D'Ignazio, C., and Klein, L. F. (2020). Data Feminism. MIT Press, Cambridge, Massachusetts. Doroudi, S., and Brunskill, E. (2020). Fairer but not fair enough: On the equitability of knowledge tracing. In Proceedings of the 9th International Conference on Learning Analytics and Knowledge, 335 to 339. Association for Computing Machinery, New York. Escueta, M., Nickow, A. J., Oreopoulos, P., and Quan, V. (2020). Upgrading education with technology: Insights from experimental research. Journal of Economic Literature, 58(4), 897 to 996. https://doi.org/10.1257/jel.20191507 European Parliament and Council (2024). Regulation on Artificial Intelligence (Artificial Intelligence Act). Official Journal of the European Union, Brussels. Green, B., and Chen, Y. (2021). Algorithmic risk assessments and human discretion in criminal justice and education. Big Data and Society, 8(1), 1 to 16. https://doi.org/10.1177/20539517211023554 Holmes, W., and Porayska-Pomsta, K. (Eds.) (2022). The Ethics of Artificial Intelligence in Education: Practices, Challenges, and Debates. Routledge, London. Jarke, J., and Breiter, A. (2020). Editorial: The datafication of education. Learning, Media and Technology, 44(1), 1 to 6. https://doi.org/10.1080/17439884.2019.1573833 Kasneci, E., Sessler, K., Kuchemann, S., Bannert, M., Dementieva, D., Fischer, F., et al. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274 Kizilcec, R. F., and Lee, H. (2022). Algorithmic fairness in education. In W. Holmes and K. Porayska-Pomsta (Eds.), The Ethics of Artificial Intelligence in Education, 174 to 202. Routledge, London. Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., et al. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14), 7684 to 7689. https://doi.org/10.1073/pnas.1915768117 Komljenovic, J. (2022). The future of value in digitalised higher education: Why data privacy should not be our biggest concern. Higher Education, 83(1), 119 to 135. https://doi.org/10.1007/s10734-020-00639-7 Kwet, M. (2021). Digital colonialism: The evolution of American empire. Roar Magazine, published by ROAR Collective, Amsterdam. Loukina, A., Madnani, N., and Zechner, K. (2020). The many dimensions of algorithmic fairness in educational applications. In Proceedings of the 15th Workshop on Innovative Use of Natural Language Processing for Building Educational Applications, 1 to 10. Association for Computational Linguistics. Major, L., Francis, G. A., and Tsapali, M. (2021). The effectiveness of technology supported personalised learning in low and middle income countries: A meta analysis. British Journal of Educational Technology, 52(5), 1935 to 1964. https://doi.org/10.1111/bjet.13116 Nasir, N. S., Lee, C. D., Pea, R., and McKinney de Royston, M. (2021). Handbook of the Cultural Foundations of Learning. Routledge, New York. Noble, S. U. (2023). Algorithms of Oppression revisited: New directions in critical technology studies. Journal of Communication, 73(3), 205 to 216. https://doi.org/10.1093/joc/jqad010 Perrotta, C., Gulson, K. N., Williamson, B., and Witzenberger, K. (2021). Automation, APIs and the distributed labour of platform pedagogies in Google Classroom. Critical Studies in Education, 62(1), 97 to 113. https://doi.org/10.1080/17508487.2020.1855597 Prinsloo, P. (2020). Data frontiers and frontiers of power in and through higher education. Teaching in Higher Education, 25(4), 366 to 383. https://doi.org/10.1080/13562517.2020.1723537 Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., Smith-Loud, J., Theron, D., and Barnes, P. (2020). Closing the AI accountability gap: Defining an end to end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, 33 to 44. Association for Computing Machinery, New York. https://doi.org/10.1145/3351095.3372873 Reich, J. (2020). Failure to Disrupt: Why Technology Alone Cannot Transform Education. Harvard University Press, Cambridge, Massachusetts. Selwyn, N. (2022). Education and Technology: Key Issues and Debates. Third Edition. Bloomsbury Academic, London. Warschauer, M., Yim, S., Lee, H., and Zheng, B. (2020). Recent contributions of data mining to language learning research. Annual Review of Applied Linguistics, 39, 93 to 112. https://doi.org/10.1017/S0267190519000023 Williamson, B. (2021). Making markets through digital platforms: Pearson, edu business, and the (e)valuation of higher education. Critical Studies in Education, 62(1), 50 to 66. https://doi.org/10.1080/17508487.2020.1737556 Williamson, B., and Hogan, A. (2021). Pandemic Privatisation in Higher Education: Edtech and University Reform. Education International, Brussels.

Latest Book Releases:

WELCOME TO THE INTERNATIONAL STUDENTS LIBRARY

bottom of page