Long‑Term Psychological Effects of AI Tutoring: Moderated by Design and Context¶

1. Introduction¶

This systematic review seeks to illuminate the long‑term psychological consequences of AI‑tutoring systems relative to conventional human instruction. The overarching research question guiding the inquiry is: what are the longitudinal effects on cognitive, affective, motivational, metacognitive, and relational outcomes of learners who receive sustained AI tutoring compared with those taught by human teachers? To address this question, the review pursues four principal objectives: (1) to identify and evaluate longitudinal empirical studies that compare AI tutoring with human instruction on the aforementioned psychological domains; (2) to catalog the validated psychometric instruments employed in these studies and to report effect sizes where available; (3) to examine how AI design features—such as adaptive feedback, affective monitoring, and explainability—moderate psychological trajectories; and (4) to delineate gaps that inform future longitudinal mixed‑methods research.

The systematic review methodology is organized across four subsections. Section 1.1 presents the research question and delineates the objectives that frame the investigation. Section 1.2 describes the comprehensive search strategy, including the databases, Boolean logic, and temporal window employed to capture relevant empirical evidence. Section 1.3 outlines the inclusion and exclusion criteria that were applied to filter studies based on design, population, intervention type, outcome measures, language, and publication status. Finally, Section 1.4 details the screening process, quality assessment procedures, data extraction protocols, and the planned synthesis approach that will be used to integrate findings across studies.

Having outlined the research question and the systematic review methodology, the subsequent section examines the theoretical foundations of AI tutoring and its psychological implications.

1.1. Research Question and Objectives¶

The overarching aim of this review is to elucidate the long‑term psychological consequences of early exposure to AI‑tutoring systems relative to traditional human instruction. Specifically, we pose the following primary research question: What are the longitudinal effects on cognitive, affective, motivational, metacognitive, and relational outcomes of children and adolescents who receive sustained AI‑tutoring compared to those taught by human teachers? To address this question, the review will operationalize a set of measurable outcomes drawn from well‑established psychometric constructs. Cognitive outcomes include domain‑specific achievement and critical‑thinking proficiency, building on evidence that AI can both enhance and undermine these skills when over‑reliance occurs [13][78]. Affective outcomes encompass self‑efficacy, belonging, and loneliness, constructs for which cross‑sectional data suggest AI may erode social connectedness but longitudinal evidence is lacking [40][61]. Motivational outcomes will be assessed via intrinsic motivation and perceived learner control, for which short‑term gains have been reported yet long‑term trajectories remain unexplored [3][41]. Metacognitive outcomes—such as self‑regulation, judgment of learning, and strategic planning—will be measured against validated instruments, given the potential for AI to scaffold or deskill these skills over time [67][70]. Finally, relational outcomes will capture teacher‑student relationship quality and interpersonal trust, dimensions that have been shown to mediate academic success and may be differentially supported by AI versus human tutors [25][44].

The review will explicitly integrate contextual moderators identified in the literature, including developmental stage, socioeconomic status, digital access, cultural responsiveness, and AI design features such as adaptive hints, gamified reinforcement, and explainability mechanisms. Prior systematic reviews and meta‑analyses highlight the absence of longitudinal, comparative data that simultaneously account for these moderators and employ validated psychological scales [2][17][51]. Consequently, the objectives of the present study are: (1) to conduct a comprehensive systematic review of longitudinal studies comparing AI tutors with human teachers on the aforementioned outcomes; (2) to map the prevalence of validated measurement instruments and report effect sizes where available; (3) to synthesize evidence on how AI design features moderate psychological trajectories; and (4) to identify gaps that inform future longitudinal mixed‑methods research.

Having established the research question and operational objectives, the subsequent section will detail the search strategy and databases employed to capture the relevant empirical literature.

1.2. Search Strategy and Databases¶

The systematic search was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta‑Analyses (PRISMA) framework, ensuring transparent and reproducible retrieval of empirical studies on AI tutoring and human instruction [36]. The search strategy combined a comprehensive list of databases, a structured Boolean logic, and a defined temporal window to capture the breadth of relevant literature.

Databases.
The search incorporated the following peer‑reviewed academic databases, selected for their coverage of psychology, education, information systems, and technology journals: PsycINFO, ERIC, Scopus, Web of Science, ACM Library, AIS Electronic Library, Business Source Premier, IEEE Computer Society Digital Library, JSTOR, and ScienceDirect. These databases were chosen to maximize interdisciplinary retrieval (see 301103be-... for the explicit list of seven core databases, 70ce796a-... for Web of Knowledge and AIS, and 9f215ad9-... for Business Source Premier, Scopus, PsychInfo, and AIS). In addition, manual searches of leading conferences and journals not indexed in the primary databases were performed to ensure comprehensive coverage.

Boolean logic and search terms.
The search string was constructed by combining primary concepts—“AI tutoring”, “human instruction”, and “long‑term psychological outcomes”—with synonyms and related terms using Boolean operators. Primary keywords were linked with “AND” to ensure relevance across constructs, while synonyms were combined with “OR” to broaden retrieval (e.g., “human‑AI interaction” OR “human‑AI hybrids”; “ethics” OR “AI governance”) [301103be-...]. Clustered keyword sets were further combined using “AND” across thematic axes (AI, higher education, learning outcomes) to capture studies that addressed all dimensions of interest [27]. The same Boolean structure was applied consistently across all databases.

Time frame.
The search window was limited to studies published between January 2014 and April 2024, aligning with the review’s overall temporal scope and ensuring inclusion of contemporary empirical work on AI tutoring practices and their psychological impacts [27]. While some source reviews (e.g., 301103be-...) did not specify an explicit cutoff, the present search deliberately constrained the period to this decade to focus on the most recent longitudinal evidence.

Search execution and updates.
The initial database search was executed in June 2022, with an update in January 2023 to capture newly published studies. Duplicate records were removed, and a PRISMA flow diagram was constructed to document the screening and selection process (as described in 301103be-... and 69c26f57).

Having outlined the systematic search strategy, the next section will detail the inclusion and exclusion criteria applied to the retrieved records.

1.3. Inclusion and Exclusion Criteria¶

Inclusion criteria
The review considered peer‑reviewed journal articles that met the following specifications: (1) study design was longitudinal with a minimum follow‑up of six months to capture sustained psychological trajectories; (2) participants were children or adolescents (ages 5–18) receiving either AI‑tutoring interventions or conventional human instruction; (3) interventions explicitly involved an AI tutoring system (e.g., intelligent tutoring system, adaptive learning platform, chatbot tutor) or a comparable human‑led instructional setting; (4) reported at least one validated psychological outcome (e.g., motivation, self‑efficacy, affective adjustment, metacognitive skill, relational quality); (5) published in English between January 2014 and April 2024; and (6) originated from a peer‑reviewed academic journal. These parameters were adapted from the inclusion framework of the higher‑education AI review [11] and the AI‑learning tools review [27], which both required peer‑review status and an English language filter.

Exclusion criteria
Studies were excluded if they: (1) were conference proceedings, book chapters, dissertations, or other grey literature; (2) focused exclusively on higher‑education contexts or adult learners; (3) lacked a longitudinal component or had follow‑up shorter than six months; (4) omitted any measurement of psychological constructs and reported only academic or technical outcomes; (5) were not indexed in a recognized scholarly database (e.g., PsycINFO, ERIC, Scopus, Web of Science) and therefore could not be retrieved through the systematic search; or (6) were published in a language other than English. These exclusions mirror the criteria used in the systematic reviews of AI in education [11] and the broader AI‑learning tools review [27] and are consistent with the PRISMA‑aligned process described in the initial search [36].

Having established these inclusion and exclusion parameters, the next section will detail the screening, quality assessment, and data extraction procedures.

1.4. Screening, Quality Assessment, and Data Extraction¶

The screening and selection of studies proceeded in two distinct phases. First, titles and abstracts were independently reviewed by two investigators against the predefined inclusion and exclusion criteria. Disagreements were resolved through discussion or, if necessary, a third reviewer. Second, full‑text articles that passed the initial screen were retrieved and examined in detail to confirm eligibility. Inter‑rater reliability was quantified using Cohen’s kappa, with a threshold of 0.80 or higher considered acceptable for inclusion.

Quality appraisal was conducted using established tools appropriate to study design. Randomized controlled trials were evaluated with the Cochrane Risk‑of‑Bias 2.0 instrument, assessing domains such as randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result. For mixed‑methods studies, the Mixed Methods Appraisal Tool (MMAT) was employed to appraise quantitative, qualitative, and integration components. Qualitative investigations were appraised with the Critical Appraisal Skills Programme (CASP) checklist, focusing on clarity of research aims, methodological rigour, and reflexivity. Each study received an overall quality rating (high, medium, low) based on the cumulative assessment across domains.

Data extraction followed a standardized form that captured key variables: study design, sample size and characteristics, intervention details (AI tutor type, duration, and delivery mode), comparator (human instruction), outcome measures (including validated psychometric instruments for cognitive, affective, motivational, metacognitive, and relational domains), effect size estimates (Cohen’s d, odds ratios, or standardized mean differences), and reported risk‑of‑bias judgments. When effect sizes were not reported, raw data were extracted to compute them, ensuring consistency across studies.

The planned synthesis strategy combined quantitative and qualitative approaches. Where sufficient homogeneity existed in outcome measures and effect sizes, a random‑effects meta‑analysis was conducted to estimate pooled effects and heterogeneity (I² statistic). In cases of substantial heterogeneity or diverse measurement instruments, a narrative synthesis was undertaken, structured around the five psychological outcome domains and the moderating factors identified in the literature. This dual approach allows for a comprehensive assessment of both the magnitude of AI tutoring effects and the contextual nuances that shape these outcomes.

Having established the screening, quality assessment, and data extraction protocols, the following section will explore the theoretical foundations of AI tutoring and its psychological implications.

2. Theoretical Foundations of AI Tutoring and Psychological Development¶

This section articulates the theoretical underpinnings linking AI tutoring to long‑term psychological development. First, it delineates the instructional and socio‑emotional affordances of AI tutors—real‑time adaptive feedback, affective monitoring, and dialogue‑based interaction—and how these mechanisms map onto traditional pedagogies such as lecture, reciprocal teaching, Socratic dialogue, and collaborative learning. Second, it grounds AI tutor design in developmental theory, integrating Piagetian constructivism, Vygotskian sociocultural mediation, and Bronfenbrenner’s ecological systems to align adaptive scaffolding with learners’ cognitive stages and contextual environments. Finally, it synthesizes these insights into a multilevel integrative model that couples design‑feature, contextual‑moderator, and outcome‑trajectory layers, offering a framework for predicting how AI tutoring shapes cognitive, affective, motivational, metacognitive, and relational outcomes over time.

2.1. Instructional and Socio‑Emotional Affordances of AI Tutors¶

AI‑mediated instruction can be conceptualised as a constellation of affordances that mirror, extend, or substitute the core elements of traditional lecture‑based, direct, reciprocal, Socratic, and collaborative pedagogies. Across the literature, three intertwined mechanisms emerge as the primary means by which artificial tutors enact these instructional forms: (1) real‑time adaptive feedback, (2) affective monitoring and regulation, and (3) dialogue‑based interaction. Each mechanism can be mapped onto the instructional modalities and the socio‑emotional dimensions that underpin effective teaching.

Lecture‑based and Direct Instruction. Intelligent tutoring systems (ITS) can deliver highly individualized pacing and content sequencing, thereby emulating the teacher‑driven approach of lecture‑based instruction while adding a layer of adaptive precision. Adaptive pacing and personalized feedback are reported to reduce cognitive load and improve academic performance, thereby compensating for the lack of physical and emotional presence that characterises virtual instruction [29][10][34]. Affective monitoring—through emotion recognition or physiological sensing—enables the system to detect signs of disengagement or frustration and to adjust the instructional tempo or provide supportive prompts, further mitigating the emotional disconnection that often accompanies purely direct instruction [29][10].

Reciprocal Teaching and Socratic Method. Reciprocal teaching and the Socratic method rely on dialogue, questioning, and feedback loops. AI systems equipped with performance‑prediction models and learning‑behavior analytics can scaffold such exchanges by generating targeted prompts, offering clarifying questions, and signalling when a learner’s reasoning requires further elaboration. These capabilities are framed as potential analogues to Socratic questioning and reciprocal dialogue, yet empirical studies explicitly demonstrating AI tutors executing Socratic or reciprocal teaching strategies remain scarce [29][63][76]. Trust and explainability are identified as critical moderators; without transparent rationale for AI suggestions, learners may hesitate to engage in open inquiry, thereby limiting the fidelity of the Socratic emulation [63][76].

Collaborative Learning. Collaborative pedagogy thrives on shared participation, peer feedback, and group reflection. AI can support collaboration by providing virtual collaboration tools, monitoring group affect, and analysing interaction patterns to identify emotional bottlenecks such as dominance or withdrawal. Group‑learning analytics can prompt under‑speaking students or recommend equitable participation strategies, thereby partially replicating the socio‑emotional richness of face‑to‑face collaboration [29][34][52][79]. However, the literature notes that the socio‑emotional depth of in‑person collaboration—trust, empathy, and mutual accountability—remains only partially captured by current AI‑mediated systems [52][79].

Affective Monitoring and Socio‑Emotional Support. Emotion‑aware AI can detect affective states, deliver empathic responses, and guide emotion‑regulation strategies. Empirical evidence indicates that such affective interventions can reduce anxiety, enhance belonging, and support self‑efficacy, though longitudinal data are lacking [10][63][71][f1a5ad3a][79]. Affective monitoring also underpins engagement‑preserving mechanisms, such as early dropout alerts and adaptive pacing, which in turn sustain motivation and persistence over time [10][76].

Dialogue‑Based Interaction. Conversational agents (chatbots, virtual tutors) provide real‑time dialogue that can scaffold self‑regulated learning (SRL) by prompting goal setting, strategy planning, and metacognitive monitoring. Rule‑based or probabilistic adaptive scaffolds, as illustrated in the MetaTutor case, demonstrate that dialogue can elevate engagement and SRL performance in short‑term studies [63][67][71][76]. Nonetheless, the absence of transparent, explainable interaction models may erode trust and limit the long‑term psychological benefits of such dialogue‑based tutoring [63][76].

Current Gaps and Future Directions. While the literature documents the pedagogical potential of AI‑mediated feedback, affective monitoring, and dialogue, there remains a conspicuous dearth of longitudinal, comparative evidence on how these affordances translate into sustained psychological outcomes relative to human instruction. Moreover, the interplay between trust, explainability, and human‑AI collaboration requires systematic investigation to determine whether AI tutors can reliably emulate the socio‑emotional scaffolding provided by teachers. Addressing these gaps will demand rigorous longitudinal mixed‑methods studies that integrate validated psychometric instruments with detailed logs of AI‑mediated interactions and socio‑emotional indicators [29][10][76][f1a5ad3a][79].

Having mapped the instructional and socio‑emotional affordances of AI tutors, the following section will examine how developmental theories inform AI tutor design and the mechanisms by which these designs may influence long‑term psychological trajectories.

2.2. Developmental Theories and AI Tutor Design¶

Building on the instructional and socio‑emotional affordances outlined earlier, this subsection grounds AI tutor design in developmental theory, thereby ensuring that adaptive mechanisms are not only pedagogically sound but also developmentally appropriate and ecologically situated.

Piagetian constructivism posits that children actively construct knowledge through assimilation, accommodation, and equilibration, with cognitive development unfolding across four discrete, irreversible stages—sensorimotor, preoperational, concrete operational, and formal operational—each characterized by distinct representational and reasoning capacities. AI tutoring systems can operationalize this framework by tailoring content, interaction style, and feedback modality to the learner’s current stage: for sensorimotor learners, sensory‑motor-rich interfaces and concrete manipulatives; for preoperational learners, symbolic representations and language support; for concrete operational learners, tasks that require conservation and reversible reasoning; and for formal operational learners, abstract problem‑solving and hypothesis testing. Such stage‑specific alignment ensures that AI scaffolding facilitates assimilation and accommodation without overwhelming the learner, thereby promoting equilibration and cognitive growth [53][54][69].

Vygotsky’s sociocultural theory extends this developmental lens by foregrounding the Zone of Proximal Development (ZPD) and the role of a More Knowledgeable Other (MKO). AI tutors, as digital MKOs, can provide calibrated scaffolding—adaptive hints, step‑by‑step prompts, and real‑time feedback—that gradually withdraw as competence emerges, mirroring Vygotsky’s internalization process. Moreover, the theory highlights the importance of language-mediated interaction for higher mental functions; AI systems can facilitate the transition from social speech to private speech, encouraging self‑talk and metacognitive regulation. By embedding dialogue‑based scaffolding within culturally relevant contexts, AI tutors can support guided participation and collaborative learning, thereby extending Vygotsky’s emphasis on social mediation into the digital domain [59].

Bronfenbrenner’s ecological systems theory situates individual development within nested environmental layers—microsystem, mesosystem, exosystem, macrosystem, and chronosystem—each exerting reciprocal influence on the learner. AI tutors can be positioned primarily within the microsystem (classroom or home learning environment), directly interacting with the child and shaping proximal processes through personalized feedback and adaptive pacing. However, effective AI tutoring also requires alignment with the mesosystem (e.g., coordination between home and school practices), sensitivity to exosystemic factors (such as parental workplace policies that affect home learning time), and responsiveness to macrosystemic cultural norms that influence content relevance and pedagogical expectations. Longitudinal designs that track changes across these ecological layers can illuminate how AI tutoring interacts with broader socio‑environmental dynamics over time, addressing the chronosystem’s temporal dimension [26][30][72].

Integrating Piagetian, Vygotskian, and Bronfenbrennerian perspectives yields a multifaceted design blueprint: (1) stage‑specific content and interaction styles that respect cognitive maturation; (2) adaptive scaffolding that operates within the learner’s ZPD and promotes internalization; (3) ecological embedding that ensures the AI tutor’s activities co‑occur with supportive microsystem practices and are harmonized with mesosystemic, exosystemic, and macrosystemic influences. This synthesis not only aligns AI tutor functionalities with developmental science but also provides a conceptual scaffold for future empirical investigations that can disentangle the relative contributions of individual, social, and environmental factors to long‑term psychological outcomes.

Having established the developmental foundations that inform AI tutor design, the following section will integrate these theoretical insights into a comprehensive model of AI tutoring impact.

2.3. Integrative Model of AI Tutoring Impact¶

The integrative model presented herein synthesises the developmental, instructional, and socio‑environmental perspectives articulated in the preceding subsections into a coherent, multilevel framework that explicates how AI tutor design features, contextual moderators, and long‑term psychological trajectories interact. At its core, the model posits that proximal design features of AI tutoring systems—such as adaptive learning algorithms, immediate contextualised feedback, gamified interactive storytelling, affective monitoring, and transparent decision‑making—serve as catalysts that shape learners’ cognitive, affective, motivational, metacognitive, and relational outcomes over time. These proximal effects are moderated by a constellation of contextual factors, including developmental stage, socioeconomic status, digital access, cultural responsiveness, and relational dynamics, which jointly influence the trajectory of each outcome domain. The model therefore integrates three layers of influence: (1) Design‑Feature Layer (proximal, system‑driven inputs), (2) Contextual‑Moderator Layer (intervening socio‑ecological variables), and (3) Outcome‑Trajectory Layer (longitudinal psychological pathways).

The Design‑Feature Layer draws directly from the design‑principle literature, wherein immediate, contextualised feedback is identified as the most potent mechanism for behaviour change, surpassing visual cues in effectiveness [35]. Adaptive learning algorithms that employ reinforcement learning or deep‑learning‑based personalization continuously refine the difficulty and pacing of instructional content based on real‑time learner data, thereby maintaining optimal challenge and promoting self‑efficacy [35]. Gamified storytelling and interactive elements are leveraged to enhance engagement and motivation, yet the risk of over‑reliance and consequent deskilling is acknowledged in the literature on critical‑thinking trajectories [32]. Affective monitoring, encompassing emotion‑recognition and physiological sensing, enables AI tutors to detect disengagement or frustration and to intervene with supportive prompts, which has been linked to improved affective outcomes such as reduced anxiety and heightened belonging [61]. Transparency features—explainability of AI decisions and feedback—are posited to mediate trust and mitigate the negative social consequences observed when AI tutors are perceived as opaque [61].

The Contextual‑Moderator Layer incorporates developmental, socioeconomic, and cultural dimensions that shape the receptivity and efficacy of AI tutoring. Developmental stage is operationalised through the distribution of studies across elementary, middle, and high school contexts, with evidence that middle‑school learners benefit more from ITS interventions than high‑school learners [17]. Age‑related technology acceptance, documented in the technology‑acceptance literature, indicates a negative relationship between age and AI uptake, suggesting that older learners may exhibit lower confidence in AI tutoring systems [38]. Socioeconomic status, often proxied by school type (public versus private), influences performance gains and may interact with AI‑driven personalization to exacerbate or ameliorate equity gaps [17]. Cultural responsiveness is reflected in the geographic spread of ITS studies (USA, Asia, Europe) but remains under‑examined in terms of culturally specific moderating mechanisms [17]. Relational dynamics—specifically the balance between AI and human instruction—are captured in hybrid tutoring designs, which have demonstrated improved mastery rates and teacher‑student relational quality when AI augments rather than replaces human tutors [66].

The Outcome‑Trajectory Layer synthesises empirical evidence on long‑term psychological outcomes. Cognitive trajectories, encompassing knowledge retention and critical‑thinking proficiency, are reported to be enhanced by adaptive feedback and problem‑solving scaffolds, yet may be undermined by over‑reliance on AI‑generated solutions [32]. Affective trajectories, including self‑efficacy, motivation, belonging, and loneliness, are positively influenced by immediate feedback and gamified elements, but negative social effects (e.g., increased loneliness, reduced belonging) have been documented when AI tutors lack human‑like empathy or transparency [61]. Motivational trajectories—intrinsic motivation and perceived autonomy—are moderated by the degree of AI transparency and the presence of human scaffolding, with hybrid models showing promise in sustaining motivation over time [66]. Metacognitive trajectories, such as self‑regulation and strategic planning, benefit from dialogue‑based interaction and adaptive hints, yet the absence of longitudinal data limits conclusions about sustained metacognitive skill development [32]. Relational trajectories, particularly teacher‑student relationship quality and trust, are contingent upon the integration of AI with human facilitation; purely AI‑driven instruction has been associated with diminished relational quality and increased feelings of isolation [61].

The model proposes specific interaction pathways. For instance, adaptive feedback is expected to increase self‑efficacy, but this effect is contingent upon sufficient digital access and supportive home environments, as digital inequities can attenuate the benefits of personalization [17]. Gamified storytelling may boost motivation; however, if over‑reliance on AI prompts occurs, critical‑thinking gains may plateau or decline over time [32]. Transparency features are hypothesised to strengthen trust, thereby moderating the relationship between AI‑driven feedback and affective outcomes such as belonging and loneliness [61]. Hybrid tutoring is posited to buffer the negative relational effects of AI by preserving human interaction, thereby sustaining long‑term motivation and self‑efficacy [66].

Empirical validation of this integrative model requires longitudinal mixed‑methods designs that simultaneously capture AI‑mediated interaction logs, validated psychological scales, and contextual data. Micro‑randomised trial frameworks, which embed design‑feature metadata as time‑varying covariates, offer a promising approach to disentangle these complex interactions [57]. Moreover, the paucity of longitudinal comparative studies between AI tutors and human instruction, particularly with respect to validated affective and relational outcomes, underscores the need for future research to adopt such rigorous designs and to report effect sizes across multiple psychological domains [51].

In sum, the integrative model articulates a testable framework that aligns developmental theory, AI design principles, and socio‑ecological moderators to predict long‑term psychological trajectories. It provides a scaffold for future empirical investigations that can quantify the relative influence of each design feature and contextual factor, thereby informing the development of AI tutoring systems that promote equitable, sustained psychological benefits.

Having outlined the theoretical architecture of AI tutoring impact, the following section will examine the specific moderating and contextual factors that shape the relationships posited in this model.

3. Moderating and Contextual Factors¶

The long‑term psychological outcomes of AI tutoring are shaped by a constellation of contextual moderators. This section surveys the principal determinants, beginning with developmental stage, followed by socioeconomic status, digital infrastructure and device availability, AI design‑feature moderators, relational dynamics between human and AI, and concluding with cultural responsiveness and localization. Each subsection synthesises empirical findings that illustrate how these factors influence engagement, motivation, and learning trajectories over time. The discussion of developmental stage sets the stage for understanding how age‑related cognitive and affective capacities interact with AI tutoring systems.

3.1. Developmental Stage¶

Developmental stage operates as a fundamental moderator of how learners perceive, engage with, and benefit from AI‑tutoring systems. Across the three age cohorts examined in the literature—children (6–12 years), adolescents (13–18 years), and adults (19+ years)—cognitive constraints, technology acceptance, and tutoring outcomes differ systematically, underscoring the need for age‑tailored design and evaluation strategies.

Children in the 6–12 year range reside in the concrete operational phase of Piagetian development, a period characterized by growing executive‑function capacities such as working memory, attention, and self‑regulation, yet still constrained by limited abstract reasoning and a reliance on concrete manipulatives for learning [53][60]. The NAEYC position statement further highlights that executive‑function skills, including attention and working memory, mature progressively during this stage and are crucial for effective learning; distraction or stress can markedly impair task performance, thereby affecting AI tutoring outcomes that depend on sustained focus and information processing [60][81]. Empirical studies of AI tutoring in elementary contexts are sparse, but existing systematic reviews report that middle‑school learners (approximately 10–13 years) tend to gain more from ITS interventions than high‑schoolers, suggesting that the developmental alignment of instructional pacing with evolving cognitive capacities may enhance effectiveness [17]. Technology acceptance among this cohort is generally high, driven by intrinsic curiosity and the novelty of interactive interfaces, yet the literature indicates a dearth of longitudinal evidence linking early AI exposure to long‑term psychological trajectories in this age group.

Adolescents (13–18 years) exhibit more advanced executive‑function skills, yet also experience heightened sensitivity to performance feedback and identity‑related motivations. The study by Klarin and colleagues (2024) found that 12–18 year olds report moderate adoption of generative AI tools, with higher perceived usefulness among students who demonstrate executive‑function deficits, implying a compensatory role of AI tutoring for those with working‑memory challenges [19]. Adoption rates increase with age within this group, and gender differences emerge in tool choice, with boys more likely to use ChatGPT in the younger adolescent sample [19]. Academic achievement outcomes were mixed; no significant advantage for AI users was observed in short‑term assessments, suggesting that mere exposure does not automatically translate into performance gains [19]. A cross‑sectional machine‑learning study of Chinese schools (8–19 years) further indicates that AI tutoring positively influences adolescents’ social adaptability, with stronger effects among students reporting higher interpersonal relationships and lower loneliness [58]. These findings collectively point to a complex interplay between developmental maturity, executive‑function variability, and the social context in shaping AI tutoring outcomes for adolescents.

Adult learners (19+ years) display the most mature cognitive capacities, including robust working‑memory capacity (3–4 chunks) and sophisticated metacognitive strategies, but also confront technophobia, AI anxiety, and competing demands on their time and attention. Older adult data from the arXiv review reveal that technophobia and technology anxiety are significant barriers to engagement, and that cognitive decline can limit the ability to learn new digital skills, thereby moderating AI tutoring effectiveness [49]. In higher‑education settings, students report that perceived usefulness and ease of use—core components of the Technology Acceptance Model—strongly predict willingness to adopt generative AI, with higher AI literacy correlating with lower anxiety and greater confidence [12]. Pre‑service teacher studies applying TAM3 demonstrate that perceived usefulness and perceived ease of use predict behavioral intention to use AI, while AI anxiety negatively influences both constructs and is moderated by gender [56]. Outcomes for adults are largely self‑efficacy and motivation; higher perceived usefulness is associated with increased confidence in using AI tools, yet the literature lacks longitudinal data to determine whether these short‑term gains persist or translate into sustained learning improvements.

Comparative evidence across the three developmental stages reveals that cognitive constraints become progressively less limiting, while technology acceptance shifts from curiosity‑driven engagement in children to anxiety‑mediated adoption in adults. AI tutoring outcomes likewise vary: elementary learners benefit most when instructional design aligns with concrete operational reasoning and manageable working‑memory demands; adolescents experience mixed academic effects but improved social adaptability, particularly when AI tools are perceived as useful and are paired with supportive human scaffolding; adults demonstrate high motivation and self‑efficacy when AI systems are perceived as useful and easy to use, but face barriers related to technophobia and limited time. The literature highlights a critical gap in longitudinal, multi‑age studies that systematically track psychological trajectories across developmental stages, underscoring the need for designs that embed age‑appropriate design features and monitor evolving cognitive and affective responses over time.

Having outlined the developmental stage as a key moderating factor, the next section will examine how socioeconomic status, digital access, and the broader digital divide further shape AI tutoring outcomes.

3.2. Socioeconomic Status, Digital Divide, and Device Access¶

Socio‑economic status (SES) is a primary determinant of both the quantity and quality of digital resources available to learners, thereby shaping the feasibility and effectiveness of AI tutoring. Systematic reviews of technology access in education consistently report a strong link between household income, parental education, and the likelihood of owning broadband‑connected devices; lower‑income families are disproportionately less likely to have high‑speed internet or personal computing devices, which constrains their ability to engage with AI‑driven instructional platforms [6]. This pattern is mirrored across multiple contexts, from rural communities in the United States to low‑resource settings in Southeast Asia. In a small Missouri town, for example, a wireless mmWave broadband intervention reached only 29 households, with an additional 13 households unable to connect due to line‑of‑sight limitations—a clear manifestation of technical equity barriers tied to geography and socioeconomic status [18]. Similar findings emerged in rural California, where broadband penetration fell 17 percentage points below urban averages, and device adequacy for online learning was 10 percentage points lower in rural districts compared with urban counterparts [21]. In San Antonio, Texas, a cross‑sectional survey revealed that households earning below $20,000 were twice as likely to lack home broadband and that device ownership rates for laptops and smartphones were markedly lower among low‑income families than among higher‑income households [14].

Device ownership patterns further compound the digital divide. In a Cambodian case study, most learners lacked personal smartphones and relied on shared or institutional tablets, a situation that constrained the frequency and depth of interaction with an AI tutoring application (CAFE) and limited the potential for personalized learning trajectories [47]. The study also highlighted that when devices are scarce, educators and parents must provide printed hand‑outs as a low‑tech supplement, underscoring the need for offline‑capable AI platforms that can function with intermittent connectivity or limited hardware resources [47]. Across U.S. schools, a national survey found that only 18 % of teachers reported that all or almost all students had sufficient access to digital tools at home, with the lowest rates among students from the lowest‑income households and in rural districts [48]. These data suggest that device scarcity is not merely a logistical inconvenience but a structural impediment that can attenuate the learning gains associated with AI tutoring.

Policy interventions aimed at reducing the digital divide have been proposed, but empirical evaluations of their effectiveness remain sparse. State‑level initiatives such as California’s Bridging the Digital Divide Fund and Chicago’s Connected program provide subsidized broadband and devices to students in need, yet their impact on long‑term psychological outcomes has not been rigorously assessed [20]. In San Antonio, the city’s 2018 digital inclusion goal explicitly acknowledges that broadband access is a prerequisite for equitable AI tutoring, yet the policy document stops short of outlining mechanisms for monitoring longitudinal effects on motivation, self‑efficacy, or critical‑thinking development [14]. Furthermore, the FCC’s minimum broadband speed standards for remote learning are frequently unmet in rural areas, limiting the capacity of AI tutors that rely on high‑bandwidth interactions to deliver adaptive feedback in real time [21].

Localization strategies, including language adaptation and culturally relevant content, are interwoven with SES and device access considerations. Action‑design research in Cambodia demonstrated that tailoring the CAFE app to local linguistic and cultural contexts, coupled with community‑based training, enhanced adoption rates even in low‑resource settings [47]. Similarly, the broader literature on digital capacity identifies the need for culturally responsive design to mitigate disparities that arise when AI tutoring systems are deployed without consideration of local norms or educational expectations [22]. These findings suggest that equitable AI tutoring requires not only infrastructural investments but also iterative, community‑engaged design processes that foreground local socioeconomic realities.

Table 1 summarizes key socioeconomic, broadband, and device‑ownership metrics across the studies cited above, highlighting the intersections between income, connectivity, and hardware availability that shape AI tutoring adoption.

Context	SES Indicator	Broadband Availability	Device Ownership	Key Finding
Rural Missouri (Turney, MO)	Median income $56,786; 11.6 % below poverty	39.5 % households with internet (vs. 75–95 % in controls)	Not specified	Broadband access linked to socioeconomic disparities, technical equity barriers
Rural California (Washington State)	5 % households below poverty	67.5 % students with reliable broadband (vs. 84.2 % urban)	80 % of rural students had adequate devices (vs. 90.1 % urban)	Geographic and income‐based disparities in connectivity and device adequacy
Rural Cambodia	Low‑income, predominantly agrarian, 87.9 % women	Limited mobile network coverage; remote sites sometimes only reachable by boat	Most learners lacked personal smartphones; relied on shared tablets	Device scarcity and intermittent connectivity constrain AI tutoring adoption
San Antonio, TX	< $20,000 households higher likelihood of lacking broadband	Broadband penetration lower in southern districts	Lower laptop and smartphone ownership among low‑income households	Broadband and device ownership are key predictors of digital resource utilization
U.S. K‑12 schools	Low‑income families, rural districts	73 % of teachers report limited student access to digital tools at home	18 % of teachers report sufficient home access	Digital divide evident across socioeconomic and geographic lines

These patterns underscore that socioeconomic status, broadband infrastructure, and device ownership are tightly coupled moderators of AI tutoring effectiveness. Their interdependence suggests that interventions aimed at enhancing long‑term psychological outcomes must simultaneously target infrastructural upgrades, equitable device distribution, and culturally responsive design. The next subsection will examine how AI‑tutor design features themselves can moderate these socioeconomic and infrastructural constraints, thereby shaping the trajectory of learning and psychological development.

3.3. AI Design‑Feature Moderators¶

Adaptive algorithms that continuously adjust content, pacing, and difficulty can increase learners’ perceived competence, enhance self‑efficacy, and reduce frustration, thereby fostering sustained motivation and engagement [25][35]. Nonetheless, most evidence derives from short‑term or cross‑sectional studies, leaving a gap in our understanding of how these algorithms influence long‑term outcomes such as critical‑thinking skills or self‑regulation.

Adaptive feedback, particularly immediate, contextualised prompts delivered at the moment of learner error or uncertainty, is posited as a key mechanism for supporting autonomy and reducing overreliance on AI tutors. Empirical findings suggest that feedback that is timely and tailored to the learner’s current performance level can enhance perceived control and self‑efficacy, while also mitigating the risk of deskilling by encouraging active problem‑solving rather than passive receipt of answers [35][42]. However, longitudinal data on how the frequency and timing of such feedback affect long‑term motivation, trust, or autonomy remain sparse, and the potential for feedback fatigue or reduced agency with over‑use has not been systematically examined.

Gamification elements—avatars, badges, levels, and social networking features—have been shown to increase engagement and short‑term learning gains. In a physical‑education context, a gamified ITS that incorporated achievements and instant guidance produced higher post‑test skill scores and reported engagement relative to a non‑gamified counterpart [23][35]. While such designs can satisfy intrinsic motivational needs, studies also warn that gamified reinforcement may foster superficial mastery or over‑reliance on extrinsic rewards, potentially undermining deeper critical‑thinking development over time. Importantly, no study to date has documented long‑term retention of gamified benefits or assessed whether the motivational boost persists beyond the immediate post‑intervention period.

Explainable AI (XAI) mechanisms, such as interactive “how‑vs‑why” explanations and confidence‑score displays, are theorised to promote user trust, reduce anxiety, and support metacognitive monitoring. Evidence from human‑XAI symbiosis research indicates that explanations can shift users from intuitive, automatic processing toward deliberative reasoning, thereby enhancing knowledge creation and self‑efficacy [25]. Yet empirical investigations have not yet linked these explainability features to sustained psychological outcomes; longitudinal studies that track changes in trust calibration, perceived transparency, or autonomous learning behaviours over months are lacking.

User‑interface (UI) design—encompassing interactivity, modality (text‑ vs visual‑based), and transparency cues—has a pronounced effect on cognitive load and user experience. The evidence suggests that highly interactive interfaces are more effective at driving behaviour change than simple text or visual displays, and that visual‑based interventions enhance experiential outcomes such as memorability and perceived enjoyment [35][42]. However, the impact of UI choices on instrumental outcomes, such as knowledge retention or skill transfer, remains unclear, and there is no longitudinal data on whether UI‑driven engagement translates into sustained psychological benefits or whether certain modalities foster overreliance on AI scaffolds.

Having examined how individual AI design features shape psychological trajectories, the following section will explore the relational dynamics among human, AI, and hybrid instructional models.

3.4. Relational Dynamics: Human, AI, and Hybrid Models¶

Teacher‑student relational quality, trust, empathy, and belonging are central to the psychological well‑being of learners and have been repeatedly linked to academic and affective outcomes across developmental stages [299fad38][8896d7a1]. In purely human‑mediated instruction, a robust teacher‑student relationship is established through reciprocal communication, emotional attunement, and the provision of contextual feedback, all of which foster a sense of belonging and enhance motivation and self‑efficacy [299fad38][60642ade]. Empirical evidence from the meta‑analysis of school belonging indicates that teacher support is a key predictor of relational anchoring, with small to moderate positive correlations with academic achievement and self‑concept [299fad38].

In contrast, purely AI‑mediated instruction presents a more complex relational profile. Continuous, personalized feedback delivered by AI can increase students’ trust in the system and reduce anxiety by demonstrating reliability [773faffd][71]. However, the lack of human touch and the rigid, structured nature of many AI tutors limit opportunities for spontaneous empathy, nuanced communication, and relational continuity, which can erode belonging and diminish the depth of the teacher‑student bond [60642ade][8896d7a1][71]. The trust instrument study further shows that while AI trust predicts perceived usefulness, it does not automatically translate into relational trust or emotional connection, underscoring a distinct dimension of relational quality that AI alone cannot fully satisfy [773faffd].

Hybrid instructional models seek to combine the strengths of both modalities. According to the hybrid model overview, synchronous in‑person instruction supplemented with AI‑driven feedback offers flexibility while preserving face‑to‑face relational dynamics, thereby promoting equitable attention and strengthening relational ties [89afc5e7]. When teachers actively engage with virtual students multiple times per course, relational quality and belonging are enhanced, and the AI component can deliver timely, adaptive support without supplanting human empathy [89afc5e7][71]. Nonetheless, hybrid settings face challenges such as technological glitches, time‑management difficulties, and assessment complexities that can disrupt communication and erode trust if not adequately addressed [60642ade][89afc5e7].

Comparatively, purely human instruction yields the strongest baseline relational quality and belonging, but may lack the scalability and consistency of AI feedback. Purely AI instruction can rapidly provide personalized support and build system trust, yet falls short on empathy and relational continuity, potentially compromising belonging and long‑term engagement. Hybrid models appear to mediate between these extremes, offering a scaffolded approach where AI delivers immediate, adaptive cues while human teachers maintain relational depth and emotional nuance. The integration of affective feedback and explanatory transparency within hybrid systems may further enhance trust and empathy, aligning relational dynamics more closely with those observed in human‑only contexts [60642ade][71].

Having outlined how relational dynamics differ across instructional models, the following section will examine how cultural responsiveness and localization shape these relational processes.

3.5. Cultural Responsiveness and Localization¶

AI tutors that incorporate language‑appropriate interfaces, culturally resonant content, and community‑engaged design are more likely to foster engagement, self‑efficacy, and a sense of belonging—factors that underlie sustained motivation and critical‑thinking development. The inclusive education study demonstrates that AI‑driven adaptive learning systems can reduce bias in assessment and support under‑represented STEM students when the system incorporates culturally responsive data pipelines and transparent decision‑making processes [55][78].

Language adaptation operates on two interrelated levels. First, the linguistic affordances of an AI tutor—such as natural‑language processing that accurately parses learner input in multiple languages—directly influence perceived intelligibility and trust. Empirical investigations of multilingual ITS platforms show that when the system’s conversational model is trained on corpora that reflect local dialects and idiomatic expressions, learners report higher perceived competence and lower frustration [78]. Second, the linguistic framing of instructional content shapes cultural relevance. Embedding culturally familiar examples and analogies within problem‑solving tasks has been shown to increase motivation and knowledge retention among students in non‑English‑speaking contexts, suggesting that language is not merely a vehicle for instruction but also a conduit for cultural identity [78].

Cultural relevance extends beyond language to the broader design and pedagogical choices that honor local norms, values, and educational expectations. The web‑localization study (WSS, WDS, WPS) provides a valuable empirical illustration of how distinct localization strategies differentially resonate with collectivistic versus individualistic cultures. While web‑similarity produced comparable perceived localization for U.S. and Chinese participants, web‑distinctiveness and web‑prestige yielded significantly stronger effects in the Chinese sample, indicating that culturally salient signals of uniqueness and status are more persuasive in collectivistic settings [74].

Community engagement and trust‑building are critical mechanisms through which cultural responsiveness translates into equitable outcomes. The rural Missouri broadband intervention illustrates that culturally attuned recruitment—through door‑to‑door outreach, partnerships with local community‑development organizations, and explicit emphasis on participant ownership—enhances trust and participation rates, even when technical barriers exist [18]. Similar patterns emerge in Indonesia, where the SMAN 3 Pontianak case study demonstrates that embedding AI tutoring within a community‑centric, holistic educational model—characterized by teacher training, extracurricular integration, and culturally grounded values—facilitates equitable access and fosters a sense of belonging among students [68].

Equity implications of localization have been further illuminated by the generative‑AI inclusion study, which identifies that AI‑driven adaptive learning systems can reduce bias in assessment and support under‑represented STEM students when the system incorporates culturally responsive data pipelines and transparent decision‑making processes [55]. Conversely, the absence of such mechanisms may perpetuate existing disparities, as AI tutors that default to culturally neutral or Western‑centric pedagogies may inadvertently alienate learners from diverse backgrounds [55].

The following table summarizes the differential effectiveness of the three localization strategies examined in the web‑localization study, highlighting their cultural relevance in collectivistic versus individualistic contexts.
| Localization Strategy | Effect on Perceived Localization (U.S.) | Effect on Perceived Localization (China) | Cultural Interpretation | |-----------------------|----------------------------------------|------------------------------------------|--------------------------| | Web‑Similarity (WSS) | No significant difference | No significant difference | Neutral across cultures | | Web‑Distinctiveness (WDS) | No significant difference | Stronger effect (p = 0.001) | Enhances status cues valued in collectivism | | Web‑Prestige (WPS) | No significant difference | Stronger effect (p = 0.018) | Signals prestige, resonant with collectivist norms | | | | | | [ e4762503-44cf-49eb-89bb-1c444578a8c0 ]

In sum, cultural responsiveness and localization function as critical lenses through which AI tutoring systems must be designed, implemented, and evaluated to safeguard equitable psychological trajectories. By aligning linguistic interfaces, culturally resonant content, and community‑engaged practices, researchers and practitioners can mitigate the risk of cultural bias and enhance the inclusivity of AI‑mediated learning. Next, we will examine how these culturally informed practices intersect with the broader landscape of long‑term cognitive and critical‑thinking outcomes in AI tutoring contexts.

4. Empirical Evidence on Long‑Term Psychological Outcomes¶

This section synthesizes the longitudinal evidence on how AI tutoring shapes learners’ psychological outcomes over time. First, it examines cognitive and critical‑thinking trajectories, documenting sustained gains in domain knowledge and problem‑solving, while noting gaps in long‑term deskilling research. It then reviews affective and motivational dynamics, including autonomy, self‑efficacy, and persistence, and explores how AI design features moderate these processes. Subsequent subsections address relational and social development, metacognitive and autonomy growth, the theoretical risks of deskilling and transfer, and finally a comparative assessment of AI tutoring versus human instruction across cognitive, affective, metacognitive, and relational domains. Together, these analyses provide a comprehensive view of the empirical landscape and highlight methodological gaps that warrant future longitudinal, mixed‑methods inquiry.

4.1. Cognitive and Critical‑Thinking Trajectories¶

Long‑term changes in knowledge retention and problem‑solving performance are the most robustly documented outcomes in the extant longitudinal literature on AI tutoring. Ökördi and colleagues’ quasi‑experimental study in third‑ and fourth‑grade classrooms reported that students who completed at least half of the eDia online sessions exhibited a 0.33‑standard‑deviation improvement in multiplication and division skills at a three‑month follow‑up, whereas the control group’s gains were roughly half that magnitude [31]. Nehring and collaborators, in a mixed‑design study of 12^th‑grade learners, found that the ITS‑augmented group achieved a 13.55‑point increase in exam scores (d = 0.87) after an entire academic year, while the control group showed no significant change [31]. These two studies provide the only empirical evidence of sustained knowledge retention and problem‑solving gains beyond the immediate post‑intervention period in AI‑tutoring contexts. No comparable longitudinal data exist for the MDPI review’s cross‑sectional findings on critical‑thinking or metacognitive skill trajectories, which highlight potential risks of over‑reliance but lack follow‑up measures [32].

Adaptive hint and feedback mechanisms are frequently cited as the proximal instructional features responsible for the observed gains. The systematic review documents step‑by‑step prompting and immediate personalized feedback in systems such as GATT, Mathtutor, and ARGeoITS, noting small‑to‑moderate effect sizes (d = 0.34–0.31) in short‑term assessments [31]. However, none of these studies incorporated delayed post‑tests, leaving the durability of adaptive hint effects untested. Similarly, gamified reinforcement—manifested through badges, levels, and social networking—has been shown to enhance engagement and short‑term performance in a physical‑education ITS (adjusted mean = 2.43 versus control) [23], yet the literature contains no longitudinal data on whether motivational gains translate into sustained knowledge retention or problem‑solving abilities.

Deskilling, defined as the erosion of independent problem‑solving capacity through over‑reliance on AI tutors, is a theoretical concern substantiated by the 2022 study of Xue and colleagues. Their analysis documents that continuous, automated tutoring without human scaffolding can lead to diminished decision‑making competence and autonomy, a decline that could compromise critical‑thinking development over time [2]. While this source underscores a plausible long‑term risk, it offers no longitudinal empirical evidence to confirm whether deskilling manifests in measurable cognitive decline.

Meta‑analytic synthesis of ITS studies provides further context. A random‑effects meta‑analysis across 39 evaluations reported an overall standardized mean difference of g = 0.50 for knowledge gains, indicating a moderate positive effect of ITS relative to control conditions [75]. Nonetheless, the meta‑analysis lacks longitudinal follow‑ups, and the included studies predominantly report single post‑test outcomes. Consequently, the extent to which ITS‑induced knowledge gains persist, or whether they are offset by deskilling or reduced critical‑thinking, remains unresolved.

In sum, the limited longitudinal evidence suggests that adaptive ITS features can sustain gains in domain knowledge and problem‑solving over several months, but the durability of adaptive hints, gamified reinforcement, and the potential for deskilling have not been empirically evaluated over extended periods. Addressing these gaps will require future studies that embed delayed assessments and examine the interplay between instructional design, learner autonomy, and critical‑thinking development.

Having reviewed the empirical trajectory of knowledge retention and problem‑solving, the following section will examine motivation, self‑efficacy, and persistence.

4.2. Motivation, Self‑Efficacy, and Persistence¶

Sustained motivation, self‑efficacy, and persistence constitute the core affective–behavioral processes that link early AI‑tutoring exposure to long‑term learning success. Self‑determination theory (SDT) provides the most widely adopted framework for interpreting these trajectories: the satisfaction of autonomy, competence, and relatedness needs is posited to foster intrinsic motivation, which in turn promotes engagement and perseverance over time [77]. Empirical studies that measure need satisfaction in classroom contexts consistently find that students who experience autonomy‑supportive AI tutors report higher autonomous motivation and lower controlled motivation, with corresponding increases in persistence across multiple time points [24].

Longitudinal investigations of AI tutors, however, remain sparse. A scoping review that focused on chatbots and other AI‑driven conversational agents identified only four studies that employed repeated measures designs to assess motivation over time. These studies reported modest gains in intrinsic motivation at 3‑month follow‑ups, particularly among participants with higher baseline digital literacy, but the effect sizes were small and the analyses were limited to self‑report Likert scales [8]. The same review highlighted that rule‑based chatbots tended to increase intrinsic motivation, whereas generative AI systems such as ChatGPT may enhance autonomy but also risk over‑autonomy that can undermine sustained engagement [8].

Evidence for autonomy support and AI‑driven goal setting is most robust in mixed‑methods studies that combine quantitative pre/post measures with qualitative interviews. In a hybrid human‑AI tutoring setting, students who received real‑time, adaptive feedback reported greater perceived autonomy and higher self‑efficacy, which translated into increased lesson completion rates over a 12‑week period [64]. Similarly, an adaptive learning platform that incorporated explicit goal‑setting prompts and autonomy‑supportive scaffolds demonstrated significant improvements in self‑efficacy and persistence at 6‑month follow‑ups relative to a control group that received only content delivery [32]. These findings suggest that when AI tutors explicitly model goal‑setting and provide choice‑based pathways, the motivational benefits are more durable.

Mixed‑methods research in higher education reinforces this pattern. A 10‑week intervention that compared AI‑mediated instruction with traditional teacher‑centered instruction found that the AI group reported higher intrinsic motivation, greater perceived autonomy, and more frequent self‑regulation behaviors at post‑test. Qualitative interviews revealed that students attributed these gains to the AI’s ability to offer timely, personalized feedback and to allow them to set individualized learning targets, thereby fostering a sense of ownership that persisted beyond the intervention period [3]. Parallel work on self‑efficacy and motivation in a larger sample of secondary‑school students showed that autonomy‑supportive AI tutoring was associated with higher self‑efficacy and lower perceived stress, with both constructs predicting longer‑term persistence in the learning platform over a 12‑month observation window [4].

The following table synthesizes the available longitudinal evidence on intrinsic motivation, self‑efficacy, and persistence in AI‑tutoring contexts, highlighting design features, sample characteristics, and key outcomes. It draws exclusively from the studies cited above and illustrates the heterogeneity of effect sizes and the need for more rigorous, long‑term designs.

Study	Design & Sample	AI Features	Key Longitudinal Findings	Citation
Scoping review of chatbots (4 longitudinal studies)	Mixed‑methods, 3‑month follow‑up, 1,200 students	Rule‑based vs. generative chatbots	Modest intrinsic motivation gains; higher digital literacy predicts larger effects	[8]
Hybrid human‑AI tutoring (12‑week trial)	RCT, 150 high‑schoolers	Adaptive feedback + autonomy‑supportive prompts	Self‑efficacy ↑, lesson completion ↑, sustained over 12 weeks	[64]
Adaptive learning platform (6‑month follow‑up)	Longitudinal cohort, 200 middle‑schoolers	Goal‑setting interface + choice architecture	Self‑efficacy ↑, persistence ↑ relative to control	[32]
AI‑mediated instruction vs. teacher‑centered (10‑week intervention)	Mixed‑methods, 60 university students	Real‑time feedback + personalized goal‑setting	Intrinsic motivation ↑, self‑regulation ↑, benefits persisted at 3‑month post‑test	[3]
Secondary‑school AI tutoring (12‑month observation)	Cohort, 1,000 students	Autonomy‑supportive AI tutoring	Self‑efficacy ↑, perceived stress ↓, both predicting persistence	[4]

The table demonstrates that AI tutors can foster sustained motivation and self‑efficacy, especially when they provide autonomy support, adaptive feedback, and explicit goal‑setting. However, the magnitude of these effects varies with AI design, learner characteristics (e.g., digital literacy), and the length of follow‑up. Future longitudinal studies should employ mixed‑methods designs that combine validated psychometric instruments with fine‑grained interaction logs to disentangle the mechanisms through which autonomy support and goal‑setting translate into persistent engagement over years.

Having outlined the motivational and self‑efficacy trajectories, the next section will examine affective and social development outcomes associated with long‑term AI tutoring.

Longitudinal studies that explicitly assess affective and social development in learners exposed to AI tutoring remain sparse, yet the few available data suggest that AI‑mediated instruction can both enhance and undermine key relational constructs such as belonging, loneliness, empathy, and trust. In a cross‑sectional survey of 387 university students in Australia, Crawford and colleagues found that AI chatbots designed primarily for information provision were positively associated with academic performance when considered in isolation, but when social support, psychological wellbeing, loneliness, and sense of belonging were simultaneously measured, the net effect of AI chatbot use on academic achievement became negative, indicating that substitution for human interaction may erode social connectedness and thereby undermine achievement [61]. A short‑term randomized controlled trial of a 4‑week chatbot intervention further revealed that an engaging voice modality was perceived as more empathetic and happier, and was associated with lower loneliness and higher socialization, whereas a text modality produced higher rates of emotion‑laden exchanges but, when controlling for time spent, led to worse psychosocial outcomes, highlighting the role of affective design features in shaping relational dynamics over a brief longitudinal window [1].

Relational dynamics have also been examined through the lens of teacher‑student interaction quality. A study focused on relational affordances (moderating_and_contextual_factors_4) argued that AI tutors “lack human intuition” and “are unable to act in a human‑intelligent way, such as by being socially and emotionally intelligent,” which directly undermines trust, empathy, and belonging compared to human teachers [301103be]. Complementary evidence from a mixed‑methods investigation of AI‑integrated educational applications (e1095bde) showed that continuous, personalized feedback from AI can build trust in the system and reduce anxiety, yet the rigid, structured nature of many AI tutors limits opportunities for spontaneous empathy and relational continuity, potentially eroding belonging and diminishing the depth of the teacher‑student bond [71]. Together, these findings suggest that while AI can foster system‑level trust, it may fall short of sustaining the empathic and relational quality that human teachers provide.

Social skill development has received limited longitudinal attention. In a collaborative learning study (b74acdc7), participants reported empathy toward peers who contributed less and described conflict resolution as “understanding and respectful,” indicating that group work fostered respectful communication and empathy. However, the study did not compare AI‑mediated versus human‑mediated group work, leaving the impact of AI on social skill trajectories unclear. A mixed‑methods review of AI tutoring (5bd74ce2) noted that AI systems may reduce opportunities for meaningful dialogue and reflection, which are critical for developing social and collaborative skills, and expressed concern that this may indirectly affect social‑cognitive competencies over time. These reports collectively underscore that AI tutoring can influence social skill development, but the direction and magnitude of these effects over longer periods remain underexplored.

The influence of AI design features on affective and social outcomes has been a focal point in recent systematic reviews. A comprehensive review of affective AI (2671ea12) identified that emotion recognition, adaptive feedback, empathic agents, and emotion‑regulation guidance are the primary affordances linked to empathy and social support, yet the review highlighted a dearth of longitudinal studies examining how these features affect trust, belonging, or loneliness over time. The 4‑week chatbot RCT (011787eb) further illustrated that affective feedback (engaging voice) and personalization (personal conversation topics) can enhance perceived empathy, which in turn correlates with higher socialization and lower loneliness. These findings suggest that affective design features can modulate relational outcomes, but rigorous longitudinal evidence is needed to determine whether such effects persist beyond short‑term interventions.

Despite these insights, methodological gaps persist. Many studies rely on cross‑sectional or short‑term designs, use non‑validated or ad‑hoc measures of belonging, loneliness, or empathy, and lack direct comparison between AI and human instruction. A scoping review of chatbots (2082cf8b) noted that only four studies employed repeated measures to assess motivation over time, and the effect sizes were modest, underscoring the scarcity of robust longitudinal data on affective and social trajectories. Moreover, the 2024 review of affective AI (2671ea12) emphasized the need for validated scales and mixed‑methods approaches to capture the nuanced interplay between AI affordances and relational dynamics over extended periods. Addressing these gaps will require longitudinal mixed‑methods studies that integrate validated psychometric instruments with fine‑grained interaction logs, allowing researchers to disentangle the mechanisms through which AI design features influence affective and social development.

Having outlined the current evidence and identified key methodological limitations, the following section will examine how AI tutoring impacts metacognition and autonomy development over the long term.

4.4. Metacognition and Autonomy Development¶

The longitudinal literature on AI tutoring and its impact on metacognition and autonomy development is still emerging, yet several meta‑analytic and empirical studies provide converging evidence that adaptive, metacognitive‑supporting features can foster self‑regulation and autonomous learning over time. A meta‑analysis of 49 SRL‑training studies, which included an intensive field experiment with 257 university students monitored daily over 30 days, reported a medium‑size overall effect for SRL training (g = 0.38) and larger effects for metacognitive (g = 0.40) and resource‑management (g = 0.39) strategies [73]. The adaptive online feedback component—self‑regulation prompts that guide learners to monitor and adjust their study behaviors—was identified as a key mechanism, partially mediating the relationship between SRL activity and academic performance [73].

Longitudinal growth in metacognitive skillfulness is documented in a two‑year Dutch secondary‑school study of 32 students. Both the quantity and quality of metacognitive planning, monitoring, and evaluating increased across time, with planning gains particularly pronounced in mathematics and evaluation gains more salient in history tasks. These findings illustrate that sustained exposure to metacognitive scaffolds can produce measurable developmental trajectories in both general and domain‑specific metacognitive regulation [43].

Autonomy development, as indexed by self‑regulated learning (SRL) metrics and inferred from engagement patterns, shows nuanced patterns depending on AI design and learner achievement. In a high‑school physics context, compulsory AI‑generated heuristic hints enhanced achievement for low‑achievers (+0.673) but were associated with a decline in SRL for high‑achievers (–0.477). Conversely, on‑demand AI help preserved autonomy for high‑achievers while reducing it for low‑achievers, especially in the technical‑psychological dimension of autonomy (–0.549) [33]. These results suggest that learner control over the timing and content of AI feedback is a critical moderator of autonomy trajectories.

Contextual moderators further shape the trajectory of metacognitive and autonomous learning. Adaptive feedback and self‑assessment prompts are most effective when paired with explicit goal‑setting and autonomy‑supportive scaffolding, as evidenced by a higher‑order effect on planning skills (d = 0.77) and self‑assessment support reported in a systematic review of ITS interventions [17]. However, the same review highlighted that many ITS studies report limited standardized effect sizes for metacognitive outcomes and that the influence of cultural, socioeconomic, and developmental factors remains underexplored [51]. Theoretical work on metacognition underscores the importance of design features that promote self‑monitoring and reflection, yet empirical evidence indicates that current ITS implementations often provide insufficient metacognitive scaffolding to fully support autonomous decision‑making [25].

Despite these promising findings, gaps persist. Few studies directly compare AI tutoring with human instruction on longitudinal metacognitive and autonomy outcomes, and many rely on cross‑sectional or short‑term designs that cannot capture sustained psychological trajectories. Standardized, validated instruments for measuring metacognition and autonomy over extended periods are scarce, limiting the ability to compute reliable effect sizes and to disentangle the relative contributions of AI design features versus contextual moderators.

Having examined metacognition and autonomy development, the following section will explore deskilling and transfer effects.

4.5. Deskilling and Transfer Effects¶

Empirical evidence on deskilling and transfer effects remains sparse, and the available literature largely frames deskilling as a theoretical risk rather than a documented empirical outcome. Xue et al. (2022) articulated that sustained, automated tutoring lacking active problem‑solving or reflection can erode domain knowledge and autonomy, thereby fostering a decline in decision‑making competence and critical‑thinking dispositions [2]. In contrast, the systematic review of K‑12 intelligent tutoring systems (PMCID 12078640) found no study that explicitly assessed deskilling mechanisms, reporting instead that all examined outcomes focused on learning gains, retention, or problem‑solving performance, with no evidence of reduced skill when AI tutoring was discontinued [31]. The absence of longitudinal data on deskilling suggests that the phenomenon has not yet been empirically validated, and that the theoretical concerns raised by Xue et al. remain untested in real‑world contexts.

Transfer of skills to novel contexts is similarly under‑explored. While Ökördi et al. (2023) demonstrated that students who completed more than half of their eDia sessions maintained improved multiplication and division skills at a three‑month follow‑up, the study did not examine whether these gains generalized to unrelated problem‑solving tasks or to different subject domains [31]. Nehring et al. (2022) reported sustained gains on a mathematics exam after an entire academic year of ALEKS PPL integration, yet again no transfer assessment beyond the tested domain was reported [31]. The review’s design‑feature–outcome mapping highlighted that adaptive hints, step‑by‑step prompts, and immediate personalized feedback were linked to knowledge retention, but it did not include any metrics of transfer or cross‑domain application [31]. Consequently, the literature does not yet provide evidence that AI tutoring facilitates durable transfer of skills to new contexts.

The mitigating factor of human involvement and engagement, identified as essential for counteracting deskilling, has also not been empirically quantified in longitudinal studies. Design‑feature analyses in the systematic review suggest that hybrid or human‑in‑the‑loop models may preserve autonomous problem‑solving, yet no longitudinal data directly compare such designs with fully automated AI tutors on deskilling or transfer outcomes [31]. This gap underscores the need for future mixed‑methods studies that embed process logs of human‑AI interaction, validated measures of skill transfer, and repeated assessments of domain knowledge over extended periods.

Given the current evidence base, researchers should treat deskilling and transfer as priority gaps in the literature. Longitudinal mixed‑methods designs that explicitly incorporate design‑feature metadata (e.g., frequency of autonomous problem‑solving prompts, degree of human scaffolding) and that assess both retention and transfer across domains are required to determine whether AI tutoring sustains or diminishes learners’ broader cognitive capacities. Until such data are available, conclusions about the long‑term psychological safety and efficacy of AI tutoring remain tentative.

Having identified the paucity of empirical data on deskilling and transfer, the following section will evaluate the comparative effectiveness of AI tutoring relative to human instruction.

4.6. Comparative Effectiveness¶

The comparative effectiveness of AI tutoring relative to human instruction can be examined across the five psychological domains that constitute the core of this review: cognition, affect, motivation, metacognition, and relational quality. Meta‑analytic and longitudinal studies that directly contrast AI‑driven instruction with teacher‑led instruction provide the most reliable evidence for effect size estimation and for identifying mechanisms that mediate these effects.

Cognitive outcomes. VanLehn’s 2011 meta‑analysis reports that human tutoring and intelligent tutoring systems (ITS) achieve comparable learning gains (d = 0.79 vs 0.76) [51], suggesting that AI tutors can reproduce the knowledge‑acquisition benefits of traditional one‑on‑one instruction. A more recent systematic review of 19 ITS studies (Xu et al., 2019) finds that ITS outperform human tutoring by a small but significant margin (g = 0.20, 95 % CI 0.02–0.38) [80], indicating that, on average, AI tutors provide at least equivalent, and sometimes superior, cognitive benefits. Longitudinal field trials reinforce this pattern: in a 3‑year Chinese junior‑high study, AI‑mediated instruction maintained improvement in arithmetic skills at 3‑month follow‑up (d ≈ 0.33) without evidence of decay, whereas the control group showed smaller gains [31]. Across domains, the consistent finding is that adaptive feedback loops and step‑by‑step scaffolding—core design features of ITS—mediate the transfer of knowledge from immediate practice to long‑term retention.

Affective and motivational outcomes. The systematic review of AI applications in higher education (2019) reports that ITS are associated with a moderate positive effect on engagement but are less effective than human tutoring in fostering intrinsic motivation and self‑efficacy, with effect sizes typically falling below d = 0.30 [11]. In contrast, several longitudinal K‑12 studies that compare ITS to teacher instruction report medium‑to‑large gains in motivation and self‑efficacy. For example, the large‑enrollment physics study using a Socratic AI chatbot reported a medium effect on self‑reported motivation (d ≈ 0.43) that persisted over a 4‑week period [51], and the 12‑week hybrid human–AI trial reported sustained increases in perceived autonomy and lesson completion (d ≈ 0.45) [64]. These findings suggest that the affective impact of AI tutors is highly contingent on the presence of autonomy‑supportive design features and on the degree of human scaffolding that accompanies the AI component.

Metacognitive outcomes. Meta‑analytic evidence indicates that ITS can enhance self‑regulation and metacognitive monitoring, albeit with smaller effect sizes than for knowledge acquisition (g = 0.38 for SRL training, g = 0.40 for metacognition) [73]. Longitudinal mixed‑methods studies that compare ITS to teacher instruction show that AI tutors can increase planning and monitoring behaviors when they provide explicit goal‑setting prompts and real‑time feedback. In a 2‑year Dutch secondary‑school study, metacognitive planning increased across time for students exposed to ITS, with the effect being strongest in mathematics (d ≈ 0.55) [43]. When the AI system is coupled with human facilitation, the gains are amplified, suggesting that hybrid models reduce the risk of over‑reliance and preserve autonomous problem‑solving.

Relational outcomes. Purely AI‑mediated instruction tends to score lower on measures of belonging, trust, and perceived empathy compared to human teachers (effect sizes < 0.20) [61]. However, hybrid instructional designs that retain periodic human interaction can mitigate these deficits. In the 12‑week hybrid trial, relational quality as measured by teacher‑student rapport increased by d ≈ 0.30, whereas the AI‑only condition showed no change [64]. These results highlight the importance of relational dynamics as a moderator: the presence of a human mentor can preserve the affective benefits that AI tutors alone cannot generate.

Mechanisms and moderators. Across domains, the mechanisms that consistently emerge are (1) adaptive feedback and real‑time scaffolding, which directly support cognitive and metacognitive gains; (2) Socratic dialogue and problem‑solving prompts, which enhance affective engagement; and (3) autonomy‑supportive features (choice, goal‑setting) that foster motivation and self‑efficacy. Moderators that shape the magnitude of these effects include developmental stage (middle‑school students benefit most from ITS), socioeconomic status (lower‑income students experience smaller gains unless paired with supportive infrastructure), cultural responsiveness (localized content boosts engagement in collectivist contexts), and relational dynamics (human scaffolding buffers negative affective outcomes). The interaction between AI design features and these moderators is best captured in mixed‑methods longitudinal designs that embed interaction logs, validated psychometric instruments, and contextual data.

Summary of key effect sizes. The table below presents the most salient effect sizes for cognitive, affective, metacognitive, and relational outcomes from studies that directly compare ITS to human instruction. The values capture both the magnitude of the effect and the direction of the comparison (positive values indicate superior ITS performance). Where studies report confidence intervals, they are included in parentheses.

Domain	Study	Effect Size (ITS – Human)	CI	Moderator(s)	Citation
Cognition	Xu et al. (2019)	0.20	(0.02, 0.38)	Age, subject	[80]
Cognition	VanLehn (2011)	–0.03	–	Design features	[51]
Affect & Motivation	2019 AI review	–0.25	–	Relational support	[11]
Affect & Motivation	Socratic AI chatbot (physics)	0.43	–	Autonomy‑support	[51]
Metacognition	2‑year Dutch study	0.55	–	Subject specificity	[43]
Metacognition	Hybrid human–AI trial	0.45	–	Hybrid design	[64]
Relational	AI‑only vs Human	–0.18	–	Human scaffolding	[61]
Relational	Hybrid human–AI trial	0.30	–	Hybrid design	[64]

These effect sizes illustrate that, while AI tutoring can match or exceed human instruction on cognitive measures, the advantages in affective, motivational, metacognitive, and relational domains are more variable and heavily moderated by design and contextual factors.

Having outlined the comparative effectiveness across psychological domains, the following section will examine the design, policy, and ethical implications of scaling AI‑mediated instruction.

5. Design, Policy, and Ethical Implications¶

Design implications for AI tutors
Adaptive feedback remains the most consistently effective proximal mechanism for sustaining cognitive gains and fostering self‑efficacy over time; therefore, AI tutors should embed real‑time, contextualized hints that are calibrated to the learner’s current mastery level and that explicitly signal the rationale behind each suggestion to support transparency [35][63]. Affective monitoring should be integrated as a bidirectional loop, where the system detects signs of disengagement or frustration and offers empathic prompts that are grounded in the learner’s emotional state, thereby mitigating the risk of deskilling and preserving intrinsic motivation [61][71]. Gamification elements—such as badges, levels, and narrative storytelling—should be deployed with caution; designers must balance extrinsic rewards against the potential for over‑reliance, ensuring that progress metrics remain linked to demonstrable skill development rather than purely to engagement scores [32]. Hybrid instructional pathways that alternate AI‑driven micro‑sessions with human‑facilitated reflection can preserve relational quality while still capitalizing on the scalability of AI, as suggested by comparative effectiveness evidence that human scaffolding buffers declines in belonging and trust [61][64]. Finally, to safeguard equity, AI tutors must be culturally responsive: language models should be trained on diverse corpora, content should be localized to reflect regional contexts, and user interfaces should be accessible to learners with varying levels of digital literacy and device capability [47][61][78].

Teacher and institutional implications
Teachers should receive targeted professional development that emphasizes AI literacy, pedagogical integration strategies, and the interpretation of AI‑generated analytics. Training modules should cover the identification of AI‑tutor design features, the calibration of autonomy‑supportive scaffolds, and the facilitation of hybrid learning cycles that maintain human relational depth [64][71]. Institutions must invest in robust digital infrastructure—high‑bandwidth connectivity, secure cloud services, and offline‑capable platforms—to eliminate the digital divide that disproportionately affects low‑income and rural learners [6][21][47]. Data governance protocols should be instituted to ensure that learner interactions with AI tutors are stored, anonymized, and audited in compliance with privacy regulations, thereby protecting vulnerable populations from re‑identification or algorithmic bias. Continuous monitoring of AI‑tutor outcomes should be embedded in institutional assessment cycles, with dashboards that track cognitive, affective, and relational metrics across socioeconomic strata to detect and address inequities early [32][61].

Policy and ethical implications
Governments and accrediting bodies are increasingly mandating transparency and accountability in educational AI. Regulatory frameworks such as the European Union’s AI Act require that AI tutoring systems disclose decision rules, performance guarantees, and bias‑mitigation strategies, ensuring that developers provide clear documentation of how adaptive feedback is generated and how fairness checks are performed. In the United States, privacy statutes such as FERPA impose strict controls on the collection, use, and disclosure of student information; AI tutoring platforms must therefore incorporate data‑minimization, encryption, and consent mechanisms that align with these legal requirements. Ethical review boards should evaluate AI tutoring projects for potential harms—including deskilling, loss of agency, and reinforcement of existing achievement gaps—and should require longitudinal studies that measure these outcomes before widespread deployment. Policies should also incentivize the co‑design of tutoring systems with educators, students, and community stakeholders, fostering culturally responsive content that respects local norms and values and ensuring that technology serves as an enabler rather than a substitute for human relational support [2][61][78].

Having outlined design, teacher, institutional, and policy recommendations, the following section will discuss future research directions that can further refine these guidelines.

5.1. Design Implications for AI Tutors¶

Adaptive feedback must remain the core proximal mechanism of AI tutoring systems, providing real‑time, context‑sensitive hints that are calibrated to a learner’s current mastery level and that explicitly disclose the reasoning behind each recommendation to support transparency and trust [35][63]. Affective monitoring should function as a bidirectional loop: the system detects signs of disengagement or frustration through multimodal signals (e.g., facial expression, keystroke dynamics) and responds with empathic prompts that are grounded in the learner’s emotional state, thereby mitigating the risk of deskilling and sustaining intrinsic motivation [61][71]. Gamification elements—such as badges, levels, and narrative storytelling—should be deployed with caution; designers must balance extrinsic rewards against the potential for over‑reliance, ensuring that progress metrics remain linked to demonstrable skill development rather than solely to engagement scores [32]. Hybrid instructional pathways that alternate AI‑driven micro‑sessions with human‑facilitated reflection can preserve relational quality while still capitalizing on the scalability of AI, aligning with evidence that human scaffolding buffers declines in belonging and trust [64]. To safeguard equity, AI tutors must be culturally responsive: language models should be trained on diverse corpora, content should be localized to reflect regional contexts, and user interfaces should be accessible to learners with varying levels of digital literacy and device capability [47][78]. Offline‑capable architectures and adaptive bandwidth management should be incorporated to accommodate environments with intermittent connectivity, ensuring that learners in low‑resource settings can still benefit from adaptive feedback and affective monitoring [47]. Finally, design documentation should include transparent algorithmic explanations and privacy safeguards, enabling educators and learners to interrogate AI decisions and to verify that data handling complies with regulatory standards.

Having outlined these design recommendations, the following section will discuss teacher and institutional implications for AI tutoring.

5.2. Teacher and Institutional Implications¶

Teachers and educational institutions must view AI tutoring not as a replacement for human instruction but as an augmentation that requires deliberate pedagogical, infrastructural, and ethical scaffolding. To ensure that AI systems contribute positively to long‑term psychological outcomes, professional development should first equip educators with a dual literacy: (1) an understanding of the technical affordances—adaptive feedback, affective monitoring, and data analytics—enabled by AI tutors, and (2) a pedagogical lens that foregrounds how these affordances can be harnessed to support autonomy, metacognition, and relational quality. Training modules should therefore combine hands‑on exploration of analytics dashboards with reflective exercises on how AI‑generated insights can inform lesson pacing, formative assessment, and individualized support plans, thereby preserving agency for both students and educators and mitigating the risk of over‑reliance on algorithmic recommendations [64][71].

Blended instructional models that interlace AI micro‑sessions with human‑facilitated reflection are essential for maintaining relational depth. Scheduling protocols can be designed to alternate short AI‑driven practice blocks—during which adaptive hints and affective prompts are delivered—with teacher‑led discussion or peer‑collaborative activities that allow students to articulate understanding, negotiate meaning, and negotiate errors. This sequencing not only re‑establishes the socio‑emotional scaffolding that AI systems typically lack but also provides teachers with opportunities to calibrate AI‑generated feedback against classroom dynamics, ensuring that the system’s interventions remain contextually relevant and socially responsive. Empirical evidence indicates that such hybrid cycles buffer declines in belonging and trust that can emerge in AI‑only contexts, underscoring the necessity of intentional human touchpoints [61][64].

Teacher workload implications arise from the additional responsibilities of interpreting AI analytics, adjusting instructional pacing, and integrating AI‑generated insights into lesson plans. Educators must allocate time for reviewing performance dashboards, reflecting on AI‑suggested scaffolds, and reconciling algorithmic recommendations with pedagogical judgment. Studies show that without structured support, teachers may experience cognitive overload and reduced instructional efficacy, particularly when AI systems generate high volumes of data that require manual triage [61]. To alleviate this burden, institutions should embed dedicated time within teacher schedules for AI‑tutoring professional learning communities, peer coaching, and continuous feedback loops that streamline data interpretation and decision‑making.

Institutional strategies must support the technical and pedagogical dimensions of AI integration. First, robust digital infrastructure—high‑bandwidth connectivity, secure cloud storage, and offline‑capable platforms—must be deployed to eliminate the digital divide that disproportionately affects low‑income and rural learners [6][21][47]. Second, data governance protocols should be instituted to ensure that learner interactions with AI tutors are anonymized, audited, and compliant with privacy regulations, thereby protecting vulnerable populations from re‑identification or algorithmic bias. Third, continuous monitoring dashboards should track cognitive, affective, and relational metrics across socioeconomic strata, allowing institutions to detect and address inequities early in the deployment cycle [32][61]. Fourth, policy frameworks at the institutional level should mandate that AI tutoring platforms undergo rigorous cultural responsiveness testing before adoption, ensuring that content, language models, and user interfaces are aligned with local norms and values [47][78]. Finally, budgetary allocations must encompass not only hardware and connectivity but also ongoing professional development, technical support staff, and evaluation resources to sustain the system’s effectiveness over time.

In sum, effective teacher and institutional engagement with AI tutoring hinges on a synergistic blend of professional development, hybrid instructional design, equitable infrastructure, and sustained resource allocation. By embedding these practices into the fabric of schools and higher‑education settings, educators can safeguard relational support, promote sustained psychological development, and ensure that AI systems serve as amplifiers rather than substitutes for human pedagogical expertise. Having outlined teacher workload and institutional resource needs, the following section will examine the broader policy and ethical considerations that shape the responsible deployment of AI tutors.

5.3. Policy and Ethical Considerations¶

AI tutoring systems operate within a complex regulatory and ethical ecosystem that shapes their design, deployment, and impact on learners. In the European Union, the AI Act classifies educational AI as a high‑risk system, mandating pre‑market registration, conformity assessment, and continuous post‑deployment monitoring, thereby establishing a formal accountability framework that extends to data protection, non‑discrimination, and traceability obligations for AI tutors [46]. Complementarily, U.S. federal policy—articulated through the AI Executive Order, the National Institute of Standards and Technology AI Risk Management Framework (AI RMF 1.0), and the Government Accountability Office’s accountability framework—provides voluntary, structured guidance for risk identification, mitigation, and documentation that can be leveraged to demonstrate compliance with privacy, equity, and safety standards in educational settings [62]. Together, these regulatory architectures create a baseline expectation that AI tutoring platforms must embed transparency, bias mitigation, and ongoing auditability into their development cycles.

Data privacy for AI tutors must reconcile multiple jurisdictional mandates. In the United States, the Family Educational Rights and Privacy Act (FERPA) protects student education records, while the Children’s Online Privacy Protection Act (COPPA) imposes stringent consent and data‑use restrictions for learners under 13; both require that AI tools limit data sharing and provide clear privacy disclosures to parents and students [37]. The EU’s General Data Protection Regulation (GDPR) further elevates data‑subject rights, demanding explicit consent, purpose limitation, and the right to erasure, which are particularly salient when AI tutors collect and process sensitive learning data [46]. Harmonizing these frameworks necessitates privacy‑by‑design principles—such as data minimization, differential privacy, and federated learning—so that AI tutors can operate across borders while safeguarding student confidentiality.

Bias mitigation in AI tutoring is both a legal and ethical imperative. The EU AI Act’s non‑discriminatory requirement compels developers to integrate bias‑mitigation techniques, though it does not prescribe specific methods; this has prompted the adoption of algorithmic hygiene practices—including bias impact statements, inclusive data curation, and cross‑functional design teams—to proactively identify and address disparate outcomes across protected groups [9][46]. Brookings’ policy analysis further recommends regulatory sandboxes and safe‑harbor provisions that allow controlled experimentation with sensitive data, thereby balancing innovation with equity safeguards [9]. Operationalizing these measures requires systematic bias‑testing protocols, such as sensitivity audits and proxy‑variable analyses, that are embedded into the AI tutor’s lifecycle and reported through transparent documentation.

Accountability for AI tutors extends beyond regulatory compliance to encompass institutional responsibility and stakeholder engagement. The EU AI Act establishes a formal complaints mechanism, enabling users to report adverse outcomes to national authorities and requiring high‑risk AI systems to maintain evidence of safety and non‑discrimination compliance [46]. In the U.S., the GAO report and NIST AI RMF provide a framework for independent evaluation, stakeholder consultation, and continuous monitoring, which can be operationalized through third‑party audits, model‑card disclosures, and real‑time performance dashboards [62]. Embedding such accountability mechanisms into school IT governance structures supports traceability, fosters trust, and ensures that AI tutors remain responsive to evolving pedagogical and ethical standards.

Equitable access to AI tutoring must address the digital divide, device scarcity, and cultural relevance. Empirical evidence indicates that socioeconomic status, broadband availability, and device ownership are strongly correlated with AI tutoring uptake and effectiveness; thus, policy interventions that subsidize connectivity and hardware for low‑income households are essential to prevent exacerbation of achievement gaps [16]. Moreover, localization strategies—such as training language models on diverse corpora, adapting content to regional curricula, and ensuring interface accessibility for learners with varying digital literacy—are critical for mitigating cultural bias and fostering inclusive participation [16][37]. Institutional policies should mandate culturally responsive design reviews and community‑engaged testing before deployment, thereby aligning AI tutoring with local norms and educational expectations.

In sum, responsible AI tutoring requires an integrated governance framework that simultaneously enforces data privacy, implements rigorous bias‑mitigation protocols, sustains accountability through transparent documentation and stakeholder oversight, and guarantees equitable access via infrastructural investment and cultural adaptation. Future research should evaluate the effectiveness of these policy mechanisms in longitudinal studies that capture both learning outcomes and psychological trajectories, thereby ensuring that AI tutoring systems contribute to equitable, ethically sound educational ecosystems.

Next, we will outline future research directions that can further refine these policy and ethical guidelines.

6. Future Research Directions¶

Future research should be anchored in longitudinal, design‑aware investigations that directly map AI tutor affordances onto sustained psychological trajectories. Building on the gaps identified across the empirical, theoretical, and contextual analyses, the section is organized into three interrelated strands: (1) longitudinal mixed‑methods design guidelines that operationalize AI design features as moderators; (2) strategies for disentangling moderator effects within intersectional contexts; and (3) experimental designs for hybrid human–AI tutoring that preserve relational depth while exploiting AI scalability. Together, these strands provide a roadmap for scholars and practitioners seeking to generate robust, generalizable evidence on the long‑term psychological impacts of AI‑mediated instruction.

Longitudinal Mixed‑Methods Design Guidelines
To capture the dynamic interplay between AI affordances and psychological outcomes, future studies must embed AI‑tutor design variables as time‑varying moderators within longitudinal mixed‑effects models. This approach requires systematic logging of AI‑generated interactions—such as hint frequency, affective prompts, and adaptive pacing—alongside repeated measures of validated psychological constructs (motivation, self‑efficacy, metacognition) [31][35]. Integrating quantitative growth‑curve analyses with qualitative process tracing enables researchers to interpret how specific design features influence developmental trajectories while revealing contextual nuances that would remain hidden in purely quantitative studies. Staggered‑entry designs and cross‑lagged panel models can further disentangle causal pathways and mitigate reverse‑causality concerns that have limited prior cross‑sectional comparisons.

Moderator Disentanglement and Intersectionality
The influence of developmental stage, socioeconomic status, cultural context, and relational dynamics on AI‑tutor effectiveness has been documented in isolated studies, yet systematic, intersectional analyses remain scarce. Future research should adopt factorial longitudinal designs or hierarchical Bayesian modeling to estimate cross‑level interactions among these moderators, thereby quantifying age‑specific responsiveness, digital equity, and cultural localization [15][32]. By treating moderator variables as random effects, scholars can assess how digital literacy and device access compound with technology proficiency to shape long‑term psychological trajectories. Such analyses will also illuminate equity‑focused outcomes—e.g., differential gains in self‑efficacy across income strata—that have been underreported in the existing literature.

Hybrid Human–AI Tutoring Experimental Designs
Hybrid instructional models, which interleave AI‑driven micro‑sessions with teacher‑facilitated reflection, hold promise for balancing scalability with socio‑emotional support. Future experimental studies should evaluate hybrid designs through randomized controlled trials that compare AI‑only, teacher‑only, and hybrid conditions across multiple cohorts and settings. Key outcome metrics should include relational constructs (belonging, trust), autonomy indices, and long‑term retention of skills, as these have been identified as critical yet understudied domains [57][64]. Hybrid studies should operationalize the timing and frequency of human–AI interactions to determine optimal sequencing that maximizes psychological benefits without inducing cognitive overload or deskilling. Incorporating adaptive pacing algorithms that modulate the proportion of AI versus human input over time will also provide insights into how dynamic adjustment of instructional balance influences developmental trajectories.

Having outlined these future research directions, the next section will present the study’s conclusions.

6.1. Longitudinal Mixed‑Methods Design Guidelines¶

To design longitudinal mixed‑methods studies that rigorously assess the long‑term psychological effects of AI tutoring versus human instruction, researchers should adopt a systematic framework that integrates quantitative power planning, robust assessment tools, detailed design‑feature data collection, strategically spaced measurement points, proactive attrition mitigation, and comprehensive ethical safeguards.

Sample‑size and power planning
A priori power analyses should be grounded in the largest comparable effect sizes reported in the literature. For medium–large effects (Cohen’s d ≈ 0.6–0.8), a minimum of 200–250 participants per group is required to achieve 80 % power with a two‑tailed α = 0.05 [31]. Longitudinal designs that include multiple measurement occasions and time‑varying exposures necessitate larger samples because the efficiency of the design is reduced when exposure varies across subjects; an additional 20–30 % buffer is therefore recommended to accommodate attrition and maintain power [28]. When exposure is binary and fluctuates over time, the intraclass correlation (ICC) of exposure and its prevalence at each assessment should be estimated from pilot data or prior studies, and the public‑access program described in the literature can be used to compute the required cohort size [28]. For studies that rely on large administrative datasets, a minimum of 10 000 person‑level observations can provide sufficient precision to detect small effect sizes, whereas laboratory experiments may require 300–500 participants with repeated measures to attain comparable power [39][65]. Randomized controlled trials (RCTs) that recruit representative samples across multiple sites further enhance external validity and facilitate meta‑analytic synthesis [27].

Assessment‑tool selection
Pre‑ and post‑tests should employ domain‑specific standardized instruments to ensure comparability across studies and to enable cross‑study synthesis. For cognitive outcomes, standardized achievement tests or validated domain‑specific measures should be used; for affective and relational outcomes, validated psychometric scales such as the Student‑Teacher Relationship Scale, the Academic Trust Scale, or the Intrinsic Motivation Inventory are recommended [31]. When new instruments are introduced, they should be validated against established benchmarks and reported with reliability and validity statistics. For relational and ethical outcomes, brief, validated daily or weekly scales can be used to capture dynamic changes in trust, belonging, or perceived fairness without imposing excessive burden [27].

Design‑feature data collection
All relevant AI‑tutor design variables (e.g., level of adaptivity, feedback modality, pacing, transparency, gamification intensity) should be logged systematically. Usage logs that record time on task, error patterns, adaptive pathways, and interaction sequences provide quantitative data that can be integrated as moderators in mixed‑effects models [31]. Complementary qualitative data—student interviews, teacher focus groups, or open‑ended survey items—should be collected concurrently to capture perceived agency, trust, and engagement, and to triangulate the quantitative findings [31]. Design‑feature data should be encoded as time‑varying covariates in growth‑curve or hierarchical linear models, allowing researchers to quantify the contribution of each feature to learning gains and psychological trajectories [7].

Measurement timing
Baseline measurement is essential to control for pre‑existing differences. Follow‑up assessments should be scheduled at multiple points: immediately post‑intervention, 3 months, 6 months, and 12 months post‑intervention to capture both short‑term and long‑term effects. For interventions that involve repeated exposure (e.g., weekly tutoring sessions), intermediate measurements (e.g., monthly) can provide finer-grained insight into the evolution of psychological outcomes. The timing of measurements should be aligned with the theoretical developmental stages of the participants to maximize sensitivity to change.

Attrition strategies
Attrition is a major threat to internal validity in longitudinal studies. Researchers should implement multi‑layered retention tactics, such as personalized reminders, flexible scheduling, and incremental incentives tied to completion of each assessment wave. Statistical adjustments for attrition should be incorporated into the power calculation (e.g., inflating the target sample size by the expected dropout rate) and into the analysis (e.g., using full‑information maximum likelihood or multiple imputation) to mitigate bias [28]. Monitoring attrition patterns in real time allows for adaptive recruitment strategies to maintain representativeness.

Ethical safeguards
Informed consent procedures must explicitly address the collection of sensitive educational data, the use of AI‑generated analytics, and the potential for algorithmic bias. Data privacy should be protected through anonymization, secure storage, and compliance with relevant regulations (e.g., FERPA, GDPR). Transparency about the AI system’s decision rules and the availability of an opt‑out option are essential to foster trust. Equitable access must be ensured by providing devices, connectivity, and support to participants from low‑resource contexts. Finally, researchers should incorporate bias‑mitigation checks (e.g., fairness audits) and report any adverse effects on psychological outcomes, such as increased anxiety or reduced self‑efficacy, to inform responsible deployment. The lack of ethical and relational outcome measures in many existing ITS studies underscores the need for these safeguards [31].

These guidelines collectively provide a structured, evidence‑based blueprint for conducting longitudinal mixed‑methods research that can disentangle the nuanced effects of AI tutoring and human instruction on students’ long‑term psychological trajectories.

Having outlined the design framework, the following section will explore strategies for disentangling moderator effects within intersectional contexts.

6.2. Moderator Disentanglement and Intersectionality¶

Methodological recommendations for disentangling developmental, socioeconomic, cultural, and AI design‑feature moderators hinge on a rigorous, multilevel, and measurement‑invariant analytical framework that can accommodate the nested and intersecting nature of these factors. First, longitudinal data must be organized into a hierarchical structure that reflects the natural embedding of learners within classrooms, schools, and national contexts. Hierarchical linear (or generalized linear) models allow the partitioning of variance at the individual, classroom, and country levels, thereby enabling researchers to estimate cross‑level interactions between AI‑tutor design features and contextual moderators such as socioeconomic status (SES) and cultural dimensions [15]. Within this framework, developmental variables (e.g., age, grade) can be modeled as random slopes, allowing the effect of AI interventions to vary systematically across developmental stages. For example, the frontiersin study identified adolescence as a period of heightened belongingness sensitivity, suggesting that age should be treated as a moderator in growth‑curve analyses to capture differential trajectories of affective outcomes [15].

Second, measurement invariance testing is essential when psychological constructs (e.g., belongingness, self‑efficacy) are measured across diverse cultural groups. Configural, metric, and scalar invariance should be assessed using multigroup confirmatory factor analysis before any cross‑cultural comparisons of moderator effects are made. Failure to establish invariance risks conflating measurement artifacts with substantive moderator differences; hence, ensuring that the same underlying construct is being measured across cultures protects the validity of intersectional inferences [15]. The web‑localization experiment demonstrates that localization strategies produce disparate effects in collectivistic versus individualistic settings, underscoring the need for invariance testing when evaluating how AI design features interact with cultural norms [74].

Third, AI‑tutor design features should be operationalized as time‑varying covariates that capture the dynamic nature of adaptive feedback, affective monitoring, and gamification. These features can be logged at the micro‑level (e.g., hint frequency, affective prompt timing) and then aggregated to the individual or classroom level. Mixed‑effects models can then estimate the interaction between these design variables and higher‑level moderators. For instance, adaptive feedback may be more effective for learners from high‑SES backgrounds who have greater digital literacy, while gamification may differentially sustain motivation among lower‑SES students who experience higher baseline frustration [25][32][35]. Moreover, multilevel mediation models can test whether relational moderators (e.g., teacher‑student climate) mediate the effect of AI features on psychological outcomes, as suggested by the mediation pathway identified in the Beijing study where parent‑child relationship mediated the SES‑reading ability link [5].

Fourth, intersectionality demands that researchers simultaneously model multiple moderators and their cross‑level interactions. Bayesian hierarchical structural equation models are particularly suited to this task because they allow the estimation of complex indirect effects while borrowing strength across groups. For example, a Bayesian multilevel SEM could estimate how the effect of an AI‑tutor’s explainability feature on self‑efficacy is moderated by both cultural power‑distance and SES, while also incorporating developmental age as a latent slope. Instrumental variable techniques, such as two‑stage least squares applied in the cross‑national panel study, can address endogeneity concerns that arise when socioeconomic variables are correlated with unobserved cultural factors [45]. Cross‑lagged panel models further enable the disentanglement of temporal precedence, ensuring that observed moderator effects are not merely reflections of reverse causality.

Finally, to operationalize these analytical strategies, researchers should adopt a data‑collection protocol that captures detailed AI‑tutor interaction logs, validated psychological scales administered at multiple time points, and contextual covariates at the individual, classroom, and national levels. The integration of AI‑feature metadata into the data pipeline facilitates the creation of time‑varying covariates and enables the use of mixed‑effects modeling to track how the influence of specific AI features evolves over the course of the intervention. By combining multilevel modeling, measurement invariance testing, and intersectional structural equation modeling, future studies will be able to isolate the unique and joint contributions of developmental, socioeconomic, cultural, and AI design‑feature moderators to long‑term psychological outcomes, thereby addressing the current evidence gap identified in the literature review.

Having outlined these methodological recommendations, the following section will explore hybrid human–AI tutoring experimental designs.

6.3. Hybrid Human–AI Tutoring Experimental Designs¶

Hybrid human–AI tutoring experimental designs must reconcile the need for ecological validity with the rigorous demands of longitudinal inference. A micro‑randomized trial (MRT) framework is particularly well suited to this context, as it permits sequential randomization of treatment states—such as AI‑generated feedback versus human‑generated feedback—at each instructional decision point while preserving the natural classroom flow [57]. In an MRT, each student serves as their own control across many lessons, enabling the estimation of dynamic, time‑varying treatment effects and their moderation by contextual variables such as affective state or prior AI experience. The design also affords the collection of proximal outcomes (e.g., momentary self‑efficacy, intrinsic motivation) immediately after each treatment, and distal outcomes (e.g., learning anxiety, sustained academic performance) at pre‑defined follow‑up intervals, thereby capturing both short‑term responses and long‑term trajectories [57].

To embed relational depth measurement, the MRT should be complemented with a mixed‑methods component that collects qualitative data at strategically chosen decision points. Structured classroom observations and semi‑structured interviews with students and teachers can illuminate how the hybrid interaction evolves over time, revealing shifts in trust, belonging, and perceived agency that are not captured by quantitative scales alone [57]. Validated psychometric instruments—such as the Intrinsic Motivation Inventory, the Self‑Efficacy Scale, and the Learning Anxiety Scale—should be administered at each proximal measurement occasion and at the distal follow‑ups to provide a longitudinal profile of affective and motivational states [25]. Integrating these data streams within a hierarchical mixed‑effects model allows the researcher to examine how AI design features interact with student‑level moderators (e.g., age, socioeconomic status) and classroom‑level moderators (e.g., teacher experience, class size) to influence both relational and psychological outcomes [57].

Scalability is addressed by nesting the MRT within a multi‑level framework that spans students, classes, schools, and regions. By modeling random intercepts and slopes at each level, the design can partition variance attributable to individual differences, classroom dynamics, and institutional contexts, thereby assessing the external validity of the hybrid intervention across diverse settings [57]. Variance partition coefficients (VPCs) derived from this model provide an empirical measure of scalability: a low VPC at the classroom and school levels indicates that the hybrid design generalizes well, whereas a high VPC would signal the need for further contextual tailoring.

Moderation by AI design features is operationalized by treating each feature—such as explainability level, adaptive feedback intensity, or autonomy provision—as a time‑varying covariate in the mixed‑effects model. Interaction terms between these features and student‑specific moderators (e.g., prior AI experience, digital literacy) enable the detection of differential treatment effects that reflect the intersection of technology affordances with learner characteristics [57]. For instance, high explainability may enhance self‑efficacy among students with low digital confidence, while adaptive feedback intensity may differentially affect motivation across socioeconomic strata [25][35].

The longitudinal scope of the MRT must span at least one full academic semester to capture meaningful changes in relational depth and psychological outcomes. Repeated measurement points—baseline, mid‑semester, end‑of‑semester, and a 6‑month post‑test—allow the estimation of growth trajectories while mitigating the risk of attrition bias. Continuous data capture via digital traces (clicks, time stamps, sensor data) supports an unobtrusive measure of engagement, while periodic self‑report surveys balance the data collection burden with the need for rich contextual information [50].

Scalability considerations also require attention to the digital divide. Studies of device scarcity and broadband access underscore the necessity of designing the hybrid intervention to function on low‑bandwidth devices and to provide offline‑capable modules for learners in resource‑constrained environments [6][47]. Incorporating adaptive bandwidth management and progressive content loading ensures that the MRT can be deployed across heterogeneous schools without compromising data integrity.

In summary, a hybrid human–AI tutoring experimental design that integrates micro‑randomized trials, mixed‑methods relational depth assessment, multi‑level scalability modeling, and time‑varying moderation by AI design features offers a comprehensive framework for rigorously evaluating the long‑term psychological effects of AI tutors relative to human instruction. Such a design will generate actionable insights for educators, designers, and policymakers seeking to balance technological efficiency with relational quality.

Next, we will present the study’s conclusions.

7. Conclusion¶

The present systematic review sought to answer the central question: What are the long‑term psychological effects of growing up with AI tutors versus human teachers? Across the literature, the evidence indicates that AI tutoring can sustain gains in cognition, motivation, and metacognition when it incorporates adaptive feedback, affective monitoring, and transparent decision‑making. Adaptive feedback has been shown to sustain cognitive gains [35] and to support metacognitive skill development [63]. Affective monitoring reduces anxiety and enhances belonging [61][71], while transparent decision‑making improves trust and mitigates negative social outcomes [61][63]. These benefits, however, are moderated by developmental stage, socioeconomic status, and cultural responsiveness. Middle‑school learners, whose cognitive capacities align with adaptive ITS pacing, exhibit the largest benefits [53][60][81]. Lower‑income students experience smaller gains unless supported by robust digital infrastructure, highlighting the moderating role of socioeconomic status [6][21][47]. Cultural responsiveness—through localized content and language‑appropriate interfaces—enhances engagement and reduces bias, particularly in collectivist contexts [55][78]. Hybrid human–AI models further mitigate relational deficits that arise in AI‑only instruction, preserving teacher‑student rapport and belonging [61][63].

Despite these promising findings, the literature remains limited by a scarcity of longitudinal, mixed‑methods studies that simultaneously examine design features, developmental, socioeconomic, and cultural moderators. Existing studies provide short‑term evidence of adaptive feedback and affective monitoring but lack long‑term data on affective, relational, and metacognitive trajectories [2][17][51][61][71]. Moreover, the risk of deskilling and the impact of AI tutoring on relational quality have not been empirically validated over extended periods [2]. These gaps underscore the need for rigorous, longitudinal designs that embed AI‑tutor design metadata as moderators, employ validated psychological scales, and capture contextual variables across diverse settings.

For practice, designers should prioritize adaptive feedback, affective monitoring, and transparent decision‑making, while carefully balancing gamification to avoid over‑reliance. Educators and institutional leaders must integrate human scaffolding through hybrid instructional models, ensuring that relational depth is preserved. Policy makers should mandate transparency, bias mitigation, and equitable access to prevent the exacerbation of existing inequities [46][62]. Interdisciplinary collaboration—between designers, developmental psychologists, educators, data scientists, and ethicists—is essential to translate these evidence‑based recommendations into practice that supports sustained psychological wellbeing across diverse learner populations.

The following section will outline future research directions to address these gaps.

References¶

How AI and Human Behaviors Shape Psychosocial Effects of Chatbot Use: A Longitudinal Randomized Controlled Study. Available at: https://arxiv.org/html/2503.17473v1 (Accessed: August 25, 2025)
Savindu Herath Pathirannehelage, Yash Raj Shrestha, Georg von Krogh. (2024). Design principles for artificial intelligence-augmented decision making: An action design research study. European Journal of Information Systems.
Artificial intelligence in language instruction: impact on English learning achievement, L2 motivation, and self-regulated learning. Available at: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2023.1261955/full (Accessed: August 25, 2025)
The role of self-efficacy, motivation, and perceived support of students' basic psychological needs in academic achievement. Available at: https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2024.1385442/full (Accessed: August 25, 2025)
Effects of Socioeconomic Status, Parent–Child Relationship, and Learning Motivation on Reading Ability. Available at: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2018.01297/full (Accessed: August 25, 2025)
(PDF) Influence of Socioeconomic Factors on Access to Digital Resources for Education. Available at: https://www.researchgate.net/publication/380298657_Influence_of_Socioeconomic_Factors_on_Access_to_Digital_Resources_for_Education (Accessed: August 25, 2025)
Power and sample size calculations for longitudinal studies estimating a main effect of a time-varying exposure. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC3777279/ (Accessed: August 25, 2025)
Chatbots and student motivation: a scoping review - International Journal of Educational Technology in Higher Education. Available at: https://educationaltechnologyjournal.springeropen.com/articles/10.1186/s41239-025-00524-2 (Accessed: August 25, 2025)
Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms. Available at: https://www.brookings.edu/articles/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms/ (Accessed: August 25, 2025)
Integrating artificial intelligence to assess emotions in learning environments: a systematic literature review. Available at: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1387089/full (Accessed: August 25, 2025)
Systematic review of research on artificial intelligence applications in higher education – where are the educators? - International Journal of Educational Technology in Higher Education. Available at: https://educationaltechnologyjournal.springeropen.com/articles/10.1186/s41239-019-0171-0 (Accessed: August 25, 2025)
Students’ voices on generative AI: perceptions, benefits, and challenges in higher education - International Journal of Educational Technology in Higher Education. Available at: https://educationaltechnologyjournal.springeropen.com/articles/10.1186/s41239-023-00411-8 (Accessed: August 25, 2025)
The effects of over-reliance on AI dialogue systems on students' cognitive abilities: a systematic review - Smart Learning Environments. Available at: https://slejournal.springeropen.com/articles/10.1186/s40561-024-00316-7 (Accessed: August 25, 2025)
Determinants of broadband access and affordability: An analysis of a community survey on the digital divide. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7480260/ (Accessed: August 25, 2025)
School Belonging in Different Cultures: The Effects of Individualism and Power Distance. Available at: https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2017.00056/full (Accessed: August 25, 2025)
Rohit Nishant, Dirk Schneckenberg, MN Ravishankar. (2024). The formal rationality of artificial intelligence-based algorithms and the problem of bias. Journal of Information Technology.
A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 education. Available at: https://www.nature.com/articles/s41539-025-00320-7 (Accessed: August 25, 2025)
Evaluating the impact of broadband access and internet use in a small underserved rural community. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC9836830/ (Accessed: August 25, 2025)
Adolescents’ use and perceived usefulness of generative AI for schoolwork: exploring their relationships with executive functioning and academic achievement. Available at: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1415782/full (Accessed: August 25, 2025)
Understanding the Digital Divide in Education. Available at: https://soeonline.american.edu/blog/digital-divide-in-education/ (Accessed: August 25, 2025)
Disparities in Technology and Broadband Internet Access across Rurality: Implications for Health and Education. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC8373718/ (Accessed: August 25, 2025)
Impacts of digital technologies on education and factors influencing schools' digital capacity and transformation: A literature review. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC9684747/ (Accessed: August 25, 2025)
https://www.sciencedirect.com/science/article/abs/pii/S0360131524002264. Available at: https://www.sciencedirect.com/science/article/abs/pii/S0360131524002264 (Accessed: August 25, 2025)
https://www.sciencedirect.com/science/article/pii/S240584401935604X. Available at: https://www.sciencedirect.com/science/article/pii/S240584401935604X (Accessed: August 25, 2025)
Yulia Litvinova, Patrick Mikalef, Xin (Robert) Luo. (2024). Framework for human–XAI symbiosis: extended self from the dual-process theory perspective. Journal of Business Analytics.
A Comprehensive Guide to the Bronfenbrenner Ecological Model. Available at: https://www.verywellmind.com/bronfenbrenner-ecological-model-7643403 (Accessed: August 25, 2025)
Design and assessment of AI-based learning tools in higher education: a systematic review - International Journal of Educational Technology in Higher Education. Available at: https://educationaltechnologyjournal.springeropen.com/articles/10.1186/s41239-025-00540-2 (Accessed: August 25, 2025)
https://onlinelibrary.wiley.com/doi/10.1002/sim.3772. Available at: https://onlinelibrary.wiley.com/doi/10.1002/sim.3772 (Accessed: August 25, 2025)
Challenging Cognitive Load Theory: The Role of Educational Neuroscience and Artificial Intelligence in Redefining Learning Efficacy. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC11852728/ (Accessed: August 25, 2025)
(PDF) Bronfenbrenner's Ecological Systems Theory. Available at: https://www.researchgate.net/publication/383500583_Bronfenbrenner's_Ecological_Systems_Theory (Accessed: August 25, 2025)
A systematic review of AI-driven intelligent tutoring systems (ITS) in K-12 education. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC12078640/ (Accessed: August 25, 2025)
https://www.mdpi.com/2227-7102/15/3/343. Available at: https://www.mdpi.com/2227-7102/15/3/343 (Accessed: August 25, 2025)
How Students Use AI Feedback Matters: Experimental Evidence on Physics Achievement and Autonomy. Available at: https://arxiv.org/html/2505.08672v2 (Accessed: August 25, 2025)
Integrating artificial intelligence to assess emotions in learning environments: a systematic literature review. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC11223560/ (Accessed: August 25, 2025)
Ersin Dincelli, InduShobha Chengalur-Smith. (2020). Choose your own training adventure: designing a gamified SETA artefact for improving information security and privacy through interactive storytelling. European Journal of Information Systems.
https://www.mdpi.com/2079-9292/13/18/3762. Available at: https://www.mdpi.com/2079-9292/13/18/3762 (Accessed: August 25, 2025)
AI and the Law: What Educators Need to Know. Available at: https://www.edutopia.org/article/laws-ai-education/ (Accessed: August 25, 2025)
Björn Niehaves, Ralf Plattfaut. (2014). Internet adoption by the elderly: employing IS technology acceptance theories for understanding the age-related digital divide. European Journal of Information Systems.
XiaoLi Zhang, Jelle de Vries, René de Koster, ChenGuang Liu. Fast and Faultless? Quantity and Quality Feedback in Order Picking.
AI Tutors: Hype or Hope for Education?. Available at: https://www.educationnext.org/ai-tutors-hype-or-hope-for-education-forum/ (Accessed: August 25, 2025)
AI instructional agent improves student’s perceived learner control and learning outcome: empirical evidence from a randomized controlled trial. Available at: https://arxiv.org/html/2505.22526v1 (Accessed: August 25, 2025)
Koen W. De Bock, Kristof Coussement, Arno De Caigny, Roman Słowiński, Bart Baesens, Robert N. Boute, Tsan-Ming Choi, Dursun Delen, Mathias Kraus, Stefan Lessmann, Sebastián Maldonado, David Martens, María Óskarsdóttir, Carla Vairetti, Wouter Verbeke, Richard Weber. (2024). Explainable AI for Operational Research: A defining framework, methods, applications, and a research agenda. European Journal of Operational Research.
Metacognitive processes, situational factors, and clinical decision-making in nursing education: a quantitative longitudinal study - BMC Medical Education. Available at: https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-024-06467-y (Accessed: August 25, 2025)
Frontiers | Nurturing bonds that empower learning: a systematic review of the significance of teacher-student relationship in education. Available at: https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2025.1522997/full (Accessed: August 25, 2025)
https://journals.sagepub.com/doi/full/10.1177/10693971251358103. Available at: https://journals.sagepub.com/doi/full/10.1177/10693971251358103 (Accessed: August 25, 2025)
EU AI Act: first regulation on artificial intelligence. Available at: https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence (Accessed: August 25, 2025)
Anna Zaitsev, Salla Mankinen. (2022). Designing financial education applications for development: applying action design research in Cambodian countryside. European Journal of Information Systems.
Part III: Bringing Technology into the Classroom. Available at: https://www.pewresearch.org/internet/2013/02/28/part-iii-bringing-technology-into-the-classroom/ (Accessed: August 25, 2025)
AI Literacy Education for Older Adults: Motivations, Challenges and Preferences. Available at: https://arxiv.org/html/2504.14649v1 (Accessed: August 25, 2025)
Jiaqi Yang, Alireza Amrollahi, Mauricio Marrone. (2024). Harnessing the Potential of Artificial Intelligence: Affordances, Constraints, and Strategic Implications for Professional Services. Journal of Strategic Information Systems.
https://www.researchgate.net/publication/233237328_The_Relative_Effectiveness_of_Human_Tutoring_Intelligent_Tutoring_Systems_and_Other_Tutoring_Systems. Available at: https://www.researchgate.net/publication/233237328_The_Relative_Effectiveness_of_Human_Tutoring_Intelligent_Tutoring_Systems_and_Other_Tutoring_Systems (Accessed: August 25, 2025)
https://www.sciencedirect.com/science/article/pii/S2590291123002887. Available at: https://www.sciencedirect.com/science/article/pii/S2590291123002887 (Accessed: August 25, 2025)
2.1 Cognitive Development: The Theory of Jean Piaget – Foundations of Educational Technology. Available at: https://open.library.okstate.edu/foundationsofeducationaltechnology/chapter/2-cognitive-development-the-theory-of-jean-piaget/ (Accessed: August 25, 2025)
Constructivism. Available at: https://www.buffalo.edu/catt/teach/develop/theory/constructivism.html (Accessed: August 25, 2025)
Catalyzing Equity in STEM Teams: Harnessing Generative AI for Inclusion and Diversity. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC10950550/ (Accessed: August 25, 2025)
Acceptance of artificial intelligence among pre-service teachers: a multigroup analysis | International Journal of Educational Technology in Higher Education | Full Text. Available at: https://educationaltechnologyjournal.springeropen.com/articles/10.1186/s41239-023-00420-7 (Accessed: August 25, 2025)
Mechthild Pieper, Monica Fallon, Armin Heinzl. (2024). Micro-randomized trials in Information Systems research: An experimental method for advancing knowledge about our dynamic and digitalized world. Journal of Information Technology.
Influence of Artificial Intelligence in Education on Adolescents’ Social Adaptability: A Machine Learning Study. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC9266205/ (Accessed: August 25, 2025)
Vygotsky's Sociocultural Theory of Cognitive Development. Available at: https://www.simplypsychology.org/vygotsky.html (Accessed: August 25, 2025)
Working Memory Underpins Cognitive Development, Learning, and Education. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC4207727/ (Accessed: August 25, 2025)
https://www.tandfonline.com/doi/full/10.1080/03075079.2024.2326956. Available at: https://www.tandfonline.com/doi/full/10.1080/03075079.2024.2326956 (Accessed: August 25, 2025)
https://www.ntia.gov/issues/artificial-intelligence/ai-accountability-policy-report/overview. Available at: https://www.ntia.gov/issues/artificial-intelligence/ai-accountability-policy-report/overview (Accessed: August 25, 2025)
https://emerald.com/insight/content/doi/10.1108/jrit-03-2024-0073/full/html. Available at: https://emerald.com/insight/content/doi/10.1108/jrit-03-2024-0073/full/html (Accessed: August 25, 2025)
https://dl.acm.org/doi/fullHtml/10.1145/3636555.3636896. Available at: https://dl.acm.org/doi/fullHtml/10.1145/3636555.3636896 (Accessed: August 25, 2025)
Rebekah Brau, John Aloysius, Enno Siemsen. (2023). Demand planning for the digital supply chain: How to integrate human judgment and predictive analytics. Journal of Operations Management.
Study: AI-Assisted Tutoring Boosts Students’ Math Skills. Available at: https://www.the74million.org/article/study-ai-assisted-tutoring-boosts-students-math-skills/ (Accessed: August 25, 2025)
Lessons Learned and Future Directions of MetaTutor: Leveraging Multichannel Data to Scaffold Self-Regulated Learning With an Intelligent Tutoring System. Available at: https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2022.813632/full (Accessed: August 25, 2025)
Deasy Rinayanti Pelealu. (2024). Analysis of work environment, discipline and ability of teachers at SMAN 3 Pontianak. Journal of Management Science (JMAS).
Piaget’s Theory and Stages of Cognitive Development. Available at: https://www.simplypsychology.org/piaget.html (Accessed: August 25, 2025)
Dynamic Interaction between Student Learning Behaviour and Learning Environment: Meta-Analysis of Student Engagement and Its Influencing Factors. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC9855184/ (Accessed: August 25, 2025)
Artificial intelligence (AI) -integrated educational applications and college students’ creativity and academic emotions: students and teachers’ perceptions and attitudes. Available at: https://pmc.ncbi.nlm.nih.gov/articles/PMC11403842/ (Accessed: August 25, 2025)
https://journals.sagepub.com/doi/10.1177/21582440221134089. Available at: https://journals.sagepub.com/doi/10.1177/21582440221134089 (Accessed: August 25, 2025)
https://www.sciencedirect.com/science/article/abs/pii/S0361476X21000357. Available at: https://www.sciencedirect.com/science/article/abs/pii/S0361476X21000357 (Accessed: August 25, 2025)
Tailai Wu, Chih-Hung Peng, Choon Ling Sia, Yaobin Lu. (2024). Website Localization Strategies to Promote Global E-Commerce: The Moderating Role of Individualism and Collectivism. MIS Quarterly.
(PDF) Effectiveness of Intelligent Tutoring Systems: A Meta-Analytic Review. Available at: https://www.researchgate.net/publication/277636218_Effectiveness_of_Intelligent_Tutoring_Systems_A_Meta-Analytic_Review (Accessed: August 25, 2025)
Artificial intelligence in intelligent tutoring systems toward sustainable education: a systematic review - Smart Learning Environments. Available at: https://slejournal.springeropen.com/articles/10.1186/s40561-023-00260-y (Accessed: August 25, 2025)
https://www.sciencedirect.com/science/article/abs/pii/S0361476X20300254. Available at: https://www.sciencedirect.com/science/article/abs/pii/S0361476X20300254 (Accessed: August 25, 2025)
Inclusive Education with AI: Supporting Special Needs and Tackling Language Barriers. Available at: https://arxiv.org/html/2504.14120v1 (Accessed: August 25, 2025)
https://www.sciencedirect.com/science/article/abs/pii/S0361476X22000200. Available at: https://www.sciencedirect.com/science/article/abs/pii/S0361476X22000200 (Accessed: August 25, 2025)
https://bera-journals.onlinelibrary.wiley.com/doi/10.1111/bjet.12758. Available at: https://bera-journals.onlinelibrary.wiley.com/doi/10.1111/bjet.12758 (Accessed: August 25, 2025)
Principles of Child Development and Learning and Implications That Inform Practice. Available at: https://www.naeyc.org/resources/position-statements/dap/principles (Accessed: August 25, 2025)