Jayanth Ramakrishnan and Parvendan Rangaswamy, Department of Computer Science and Engineering, SRM University-AP, Amaravati, Andhra Pradesh, India
Large Language Models (LLMs) have gotten really good at generating code, and we usually measure thatwith Pass@k. But Pass@k only tells you how often the model gets it right on its best try. It says nothingabout how much the outputs vary or whether they’re structurally consistent across multiple runs. That’s abig problem for real-world use, where you typically call the model just once.So we came up with the Stochastic Stability Index (SSI). It’s a combined metric that looks at three things:whether the code is syntactically valid (Msyn), how similar its structure is to a correct solution using thenormalised Zhang–Shasha Tree Edit Distance on Abstract Syntax Trees (Mast), and whether it actuallyworks when run in a safe sandbox (Mfun). SSI directly penalises variation from one sample to the next,giving you a single reliability score that reflects actual deployment risk, not just best-case performance.We tested SSI on HumanEval and MBPP using three top code generation models—Qwen, DeepSeek-Coder-v2, and Yi-Coder-9b—at three temperatures (T ∈{0.2, 0.5, 0.8}), generating 20 independent samples perproblem (that’s n = 96,840 programs total). What we found is a Creativity–Stability Paradox that shows upacross all models and both benchmarks: Pass@10 goes up as you raise the temperature, but SSI goes downas outputs become more variable. For Qwen on HumanEval, Pass@10 jumps from 86.8% to 94.0%, whileSSI falls from 0.6833 to 0.6563. A one-way ANOVA on structural similarity across every model–benchmarkcombination (in all cases p > 0.05) shows that the drop comes almost entirely from functional inconsistency,not structural divergence. That tells us something important about how to design better stability metrics. Thebottom line: if you really care about production-ready code generation, you need reliability-aware metricsalongside Pass@k.
Large Language Models,Code Generation,Stochastic Stability Index,Abstract Syntax Trees,Pass@k,Inference Temperature,Benchmark Evaluation,Output Variance
Evariste Ntaryamira1, Grace Murhula Kabi2, 1Université Paris 8, Saint-Denis, France, 2Institut Supérieur de Commerce (ISC), Bukavu, République Démocratique du Congo
Educational Management Information Systems in Sub-Saharan Africa collect data from thousands of schools each year to feed the indicators required by Sustainable Development Goal 4 and the Continental Education Strategy for Africa. A recurring problem, however, undermines this process: as data travels up the administrative pyramid, each echelon that handles a file reformats it, reinterprets column names, and silently alters values. By the time a record reaches the national level, it often says more about the chain of people who handled it than about the school that produced it. This paper introduces a multi-layer semantic mediation architecture that addresses the problem at its root. Positioned between school-level data sources and the national server in a Hub-and-Spoke topology, the architecture comprises six formally defined processing layers: ingestion, statistical profiling, adaptive schema matching, coherence validation, incremental dictionary learning, and transactional synchronization. We formalize the mediation process as a structure-preserving functor between schema categories and prove its correctness under composition. Three original algorithms are proposed and analyzed: AdaptiveSchemaMatch, which combines TF-IDF lexical analysis, distributional divergence, and value-set overlap into a weighted similarity measure with gradientbased weight adaptation; IncrementalDictLearn, a reinforcement-inspired dictionary learner with provable convergence; and CoherenceGuard, a multi-tier consistency engine. We further show how cross-lingual sentence embeddings and lightweight autoencoders can augment the matching and anomaly detection capabilities in data-rich settings, while remaining deployable on resourceconstrained devices. The architecture is designed for offline-first, low-connectivity environments and scales to national-level volumes through a formally analyzed batch-parallel processing model. We describe the deployment context within the PAQABU project (UNESCO, Burundi) and the eSchool platform (Democratic Republic of Congo), and discuss the implications for educational data governance across the continent.
Semantic mediation, schema matching, educational data, Hub-and-Spoke, EMIS, incremental learning, data quality, Big Data, deep learning, Sub-Saharan Africa
Kazim Ali Korejo, Ana Maria Martinez-Enriquez, and Faaiza Zulfqar, Computer Science, Cinvestav-IPN, Av Instituto Politecnico Nacional 2508 Zacatenco, 07360, Mexico City, Mexico
Generative adversarial networks have emerged as a significant paradigm in Deep learning for Natural Language Processing, enabling improvements in multilingual learning, semantic modeling, text generation, and multimodal synthesis, to name a few. Our research aims to analyze the underlying learning mechanisms and the architectures used for different applications, such as paraphrase generation, text-toimage synthesis, machine translation, data augmentation, sentiment analysis, and noise reduction. Besides, our study analyzes the integration of Transformer with multilingual adaptation, focusing on limitations such as training instability, collapse mode, unreliable evaluation, and semantic inconsistency. Lastly, we outline opportunities for the way forward involving retrieval-augmented, explainable multimodal semantic alignment, symbolic reasoning, and hybrid large language model frameworks for more reliable, semantically grounded language generation.
GANs, Large Language Models, Multi-Agent, Generative Adversarial Networks, GAN-Variant.
Copyright © NCWC 2026