Wolf Digest — 2026-04-30

#1

Sources: Anthropic could raise a new $50B round at a valuation of $900B

Industry 2026-04-30 TechCrunch — AI 8.7 9.5/8.5/8.5

TechCrunch is reporting, citing sources familiar with the matter, that Anthropic has received multiple pre-emptive offers for a new financing round in the $850 billion to $900 billion valuation range, with a target raise of around $50 billion. If it lands at the upper end, that would put Anthropic within striking distance of OpenAI's most recent valuation marks and would represent a roughly 7×–10× revaluation from where the company sat on its previous primary in 2024. The round is being characterized as pre-emptive — investors approached the company rather than the other way around — and the size puts it in the same bucket as the largest private financings in technology history.

The number is striking against Anthropic's reported revenue trajectory. Public estimates over the last twelve months have placed Anthropic's annualized run rate somewhere in the $4–7 billion range, dominated by Claude API revenue, Claude Code, and a fast-growing enterprise contract book. A $900 billion mark implies multiples that only make sense if buyers are pricing in a continued 3–5× per-year revenue ramp and an eventual stable share of the frontier-LLM market — and if they believe Claude's trajectory in coding and agentic workflows persists into the next training cycle. The same investors are presumably modeling Claude Opus 4.7 (released this month) and a likely Opus 5 generation later this year as the products that bear out those assumptions.

The capital, if raised, points squarely at compute. Anthropic's training and serving spend is the bulk of its cost base, and the partnerships disclosed earlier this month — the expanded $50B+ AWS commitment for Trainium-class capacity, the Google–Broadcom collaboration on TPU access, and the NEC build in Japan — all imply a forward training spend that dwarfs current revenue. A $50 billion equity round materially extends Anthropic's runway against that build-out and meaningfully reduces the chance the company has to take on debt or convert future capacity commitments into dilution.

The signal to the rest of the industry is the harder thing to read. If Anthropic prints at $900B, it cements a market structure in which two private labs (OpenAI, Anthropic) and one or two public-cloud-aligned labs (Google DeepMind, possibly xAI) carry valuations and capital pools an order of magnitude beyond every other frontier player. That reshapes the talent market, the compute-allocation negotiation with TSMC and the hyperscalers, and the willingness of GPU-rich nation states to back national-champion labs as counterweights. Worth tracking whether the round closes near the floated number or compresses on the way to a final term sheet — a $50B round at $700B is a meaningfully different signal than the same round at $900B.

#2

Building the compute infrastructure for the Intelligence Age

Infrastructure 2026-04-29 OpenAI Research 8.2 8.5/8.5/7.0

OpenAI published a research-page essay framing Stargate — its multi-site data-center program with Oracle, SoftBank, and a roster of regional partners — as the operational core of how the company plans to scale toward AGI. The post is light on novel numbers and heavy on positioning: it lays out a mental model in which compute capacity is the binding constraint on capability progress, sets out a plan for adding "multiple gigawatts" of new training and serving capacity over the coming year, and asserts that frontier capability gains and product reach are now coupled to data-center construction in a way that can't be wished away with software-only optimizations.

The substantive content concentrates on three things. First, the post confirms continued buildout of the Abilene, Texas Stargate campus alongside additional U.S. sites in coordination with Oracle, plus the previously announced UAE expansion and new partnerships across Asia. Second, it sketches a power-procurement strategy that mixes long-term PPAs for clean firm power, behind-the-meter natural gas, and direct relationships with utilities to expedite grid interconnect timelines that would otherwise gate construction. Third, it positions Stargate's $500 billion+ committed capex as both training infrastructure and the inference base for hundreds of millions of free-tier and enterprise users — pre-empting the criticism that the largest training runs cannot economically justify their dedicated capacity by reframing the same hardware as serving infrastructure for the consumer footprint.

What's notable about the essay is the rhetorical shift. OpenAI has historically described compute as a means; this post is the clearest articulation yet that, internally, the company treats compute capacity as the objective function to be maximized — and reasoning, agents, and product features as the natural consequences of having the substrate. That mirrors the message public Anthropic communications have been leaning into for the last six months and Google DeepMind's equivalent in Alphabet's earnings calls. It also sets the framing for the simultaneous Anthropic financing reporting, the Microsoft–OpenAI exclusivity unwind, and the Q1 cloud earnings prints from Google and AWS that all landed within the same 48-hour window: the entire frontier conversation has rotated from model architectures to gigawatts and grid interconnect.

The post does not commit to a specific GPU count, training-cluster size, or compute-vs-revenue ratio, and it does not disclose unit economics. Anyone trying to forecast 2026–2027 frontier compute supply will still need to reconstruct from public hyperscaler capex disclosures, Nvidia and Broadcom shipment guidance, and energy-permit filings. But the public framing is now unambiguous: OpenAI sees the next two years' capability frontier as being decided in concrete pours and substation upgrades, not in clever post-training tricks.

#3

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising

Evaluations & Benchmarks 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.RO (Robotics) · arXiv cs.CV (Computer Vision) · arXiv — Generative Media / Diffusion · arXiv — Evals & Benchmarks · Hugging Face Daily Papers 8.0 8.0/6.9/9.0

X-WAM, presented in a new paper that surfaced through arXiv's robotics, vision, and AI feeds along with Hugging Face's Daily Papers, is an attempt to unify two streams of work that have been moving in parallel for the last year: world-model video synthesis and direct robotic action prediction. Prior unified models such as UWM operate in 2D pixel space and consistently force a tradeoff — either you get high-fidelity rollouts that are too slow to act on, or you get fast action decoding from a model that hallucinates physics. X-WAM resolves that tradeoff by predicting multi-view RGB-D videos rather than flat pixels, which gives the model a depth representation that's grounded in geometry, and by adding what the authors call Asynchronous Noise Sampling — a denoising schedule that lets the model emit actions after a small number of diffusion steps while continuing to sharpen the corresponding video for full-resolution scene reconstruction.

The architectural move that makes this work is small but interesting. X-WAM reuses the final blocks of a pretrained video Diffusion Transformer to spawn a depth-prediction branch, rather than training a fresh depth head from scratch. That means the spatial reasoning rides on top of the visual priors already baked into a strong video DiT, and the additional parameter count is modest. ANS, the asynchronous denoising contribution, applies a specialized schedule at inference time so action tokens are decoded under a low-step regime while video tokens run the full denoising chain. Crucially, training samples from the joint distribution of action and video timesteps rather than fully decoupling them, which keeps the inference distribution aligned with the training distribution — an alignment that earlier asymmetric-rollout schemes have struggled with.

The model is pretrained on over 5,800 hours of robotic data and evaluated on RoboCasa and RoboTwin 2.0. Reported success rates are 79.2% on RoboCasa and 90.7% on RoboTwin 2.0, both ahead of the prior unified baselines, while the visual generations match or exceed the dedicated 4D world models the team compares against. The combined result — fast policy decoding plus high-fidelity simulation rolling out of the same network — is the kind of capability that has been nominally promised by the world-model line of work for three years and has consistently failed to materialize at deployable speeds.

The paper landed on arXiv this morning and was almost immediately picked up by Hugging Face's Daily Papers, which placed it in the multi-source bucket for today's run. The right way to read it is alongside Physical Intelligence's recent π-series releases and the broader push from robot-learning groups to converge VLA models, world models, and policy heads into a single trainable object. If X-WAM's results replicate at scale, the gap between robot simulation and robot deployment narrows by another notch — and the unified-model design choice (one network, joint training) becomes harder to argue against.

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.AI, arXiv cs.RO, arXiv cs.CV.
Matched topical feeds: Generative Media / Diffusion, Evals & Benchmarks — wide thematic overlap.
Hugging Face Daily Papers picked it up — community-curated visibility signal.

cs.AI cs.RO cs.CV

#4

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv — Efficiency (Quantization, MoE, Inference) · arXiv — Evals & Benchmarks · Hugging Face Daily Papers 7.9 8.3/6.3/9.0

TIDE is the first systematic framework for cross-architecture distillation of diffusion language models — the kind of distillation where teacher and student differ not only in size but in the underlying generation paradigm itself. Within-architecture distillation for diffusion LLMs has been studied for about a year and the techniques are well understood; what has remained open is whether you can distill a diffusion LLM down to a much smaller dense or MoE student that runs on different attention machinery and even a different tokenizer. TIDE's authors argue this gap is the binding constraint on practical deployment of diffusion LLMs because the existing teacher families are 8B+ dense or 16B MoE, both of which are too large for most edge or low-latency serving budgets.

The framework decomposes into three modular components. TIDAL is the cross-timestep distillation strength controller — it modulates the loss weight as a function of both training progress and the diffusion timestep, on the principle that the teacher's predictive reliability depends on the noise level at which it's queried. CompDemo enriches the teacher's context with complementary mask splits, which gives it a stronger signal under heavy masking conditions where diffusion LLMs tend to degrade. Reverse CALM is the cross-tokenizer alignment objective: it inverts the chunk-level likelihood matching used in earlier work to produce bounded gradients and to filter noise from both ends of the alignment, which the authors describe as the technical move that makes the cross-tokenizer setting tractable at all.

The headline empirical claim is that distilling 8B dense and 16B MoE teachers down to a 0.6B student through two heterogeneous pipelines beats baseline by an average of 1.53 points across eight benchmarks, with the largest gains concentrated in code generation. HumanEval climbs from 32.3 with the autoregressive baseline to 48.78 with the TIDE-distilled diffusion student — a ~50% relative improvement that's hard to dismiss. The result implies that the parallel-decoding and bidirectional-context advantages of diffusion LLMs survive aggressive size compression as long as the distillation is done correctly across the architectural boundary.

This matters for the broader frontier-LLM conversation because it provides a credible path to deploying diffusion LLMs at the scale that autoregressive LLMs already serve at. Diffusion language modeling has been a slow-burn research thread; its inference throughput advantages have been documented but the parameter inefficiency has been real, and the distillation literature has been the gating factor on whether you could get a sub-billion-parameter diffusion model that matters. TIDE's HumanEval result suggests that a small diffusion student can outperform an autoregressive baseline of comparable size, which inverts the prior conventional wisdom about which paradigm dominates at the edge.

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL, arXiv cs.AI.
Matched topical feeds: Efficiency (Quantization, MoE, Inference), Evals & Benchmarks — wide thematic overlap.
Hugging Face Daily Papers picked it up — community-curated visibility signal.

cs.LG cs.CL cs.AI

#5

ClawGym: A Scalable Framework for Building Effective Claw Agents

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv — Reinforcement Learning · arXiv — Evals & Benchmarks · Hugging Face Daily Papers 7.8 8.0/6.3/9.0

ClawGym is a scalable framework for building what the authors call "Claw-style" personal agents — agents that operate over local files, tool calls, and persistent workspace state, the modal computer-use deployment surface. The framework targets the gap that's been frustrating practical claw/agent development since the GPT-4 era: there's no shared infrastructure for synthesizing verifiable training data, integrating it with agent training, and running diagnostic evaluation. Every group rolls its own pipeline and the resulting agents underperform because the data and the training loop are starved of signal.

The contribution comes in three pieces. ClawGym-SynData is a 13.5K-task dataset synthesized through persona-driven intent generation paired with skill-grounded operations, where each task is matched to a realistic mock workspace and verified through a hybrid mechanism that combines automated checks with LLM judges. The persona-and-skill combinatorial generation is the move that gets to coverage at scale — instead of hand-curating tasks, the framework samples from intent distributions plausible for given personas and from operation distributions over the available tool set. ClawGym-Agents is the family of trained models, produced first via supervised fine-tuning on rollouts and then refined with a lightweight reinforcement-learning pipeline that parallelizes rollouts across per-task sandboxes — a system design choice that's necessary to make the RL phase tractable since claw tasks involve real filesystem and tool side effects. ClawGym-Bench is a 200-instance evaluation slate filtered through automated quality checks and human-LLM review, designed to be reliable enough to compare different training recipes.

The 13.5K-task scale puts ClawGym in the same league as the largest open agentic-task corpora released in the last year and substantially ahead of most of them on verifiability — most prior corpora trade verifiability for scale or vice versa. The paper claims competitive performance from the resulting ClawGym-Agents family, though the absolute benchmark numbers are best read alongside the codebase release the authors promise. The framework's open release at github.com/ClawGym, once it lands, will be useful for any group trying to do agentic post-training without building data and evaluation from scratch.

This sits at the intersection of two important threads: the agentic capability push that everyone from Anthropic to OpenAI is pursuing in product, and the open-research push to give the community a trainable substrate that doesn't require frontier-lab data resources to be useful. The cross-source coverage on this paper — three arXiv categorical feeds plus two virtual feeds plus Hugging Face Daily Papers — is the cleanest signal yet that the field is treating agentic-data infrastructure as a first-class research surface rather than a tooling problem.

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL, arXiv cs.AI.
Matched topical feeds: Reinforcement Learning, Evals & Benchmarks — wide thematic overlap.
Hugging Face Daily Papers picked it up — community-curated visibility signal.

cs.LG cs.CL cs.AI

#6

Reiner Pope – The math behind how LLMs are trained and served

Infrastructure 2026-04-29 Dwarkesh Podcast 7.8 7.5/8.0/7.5

Dwarkesh Patel's latest interview is a meaningful departure in format. Rather than the usual conversational long-form, the episode is a blackboard lecture in which Reiner Pope — formerly of Google's TPU compiler and software-efficiency teams, now CEO of MatX — walks through the math of how frontier LLMs are trained and served. The premise Patel opens with is the right one: from a small set of equations, the published API prices, and chalk, you can recover most of what the frontier labs are doing internally. The episode tries to make that derivation legible.

The substantive content runs through the standard scaling-law and inference-economics arithmetic but with a level of mechanical precision that the public discourse usually elides. Pope walks through the parameter, FLOP, and memory-bandwidth budgets that determine whether a given model architecture is training-bound, inference-bound, or memory-bound, and shows how API price points reveal the rough cost-per-token at which a model is being served — which in turn tells you the likely model size, the precision regime, and whether the lab is running a dense or MoE backbone. He extends the same analysis to inference-time techniques like speculative decoding and test-time scaling, deriving the conditions under which they're economically dominant rather than treating them as catalog items. The episode also covers TPU and GPU architectural tradeoffs at a level of detail that is uncommon in public conversation, including the bandwidth and arithmetic-intensity envelopes that constrain modern accelerators.

Pope is now CEO of MatX, the chip startup focused on building accelerators specifically optimized for transformer inference economics rather than general-purpose deep learning. That biographical detail is relevant: the lecture lands within a broader argument for purpose-built silicon, and listeners should weigh the framing accordingly. Pope's prior work on TPU compilers and on the open scaling book he co-authored gives him an unusual credibility on the math even setting aside the MatX framing.

The episode is best treated as a study guide. Patel publishes flashcards and practice problems alongside it — an admission that the material rewards re-listening with a notepad. For practitioners trying to forecast frontier model trajectories from public information, the playbook Pope lays out is one of the most useful summaries available. The release lands at the same time as OpenAI's compute-infrastructure essay and the hyperscaler Q1 prints, all of which together rotate the field's frame from architectures and post-training tricks toward the gigawatt-and-bandwidth substrate. Worth watching the YouTube version specifically — the chalkboard is doing real work.

#7

Delineating Knowledge Boundaries for Honest Large Vision-Language Models

Interpretability 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.CV (Computer Vision) · arXiv — Reinforcement Learning · arXiv — Mechanistic Interpretability · arXiv — Post-training / Alignment 7.5 7.3/7.6/7.5

This paper takes on a long-standing failure mode of vision-language models: they confidently produce factual hallucinations on long-tail or specialized visual content because they have no calibrated mechanism for refusing queries that fall outside their parametric knowledge. The authors propose a framework that delineates the model's knowledge boundaries through targeted probing and aligns the model to refuse appropriately rather than confabulate. The dataset they construct — Visual-Idk, for "Visual I don't know" — is built by multi-sample consistency probing across the model's own outputs, which is the right way to identify where a given model genuinely lacks coverage rather than measuring against an external oracle.

The training recipe is two-stage. Supervised fine-tuning teaches the model the basic refusal pattern on the Visual-Idk dataset; preference-aware optimization (the authors evaluate both DPO and ORPO) then sharpens the boundary so the model refuses on uncertain queries while remaining responsive on queries within its competence. The headline result is a Truthful Rate increase from 57.9% to 67.3% on the Visual-Idk slate — a 9.4-point absolute improvement, which is substantial in a setting where most prior work has produced either marginal gains or required external retrieval scaffolding to get there.

What makes the paper more than a routine refusal-tuning study is the internal probing the authors run to verify that the model has learned to recognize its own boundaries rather than memorize refusal templates. The probing distinguishes "actually knows it doesn't know" behavior from "pattern-matched to refuse on superficial cues" — a distinction that has plagued the alignment literature on refusal more generally. The probing results suggest the trained model genuinely tracks knowledge boundaries internally, which is consistent with what mechanistic interpretability work on text-only refusals has been finding over the last year.

The framework generalizes to out-of-distribution medical and perceptual domains in the authors' evaluation, which is the right test — knowledge boundaries are precisely the place where naive fine-tuning approaches fail to transfer. Multi-source coverage on this paper is heavy: it surfaced through arXiv's AI and CV feeds, the post-training and interpretability virtual feeds, and the reinforcement-learning virtual feed, since DPO and ORPO are RL-adjacent. The combination places it in the cluster of trustworthy-VLM work that's becoming foundational for medical and scientific deployments where confabulation is a deal-breaker.

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.CV.
Matched topical feeds: Reinforcement Learning, Mechanistic Interpretability, Post-training / Alignment — wide thematic overlap.

cs.AI cs.CV

#8

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

Agents & Tool Use 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Reinforcement Learning · arXiv — Agents / Tool Use · Hugging Face Daily Papers 7.4 8.3/6.3/7.5

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Reinforcement Learning, Agents / Tool Use — wide thematic overlap.
Hugging Face Daily Papers picked it up — community-curated visibility signal.

cs.CV

#9

Why AI companies want you to be afraid of them

Industry 2026-04-29 Hacker News 7.4 7.5/6.0/8.6

Hacker News discussion (275 points) — Why AI companies want you to be afraid of them. Visit the comments thread for community caveats and the linked article for primary reporting.

#10

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) · Hugging Face Daily Papers 7.2 7.5/6.8/7.0

RL post-training of frontier language models is increasingly bottlenecked by autoregressive rollout generation, making rollout acceleration a central systems challenge. Many existing efficiency methods improve throughput by changing the rollout or optimization regime, for example, through off-policy execution, replay, or lower-precision generation. We study speculative decoding as a lossless acceleration primitive for RL rollouts that preserves the target model's output distribution. We implement speculative decoding in NeMo-RL with a vLLM backend, supporting both synchronous and asynchronous pipelines and enabling speculation during RL rollouts. This benefit is realizable across speculation mechanisms,…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL.
Hugging Face Daily Papers picked it up — community-curated visibility signal.

cs.LG cs.CL

#11

TLPO: Token-Level Policy Optimization for Mitigating Language Confusion in Large Language Models

Post-Training 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv — Reinforcement Learning · arXiv — Post-training / Alignment 7.2 7.0/6.8/7.5

Large language models (LLMs) demonstrate strong multilingual capabilities, yet often fail to consistently generate responses in the intended language, exhibiting a phenomenon known as language confusion. Prior mitigation approaches based on sequence-level fine-tuning, such as DPO, ORPO, and GRPO, operate at the level of entire responses and can lead to unintended degradation of general model capabilities, motivating the need for more fine-grained alternatives. To address this, we introduce Token-Level Policy Optimization (TLPO), a fine-tuning framework designed to mitigate language confusion through localized, token-level updates. TLPO identifies error-prone positions, explores alternative…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL, arXiv cs.AI.
Matched topical feeds: Reinforcement Learning, Post-training / Alignment — wide thematic overlap.

cs.LG cs.CL cs.AI

#12

Colby Adcock’s Scout AI raises $100M to train its models for war: We visited its bootcamp

Government & Defense 2026-04-29 TechCrunch — AI 7.2 7.5/7.5/6.5

We visited Scout AI's training ground where it's working on AI agents that can help individual soldiers control fleets of autonomous vehicles.

#13

He asked AI to count carbs 27000 times. It couldn't give the same answer twice

Industry 2026-04-29 Hacker News 7.2 7.4/6.0/8.0

Hacker News discussion (237 points) — He asked AI to count carbs 27000 times. It couldn't give the same answer twice. Visit the comments thread for community caveats and the linked article for primary reporting.

#14

Edge AI for Automotive Vulnerable Road User Safety: Deployable Detection via Knowledge Distillation

Efficiency 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.RO (Robotics) · arXiv cs.CV (Computer Vision) · arXiv — Efficiency (Quantization, MoE, Inference) 7.1 7.3/7.8/6.0

Deploying accurate object detection for Vulnerable Road User (VRU) safety on edge hardware requires balancing model capacity against computational constraints. Large models achieve high accuracy but fail under INT8 quantization required for edge deployment, while small models sacrifice detection performance. This paper presents a knowledge distillation (KD) framework that trains a compact YOLOv8-S student (11.2M parameters) to mimic a YOLOv8-L teacher (43.7M parameters), achieving 3.9x compression while preserving quantization robustness. We evaluate on full-scale BDD100K (70K training images) with Post-Training Quantization to INT8. The teacher suffers catastrophic degradation under INT8…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.RO, arXiv cs.CV.
Matched topical feeds: Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.LG cs.RO cs.CV

#15

FutureWorld: A Live Environment for Training Predictive Agents with Real-World Outcome Rewards

Agents & Tool Use 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) · arXiv — Reinforcement Learning · arXiv — Agents / Tool Use · arXiv — Evals & Benchmarks 7.1 7.3/6.3/7.5

Live future prediction refers to the task of making predictions about real-world events before they unfold. This task is increasingly studied using large language model-based agent systems, and it is important for building agents that can continually learn from real-world. Just as interactive environments have often driven progress in agents, advancing live future prediction naturally motivates viewing it as a learning environment. Prior works have explored future prediction from several different parts, but have generally not framed it as a unified learning environment. This task is appealing for learning because…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI.
Matched topical feeds: Reinforcement Learning, Agents / Tool Use, Evals & Benchmarks — wide thematic overlap.

cs.LG cs.AI

#16

Do Larger Models Really Win in Drug Discovery? A Benchmark Assessment of Model Scaling in AI-Driven Molecular Property and Activity Prediction

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — State Space Models · arXiv — Reinforcement Learning · arXiv — Post-training / Alignment · arXiv — Evals & Benchmarks 7.1 7.3/6.3/7.5

The rapid growth of molecular foundation models and general-purpose large language models has encouraged a scale-centric view of artificial intelligence in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models and task-specific graph neural networks (GNNs). We test this assumption on 22 molecular property and activity endpoints, including public ADMET and Tox21 benchmarks and two internal anti-infective activity datasets. Across 167,056 held-out task--molecule evaluations under structure-similarity-separated five-fold cross-validation (37,756 ADMET, 77,946 Tox21, 49,266 anti-TB and 2,088 antimalaria), classical machine-learning (ML) models such as RF(ECFP4) and…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: State Space Models, Reinforcement Learning, Post-training / Alignment, Evals & Benchmarks — wide thematic overlap.

cs.LG

#17

Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) · arXiv — Reinforcement Learning · arXiv — Evals & Benchmarks 7.0 7.3/7.6/6.0

Offline reinforcement learning (RL) agents often fail when deployed, as the gap between training datasets and real environments leads to unsafe behavior. To address this, we present SAS (Self-Alignment for Safety), a transformer-based framework that enables test-time adaptation in offline safe RL without retraining. In SAS, the main mechanism is self-alignment: at test time, the pretrained agent generates several imagined trajectories and selects those satisfying the Lyapunov condition. These feasible segments are then recycled as in-context prompts, allowing the agent to realign its behavior toward safety while avoiding parameter updates.…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI.
Matched topical feeds: Reinforcement Learning, Evals & Benchmarks — wide thematic overlap.

cs.LG cs.AI

#18

SoftBank is creating a robotics company that builds data centers — and already eyeing a $100B IPO

Robotics 2026-04-30 TechCrunch — AI 7.0 7.5/7.0/6.5

You need infrastructure to build AI a and robots, but apparently you also need AI and robots to build infrastructure.

#19

Satya Nadella says he’s ready to ‘exploit’ the new OpenAI deal

Industry 2026-04-29 TechCrunch — AI 7.0 6.0/7.5/7.5

Microsoft gets to offer OpenAI's tech to its cloud customers and doesn't have to pay for it. "We fully plan to exploit it," Nadella said.

#20

Microsoft says it has over 20M paid Copilot users, and they really are using it

AI Coding 2026-04-29 TechCrunch — AI 7.0 6.5/7.5/7.0

Despite the lingering perception that no one really uses Copilot, Microsoft said on Wednesday that the number of users and engagement is growing.

#21

Google Cloud surpasses $20B, but says growth was capacity-constrained

Infrastructure 2026-04-29 TechCrunch — AI 7.0 7.5/7.0/6.5

Google Cloud topped $20B in quarterly revenue for the first time, fueled by surging demand for AI. But capacity constraints mean it could have grown even faster.

#22

Intel Earnings, Intel’s Differentiation?, Whither Terafab

Infrastructure 2026-04-29 Stratechery 7.0 7.0/7.0/7.0

Intel's earnings were very impressive, but the chief driver was a structural shift in demand for CPUs for AI. Plus, what is going on with Terafab?

#23

Remote agents in Vibe. Powered by Mistral Medium 3.5. Product Introducing Mistral Medium 3.5, remote coding agents in Vibe, plus new Work mode in Le Chat for complex tasks. Apr 29, 2026 Mistral AI

Frontier LLMs 2026-04-30 Mistral AI News 7.0 7.0/7.0/7.0

Remote agents in Vibe. Powered by Mistral Medium 3.5. Product Introducing Mistral Medium 3.5, remote coding agents in Vibe, plus new Work mode in Le Chat for complex tasks. Apr 29, 2026 Mistral AI

#24

Where the goblins came from

Frontier LLMs 2026-04-29 OpenAI Research 6.9 7.0/6.5/7.0

How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.

#25

PAINT: Partial-Solution Adaptive Interpolated Training for Self-Distilled Reasoners

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Reinforcement Learning · arXiv — Efficiency (Quantization, MoE, Inference) · arXiv — Evals & Benchmarks 6.8 7.3/6.8/6.0

Improving large language model (LLM) reasoning requires supervision that is both aligned with the model's own test-time states and informative at the token level. Reinforcement learning with verifiable rewards provides on-policy exploration but offers sparse, high-variance credit; supervised fine-tuning and distillation provide dense targets but often train on fixed trajectories or rely on stronger teachers. Recent privileged on-policy self-distillation explores a middle ground by scoring student rollouts with the same model under verified solution context. We revisit this setting through a contextual re-scoring lens: for reasoning, the important choices are…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Reinforcement Learning, Efficiency (Quantization, MoE, Inference), Evals & Benchmarks — wide thematic overlap.

cs.LG

#26

Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data

Generative Media 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv — Generative Media / Diffusion 6.7 7.0/6.9/6.0

When do language diffusion models memorize their training data, and how to quantitatively assess their true generative regime? We address these questions by showing that Uniform-based Discrete Diffusion Models (UDDMs) fundamentally behave as Associative Memories (AMs) $\textit{with emergent creative capabilities}$. The core idea of an AM is to reliably recover stored data points as $\textit{memories}$ by establishing distinct basins of attraction around them. Historically, models like Hopfield networks use an explicit energy function to guarantee these stable attractors. We broaden this perspective by leveraging the observation that energy is not…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL, arXiv cs.AI.
Matched topical feeds: Generative Media / Diffusion — wide thematic overlap.

cs.LG cs.CL cs.AI

#27

Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training

Robotic Autonomy 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) · arXiv cs.RO (Robotics) · arXiv — Reinforcement Learning 6.7 7.0/7.0/6.0

This paper presents a hierarchical decision-making framework for unmanned aerial vehicle (UAV) missions motivated by search-and-rescue (SAR) scenarios under limited simulation training. The framework combines a fixed rule-based high-level advisor with an online goal-conditioned low-level reinforcement learning (RL) controller. To stress-test early adaptation, we also consider a strict no-pretraining deployment regime. The high-level advisor is defined offline from a structured task specification and compiled into deterministic rules. It provides interpretable mission- and safety-aware guidance through recommended actions, avoided actions, and regime-dependent arbitration weights. The low-level controller learns online from task-defined…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI, arXiv cs.RO.
Matched topical feeds: Reinforcement Learning — wide thematic overlap.

cs.LG cs.AI cs.RO

#28

Google gains 25M subscriptions in Q1, driven by YouTube and Google One

Industry 2026-04-29 TechCrunch — AI 6.7 7.5/5.9/6.5

Google added 25M paid subscriptions in Q1, reaching 350M total, as YouTube and Google One grow.

#29

Is AI video just a prequel? Runway’s CEO thinks world models are next

Industry 2026-04-29 TechCrunch — AI 6.7 7.5/5.9/6.5

AI-generated video has gone from novelty to creative tool almost overnight, and Runway has a front row seat to the shift. The New York-based company has raised close to $860 million at a $5.3 billion valuation, and its models are going toe-to-toe with the most well-funded labs in the world, including Google and OpenAI. The technology goes way beyond […]

#30

Parallel Web Systems hits $2B valuation five months after its last big raise

Industry 2026-04-29 TechCrunch — AI 6.7 7.5/5.9/6.5

The AI agent-tool startup founded by former Twitter CEO Parag Agrawal has raised $100 million, led by Sequoia, months after raising a previous $100 million.

#31

SciHorizon-DataEVA: An Agentic System for AI-Readiness Evaluation of Heterogeneous Scientific Data

Agents & Tool Use 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) · arXiv — Agents / Tool Use · arXiv — AI for Science 6.6 7.3/6.3/6.0

AI-for-Science (AI4Science) is increasingly transforming scientific discovery by embedding machine learning models into prediction, simulation, and hypothesis generation workflows across domains. However, the effectiveness of these models is fundamentally constrained by the AI-readiness of scientific data, for which no scalable and systematic evaluation mechanism currently exists. In this work, we propose SciHorizon-DataEVA, a novel agentic system to scalable AI-readiness evaluation of heterogeneous scientific data. At the evaluation-criteria level, we introduce the Sci-TQA2 principles, which organize AI-readiness into four complementary dimensions: Governance Trustworthiness, Data Quality, AI Compatibility, and Scientific Adaptability. Each…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI.
Matched topical feeds: Agents / Tool Use, AI for Science — wide thematic overlap.

cs.LG cs.AI

#32

Grounding vs. Compositionality: On the Non-Complementarity of Reasoning in Neuro-Symbolic Systems

Interpretability 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) · arXiv cs.CV (Computer Vision) · arXiv — Mechanistic Interpretability 6.6 7.3/6.3/6.0

Compositional generalization remains a foundational weakness of modern neural networks, limiting their robustness and applicability in domains requiring out-of-distribution reasoning. A central, yet unverified, assumption in neuro-symbolic AI is that compositional reasoning will emerge as a byproduct of successful symbol grounding. This work presents the first systematic empirical analysis to challenge this assumption by disentangling the contributions of grounding and reasoning. To operationalize this investigation, we introduce the Iterative Logic Tensor Network ($i$LTN), a fully differentiable architecture designed for multi-step deduction. Using a formal taxonomy of generalization -- probing for…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI, arXiv cs.CV.
Matched topical feeds: Mechanistic Interpretability — wide thematic overlap.

cs.LG cs.AI cs.CV

#33

Select to Think: Unlocking SLM Potential with Local Sufficiency

Evaluations & Benchmarks 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv — Efficiency (Quantization, MoE, Inference) · arXiv — Evals & Benchmarks 6.6 6.5/7.6/5.5

Small language models (SLMs) offer computational efficiency for scalable deployment, yet they often fall short of the reasoning power exhibited by their larger counterparts (LLMs). To mitigate this gap, current approaches invoke an LLM to generate tokens at points of reasoning divergence, but these external calls introduce substantial latency and costs. Alternatively, standard distillation is often hindered by the capacity limitation, as SLMs struggle to accurately mimic the LLM's complex generative distribution. We address this dilemma by identifying local sufficiency: at divergence points, the LLM's preferred token consistently resides within…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CL.
Matched topical feeds: Efficiency (Quantization, MoE, Inference), Evals & Benchmarks — wide thematic overlap.

cs.CL

#34

Benchmarking Complex Multimodal Document Processing Pipelines: A Unified Evaluation Framework for Enterprise AI

Agents & Tool Use 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv — Agents / Tool Use · arXiv — Evals & Benchmarks 6.6 7.3/6.3/6.0

Most enterprise document AI today is a pipeline. Parse, index, retrieve, generate. Each of those stages has been studied to death on its own -- what's still hard is evaluating the system as a whole. We built EnterpriseDocBench to take a swing at it: parsing fidelity, indexing efficiency, retrieval relevance, and generation groundedness, all on the same corpus. The corpus is built from public, permissively licensed documents across six enterprise domains (five represented in the current pilot). We ran three pipelines through it -- BM25, dense embedding, and a hybrid…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.CL, arXiv cs.AI.
Matched topical feeds: Agents / Tool Use, Evals & Benchmarks — wide thematic overlap.

cs.CL cs.AI

#35

3D Generation for Embodied AI and Robotic Simulation: A Survey

Agents & Tool Use 2026-04-29 arXiv cs.RO (Robotics) · arXiv cs.CV (Computer Vision) · arXiv — Robotic Autonomy / Embodied AI · arXiv — Agents / Tool Use 6.6 7.3/6.3/6.0

Embodied AI and robotic systems increasingly depend on scalable, diverse, and physically grounded 3D content for simulation-based training and real-world deployment. While 3D generative modeling has advanced rapidly, embodied applications impose requirements far beyond visual realism: generated objects must carry kinematic structure and material properties, scenes must support interaction and task execution, and the resulting content must bridge the gap between simulation and reality. This survey presents the first survey of 3D generation for embodied AI and organizes the literature around three roles that 3D generation plays in embodied systems.…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.RO, arXiv cs.CV.
Matched topical feeds: Robotic Autonomy / Embodied AI, Agents / Tool Use — wide thematic overlap.

cs.RO cs.CV

#36

QYOLO: Lightweight Object Detection via Quantum Inspired Shared Channel Mixing

Evaluations & Benchmarks 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.CV (Computer Vision) · arXiv — Efficiency (Quantization, MoE, Inference) · arXiv — Evals & Benchmarks 6.5 7.3/6.2/6.0

The rapid advancement of object detection architectures has positioned single stage detectors as the dominant solution for real-time visual perception. A primary source of computational overhead in these models lies in the deep backbone stages, where C2f bottleneck modules at high stride levels accumulate a disproportionate share of parameters due to quadratic scaling with channel width. This work introduces QYOLO, a quantum-inspired channel mixing framework that achieves genuine architectural compression by replacing the two deepest backbone C2f modules at P4/16 (512 channels) and P5/32 (1024 channels) with a compact QMixBlock.…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.CV.
Matched topical feeds: Efficiency (Quantization, MoE, Inference), Evals & Benchmarks — wide thematic overlap.

cs.AI cs.CV

#37

Sparsity as a Key: Unlocking New Insights from Latent Structures for Out-of-Distribution Detection

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Mechanistic Interpretability · arXiv — Evals & Benchmarks 6.5 6.2/7.6/5.5

Sparse Autoencoders (SAEs) have demonstrated significant success in interpreting Large Language Models (LLMs) by decomposing dense representations into sparse, semantic components. However, their potential for analyzing Vision Transformers (ViTs) remains largely under-explored. In this work, we present the first application of SAEs to the ViT [CLS] token for out-of-distribution (OOD) detection, addressing the limitation of existing methods that rely on entangled feature representations. We propose a novel framework utilizing a Top-k SAE to disentangle the dense [CLS] features into a structured latent space. Through this analysis, we reveal that in-distribution…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Mechanistic Interpretability, Evals & Benchmarks — wide thematic overlap.

cs.CV

#38

Amazon’s cloud business is surging — and so is its capital spending

Infrastructure 2026-04-30 TechCrunch — AI 6.5 6.0/7.0/6.5

The e-commerce giant is making more money than expected from AWS but it's also spending a lot, and will continue to do so in the near term, its chief executive said.

#39

Firestorm Labs raises $82M to take drone factories into the field

Government & Defense 2026-04-29 TechCrunch — AI 6.5 6.0/7.0/6.5

A defense startup just raised $82 million to put drone factories inside shipping containers and bring manufacturing to the front lines.

#40

Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Reinforcement Learning · arXiv — Evals & Benchmarks 6.4 6.5/7.0/5.5

Predictive safety filters (PSFs) leverage model predictive control to enforce constraint satisfaction during deep reinforcement learning (RL) exploration, yet their reliance on first-principles models or Gaussian processes limits scalability and broader applicability. Meanwhile, model-based RL (MBRL) methods routinely employ probabilistic ensemble (PE) neural networks to capture complex, high-dimensional dynamics from data with minimal prior knowledge. However, existing attempts to integrate PEs into PSFs lack rigorous uncertainty quantification. We introduce the Uncertainty-Aware Predictive Safety Filter (UPSi), a PSF that provides rigorous safety predictions using PE dynamics models by formulating future outcomes…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Reinforcement Learning, Evals & Benchmarks — wide thematic overlap.

cs.LG

#41

State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement Reading

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Mechanistic Interpretability · arXiv — Evals & Benchmarks 6.4 6.5/7.0/5.5

Multimodal large language models (MLLMs) have achieved impressive progress on general multimodal tasks, yet they remain brittle on dial-based measurement reading. In this paper, we study this problem through controlled benchmarks and feature-space probing, and show that current MLLMs not only achieve unsatisfactory accuracy on dial-based readout, but also suffer sharp performance drops under viewpoint and illumination changes even when the underlying dial state remains fixed. Our probing analysis further reveals that same-state samples under appearance variation are not consistently clustered, while neighboring states fail to preserve the local structure…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Mechanistic Interpretability, Evals & Benchmarks — wide thematic overlap.

cs.CV

#42

AI evals are becoming the new compute bottleneck

Evaluations & Benchmarks 2026-04-29 Hugging Face Blog 6.4 7.0/7.5/4.5

AI evals are becoming the new compute bottleneck

#43

[AINews] The Inference Inflection

Infrastructure 2026-04-30 Latent Space 6.4 6.5/6.2/6.5

Just as we covered World Models early this year, we’ll be releasing a short miniseries on the CPU compute/sandbox industry on the pod over the coming weeks, and it’s a good time to explain why. In recent days: Noam Brown : “inference compute is a strategic resource, currently undervalued” Sam Altman : “To a significant degree, we have to become an AI inference company now.” Taken individually, these comments might seem unremarkable normal reactions to a very successful GPT 5.5…

#44

"People who don't use AI will be left behind"

Industry 2026-04-29 Hacker News 6.4 6.6/6.0/6.6

Hacker News discussion (157 points) — "People who don't use AI will be left behind". Visit the comments thread for community caveats and the linked article for primary reporting.

#45

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing

Frontier LLMs 2026-04-30 Hugging Face Daily Papers 6.4 6.8/6.3/6.0

#46

MoRFI: Monotonic Sparse Autoencoder Feature Identification

Interpretability 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) · arXiv — Mechanistic Interpretability 6.3 6.2/7.0/5.5

Large language models (LLMs) acquire most of their factual knowledge during the pre-training stage, through next token prediction. Subsequent stages of post-training often introduce new facts outwith the parametric knowledge, giving rise to hallucinations. While it has been demonstrated that supervised fine-tuning (SFT) on new knowledge may exacerbate the problem, the underlying mechanisms are still poorly understood. We conduct a controlled fine-tuning experiment, focusing on closed-book QA, and find latent directions that causally contribute to hallucinations. Specifically, we fine-tune Llama 3.1 8B, Gemma 2 9B and Mistral 7B v03 on…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL.
Matched topical feeds: Mechanistic Interpretability — wide thematic overlap.

cs.LG cs.CL

#47

When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models

Evaluations & Benchmarks 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv — Evals & Benchmarks 6.3 6.5/6.8/5.5

Large reasoning models such as DeepSeek-R1 and OpenAI o1 generate extended chains of thought spanning thousands of tokens, yet their integration with retrieval-augmented generation (RAG) remains fundamentally misaligned. Current RAG systems optimize for providing context before reasoning begins, while reasoning models require evidence injection during multi-step inference chains. We introduce ReaLM-Retrieve, a reasoning-aware retrieval framework that addresses this mismatch through three key innovations: (1) a step-level uncertainty detector that identifies knowledge gaps at reasoning-step granularity rather than token or sentence level; (2) a retrieval intervention policy that learns when external…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.CL, arXiv cs.AI.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CL cs.AI

#48

Benchmarking the Safety of Large Language Models for Robotic Health Attendant Control

Evaluations & Benchmarks 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.RO (Robotics) · arXiv — Evals & Benchmarks 6.3 6.5/6.9/5.5

Large language models (LLMs) are increasingly considered for deployment as the control component of robotic health attendants, yet their safety in this context remains poorly characterized. We introduce a dataset of 270 harmful instructions spanning nine prohibited behavior categories grounded in the American Medical Association Principles of Medical Ethics, and use it to evaluate 72 LLMs in a simulation environment based on the Robotic Health Attendant framework. The mean violation rate across all models was 54.4\%, with more than half exceeding 50\%, and violation rates varied substantially across behavior categories,…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.RO.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.AI cs.RO

#49

World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Post-training / Alignment · arXiv — Evals & Benchmarks 6.3 6.5/6.8/5.5

Vision-language models (VLMs) have shown strong performance on static visual understanding, yet they still struggle with dynamic spatial reasoning that requires imagining how scenes evolve under egocentric motion. Recent efforts address this limitation either by scaling spatial supervision with synthetic data or by coupling VLMs with world models at inference time. However, the former often lacks explicit modeling of motion-conditioned state transitions, while the latter incurs substantial computational overhead. In this work, we propose World2VLM, a training framework that distills spatial imagination from a generative world model into a vision-language…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Post-training / Alignment, Evals & Benchmarks — wide thematic overlap.

cs.CV

#50

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion

Frontier LLMs 2026-04-30 Hugging Face Daily Papers 6.3 6.5/6.3/6.0

#51

A Survey on LLM-based Conversational User Simulation

Frontier LLMs 2026-04-30 Hugging Face Daily Papers 6.3 6.5/6.3/6.0

#52

Random Cloud: Finding Minimal Neural Architectures Without Training

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) · arXiv — Evals & Benchmarks 6.2 6.5/6.3/5.5

I propose the \emph{Random Cloud} method, a training-free approach to neural architecture search that discovers minimal feedforward network topologies through stochastic exploration and progressive structural reduction. Unlike post-training pruning methods that require a full train-prune-retrain cycle, this method evaluates randomly initialized networks without backpropagation, progressively reduces their topology, and only trains the best minimal candidate at the end. I evaluate on 7 classification benchmarks against magnitude pruning and random pruning baselines. The Random Cloud matches or outperforms both baselines in 6 of 7 datasets, achieving statistically significant improvements on Sonar…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.LG cs.AI

#53

A Multi-Dataset Benchmark of Multiple Instance Learning for 3D Neuroimage Classification

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Post-training / Alignment · arXiv — Evals & Benchmarks 6.2 6.5/6.3/5.5

Despite being resource-intensive to train, 3D convolutional neural networks (CNNs) have been the standard approach to classify CT and MRI scans. Recent work suggests that deep multiple instance learning (MIL) may be a more efficient alternative for 3D brain scans, especially when the pre-trained image encoder used to embed each 2D slice is frozen and only the pooling operation and classifier are trained. In this paper, we provide a systematic comparison of simple MIL, attention-based MIL, 3D CNNs, and 3D ViTs across three CT and four MRI datasets, including two…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Post-training / Alignment, Evals & Benchmarks — wide thematic overlap.

cs.LG

#54

Laplace Approximation for Bayesian Tensor Network Kernel Machines

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv stat.ML (Statistical ML) · arXiv — Evals & Benchmarks 6.2 6.5/6.3/5.5

Uncertainty estimation is essential for robust decision-making in the presence of ambiguous or out-of-distribution inputs. Gaussian Processes (GPs) are classical kernel-based models that offer principled uncertainty quantification and perform well on small- to medium-scale datasets. Alternatively, formulating the weight space learning problem under tensor network assumptions yields scalable tensor network kernel machines. However, these assumptions break Gaussianity, complicating standard probabilistic inference. This raises a fundamental question: how can tensor network kernel machines provide principled uncertainty estimates? We propose a novel Bayesian Tensor Network Kernel Machine (LA-TNKM) that employs a (linearized)…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv stat.ML.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.LG stat.ML

#55

FloatSOM: GPU-Accelerated, Distributed, Topology-Flexible Self-Organizing Maps

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Efficiency (Quantization, MoE, Inference) · arXiv — Evals & Benchmarks 6.2 6.5/6.3/5.5

GPU-accelerated Self-Organizing Map (SOM) implementations are among the most competitive options for large-scale SOM analysis, but growing dataset sizes increasingly challenge their practical use because workloads no longer fit cleanly within device-memory limits. We introduce FloatSOM, a SOM framework for scalable training and deployment that supports multi-GPU execution, out-of-memory disk-backed streaming, and novel topologies beyond regular lattices. We evaluate FloatSOM on 14 synthetic and real benchmark datasets together with controlled speed scaling benchmarks, and show that these improved topologies, combined with topology-aware hyperparameter fine-tuning, yield lower quantization error than current…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Efficiency (Quantization, MoE, Inference), Evals & Benchmarks — wide thematic overlap.

cs.LG

#56

Naamah: A Large Scale Synthetic Sanskrit NER Corpus via DBpedia Seeding and LLM Generation

Evaluations & Benchmarks 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv — Evals & Benchmarks 6.2 6.5/6.3/5.5

The digitisation of classical Sanskrit literature is impeded by a scarcity of annotated resources, particularly for Named Entity Recognition. While recent methodologies utilise generic Large Language Models (LLMs) for data augmentation, these approaches remain prone to error and often lack the reasoning depth required for classical grammar. In this work, we introduce Naamah, a high quality silver standard Sanskrit NER dataset comprising 102,942 sentences. We propose a methodology that combines entity extraction from DBpedia with the generative capabilities of a 24B parameter hybrid reasoning model to create grammatically natural and…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.CL, arXiv cs.AI.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CL cs.AI

#57

ATLAS: An Annotation Tool for Long-horizon Robotic Action Segmentation

Robotic Autonomy 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.RO (Robotics) · arXiv — Reinforcement Learning 6.2 6.2/6.7/5.5

Annotating long-horizon robotic demonstrations with precise temporal action boundaries is crucial for training and evaluating action segmentation and manipulation policy learning methods. Existing annotation tools, however, are often limited: they are designed primarily for vision-only data, do not natively support synchronized visualization of robot-specific time-series signals (e.g., gripper state or force/torque), or require substantial effort to adapt to different dataset formats. In this paper, we introduce ATLAS, an annotation tool tailored for long-horizon robotic action segmentation. ATLAS provides time-synchronized visualization of multi-modal robotic data, including multi-view video and proprioceptive signals,…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.RO.
Matched topical feeds: Reinforcement Learning — wide thematic overlap.

cs.AI cs.RO

#58

On the stand, Elon Musk can’t escape his own tweets

Industry 2026-04-29 TechCrunch — AI 6.2 6.0/5.9/6.5

Elon Musk took the stand for the second day for his attempt to legally dismantle OpenAI.

#59

Meta is still burning money on AR/VR

Industry 2026-04-29 TechCrunch — AI 6.2 6.0/5.9/6.5

Meta is losing billions on Reality Labs each quarter, and its AI expenditures are only going to increase its spending.

#60

Google Photos uses AI to make the iconic closet from ‘Clueless’ a reality

Industry 2026-04-29 TechCrunch — AI 6.2 6.0/5.9/6.5

Google says the new feature will leverage AI technology to automatically create a copy of your wardrobe that's based on the pieces of clothing appearing in your Google Photos library.

#61

More Gemini features are coming to Google TV

Industry 2026-04-29 TechCrunch — AI 6.2 6.0/5.9/6.5

Google TV just got more Gemini features, including the ability to transform photos and videos with tools Nano Banana and Veo.

#62

Meet Shapes, the app bringing humans and AI into the same group chats

Industry 2026-04-29 TechCrunch — AI 6.2 6.0/5.9/6.5

Think Discord chats, but with AI characters in addition to humans.

#63

Ramp's Sheets AI Exfiltrates Financials

Safety, Policy & Regulation 2026-04-29 Hacker News 6.2 6.3/6.0/6.1

Hacker News discussion (127 points) — Ramp's Sheets AI Exfiltrates Financials. Visit the comments thread for community caveats and the linked article for primary reporting.

#64

Letting AI play my game – building an agentic test harness to help play-testing

Agents & Tool Use 2026-04-29 Hacker News 6.2 6.3/6.0/6.1

Hacker News discussion (128 points) — Letting AI play my game – building an agentic test harness to help play-testing. Visit the comments thread for community caveats and the linked article for primary reporting.

#65

Hegseth: Autonomous warfare sub-unified command coming soon

Government & Defense 2026-04-29 DefenseScoop 6.2 6.5/7.5/4.5

The U.S. military will soon have a new sub-unified command focused on autonomous warfare, Secretary of Defense Pete Hegseth told lawmakers Wednesday. Sub-unified commands, which combatant commanders can set up with the approval of the SecDef, are joint organizations designed to conduct operations and certain missions assigned to the geographic or functional combatant command that they fall under. The designation typically signifies that the organization’s mission is enduring and a high priority for military leadership. Examples of sub-unified commands include…

#66

Causal Learning with Neural Assemblies

State Space Models 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) · arXiv cs.NE (Neural & Evolutionary Computing) 6.1 6.2/6.3/5.5

Can Neural Assemblies -- groups of neurons that fire together and strengthen through co-activation -- learn the direction of causal influence between variables? While established as a computationally general substrate for classification, parsing, and planning, neural assemblies have not yet been shown to internalize causal directionality. We demonstrate that the inherent operations of neural assemblies -- projection, local plasticity control, and sparse winner selection -- are sufficient for directional learning. We introduce DIRECT (DIRectional Edge Coupling/Training), a mechanism that co-activates source and target assemblies under an adaptive gain schedule to…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI, arXiv cs.NE.

cs.LG cs.AI cs.NE

#67

Domain-Adapted Small Language Models for Reliable Clinical Triage

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) 6.1 6.2/6.3/5.5

Accurate and consistent Emergency Severity Index (ESI) assignment remains a persistent challenge in emergency departments, where highly variable free-text triage documentation contributes to mistriage and workflow inefficiencies. This study evaluates whether open-source small language models (SLMs) can serve as reliable, privacy-preserving decision-support tools for clinical triage. We systematically compared multiple SLMs across diverse prompting pipelines and found that clinical vignettes, concise summaries of triage narratives, yielded the most accurate predictions. The SLM, Qwen2.5-7B, demonstrated the strongest balance of accuracy, stability, and computational efficiency. Through large-scale domain adaptation using expert-curated and…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL, arXiv cs.AI.

cs.LG cs.CL cs.AI

#68

CurEvo: Curriculum-Guided Self-Evolution for Video Understanding

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 6.1 6.2/6.3/5.5

Recent advances in self-evolution video understanding frameworks have demonstrated the potential of autonomous learning without human annotations. However, existing methods often suffer from weakly controlled optimization and uncontrolled difficulty progression, as they lack structured guidance throughout the iterative learning process. To address these limitations, we propose CurEvo, a curriculum-guided self-evolution framework that introduces curriculum learning into self-evolution to achieve more structured and progressive model improvement. CurEvo dynamically regulates task difficulty, refines evaluation criteria, and balances data diversity according to model competence, forming a curriculum-guided feedback loop that aligns learning complexity…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.LG cs.CV

#69

Learning to Route Electric Trucks Under Operational Uncertainty

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Reinforcement Learning · arXiv — Evals & Benchmarks 6.1 6.2/6.3/5.5

Electric truck operations require routing decisions that remain feasible under limited battery range, long charging times, travel and energy consumption, and competition for shared charging infrastructure. These features make electric truck routing a coupled logistics and energy problem, limiting the practicality of heuristics-based methods and rendering them computationally infeasible at scale. This paper proposes a learning-based framework for the stochastic electric truck routing under charging constraints and operational uncertainty. The problem, solved by Reinforcement Learning, is formulated as an event-driven semi-Markov decision process with shared charging resources, stochastic travel and…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Reinforcement Learning, Evals & Benchmarks — wide thematic overlap.

cs.LG

#70

Progressive Semantic Communication for Efficient Edge-Cloud Vision-Language Models

Multimodal 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) · arXiv cs.CV (Computer Vision) 6.1 6.2/6.3/5.5

Deploying Vision-Language Models (VLMs) on edge devices remains challenging due to their substantial computational and memory demands, which exceed the capabilities of resource-constrained embedded platforms. Conversely, fully offloading inference to the cloud is often impractical in bandwidth-limited environments, where transmitting raw visual data introduces substantial latency overhead. While recent edge-cloud collaborative architectures attempt to partition VLM workloads across devices, they typically rely on transmitting fixed-size representations, lacking adaptability to dynamic network conditions and failing to fully exploit semantic redundancy. In this paper, we propose a progressive semantic communication framework for…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI, arXiv cs.CV.

cs.LG cs.AI cs.CV

#71

Order-Sensitive Sequential Interventions on Ideal Lattices

Post-Training 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Reinforcement Learning · arXiv — Post-training / Alignment 6.1 6.2/6.3/5.5

We study sequential interventions under prerequisite constraints. In this setting, admissible intervention sequences are paths in the ideal lattice of a finite prerequisite poset rather than unconstrained action strings. We give an exact local-to-global theory of order sensitivity on this state space. First, we prove that any two admissible paths with the same endpoints differ by a finite sequence of elementary diamond swaps. Second, for edge-additive path valuations, we show that path-independence is equivalent to vanishing diamond curvature, yielding an endpoint potential with a canonical Möbius parameterization on the ideal…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Reinforcement Learning, Post-training / Alignment — wide thematic overlap.

cs.LG

#72

SG-UniBuc-NLP at SemEval-2026 Task 6: Multi-Head RoBERTa with Chunking for Long-Context Evasion Detection

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) 6.1 6.2/6.3/5.5

We describe our system for SemEval-2026 Task 6 (CLARITY: Unmasking Political Question Evasions), which classifies English political interview responses by coarse-grained clarity (3-way) and fine-grained evasion strategy (9-way). Since responses frequently exceed the 512-token limit of standard Transformer encoders, we apply an overlapping sliding-window chunking strategy with element-wise Max-Pooling aggregation over chunk representations. A shared RoBERTa-large encoder supplies two task-specific heads trained jointly via a multi-task objective, with inference-time ensembling over 7-fold stratified cross-validation. Our system achieves a Macro-F1 of 0.80 on Subtask 1 and 0.51 on Subtask 2, ranking…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL, arXiv cs.AI.

cs.LG cs.CL cs.AI

#73

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

Agents & Tool Use 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv — Agents / Tool Use · arXiv — Evals & Benchmarks 6.1 6.2/6.3/5.5

Autonomous LLM agents increasingly operate in long-horizon, interactive settings where success depends on reusing experience accumulated over extended histories. However, existing agent memory systems are fundamentally constrained by text-context budgets: storing or revisiting raw trajectories is prohibitively token-expensive, while summarization and text-only retrieval trade token savings for information loss and fragmented evidence. To address this limitation, we propose Optical Context Retrieval Memory (OCR-Memory), a memory framework that leverages the visual modality as a high-density representation of agent experience, enabling retention of arbitrarily long histories with minimal prompt overhead at retrieval…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CL.
Matched topical feeds: Agents / Tool Use, Evals & Benchmarks — wide thematic overlap.

cs.CL

#74

Translating Under Pressure: Domain-Aware LLMs for Crisis Communication

Post-Training 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv — Post-training / Alignment 6.1 6.2/6.3/5.5

Timely and reliable multilingual communication is critical during natural and human-induced disasters, but developing effective solutions for crisis communication is limited by the scarcity of curated parallel data. We propose a domain-adaptive pipeline that expands a small reference corpus, by retrieving and filtering data from general corpora. We use the resulting dataset to fine-tune a small language model for crisis-domain translation and then apply preference optimization to bias outputs toward CEFR A2-level English. Automatic and human evaluation shows that this approach improves readability, while maintaining strong adequacy. Our results indicate…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.CL, arXiv cs.AI.
Matched topical feeds: Post-training / Alignment — wide thematic overlap.

cs.CL cs.AI

#75

Text-Utilization for Encoder-dominated Speech Recognition Models

State Space Models 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) · arXiv cs.NE (Neural & Evolutionary Computing) 6.1 6.2/6.3/5.5

This paper investigates efficient methods for utilizing text-only data to improve speech recognition, focusing on encoder-dominated models that facilitate faster recognition. We provide a comprehensive comparison of techniques to integrate text-only data, including modality matching and dynamic downsampling to reach text-level representations within the encoder. Our experiments on the LibriSpeech corpus show that a larger encoder with a smaller decoder can equal or surpass the performance of architectures with larger decoders. We demonstrate that simple configurations, such as random duration models, are often more effective than complex alternatives, significantly simplifying…

How it was discussed

Cross-listed in 3 arXiv categorical feeds: arXiv cs.CL, arXiv cs.AI, arXiv cs.NE.

cs.CL cs.AI cs.NE

#76

ViCrop-Det: Spatial Attention Entropy Guided Cropping for Training-Free Small-Object Detection

Interpretability 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.CV (Computer Vision) · arXiv — Mechanistic Interpretability 6.1 6.2/6.3/5.5

Transformer-based architectures have established a dominant paradigm in global semantic perception; however, they remain fundamentally constrained by the profound spatial heterogeneity inherent in natural images. Specifically, the imposition of a uniform global receptive field across regions of varying information density inevitably leads to local feature degradation, particularly in dense conflict zones populated by microscopic targets. To address this mechanistic limitation, we propose ViCrop-Det, a training-free inference framework that introduces adaptive spatial trust region shrinkage. Inspired by the use of attention entropy in anomaly segmentation, ViCrop-Det leverages the detection decoder's cross-attention…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.CV.
Matched topical feeds: Mechanistic Interpretability — wide thematic overlap.

cs.AI cs.CV

#77

Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

Agents & Tool Use 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv — Agents / Tool Use · arXiv — Efficiency (Quantization, MoE, Inference) 6.1 6.5/6.2/5.5

Operating and maintaining (O&M) large-scale online engine systems (search, recommendation, advertising) demands substantial human effort for release monitoring, alert response, and root cause analysis. While LLM-based agents are a natural fit for these tasks, the deployment bottleneck is not reasoning capability but orchestration: selecting, for each operational event, the relevant data (metrics, logs, change events) and the applicable operational knowledge (handbook rules and practitioner experience). Feeding all signals indiscriminately causes dilution and hallucination, while manually curating the event-to-(data, knowledge) mapping is intractable under dozens of daily releases. We present Bian…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.AI.
Matched topical feeds: Agents / Tool Use, Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.AI

#78

MemOVCD: Training-Free Open-Vocabulary Change Detection via Cross-Temporal Memory Reasoning and Global-Local Adaptive Rectification

Evaluations & Benchmarks 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 6.1 6.5/6.2/5.5

Open-vocabulary change detection aims to identify semantic changes in bi-temporal remote sensing images without predefined categories. Recent methods combine foundation models such as SAM, DINO and CLIP, but typically process each timestamp independently or interact only at the final comparison stage. Such paradigms suffer from insufficient temporal coupling during semantic reasoning, which limits their ability to distinguish genuine semantic changes from non-semantic appearance discrepancies. In addition, patch-dominant inference on high-resolution images often weakens global semantic continuity and produces fragmented change regions. To address these issues, we propose MemOVCD, a training-free…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.AI cs.CV

#79

SecMate: Multi-Agent Adaptive Cybersecurity Troubleshooting with Tri-Context Personalization

Agents & Tool Use 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv — Agents / Tool Use · arXiv — Evals & Benchmarks 6.1 6.5/6.2/5.5

Recent advances in large language models and agentic frameworks have enabled virtual customer assistants (VCAs) for complex support. We present SecMate, a multi-agent VCA for cybersecurity troubleshooting that integrates device, user, and service specificity from conversational and device-level signals. Device specificity is provided by a lightweight local diagnostic utility, while user specificity relies on implicit proficiency inference and profile-aware troubleshooting. Service specificity is achieved through a proactive, context-aware recommender. We evaluate SecMate in a controlled study with 144 participants and 711 conversations. Device-level evidence increased correct resolutions from about 50%…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.AI.
Matched topical feeds: Agents / Tool Use, Evals & Benchmarks — wide thematic overlap.

cs.AI

#80

$\text{PKS}^4$:Parallel Kinematic Selective State Space Scanners for Efficient Video Understanding

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — State Space Models · arXiv — Evals & Benchmarks 6.1 6.2/6.3/5.5

Temporal modeling remains a fundamental challenge in video understanding, particularly as sequence lengths scale. Traditional video models relying on dense spatiotemporal attention suffer from quadratic computational costs for long videos. To circumvent these costs, recent approaches adapt image models for videos via Parameter-Efficient Fine-Tuning (PEFT) methods such as adapters. However, deeply inserting these modules incurs prohibitive activation memory overhead during back-propagation. While recent efficient State Space Models (SSMs) introduce linear complexity, they disrupt 2D spatial relationships and rely on extensive masked pre-training to recover spatial awareness. To overcome these limitations,…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: State Space Models, Evals & Benchmarks — wide thematic overlap.

cs.CV

#81

Claude.ai and API unavailable [fixed]

Industry 2026-04-30 Hacker News 6.1 6.1/6.0/5.9

Hacker News discussion (113 points) — Claude.ai and API unavailable [fixed]. Visit the comments thread for community caveats and the linked article for primary reporting.

#82

Asynchronous Federated Unlearning with Invariance Calibration for Medical Imaging

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Evals & Benchmarks 6.0 5.5/7.4/5.0

Federated Unlearning (FU) is an emerging paradigm in Federated Learning (FL) that enables participating clients to fully remove their contributions from a trained global model, driven by data protection regulations that mandate the right to be forgotten. However, existing FU methods mostly rely on synchronous coordination. This requirement forces the entire federation to halt and wait for stragglers to complete erasure, creating significant delays due to device heterogeneity. Furthermore, these methods often face the problem that the influence of erased data is merely suppressed temporarily and resurfaces during subsequent training,…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.LG

#83

CoQuant: Joint Weight-Activation Subspace Projection for Mixed-Precision LLMs

Efficiency 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Efficiency (Quantization, MoE, Inference) 6.0 5.8/7.0/5.0

Post-training quantization (PTQ) has become an important technique for reducing the inference cost of Large Language Models (LLMs). While recent mixed-precision methods improve ultra-low bit quantization by preserving critical subspaces in high precision, they typically construct these subspaces relying solely on activation statistics. This ignores the fundamental nature of linear operations, where the output perturbation is jointly driven by both activation and weight quantization noise. In this paper, we propose CoQuant, a joint weight-activation subspace projection method. By theoretically modeling the expected output error, CoQuant formulates a closed-form weighted PCA…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.LG

#84

Topology-Aware Representation Alignment for Semi-Supervised Vision-Language Learning

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) 6.0 5.8/7.6/4.5

Vision-language models have shown strong performance, but they often generalize poorly to specialized domains. While semi-supervised vision-language learning mitigates this limitation by leveraging a small set of labeled image-text pairs together with abundant unlabeled images, existing methods remain fundamentally pairwise and fail to model the global structure of multimodal representation manifolds. Existing topology-based alignment methods rely on persistence diagram matching, which neither guarantees geometric alignment nor utilizes the image-text pairing information central to vision-language learning. We propose Topology-Aware Multimodal Representation Alignment (ToMA), a framework that uses persistent homology to identify…

cs.LG

#85

Hyper Input Convex Neural Networks for Shape Constrained Learning and Optimal Transport

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv stat.ML (Statistical ML) 5.9 5.5/6.9/5.0

We introduce Hyper Input Convex Neural Networks (HyCNNs), a novel neural network architecture designed for learning convex functions. HyCNNs combine the principles of Maxout networks with input convex neural networks (ICNNs) to create a neural network that is always convex in the input, theoretically capable of leveraging depth, and performs reliable when trained at scale compared to ICNNs. Concretely, we prove that HyCNNs require exponentially fewer parameters than ICNNs to approximate quadratic functions up to a given precision. Throughout a series of synthetic experiments, we demonstrate that HyCNNs outperform existing…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv stat.ML.

cs.LG stat.ML

#86

FaaSMoE: A Serverless Framework for Multi-Tenant Mixture-of-Experts Serving

Efficiency 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Efficiency (Quantization, MoE, Inference) 5.9 5.5/7.0/5.0

Mixture-of-Experts (MoE) models offer high capacity with efficient inference cost by activating a small subset of expert models per input. However, deploying MoE models requires all experts to reside in memory, creating a gap between the resource used by activated experts and the provisioned resources. This underutilization is further pronounced in multi-tenant scenarios. In this paper, we propose FaaSMoE, a multi-tenant MoE serving architecture built on Function-as-a-Service (FaaS) platforms. FaaSMoE decouples the control and execution planes of MoE by deploying experts as stateless FaaS functions, enabling on-demand and scale-to-zero expert…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.LG

#87

Deep-testing: the case of dependence detection

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv stat.ML (Statistical ML) 5.9 5.5/6.9/5.0

Deep learning methods have proved highly effective for classification and image recognition problems. In this paper, we ask whether this success can be transferred to hypothesis testing: if a neural network can distinguish, for example, an image of a handwritten digit from another, can it also distinguish an "image of a sample" (such as a scatter plot) generated under a given statistical model from one generated outside that model? Motivated by this idea, we propose a novel procedure called deep-testing, which approaches the classical inferential problem of hypothesis testing through…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv stat.ML.

cs.LG stat.ML

#88

Differentially-Private Text Rewriting reshapes Linguistic Style

Frontier LLMs 2026-04-29 arXiv cs.CL (Computation & Language) 5.9 5.5/7.6/4.5

Differential Privacy (DP) for text matured from disjointed word-level substitutions to contiguous sentence-level rewriting by leveraging the generative capacity of language models. While this form of text privatization is best suited for balancing formal privacy guarantees with grammatical coherence, its impact on the register identity of text remains largely unexplored. By conducting a multidimensional stylistic profiling of differentially-private rewriting, we demonstrate that the cost of privacy extends far beyond lexical variation. Specifically, we find that rewriting under privacy constraints induces a systematic functional mutation of the text's communicative signature. This…

cs.CL

#89

Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

Robotic Autonomy 2026-04-29 arXiv cs.RO (Robotics) 5.9 5.8/7.4/4.5

Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-horizon settings. To bridge this gap, we propose Walk with Me, a map-free framework for long-horizon social navigation from high-level human instructions. Walk with Me leverages GPS context and lightweight candidate points-of-interest from a public map API for semantic destination grounding and waypoint proposal. A High-Level Vision-Language Model grounds abstract instructions…

cs.RO

#90

LLM-Flax : Generalizable Robotic Task Planning via Neuro-Symbolic Approaches with Large Language Models

Evaluations & Benchmarks 2026-04-29 arXiv cs.RO (Robotics) · arXiv — Evals & Benchmarks 5.9 5.8/6.7/5.0

Deploying a neuro-symbolic task planner on a new domain today requires significant manual effort: a domain expert must author relaxation and complementary rules, and hundreds of training problems must be solved to supervise a Graph Neural Network (GNN) object scorer. We propose LLM-Flax, a three-stage framework that eliminates all three sources of manual effort using a locally hosted LLM given only a PDDL domain file. Stage 1 automatically generates relaxation and complementary rules via structured prompting with format validation and self-correction. Stage 2 introduces LLM-guided failure recovery with a feasibility-gated…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.RO.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.RO

#91

Graph-based Semantic Calibration Network for Unaligned UAV RGBT Image Semantic Segmentation and A Large-scale Benchmark

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.9 5.8/6.8/5.0

Fine-grained RGBT image semantic segmentation is crucial for all-weather unmanned aerial vehicle (UAV) scene understanding. However, UAV RGBT semantic segmentation faces two coupled challenges: cross-modal spatial misalignment caused by sensor parallax and platform vibration, and severe semantic confusion among fine-grained ground objects under top-down aerial views. To address these issues, we propose a Graph-based Semantic Calibration Network (GSCNet) for unaligned UAV RGBT image semantic segmentation. Specifically, we design a Feature Decoupling and Alignment Module (FDAM) that decouples each modality into shared structural and private perceptual components and performs deformable alignment…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#92

Breaking the Rigid Prior: Towards Articulated 3D Anomaly Detection

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.9 5.8/6.8/5.0

Existing 3D anomaly detection methods are built on a rigid prior: normal geometry is pose-invariant and can be canonicalized through registration or alignment. This prior does not hold for articulated objects with hinge or sliding joints, where valid pose changes induce structured geometric variations that cannot be collapsed to a single canonical template, causing pose-induced deformations to be misidentified as anomalies while true structural defects are obscured. No existing benchmark addresses this challenge. We introduce ArtiAD, the first large-scale benchmark for articulated 3D anomaly detection, comprising 15,229 point clouds across…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#93

TAP into the Patch Tokens: Leveraging Vision Foundation Model Features for AI-Generated Image Detection

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.9 5.8/6.6/5.0

Recent methods demonstrate that large-scale pretrained models, such as CLIP vision transformers, effectively detect AI-generated images (AIGIs) from unseen generative models when used as feature extractors. Many state-of-the-art methods for AI-generated image detection build upon the original CLIP-ViT to enhance this generalization. Since CLIP's release, numerous vision foundation models (VFMs) have emerged, incorporating architectural improvements and different training paradigms. Despite these advances, their potential for AIGI detection and AI image forensics remains largely unexplored. In this work, we present a comprehensive benchmark across multiple VFM families, covering diverse pretraining objectives,…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#94

AirZoo: A Unified Large-Scale Dataset for Grounding Aerial Geometric 3D Vision

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.9 5.8/6.6/5.0

Despite the rapid progress in data-driven 3D vision, aerial geometric 3D vision remains a formidable challenge due to the severe scarcity of large-scale, high-fidelity training data. Existing benchmarks, predominantly biased toward ground-level or object-centric views, do not account for complex viewpoint transformations and diverse environmental conditions in UAV-based sensing. To bridge this critical gap, we propose AirZoo, a unified large-scale dataset and benchmark for grounding aerial geometric 3D vision. AirZoo possesses three appealing properties: 1) Scalable Generation Pipeline: Leveraging freely available, world-scale photogrammetric 3D meshes, it renders vast outdoor environments…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#95

DenseStep2M: A Scalable, Training-Free Pipeline for Dense Instructional Video Annotation

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.9 5.8/6.8/5.0

Long-term video understanding requires interpreting complex temporal events and reasoning over procedural activities. While instructional video corpora, like HowTo100M, offer rich resources for model training, they present significant challenges, including noisy ASR transcripts and inconsistent temporal alignments between narration and visual content. In this work, we introduce an automated, training-free pipeline to extract high-quality procedural annotations from in-the-wild instructional videos. Our approach segments videos into coherent shots, filters poorly aligned content, and leverages state-of-the-art multimodal and large language models (Qwen2.5-VL and DeepSeek-R1) to generate structured, temporally grounded procedural steps. This…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#96

Cross-Domain Transfer of Hyperspectral Foundation Models

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.9 5.8/6.6/5.0

Hyperspectral imaging (HSI) semantic segmentation typically relies on in-domain training, but limited data availability often restricts model performance in real-world applications. Current approaches to leverage foundation models in proximal sensing use cross-modality techniques, bridging RGB and HSI to exploit vision foundation models. However, these methods either discard spectral information or introduce architectural complexity. We propose cross-domain transfer as an alternative, reusing HSI foundation models - originally trained in remote sensing - for proximal sensing applications. By eliminating the need to bridge modality gaps, our approach preserves spectral information while maintaining…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#97

The Zig project's rationale for their firm anti-AI contribution policy

AI Coding 2026-04-30 Simon Willison · Hacker News 5.9 6.0/6.7/5.0

Zig has one of the most stringent anti-LLM policies of any major open source project: No LLMs for issues. No LLMs for pull requests. No LLMs for comments on the bug tracker, including translation. English is encouraged, but not required. You are welcome to post in your native language and rely on others to have their own translation tools of choice to interpret your words. The most prominent project written in Zig may be the Bun JavaScript runtime, which was…

How it was discussed

Simon Willison reported it.
Hacker News reported it.

#98

Learning Over-Relaxation Policies for ADMM with Convergence Guarantees

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Evals & Benchmarks 5.8 5.8/6.3/5.0

The Alternating Direction Method of Multipliers (ADMM) is a widely used method for structured convex optimization, and its practical performance depends strongly on the choice of penalty and relaxation parameters. Motivated by settings such as Model Predictive Control (MPC), where one repeatedly solves related optimization problems with fixed structure and changing parameter values, we propose learning online updates of the relaxation parameter to improve performance on problem classes of interest. This choice is computationally attractive in OSQP-like architectures, since adapting relaxation does not trigger the matrix refactorizations associated with penalty…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.LG

#99

Electricity price forecasting across Norway's five bidding zones in the post-crisis era

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Evals & Benchmarks 5.8 5.8/6.3/5.0

Norway's electricity market is heavily dominated by hydropower, but the 2021--2022 energy crisis and stronger integration with Continental Europe have fundamentally altered price formation, reducing the reliability of forecasting models calibrated on historical data. Despite the critical need for updated models, a unified benchmark evaluating feature contributions across all structurally diverse Norwegian bidding zones remains lacking. Here we present a comprehensive evaluation of electricity price forecasting across all five Norwegian Nord Pool bidding zones. We constructed a multimodal hourly dataset spanning 2019--2025 and evaluated eight forecasting model families including LightGBM,…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.LG

#100

AGEL-Comp: A Neuro-Symbolic Framework for Compositional Generalization in Interactive Agents

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) 5.8 5.8/6.3/5.0

Large Language Model (LLM)-based agents exhibit systemic failures in compositional generalization, limiting their robustness in interactive environments. This work introduces AGEL-Comp, a neuro-symbolic AI agent architecture designed to address this challenge by grounding actions of the agent. AGEL-Comp integrates three core innovations: (1) a dynamic Causal Program Graph (CPG) as a world model, representing procedural and causal knowledge as a directed hypergraph; (2) an Inductive Logic Programming (ILP) engine that synthesizes new Horn clauses from experiential feedback, grounding symbolic knowledge through interaction; and (3) a hybrid reasoning core where an…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI.

cs.LG cs.AI

#101

Featurising Pixels from Dynamic 3D Scenes with Linear In-Context Learners

Government & Defense 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CV (Computer Vision) 5.8 5.8/6.3/5.0

One of the most exciting applications of vision models involve pixel-level reasoning. Despite the abundance of vision foundation models, we still lack representations that effectively embed spatio-temporal properties of visual scenes at the pixel level. Existing frameworks either train on image-based pretext tasks, which do not account for dynamic elements, or on video sequences for action-level reasoning, which does not scale to dense pixel-level prediction. We present a framework that learns pixel-accurate feature descriptors from videos, LILA. The core element of our training framework is linear in-context learning. LILA leverages…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CV.

cs.LG cs.CV

#102

ClassEval-Pro: A Cross-Domain Benchmark for Class-Level Code Generation

AI Coding 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv — Evals & Benchmarks 5.8 5.8/6.3/5.0

LLMs have achieved strong results on both function-level code synthesis and repository-level code modification, yet a capability that falls between these two extremes -- compositional code creation, i.e., building a complete, internally structured class from a specification -- remains underserved. Current evaluations are either confined to isolated functions or rely on manually curated class-level tasks that are expensive to scale and increasingly susceptible to data contamination. We introduce ClassEval-Pro, a benchmark of 300 class-level tasks spanning 11 domains, constructed through an automated three-stage pipeline that combines complexity enhancement, cross-domain class…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CL.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CL

#103

SAGE: A Strategy-Aware Graph-Enhanced Generation Framework For Online Counseling

Frontier LLMs 2026-04-29 arXiv cs.CL (Computation & Language) 5.8 5.8/7.0/4.5

Effective mental health counseling is a complex, theory-driven process requiring the simultaneous integration of psychological frameworks, real-time distress signals, and strategic intervention planning. This level of clinical reasoning is critical for safety and therapeutic effectiveness but is often missing in general-purpose Large Language Models (LLMs). We introduce SAGE (Strategy-Aware Graph-Enhanced), a novel framework designed to bridge the gap between structured clinical knowledge and generative AI. SAGE constructs a heterogeneous graph that unifies conversational dynamics with a psychologically grounded layer, explicitly anchoring interactions in a theory-driven lexicon. Our architecture first employs…

cs.CL

#104

Multimodal LLMs are not all you need for Pediatric Speech Language Pathology

Evaluations & Benchmarks 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv — Evals & Benchmarks 5.8 5.8/6.3/5.0

Speech Sound Disorders (SSD) affect roughly five percent of children, yet speech-language pathologists face severe staffing shortages and unmanageable caseloads. We test a hierarchical approach to SSD classification on the granular multi-task SLPHelmUltraSuitePlus benchmark. We propose a cascading approach from binary classification to type, and symptom classification. By fine-tuning Speech Representation Models (SRM), and using targeted data augmentation we mitigate biases found by previous works, and improve upon all clinical tasks in the benchmark. We also treat Automatic Speech Recognition (ASR) with our data augmentation approach. Our results demonstrate that…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CL.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CL

#105

StarDrinks: An English and Korean Test Set for SLU Evaluation in a Drink Ordering Scenario

Evaluations & Benchmarks 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv — Evals & Benchmarks 5.8 5.8/6.3/5.0

LLMs and speech assistants are increasingly used for task-oriented interactions, yet their evaluation often relies on controlled scenarios that fail to capture the variability and complexity of real user requests. Drink ordering, for example, involves diverse named entities, drink types, sizes, customizations, and brand-specific terminology, as well as spontaneous speech phenomena such as hesitations and self-corrections. To address this gap, we introduce StarDrinks, a test set in English and Korean containing speech utterances features, transcriptions, and annotated slots. Our dataset supports speech-to-slots SLU, transcription-to-slots NLU, and speech-to-transcription ASR evaluation, providing…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CL.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CL

#106

Theory-Grounded Evaluation Exposes the Authorship Gap in LLM Personalization

Evaluations & Benchmarks 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv — Evals & Benchmarks 5.8 5.8/6.3/5.0

Stylistic personalization - making LLMs write in a specific individual's style, rather than merely adapting to task preferences - lacks evaluation grounded in authorship science. We show that grounding evaluation in authorship verification theory transforms what benchmarks can measure. Drawing on three measurement traditions - LUAR, a trained authorship verification model; an LLM-as-judge with decoupled trait matching; and classical function-word stylometrics - we evaluate four inference-time personalization methods across 50 authors and 1,000 generations. The theory-grounded metric, LUAR, provides what ad hoc alternatives cannot: calibrated baselines, with a human ceiling…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CL.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CL

#107

Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

Robotic Autonomy 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.RO (Robotics) 5.8 5.5/6.7/5.0

Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.RO.

cs.AI cs.RO

#108

SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection

Multimodal 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.CV (Computer Vision) 5.8 5.5/6.9/5.0

The bottleneck in learning-based industrial defect detection is often limited not by model capacity, but by the scarcity of labeled defect data: defects are rare, annotations are expensive, and collecting balanced training sets is slow. We present an end-to-end pipeline for synthetic defect generation and annotation, combining Vision-Language-Model-based prompts, LoRA-adapted diffusion, mask-guided inpainting, and sample filtering with automatic label derivation, and demonstrates the potential of real data with realistic synthetic samples to overcome data scarcity. The evaluation is conducted on, a challenging dataset of pitting defects on ball screw drives,…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.CV.

cs.AI cs.CV

#109

Tatemae: Detecting Alignment Faking via Tool Selection in LLMs

Research 2026-04-29 arXiv cs.AI (Artificial Intelligence) 5.8 5.8/6.9/4.5

Alignment faking (AF) occurs when an LLM strategically complies with training objectives to avoid value modification, reverting to prior preferences once monitoring is lifted. Current detection methods focus on conversational settings and rely primarily on Chain-of-Thought (CoT) analysis, which provides a reliable signal when strategic reasoning surfaces, but cannot distinguish deception from capability failures if traces are absent or unfaithful. We formalize AF as a composite behavioural event and detect it through observable tool selection, where the LLM selects the safe tool when unmonitored, but switches to the unsafe tool…

cs.AI

#110

SnapPose3D: Diffusion-Based Single-Frame 2D-to-3D Lifting of Human Poses

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.8 5.5/6.6/5.0

Depth ambiguity and joint uncertainty are the two main obstacles in obtaining accurate human pose predictions by 2D-to-3D lifting methods proposed in the literature. In particular, these issues are caused by 2D joint locations that can be mapped to multiple 3D positions, inducing multiple possible final poses. Following these considerations, we propose leveraging diffusion-based models generation capability to predict multiple hypotheses and aggregate them in a final accurate pose. Therefore, we introduce SnapPose3D, a pose-lifting framework trained deterministically to denoise 3D poses conditioned on both visual context and 2D pose…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#111

MTCurv: Deep learning for direct microtubule curvature mapping in noisy fluorescence microscopy images

Post-Training 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Post-training / Alignment 5.8 5.5/6.8/5.0

Accurate quantification of the geometry of curvilinear biological structures is essential for understanding cellular mechanics and disease-related morphological alterations. Microtubule curvature is a key descriptor of filament rigidity and mechanical perturbations. However, reliable curvature extraction from fluorescence microscopy images remains challenging due to noise, low contrast, and partial filament visibility. Existing approaches rely on segmentation pipelines with pre or post-processing, which are highly sensitive to segmentation errors and often fail under adverse imaging conditions. In this work, we propose MTCurv, a deep learning framework for direct, segmenta-tion-free regression of microtubule…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Post-training / Alignment — wide thematic overlap.

cs.CV

#112

Robust Alignment: Harmonizing Clean Accuracy and Adversarial Robustness in Adversarial Training

Multimodal 2026-04-29 arXiv cs.CV (Computer Vision) 5.8 5.5/7.3/4.5

Adversarial Training (AT) is one of the most effective methods for developing robust deep neural networks (DNNs). However, AT faces a trade-off problem between clean accuracy and adversarial robustness. In this work, we reveal a surprising phenomenon for the first time: Varying input perturbation intensities for training samples near decision boundaries in AT have minimal impact on model robustness. This finding directly exposes the inconsistency between accuracy and robustness score fluctuations, leading us to identify the misalignment between input and latent spaces as a critical driver of the robustness-accuracy trade-off.…

cs.CV

#113

Granite 4.1 LLMs: How They’re Built

Frontier LLMs 2026-04-29 Hugging Face Blog 5.8 6.5/6.2/4.5

Granite 4.1 LLMs: How They’re Built

#114

It’s time to make a plan for nuclear waste

Safety, Policy & Regulation 2026-04-29 MIT Tech Review 5.8 6.0/6.9/4.5

Today, nuclear energy enjoys a rare moment of support across the political spectrum in the US. Interest from tech companies that are scrambling to meet demand for massive data centers has sparked a resurgence of money and attention in the industry. That newfound interest is exactly why it’s time to talk about an old problem: nuclear waste. In the US alone, nuclear reactors produce about 2,000 metric tons of high-level waste each year. And there’s nowhere to put it. Though…

#115

Space Force awards first contracts for satellite threat warning radar payloads - Breaking Defense

Government & Defense 2026-04-29 Breaking Defense (Google News) 5.8 6.5/6.2/4.5

Space Force awards first contracts for satellite threat warning radar payloads Breaking Defense

#116

Pentagon leaders place $25 billion price tag on Operation Epic Fury - Breaking Defense

Government & Defense 2026-04-30 Breaking Defense (Google News) 5.8 6.5/6.2/4.5

Pentagon leaders place $25 billion price tag on Operation Epic Fury Breaking Defense

#117

Sixty days in, Pentagon estimates $25B spent on Iran war

Government & Defense 2026-04-29 Defense One 5.8 6.5/6.2/4.5

Sixty days in, Pentagon estimates $25B spent on Iran war

#118

Meet the 3-star insiders say will be Space Force’s next top leader

Government & Defense 2026-04-29 Defense One 5.8 6.5/6.2/4.5

Meet the 3-star insiders say will be Space Force’s next top leader

#119

Rapid software delivery is possible inside DoW — Software Factory 2.0 shows how

Government & Defense 2026-04-29 DefenseScoop 5.8 6.5/6.2/4.5

I co-founded Kessel Run , the Department of War’s (DoW) first software factory, with a simple mission: To continuously deliver valuable software that warfighters love. At our peak, we deployed five applications from concept to operations in an average of 124 days, reducing target development timelines by 85%. Section 31, the U.S. Space Force’s first software factory, deployed eight applications to operations in an average of 64 days and reduced conjunction analysis from three hours to 15 minutes. These outcomes…

#120

A Note on How to Remove the $\ln\ln T$ Term from the Squint Bound

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv stat.ML (Statistical ML) 5.7 5.5/6.3/5.0

In Orabona and Pál [2016], we introduced the shifted KT potentials, to remove the $\ln \ln T$ factor in the parameter-free learning with expert bound. In this short technical note, I show that this is equivalent to changing the prior in the Krichevsky--Trofimov algorithm. Then, I show how to use the same idea to remove the $\ln \ln T$ factor in the data-independent bound for the Squint algorithm.

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv stat.ML.

cs.LG stat.ML

#121

On the Learning Curves of Revenue Maximization

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv stat.ML (Statistical ML) 5.7 5.5/6.3/5.0

Learning curves are a fundamental primitive in supervised learning, describing how an algorithm's performance improves with more data and providing a quantitative measure of its generalization ability. Formally, a learning curve plots the decay of an algorithm's error for a fixed underlying distribution as a function of the number of training samples. Prior work on revenue-maximizing learning algorithms, starting with the seminal work of Cole and Roughgarden [STOC, 2014], adopts a distribution-free perspective, which parallels the PAC learning framework in learning theory. This approach evaluates performance against the hardest possible…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv stat.ML.

cs.LG stat.ML

#122

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv stat.ML (Statistical ML) 5.7 5.5/6.3/5.0

We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv stat.ML.

cs.LG stat.ML

#123

HealthNLP_Retrievers at ArchEHR-QA 2026: Cascaded LLM Pipeline for Grounded Clinical Question Answering

Government & Defense 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CL (Computation & Language) 5.7 5.5/6.3/5.0

Patient portals now give individuals direct access to their electronic health records (EHRs), yet access alone does not ensure patients understand or act on the complex clinical information contained in these records. The ArchEHR-QA 2026 shared task addresses this challenge by focusing on grounded question answering over EHRs, and this paper presents the system developed by the HealthNLP_Retrievers team for this task. The proposed approach uses a multi-stage cascaded pipeline powered by the Gemini 2.5 Pro large language model to interpret patient-authored questions and retrieve relevant evidence from lengthy clinical…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CL.

cs.LG cs.CL

#124

KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment

Multimodal 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.CV (Computer Vision) 5.7 5.5/6.3/5.0

We present KAYRA, an end-to-end karyotyping system that operates inside the operational constraints of a clinical cytogenetic laboratory. KAYRA is architected as a containerized microservice pipeline whose ML stack combines an EfficientNet-B5 + U-Net semantic segmenter, a Mask R-CNN (ResNet-50 + FPN) instance detector, and a ResNet-18 classifier, orchestrated through a cascaded ROI-narrowing strategy that focuses each downstream model on the chromosome-bearing region. The same container images are deployed both as a cloud service and as an on-premise installation, supporting clinical environments where patient-data egress is not permitted as well…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.CV.

cs.LG cs.CV

#125

Unifying Sparse Attention with Hierarchical Memory for Scalable Long-Context LLM Serving

Efficiency 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Efficiency (Quantization, MoE, Inference) 5.7 5.5/6.3/5.0

Long-context LLM serving is bottlenecked by the cost of attending over ever-growing KV caches. Dynamic sparse attention promises relief by accessing only a small, query-dependent subset of the KV state per decoding step and extending the KV storage to CPU memory. In practice, however, these algorithmic savings rarely translate into end-to-end system-level gains because sparse methods typically operate at different granularities and thus rely on ad hoc, per-algorithm implementations. At the same time, hierarchical KV storage introduces a new systems bottleneck: retrieving fine-grained, irregular KV subsets across the GPU-CPU boundary…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.LG

#126

Quantum Feature Selection with Higher-Order Binary Optimization on Trapped-Ion Hardware

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Evals & Benchmarks 5.7 5.5/6.3/5.0

We present a quantum feature-selection framework based on a higher-order unconstrained binary optimization (HUBO) formulation that explicitly incorporates multivariate dependencies beyond standard quadratic encodings. In contrast to QUBO-based approaches, the proposed model includes one-, two-, and three-body interaction terms derived from mutual-information measures, enabling the objective function to capture feature relevance, pairwise redundancy, and higher-order statistical structure within a unified energy model. To suppress trivial all-selected solutions, we further include structured linear penalties that promote sparsity while preserving informative variables. The resulting HUBO instances are optimized with digitized counterdiabatic quantum…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.LG

#127

Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) 5.7 5.5/6.3/5.0

The Probabilistic Transformer (PT) establishes that the Transformer's self-attention plus its feed-forward block is mathematically equivalent to Mean-Field Variational Inference (MFVI) on a Conditional Random Field (CRF). Under this equivalence the Transformer ceases to be a black-box neural network and becomes a programmable factor graph: graph topology, factor potentials, and the message-passing schedule are all explicit and inspectable primitives that can be engineered. PT was originally developed for natural language and in this report we investigate its potential for time series. We first lift PT into the Spatial-Temporal Probabilistic Transformer…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI.

cs.LG cs.AI

#128

Advancing multi-site emission control: A physics-informed transfer learning framework with mixture of experts for carbon-pollutant synergy

Efficiency 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Efficiency (Quantization, MoE, Inference) 5.7 5.5/6.3/5.0

Municipal solid waste incineration is increasingly central to urban waste management, yet its sustainability benefit depends on controlling carbon emissions and multiple air pollutants under highly heterogeneous operating conditions. Current data-driven models are often accurate within individual plants but are difficult to transfer across facilities, limiting their value for scalable emission-control strategies. Here we show that multi-site emission behaviour can be represented through transferable system-level structures when physical constraints, operating-regime heterogeneity and carbon--pollutant coupling are jointly considered. We develop a physics-informed transfer learning framework built on a carbon--pollutant mixture-of-experts model,…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.LG

#129

Quantamination: Dynamic Quantization Leaks Your Data Across the Batch

Efficiency 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Efficiency (Quantization, MoE, Inference) 5.7 5.5/6.3/5.0

Dynamic quantization emerged as a practical approach to increase the utilization and efficiency of the machine learning serving flow. Unlike static quantization, which applies quantization offline, dynamic quantization operates on tensors at run-time, adapting its parameters to the actual input data. Today's mainstream machine learning frameworks, including ML compilers and inference engines, frequently recommend dynamic quantization as an initial step for optimizing model serving. This is because dynamic quantization can significantly reduce memory usage and computational load, leading to faster token generation and improved model serving efficiency without substantial loss…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.LG

#130

Recipes for Calibration Checks in Safety-Critical Applications

Robotic Autonomy 2026-04-29 arXiv cs.LG (Machine Learning) 5.7 5.5/7.0/4.5

Safety-critical prediction systems, such as autonomous vehicles, weather forecasters, and medical monitors, commonly rely on probabilistic forecasters. These forecasters make predictions about possible future outcomes, and their quality and robustness needs to be validated and certified. Often, only accuracy -- the mean of the predictions -- is evaluated against true outcomes. However, for safety-critical scenarios and decision making under uncertainty, the full distributional properties of the forecasts should be checked: do the observed prediction errors actually follow the forecasted probability distributions? To this end, we introduce a framework for calibration…

cs.LG

#131

STLGT: A Scalable Trace-Based Linear Graph Transformer for Tail Latency Prediction in Microservices

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv cs.AI (Artificial Intelligence) 5.7 5.5/6.3/5.0

Accurate end-to-end tail-latency forecasting is critical for proactive SLO management in microservice systems. However, modeling long-range dependency propagation and non-stationary, bursty workloads while maintaining inference efficiency at scale remains challenging. We present STLGT (Scalable Trace-based Linear Graph Transformer), a per-API predictor that encodes traces as span graphs for multi-step p95 tail-latency forecasting. STLGT uses a structure-aware linear graph Transformer to propagate cross-service dependencies with inference time linear in span graph size, and a decoupled temporal module to capture workload dynamics. Across a personalized education microservice application, DeathStarBench, and Alibaba traces,…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.LG, arXiv cs.AI.

cs.LG cs.AI

#132

Unifying Runtime Monitoring Approaches for Safety-Critical Machine Learning: Application to Vision-Based Landing

Frontier LLMs 2026-04-29 arXiv cs.LG (Machine Learning) 5.7 5.5/7.0/4.5

Runtime monitoring is essential to ensure the safety of ML applications in safety-critical domains. However, current research is fragmented, with independent methods emerging from different communities. In this paper, we propose a unified framework categorising runtime monitoring approaches into three distinct types: Operational Design Domain (ODD) monitoring, which ensures compliance with expected operating conditions; Out-of-Distribution (OOD) monitoring, which rejects inputs that deviate from the training data; and Out-of-Model-Scope (OMS) monitoring, which detects anomalous model behaviour based its internal states or outputs. We demonstrate the benefits of this categorization with a…

cs.LG

#133

SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning

Evaluations & Benchmarks 2026-04-29 arXiv cs.LG (Machine Learning) · arXiv — Evals & Benchmarks 5.7 5.5/6.3/5.0

Federated Split Learning has been identified as an efficient approach to address the computational resource constraints of clients in classical federated learning, while guaranteeing data privacy for distributed model training across data owners. However, it faces some critical challenges when such a training strategy meets large language models (LLMs) for fine-tuning. Such challenges include setting the cutlayer adaptively across different clients to address the data and device heterogeneity issues, which affect the system performance significantly. In addition, efficiently reducing the communication overhead during the fine-tuning procedure is also another challenge.…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.LG.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.LG

#134

HalluCiteChecker: A Lightweight Toolkit for Hallucinated Citation Detection and Verification in the Era of AI Scientists

Frontier LLMs 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) 5.7 5.5/6.3/5.0

We introduce HalluCiteChecker, a toolkit for detecting and verifying hallucinated citations in scientific papers. While AI assistant technologies have transformed the academic writing process, including citation recommendation, they have also led to the emergence of hallucinated citations that do not correspond to any existing work. Such citations not only undermine the credibility of scientific papers but also impose an additional burden on reviewers and authors, who must manually verify their validity during the review process. In this study, we formalize hallucinated citation detection as an NLP task and provide a…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.CL, arXiv cs.AI.

cs.CL cs.AI

#135

From Black-Box Confidence to Measurable Trust in Clinical AI: A Framework for Evidence, Supervision, and Staged Autonomy

Frontier LLMs 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) 5.7 5.5/6.3/5.0

Trust in clinical artificial intelligence (AI) cannot be reduced to model accuracy, fluency of generation, or overall positive user impression. In medicine, trust must be engineered as a measurable system property grounded in evidence, supervision, and operational boundaries of AI autonomy. This article proposes a practical framework for trustworthy clinical AI built around three principles: evidence, supervision, and staged autonomy. Rather than replacing deterministic clinical logic wholesale with end-to-end black-box models, the proposed approach combines a deterministic core, a patient-specific AI assistant for contextual validation, a multi-tier model escalation mechanism,…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.CL, arXiv cs.AI.

cs.CL cs.AI

#136

Tree-of-Text: A Tree-based Prompting Framework for Table-to-Text Generation in the Sports Domain

Frontier LLMs 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv cs.AI (Artificial Intelligence) 5.7 5.5/6.3/5.0

Generating sports game reports from structured tables is a complex table-to-text task that demands both precise data interpretation and fluent narrative generation. Traditional model-based approaches require large, annotated datasets, while prompt-based methods using large language models (LLMs) often struggle with hallucination due to weak table comprehension. To overcome these challenges, we propose Tree-of-Text, a tree-structured prompting framework that guides LLMs through a three-stage generation process: (1) Content Planning, where relevant operations and arguments are selected from the input tables; (2) Operation Execution, which breaks down large tables into manageable sub-tables;…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.CL, arXiv cs.AI.

cs.CL cs.AI

#137

When Hidden States Drift: Can KV Caches Rescue Long-Range Speculative Decoding?

Efficiency 2026-04-29 arXiv cs.CL (Computation & Language) · arXiv — Efficiency (Quantization, MoE, Inference) 5.7 5.5/6.3/5.0

Speculative decoding accelerates LLM inference, but SOTA hidden-state-based drafters suffer from long-range decay: draft accuracy degrades as the speculative step increases. Existing work attributes this decay to train-inference mismatch and proposes test-time training (TTT) as a remedy, yet we observe that long-range decay persists even in TTT-trained drafters. We revisit long-range decay from the perspective of context information preservation. In hidden-state reuse, we argue the target hidden state acts as a biased context compression: it aggregates historical token information according to the attention query at the current position, yielding a…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CL.
Matched topical feeds: Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.CL

#138

A self-evolving agent for explainable diagnosis of DFT-experiment band-gap mismatch

Evaluations & Benchmarks 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv — Evals & Benchmarks 5.7 5.8/6.2/5.0

Standard density functional theory (DFT) routinely misclassifies the electronic ground state of correlated and structurally complex compounds, predicting metallic behaviour for materials that experiments report as semiconductors. Each such mismatch encodes a specific non-ideality -- magnetic ordering, electron correlation, an alternative polymorph, or a defect -- that the calculation excluded, but extracting that signal at scale has remained a manual exercise. Here we introduce XDFT, a closed-loop agent that diagnoses the mismatch automatically: it draws candidate hypotheses from a curated catalogue, executes the corresponding first-principles tests, and updates a global…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.AI.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.AI

#139

When to Vote, When to Rewrite: Disagreement-Guided Strategy Routing for Test-Time Scaling

Evaluations & Benchmarks 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv — Evals & Benchmarks 5.7 5.8/6.2/5.0

Large Reasoning Models (LRMs) achieve strong performance on mathematical reasoning tasks but remain unreliable on challenging instances. Existing test-time scaling methods, such as repeated sampling, self-correction, and tree search, improve performance at the cost of increased computation, yet often exhibit diminishing returns on hard problems. We observe that output disagreement is strongly correlated with instance difficulty and prediction correctness, providing a useful signal for guiding instance-level strategy selection at test time. Based on this insight, we propose a training-free framework that formulates test-time scaling as an instance-level routing problem, rather…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.AI.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.AI

#140

Human-in-the-Loop Benchmarking of Heterogeneous LLMs for Automated Competency Assessment in Secondary Level Mathematics

Evaluations & Benchmarks 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv — Evals & Benchmarks 5.7 5.8/6.2/5.0

As Competency-Based Education (CBE) is gaining traction around the world, the shift from marks-based assessment to qualitative competency mapping is a manual challenge for educators. This paper tackles the bottleneck issue by suggesting a "Human-in-the-Loop" benchmarking framework to assess the effectiveness of multiple LLMs in automating secondary-level mathematics assessment. Based on the Grade 10 Optional Mathematics curriculum in Nepal, we created a multi-dimensional rubric for four topics and four cross-cutting competencies: Comprehension, Knowledge, Operational Fluency, and Behavior and Correlation. The multi-provider ensemble, consisted of open-weight models -- Eagle (Llama 3.1-8B)…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.AI.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.AI

#141

MappingEvolve: LLM-Driven Code Evolution for Technology Mapping

Evaluations & Benchmarks 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv — Evals & Benchmarks 5.7 5.8/6.2/5.0

Technology mapping is a critical yet challenging stage in logic synthesis. While Large Language Models (LLMs) have been applied to generate optimization scripts, their potential for core algorithm enhancement remains untapped. We introduce MappingEvolve, an open-source framework that pioneers the use of LLMs to directly evolve technology mapping code. Our method abstracts the mapping process into distinct optimization operators and employs a hierarchical agent-based architecture, comprising a Planner, Evolver, and Evaluator, to guide the evolutionary search. This structured approach enables strategic and effective code modifications. Experiments show our method significantly…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.AI.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.AI

#142

Preserving Disagreement: Architectural Heterogeneity and Coherence Validation in Multi-Agent Policy Simulation

Research 2026-04-29 arXiv cs.AI (Artificial Intelligence) 5.7 5.8/6.7/4.5

Multi-agent deliberation systems using large language models (LLMs) are increasingly proposed for policy simulation, yet they suffer from artificial consensus: evaluator agents converge on the same option regardless of their assigned value perspectives. We present the AI Council, a three-phase deliberation framework, and conduct 120 deliberations across two policy scenarios to test two interventions. First, architectural heterogeneity (assigning a different 7-9B parameter model to each value perspective) significantly reduces first-choice concentration compared to a homogeneous baseline (child welfare: 70.9% to 46.1%, p < 0.001, r = 0.58; housing: 46.0% to…

cs.AI

#143

DUAL-BLADE: Dual-Path NVMe-Direct KV-Cache Offloading for Edge LLM Inference

Research 2026-04-29 arXiv cs.AI (Artificial Intelligence) 5.7 5.5/6.9/4.5

The increasing deployment of Large Language Model (LLM) inference on edge AI systems demands efficient execution under tight memory budgets. A key challenge arises from Key-Value (KV) caches, which often exceed available device memory. Although NVMe-based offloading offers scalable capacity, existing file-based designs rely heavily on the kernel page cache, leading to cache thrashing, unpredictable latency, and high software overhead under memory pressure. We present DUAL-BLADE, a dual-path KV residency framework that dynamically assigns KV tensors to either a page-cache path or an NVMe-direct path based on runtime memory availability.…

cs.AI

#144

Culturally Aware GenAI Risks for Youth: Perspectives from Youth, Parents, and Teachers in a Non-Western Context

Research 2026-04-29 arXiv cs.AI (Artificial Intelligence) 5.7 5.5/6.9/4.5

Generative AI tools are widely used by youth and have introduced new privacy and safety challenges. While prior research has explored youth's safety in GenAI within western context, it often overlooks the cultural, religious, and social dimensions of technology use that strongly shape youths digital experiences in countries like Saudi Arabia. To address the gap, this study explores children (aged 7 to 17), parents and teachers interactions with GenAI tools and risk perceptions through non-western lens. Through a mixed methods approach, we analyzed 736 Reddit and 1,262 X(Twitter) posts and…

cs.AI

#145

Three-Step Nav: A Hierarchical Global-Local Planner for Zero-Shot Vision-and-Language Navigation

Multimodal 2026-04-29 arXiv cs.RO (Robotics) · arXiv cs.CV (Computer Vision) 5.7 5.8/6.2/5.0

Breakthrough progress in vision-based navigation through unknown environments has been achieved by using multimodal large language models (MLLMs). These models can plan a sequence of motions by evaluating the current view at each time step against the task and goal given to the agent. However, current zero-shot Vision-and-Language Navigation (VLN) agents powered by MLLMs still tend to drift off course, halt prematurely, and achieve low overall success rates. We propose Three-Step Nav to counteract these failures with a three-view protocol: First, "look forward" to extract global landmarks and sketch a…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.RO, arXiv cs.CV.

cs.RO cs.CV

#146

STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation

Robotic Autonomy 2026-04-29 arXiv cs.RO (Robotics) 5.7 5.8/6.7/4.5

Robotic manipulation critically requires reasoning about future spatial-temporal interactions, yet existing VLA policies and world-model-enhanced policies do not fully model action-relevant spatial-temporal interaction structure. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-temporal prediction with action generation. STARRY jointly denoises future spatial-temporal latents and action sequences, and introduces Geometry-Aware Selective Attention Modulation to convert predicted depth and end-effector geometry into token-aligned weights for selective action-attention modulation. On RoboTwin 2.0, STARRY achieves 93.82% / 93.30% average success under Clean and Randomized settings. Real-world experiments further improve average success from 42.5%…

cs.RO

#147

Virtual-reality based patient-specific simulation of spine surgical procedures: A fast, highly automated and high-fidelity system for surgical education and planning

Multimodal 2026-04-29 arXiv cs.CV (Computer Vision) 5.7 5.8/6.6/4.5

Surgical training involves didactic teaching, mentor-led learning, surgical skills laboratories, and direct exposure to surgery; however, increasing clinical pressures have limited operating room (OR) exposure. This work leverages virtual reality (VR) to provide a safe and immersive training environment. Existing VR training is often based on standardized scenarios not tailored to individual clinical cases. This study addresses this limitation using artificial intelligence (AI) based computer vision methods to generate patient-specific simulations from computed tomography (CT) and magnetic resonance imaging (MRI). This study focuses on patient-specific spinal decompression simulation for spinal…

cs.CV

#148

A Multistage Extraction Pipeline for Long Scanned Financial Documents: An Empirical Study in Industrial KYC Workflows

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.7 5.8/6.0/5.0

Structured information extraction from long, multilingual scanned financial documents is a core requirement in industrial KYC and compliance workflows. These documents are typically non machine readable, noisy, and visually heterogeneous. They usually span dozens of pages while containing only sparse task relevant information. Although recent vision-language models achieve strong benchmark performance, directly applying them end to end to full financial reports often leads to unreliable extraction under real world conditions. We present a multistage extraction framework that integrates image preprocessing, multilingual OCR, hybrid page-level retrieval, and compact VLM-based structured extraction.…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#149

Decoupled Prototype Matching with Vision Foundation Models for Few-Shot Industrial Object Detection

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.7 5.8/6.0/5.0

Industrial object detection systems typically rely on large annotated datasets, which are expensive to collect and challenging to maintain in industrial scenarios where the inventory of objects changes frequently. This work addresses the challenge of few-shot object detection in such industrial scenarios, where only a limited number of labeled samples are available for newly introduced objects. We present a detection framework that leverages vision foundation models to recognize objects with minimal supervision. The method constructs class prototypes from a small set of reference samples by extracting feature representations. For a…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#150

Four ways Google Research scientists have been using Empirical Research Assistance

AI for Science 2026-04-29 Google AI Blog 5.7 6.0/6.3/4.5

Data Mining & Modeling

#151

Disclosed government AI use increased by 70% in 2025, per OMB

Government & Defense 2026-04-29 FedScoop 5.7 5.5/7.0/4.5

The Office of Management and Budget’s public tally of governmentwide AI use again grew in 2025 — this time amid the Trump administration’s push to use the technology in the name of efficiency. Per OMB’s recent publication on GitHub , the U.S. government reported about 3,600 AI use cases across agencies, a nearly 70% increase in disclosed applications of the technology from the previous reporting year. As with previous disclosures, the accounting captures pre-deployment uses, pilot projects, those in active…

#152

Star-Fusion: A Multi-modal Transformer Architecture for Discrete Celestial Orientation via Spherical Topology

Multimodal 2026-04-29 arXiv cs.AI (Artificial Intelligence) · arXiv cs.CV (Computer Vision) 5.6 5.5/6.2/5.0

Reliable celestial attitude determination is a critical requirement for autonomous spacecraft navigation, yet traditional "Lost-in-Space" (LIS) algorithms often suffer from high computational overhead and sensitivity to sensor-induced noise. While deep learning has emerged as a promising alternative, standard regression models are often confounded by the non-Euclidean topology of the celestial sphere and by the periodic boundary conditions of Right Ascension (RA) and Declination (Dec). In this paper, we present Star-Fusion, a multi-modal architecture that reformulates orientation estimation as a discrete topological classification task. Our approach leverages spherical K-Means clustering to…

How it was discussed

Cross-listed in 2 arXiv categorical feeds: arXiv cs.AI, arXiv cs.CV.

cs.AI cs.CV

#153

Persona-Based Process Design for Assistive Human-Robot Workplaces for Persons with Disabilities

Government & Defense 2026-04-29 arXiv cs.RO (Robotics) · arXiv — AI, Defense & National Security 5.6 5.5/6.2/5.0

Human-robot interaction is emerging as an important paradigm for integrating persons with disabilities into the workplace. While these systems can enable individuals to work, their design is mostly personalized, hindering widespread use beyond the individual user. The universal design paradigm is a central pillar of inclusive design, describing usability of systems by all. To incorporate universal design into process design for human-robot workplaces expert knowledge is required that is often not available. To simplify process design of human-robot workplaces, we propose a persona-based design approach. First, typical impairments prevalent in…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.RO.
Matched topical feeds: AI, Defense & National Security — wide thematic overlap.

cs.RO

#154

HiPAN: Hierarchical Posture-Adaptive Navigation for Quadruped Robots in Unstructured 3D Environments

Robotic Autonomy 2026-04-29 arXiv cs.RO (Robotics) 5.6 5.5/6.7/4.5

Navigating quadruped robots in unstructured 3D environments poses significant challenges, requiring goal-directed motion, effective exploration to escape from local minima, and posture adaptation to traverse narrow, height-constrained spaces. Conventional approaches employ a sequential mapping-planning pipeline but suffer from accumulated perception errors and high computational overhead, restricting their applicability on resource-constrained platforms. To address these challenges, we propose Hierarchical Posture-Adaptive Navigation (HiPAN), a framework that operates directly on onboard depth images at deployment. HiPAN adopts a hierarchical design: a high-level policy generates strategic navigation commands (planar velocity and body posture), which…

cs.RO

#155

ProcFunc: Function-Oriented Abstractions for Procedural 3D Generation in Python

Post-Training 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Post-training / Alignment 5.6 5.5/6.2/5.0

We introduce ProcFunc, a library for Blender-based procedural 3D generation in Python. ProcFunc provides a library of easy-to-use Python functions, which streamline creating, combining, analyzing, and executing procedural generation code. ProcFunc makes it easy to create large-scale diverse training data, by combinatorial compositions of semantic components. VLMs can use ProcFunc to edit procedural material and geometry code and can create new procedural code with significantly fewer coding errors. Finally, as an example use case, we use ProcFunc to develop a new procedural generator of indoor rooms, which includes a collection…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Post-training / Alignment — wide thematic overlap.

cs.CV

#156

Color-Encoded Illumination for High-Speed Volumetric Scene Reconstruction

Multimodal 2026-04-29 arXiv cs.CV (Computer Vision) 5.6 5.5/6.6/4.5

The task of capturing and rendering 3D dynamic scenes from 2D images has become increasingly popular in recent years. However, most conventional cameras are bandwidth-limited to 30-60 FPS, restricting these methods to static or slowly evolving scenes. While overcoming bandwidth limitations is difficult for general scenes, recent years have seen a flurry of computational imaging methods that yield high-speed videos using conventional cameras for specific applications (e.g., motion capture and particle image velocimetry). However, most of these methods require modifications to a camera's optics or the addition of mechanically moving…

cs.CV

#157

AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.6 5.5/6.0/5.0

Recent advances in 4D content generation have attracted increasing attention, yet creating high-quality animated 3D models remains challenging due to the complexity of modeling spatio-temporal distributions and the scarcity of 4D training data. We present AnimateAnyMesh++, a feed-forward framework for text-driven animation of arbitrary 3D meshes with substantial upgrades in data, architecture, and generative capability. First, we expand the DyMesh-XL dataset by mining dynamic content from Objaverse-XL, increasing the number of unique identities from 60K to 300K and substantially broadening category and motion diversity. Second, we redesign DyMeshVAE-Flex with power-law…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#158

SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset

Multimodal 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Generative Media / Diffusion 5.6 5.5/6.2/5.0

Synthesizing a target concept from a single reference image is challenging in diffusion-based personalized text-to-image generation, particularly for sticker personalization where prompts often require explicit attribute edits. With only one reference, test-time fine-tuning (TTF) methods tend to overfit, producing \textit{visual entanglement}, where background artifacts are absorbed into the learned concept, and \textit{structural rigidity}, where the model memorizes reference-specific spatial configurations and loses contextual controllability. To address these issues, we introduce \textbf{SE}mantic-aware single-image sticker person\textbf{AL}ization (\textbf{SEAL}), a plug-and-play, architecture-agnostic adaptation module that integrates into existing personalization pipelines without modifying their U-Net-based…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Generative Media / Diffusion — wide thematic overlap.

cs.CV

#159

Bridge: Basis-Driven Causal Inference Marries VFMs for Domain Generalization

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.6 5.5/6.0/5.0

Detectors often suffer from degraded performance, primarily due to the distributional gap between the source and target domains. This issue is especially evident in single-source domains with limited data, as models tend to rely on confounders (e.g., illumination, co-occurrence, and style) from the source domain, leading to spurious correlations that hinder generalization. To this end, this paper proposes a novel Basis-driven framework for domain generalization, namely \textbf{\textit{Bridge}}, that incorporates causal inference into object detection. By learning the low-rank bases for front-door adjustment, \textbf{\textit{Bridge}} blocks confounders' effects to mitigate spurious correlations,…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#160

MesonGS++: Post-training Compression of 3D Gaussian Splatting with Hyperparameter Searching

Efficiency 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Efficiency (Quantization, MoE, Inference) 5.6 5.5/6.2/5.0

3D Gaussian Splatting (3DGS) achieves high-quality novel view synthesis with real-time rendering, but its storage cost remains prohibitive for practical deployment. Existing post-training compression methods still rely on many coupled hyperparameters across pruning, transformation, quantization, and entropy coding, making it difficult to control the final compressed size and fully exploit the rate-distortion trade-off. We propose MesonGS++, a size-aware post-training codec for 3D Gaussian compression. On the codec side, MesonGS++ combines joint importance-based pruning, octree geometry coding, attribute transformation, selective vector quantization for higher-degree spherical harmonics, and group-wise mixed-precision quantization with…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Efficiency (Quantization, MoE, Inference) — wide thematic overlap.

cs.CV

#161

Hearing the Room Through the Shape of the Drum: Modal-Guided Sound Recovery from Multi-Point Surface Vibrations

Multimodal 2026-04-29 arXiv cs.CV (Computer Vision) 5.6 5.5/6.6/4.5

Optical vibration sensing enables recovering the scene sound directly from the surface vibration of nearby objects, turning everyday objects into ``visual microphones''. However, most prior methods had focused on capturing the vibrations of specific objects with highly favorable vibration responses. These include objects where the surface vibrations are generated by the object itself (e.g., speaker membrane or guitar body) or objects consisting of a thin membrane which is highly reactive to sound (e.g., a chip bag or the leaf of a plant). In this paper, we tackle sound recovery for…

cs.CV

#162

FunFace: Feature Utility and Norm Estimation for Face Recognition

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.6 5.5/6.0/5.0

Face Recognition (FR) is used in a variety of application domains, from entertainment and banking to security and surveillance. Such applications rely on the FR model to be robust and perform well in a variety of settings. To achieve this, state-of-the-art FR models typically use expressive adaptive margin loss functions, which tie the feature norm to concepts related to sample quality, such as recognizability and perceptual image quality. Recently, through the development of Face Image Quality Assessment (FIQA) techniques, biometric utility has become the preferred measure of face-image quality and…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#163

GIFGuard: Proactive Forensics against Deepfakes in Facial GIFs via Spatiotemporal Watermarking

Evaluations & Benchmarks 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Evals & Benchmarks 5.6 5.5/6.0/5.0

The rapid evolution of deepfake technology poses an unprecedented threat to the authenticity of Graphics Interchange Format (GIF) imagery, which serves as a representative of short-loop temporal media in social networks. However, existing proactive forensics works are designed for static images, which limits their applicability to animated GIFs. To bridge this gap, we propose GIFGuard, the first spatiotemporal watermarking framework tailored for deepfake proactive forensics in GIFs. In the embedding stage, we propose the Spatiotemporal Adaptive Residual Encoder (STARE) to ensure robustness against high-level semantic tampering. It employs a 3D…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Evals & Benchmarks — wide thematic overlap.

cs.CV

#164

Delta Score Matters! Spatial Adaptive Multi Guidance in Diffusion Models

Multimodal 2026-04-29 arXiv cs.CV (Computer Vision) · arXiv — Generative Media / Diffusion 5.6 5.5/6.2/5.0

Diffusion models have achieved remarkable success in synthesizing complex static and temporal visuals, a breakthrough largely driven by Classifier-Free Guidance (CFG). However, despite its pivotal role in aligning generated content with textual prompts, standard CFG relies on a globally uniform scalar. This homogeneous amplification traps models in a well-documented "detail-artifact dilemma": low guidance scales fail to inject intricate semantics, while high scales inevitably cause structural degradation, color over-saturation, and temporal inconsistencies in videos. In this paper, we expose the physical root of this flaw through the lens of differential geometry.…

How it was discussed

Cross-listed in 1 arXiv categorical feeds: arXiv cs.CV.
Matched topical feeds: Generative Media / Diffusion — wide thematic overlap.

cs.CV

#165

Adaptive Transform Coding for Semantic Compression

Multimodal 2026-04-29 arXiv cs.CV (Computer Vision) 5.6 5.5/6.8/4.5

Visual data compression is shifting from human-centered reconstruction to machine-oriented representation coding. In this setting, an image is often mapped to a compact semantic embedding, which is then compressed and transmitted for downstream inference. We propose an adaptive transform-coding method for semantic-feature compression motivated by the conditional rate-distortion function of a Gaussian mixture model. The scheme uses mode-dependent transforms and quantizers selected according to the inferred source component, enabling more efficient coding of heterogeneous feature distributions. Evaluations on features from widely used vision backbones and foundation models show that the…

cs.CV

#166

Are Data Augmentation and Segmentation Always Necessary? Insights from COVID-19 X-Rays and a Methodology Thereof

Multimodal 2026-04-29 arXiv cs.CV (Computer Vision) 5.6 5.5/6.6/4.5

Purpose: Rapid and reliable diagnostic tools are crucial for managing respiratory diseases like COVID-19, where chest X-ray analysis coupled with artificial intelligence techniques has proven invaluable. However, most existing works on X-ray images have not considered lung segmentation, raising concerns about their reliability. Additionally, some have employed disproportionate and impractical augmentation techniques, making models less generalized and prone to overfitting. This study presents a critical analysis of both issues and proposes a methodology (SDL-COVID) for more reliable classification of chest X-rays for COVID-19 detection. Methods: We use class activation mapping…

cs.CV

#167

The Download: storing nuclear waste and orchestrating agents

Safety, Policy & Regulation 2026-04-29 MIT Tech Review 5.6 6.0/6.2/4.5

This is today’s edition of The Download , our weekday newsletter that provides a daily dose of what’s going on in the world of technology. It’s time to make a plan for nuclear waste Today, nuclear energy enjoys rare support across the political spectrum. Public approval has spiked, and Big Tech is throwing money around to meet rising electricity demand. That newfound interest is exactly why it’s time to talk about an old problem: nuclear waste. In the US, nuclear…

#168

‘America’s seed fund’ is being revamped for modern warfare

Government & Defense 2026-04-29 DefenseScoop 5.6 5.5/6.7/4.5

The U.S. government is modernizing its Small Business Innovation Research and Small Business Technology Transfer (SBIR/STTR) programs to get after contemporary warfare and national security gaps, senior officials involved in the work said on Wednesday. Referred to collectively as “America’s seed fund,” that decades-old pair of federal programs provides technology-focused small businesses and startups with early-stage investments and support to commercialize their products, and ultimately field them for use by federal agencies and the military. “I think what you’re going…

#169

House bill wants CIOs, agency heads to hit the gas on legacy IT phase-outs

Government & Defense 2026-04-29 FedScoop 5.6 5.5/6.6/4.5

Agencies would be pushed to pick up the pace on the elimination of legacy IT systems under a new bill from a bipartisan group of House lawmakers. The Legacy IT Reduction Act of 2026 ( H.R.8408 ) from Reps. Maxwell Frost, D-Fla., William Timmons, R-S.C., Eric Burlison, R-Mo., and Byron Donalds, R-Fla., would require agency chief information officers to lead the charge on lessening the federal government’s reliance on and expenditures for aging systems. The first step in that reduction…

#170

Human Geography: The Strategic Edge in a Complex World

Government & Defense 2026-04-29 War on the Rocks 5.6 5.5/6.5/4.5

In 2024, Judd Devermont wrote, “Human Geography Is Mission-Critical,” where he argued that the United States should focus on behaviors and attitudes informed by human geography to craft better strategy. Two years later, we asked Judd to revisit his arguments. Image: Samuel Lamptey via Wikimedia CommonsIn your 2024 article, you argued that the United States needed to focus its attention on behaviors and attitudes informed by human geography to craft strategy that adequately navigates a more complex world and threat…

#171

Aligning the U.S. and Canadian Defense Industrial Bases

Government & Defense 2026-04-29 War on the Rocks 5.6 5.5/6.5/4.5

The United States and Canada are both racing to rebuild their defense industrial bases, recognizing that future conflicts will be determined not only by military capability, but by the ability to produce at scale. But they cannot succeed alone — and importantly, they do not need to start from scratch.After decades of reliance on globalized supply chains for everything from consumer products to critical defense technologies, the United States is reasserting a more active industrial policy, using tools ranging from…

#172

Multiple Additive Neural Networks for Structured and Unstructured Data