Mostafa ElAraby

Inference-time factuality improvement in LLMs: from layer contrasting to deep-thinking tokens

Paper Review · 30 Apr 2026 - 12 minutes to read.

LLMs hallucinate (Huang et al., 2025). They...

Out-of-Distribution Detection in Vision-Language Models: A Survey

Paper Review · 07 Mar 2026 - 16 minutes to read.

Vision-Language Models (VLMs) like CLIP have dramatically shifted the landscape of visual understanding. Trained on internet-scale image-text pairs, these models demonstrate remarkable zero-shot generalization, describing objects they have never explicitly seen during training. Yet this generalization comes with an underappreciated fragility: when deployed in the real world, VLMs routinely encounter inputs that bear no resemblance to anything...

Reasoning's Razor: When Thinking More Makes Safety Worse

Paper Review · 25 Jan 2026 - 7 minutes to read.

Large Reasoning Models (LRMs) like DeepSeek-R1 and QwQ-32B have become remarkably capable at solving complex problems through extended chain-of-thought. The natural instinct is to apply this power to safety-critical tasks: detecting harmful content, catching hallucinations, flagging policy violations. More reasoning = more accuracy = safer AI, right?

A new paper challenges that intuition head-on. “Reasoning’s Razor”