Introducing Ritual Interns

Introduction

This summer, we had the privilege of working with an exceptional group of interns who contributed significantly to Ritual’s mission at the intersection of AI and crypto. From advancing private LLM inference to improving model reasoning capabilities, our interns tackled some of the most challenging problems in the space. Today, we’re excited to share their remarkable achievements and the cutting-edge research they’ve been working on.

Rahul Thomas

I’m Rahul, been at Ritual for the past year, and I recently graduated from Stanford with a BS in math, with plans to complete an MS in CS in the fall. I’ve spent most of my time on Cascade, a project related to third-party private LLM inference. In this work, we devised a novel reconstruction attack that could reverse engineer prompts from permuted hidden states at any layer of an autoregressive LLM, and showed how this broke the security of a few existing private LLM inference schemes. We also developed a statistical method to protect against our attack, based on the sharding of hidden states. I had the opportunity to present our work at ICML in Vancouver a few weeks ago. This summer, I’ve also been working on a project related to speculative sampling, a technique that accelerates autoregressive decoding in LLMs without affecting performance. Speculative sampling uses a cheap draft model to predict the target LLM’s next token, followed by a sampling method that ensures the output matches the target LLM distribution. In the multi-draft setting, many suggested draft tokens can be generated, perhaps even from different draft models. A few recent papers have established theoretical foundations here, framing the problem as an optimal transport linear program (OT-LP) with connections to importance sampling and subset selection. However, not much is understood about the limits of optimal multi-draft sampling, particularly when more than 2 drafts tokens are generated. My project aims to understand the relative performance benefits from increasing the draft count, particularly across varying domains and model sizes. So far, I have been able to reconcile a few theoretical formulations of this problem, and devise a more efficient max-flow algorithm to solve the OT-LP for higher draft counts. I’m currently working on extending this to the multi-step setting, where draft models autoregressively decode multiple tokens at a time. This has the potential to further improve wall clock speedups from existing speculative sampling approaches.

Erica Choi

I’m Erica, currently a grad student at Columbia in computer science with an undergraduate background in math and research experience in topology and graph neural networks, now focused on LLM reasoning and reinforcement learning. This summer, I worked on improving RLVR (Reinforcement Learning with Verifiable Rewards) by integrating knowledge distillation to address RLVR’s tendency to reinforce known solutions rather than encourage discovery. I distilled mathematical reasoning traces from teacher models into student models via LoRA fine-tuning, experimenting with various regularization methods and selectively zeroing out layers and modules. These experiments showed that a small subset of parameters at targeted locations can effectively transfer mathematical reasoning ability and lead to improvements on math pass@k benchmarks. I also conducted comparative experiments on the Qwen, Gemma, and LLaMA model families, observing that LLaMA struggles to learn effectively from chain-of-thought data: a limitation we are actively investigating. Ongoing work explores applying RL within the constrained parameter space to promote novel reasoning.

William Gvozdjak

I’m William, currently a freshman at MIT studying math and computer science. Coming from a background of math, I competed in math olympiads and conducted research through the MIT PRIMES-USA program in high school on the classification of modular categories. This spring and summer, I worked on the privacy implies verifiability project, where we designed a ‘weak’ scheme to sharpen the guarantees we get of computational integrity of privacy-preserving inference by an LLM nearly for free. I worked on developing the strategy as well as designing attacks to reveal potential weaknesses. To verify the effectiveness of the scheme, I ran tests on various datasets, models, and attacks, revealing successes and weaknesses in different settings and allowing us to iterate on and improve the strategy.

Dennis Chen

I’m Dennis, currently a sophomore currently studying mathematics and computer science at Carnegie Mellon primarily interested in mathematical logic & algebra. This past summer, I focused on a line of work to accelerate computational integrity of transformer-based architectures. The key bottleneck in building SNARKs for large computations such as LLMs is committing to the data, which is the core cryptographic operation. An efficient method for committing to data requires it to first be encoded with an error-correcting code. Coincidentally, LLM constructions also make use of error-correcting codes, via their embedding step, in which they place their input into a vector space, according to semantic distance. The goal of my experiment was to test the viability of combining these two encoding steps to speed up the proof of an LLM computation. However, in order to be used with cryptographic schemes, an error-correcting code needs to have high minimum distance, which I emprically tested across a variety of open-weight embedding models in a compact ball of synonym space.

Teo Kitanovski

I’m Teo, currently a CS freshman at Vanderbilt, IOI medialist, bionics startup co-founder (eBionyx), and have published research in constraint satisfaction problems. This spring and summer at Ritual, my work focused on understanding and controlling deference in large language models. Specifically, when and why models change answers to match user feedback, even when the feedback may be misleading. I built custom evaluation pipelines, adapted and extended datasets like GPQA and GSM-Symbolic, and ran experiments to test how different prompts and internal model vectors influence answer changes. We identified a single latent vector that controls deference and successfully used activation steering to modulate this behavior. Currently, we are also exploring various methods of model confidence elicitation, specifically logit-based and sampling-based estimations, to observe the relationship between model confidence and answer persistence. Additional directions include calibration experiments and targeted fine-tuning to improve model reliability. Results are being prepared for publication.

Arthur Liang

Hi, I’m Arthur, currently a rising senior at MIT studying CS and neuroscience and prior to joining, I was researching multi-modal LLMs for protein representation learning. This summer I’ve been working on the research centered around LLMs defending their beliefs. The goal is to understand the relationship between LLM response confidence and its propensity to defend or capitulate on its beliefs when challenged. We hope to suggest approaches to improve the consistency of behavior on these—for example, when the model is very confident, it should ideally defend its beliefs more strongly. So far I’ve worked on implementing different methods for confidence elicitation such as sampling and calibration tuning. I’ve also added evaluations of new datasets to our framework.

Research Contributions

Our interns have made significant contributions to the scientific community through their research. Their work has resulted in several publications and papers that advance the state of the art in AI and crypto:

Cascade: Token-Sharded Private LLM Inference: ICML Main Conference Paper, arXiV Paper, Blog Post
Breaking Permutation Security in LLM Inference: ICML Main Conference Paper, arXiV Paper, Blog Post, GitHub

Looking Ahead

As we continue to push the boundaries of what’s possible at the intersection of AI and crypto, we’re incredibly grateful for the exceptional talent and dedication our interns brought to these challenging problems. Their work will continue to influence Ritual’s research direction and contribute to the broader scientific community.

We’re excited to see how these research directions evolve and look forward to welcoming the next cohort of brilliant minds to join us in building the future of AI infrastructure.

Join Our Team

Interested in contributing to cutting-edge research at the intersection of AI and crypto? We’re hiring for our next intern cohort:

SWE Intern → Research Intern →

Disclaimer: This post is for general information purposes only. It does not constitute investment advice or a recommendation, offer or solicitation to buy or sell any investment and should not be used in the evaluation of the merits of making any investment decision. It should not be relied upon for accounting, legal or tax advice or investment recommendations. The information in this post should not be construed as a promise or guarantee in connection with the release or development of any future products, services or digital assets. This post reflects the current opinions of the authors and is not made on behalf of Ritual or its affiliates and does not necessarily reflect the opinions of Ritual, its affiliates or individuals associated with Ritual. All information in this post is provided without any representation or warranty of any kind. The opinions reflected herein are subject to change without being updated.

RITUAL