Aswinkumar

AAAI Conference 2024 @ Vancouver

I recently completed my Masters in Electrical and Computer Engineering at the University of Wisconsin-Madison, focusing on Computer Architecture, Robotics, and Machine Learning. During my master’s I interned at AMD in Austin (Aug–Dec 2025), working on a simulation tool to identify the most efficient AMD GPU backend for disaggregated prefill-decode LLM inference, and earlier at Vayu Robotics (May–Aug 2025, since acquired by Serve Robotics), where I cut deep-learning model latency by 4× and reduced robot build time by 3.5×.

Previously, I was a GPU Advocate at NVIDIA, developing End to End AI for Science material that focused on deep learning for scientific applications. Before this, I did three internships at NVIDIA, where I worked on AI for Science (2019), DeepStream & DeepStream Performance Lab (2020) , and Distributed Training (2021), after which I joined my Full-time role in 2022. During my role, I’ve mentored over 24 Research & Enterprise Teams to accelerate their application on GPUs ( via OpenMP, OpenACC, CUDA, CUDA libraries, DeepStream, TAO, TensorRT-LLM, etc..) and have also been an Instructor for over 16 boot camps in the APAC region in the last two years.

I completed my undergraduate at the Indian Institute of Technology Madras, majoring in Engineering Physics with a Minor in Computing. I spent a lot of time at CFI learning and tinkering with Hardware. I did my Bachelor’s Thesis with Prof. Kamakoti Veezhinathan on Accelerating DCT and IDCT Algorithms for energy-efficient video decoding

CV / Google Scholar / Email / Github / LinkedIn

publications

DAI-AAAI

MABViT – Modified Attention Block Enhances Vision Transformers

Aswinkumar* & Mahesh^*

Feb 2024

Abs HTML PDF

Recent studies have demonstrated the effectiveness of Gated Linear Units (GLU) in enhancing transformer models, particularly in Large Language Models (LLMs). Additionally, utilizing a parallel configuration within each Transformer block rather than the conventional serialized method has been revealed to accelerate the training of LLMs without significantly impacting performance. However, when the MLP and attention block were run in parallel for the image classification task, we observed a noticeable decline in performance. We propose a novel transformer variant that integrates non-linearity within the attention block to tackle this problem. We implemented the GLU-based activation function on the Value tensor, and this new technique surpasses the current state-of-the-art S/16 variant of Vision Transformers by 0.6% on the ImageNet-1K dataset while utilizing fewer parameters. It also supersedes the B/16 variant while using only half the parameters. Furthermore, we provide results with the GELU activation function variant to confirm our assertions. Lastly, we showcase that the MABViT variants exhibit greater potential when utilized in deep transformers compared to the standard architecture.