Research Projects

Apache SystemML

Apache SystemML provides declarative large-scale machine learning (ML) that aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations, to distributed computations on Apache Hadoop and Apache Spark.

Datapath system

Data-centric is a purely-push based, research prototype database system. In DataPath, queries do not request data. Instead, data are automatically pushed onto processors, where they are then processed by any interested computation. It has been tested on multi-terabyte benchmark to show this basic design principle makes for a very lean and fast database system.

Online Aggregation for Large MapReduce Jobs

In online aggregation, a database system processes a user’s aggregation query in an online fashion. At all times during processing, the system gives the user an estimate of the final query result, with the confidence bounds that become tighter over time. In this project, we built a system that does online aggregation over MapReduce environment for large-scale data analysis. Given the MapReduce paradigm’s close relationship with cloud computing (in that one might expect a large fraction of MapReduce jobs to be run in the cloud), online aggregation is a very attractive technology. Since large-scale cloud computations are typically pay-as-you-go, a user can monitor the accuracy obtained in an online fashion, and then save money by killing the computation early once sufficient accuracy has been obtained.

Internship Projects

Table Analysis Tools (TAT) for Cloud (Summer 2008 Internship Project@ SQL Server Data Mining)

TAT Cloud is a set of canned data mining tasks that you can use without having SQL Server installed on your machine. It consists of encapsulations of some common data mining problems, such as detecting key influencers, forecasting, generating predictive scorecards or doing market basket analysis. The tasks can be executed from browser as well as Excel 2007 (after installing TAT add-in).

SpokenWeb (Summer 2011 Internship Project @ IBM Research Lab)

Spoken Web is an alternative Web for low-literacy users in the developing world. People can create audio content over phone and share on the Spoken Web. This enables easy creation of locally relevant content.

Embedded Web Server using VxWorks Real Time Operating System (@ ECIL Hyderabad as part of PG Diploma)

Acts as a standalone web server with remote file system. Since it is booted via RS-232 (serial port), it does not require a hard disk.

Sure Serve (Server Monitoring Utility) (BE Final year Project @ Rediff)

Allows the administrator to monitor server performance based on the specified parameters. It comprises of modules (TCP, HTTP, Database and Application) that monitors major functional areas of a commonplace web server. It plots the parameter at real time using Multi Router Traffic Grapher.

Industry Projects

Usage Reporting of Hotmail data (Data warehouse)

Gathers data directly from product teams, transform and load into data warehouse for aggregation, and generates reports for partners.

ERM (Employee Resource Management) website

An Ajax based web application by the means of which MAQSoftware manages its employee timesheet details, project resource allocation and report generation.

Crystal (for Swedish Sleep Institute)

To improve existing 14 legacy applications and migrate them to ASP.NET and SQL Server (so that they are accessible and housed under a single dashboard with a single sign-on).

Pet Projects

yadmt (Yet Another Data Mining Tool)

Tool to find the best classifier for your dataset using statistical tests suggested in Machine Learning literature. The user of this tool does not need to know about inner workings of the classifiers or the statistical tests. The main goal of this tool is efficiency and load balancing. It is designed to work on a single server or on a cluster (that might be shared by multiple users), which is applicable for most research labs. Here is a link to the demo.


Voca is a desktop app that is designed to run in background, with minimal user interaction/interference, and that allows users to issue voice commands. Here is a link to the demo.

For entire list of my projects, visit my linkedin page.