Design and analysis of efficient algorithms had been recognized as the common fabric of computer science for at least six decades. It has shaped and defined many important subareas of computer science that may not appear algorithmic but have underlying algorithmic layers. The primary motivation is to understand the computational complexity and relative hardness of various problems in specific computational frameworks like RAM, parallel, distributed and streaming. The class of intractable problems (those with super-polynomial running time) has led to many interesting paradigms like approximation, fixed-parameter tractable that go beyond the traditional worst case complexity measures. While intractability is a challenge, it has been cleverly leveraged for developing many secure cryptographic protocols like RSA, where the asymmetry of hardness of ciphering/deciphering is the key.
Given that much of the recent advances across all fields are fuelled by computational methods, that also includes AI, ML and Deep-learning, a basic understanding of algorithms and complexity is an essential know-how for all researchers, and not just theoreticians. The problems arise from diverse fields like graphs, geometry, algebraic, numerical, number-theoretic and it is quite challenging to explore the connection between problems from such diverse domains. Use of random sampling and randomization has been spectacularly successful in this endeavor as also the critical question about efficient derandomization. Pursuing research in this field requires a deep understanding of algorithmic techniques, mathematical maturity and a zest for problem solving.
Approximation Algorithms/Geometry/Randomization
Dynamic data structures
Alternate computation models – Memory hierarchy/Parallel/Streaming/Distribution sensitive
Multi-party computation
We are especially interested in cases where things go wrong in MPC protocols, and how accountability can be achieved in such scenarios. This interest extends also to things like abuse in secure messaging protocols: again, with a focus on how to design trapdoors that reveal identities or provide some sort of disincentive in cases where illegal or abusive actions (or content) have been found to exist. A parallel question is that of producing a verifiable proof of correct computation in cases where no such “bad” actions occurred.
One theoretical focus is to find specific computational problems that are MPC- or TEE-hard in practice, such as functions requiring many memory lookups.
Post-quantum Cryptography
Lattice-based cryptography is the use of conjectured hard problems on point lattices in $\mathbb{R}^{n}$ as the foundation for secure cryptographic systems. Attractive features of lattice-based cryptography include apparent resistance to quantum attacks, security under worst-case intractability assumptions, high asymptotic efficiency and parallelism, and solutions to long-standing open problems in cryptography. Recent trends, such as the NIST initiative to standardize post-quantum cryptography, point to large-scale adoption of lattice-based cryptography in the near future. Our current efforts focus on analyzing state-of-the-art approaches for constructing lattice-based public-key primitives such as public-key encryption and signature schemes.
As part of PQC migration, we also focus on exploring various techniques that integrate post-quantum cryptography into well-known network security protocols such as TLS, DNSSec, and IPSec for achieving post-quantum security.
Privacy Enhancing Cryptography/ Electronic voting
This field encompasses advanced cryptographic tools that allow parties to interact effectively toward a specific application goal without disclosing unnecessary private information to each other or to third parties. Our focus is on PEC tools and concepts, such as Mixnets, Anonymous Credentials, Zero-Knowledge Proofs, Blind Signatures, and Private Set Intersection. We delve into the design and analysis of these tools, as well as their applications in areas like electronic voting, anonymous account ownership, anonymous communication, privacy-preserving authentication and transactions, and decentralized identity.
Natural language processing
Though the area of natural language processing has seen huge advancements in recent years, especially in the areas of generative linguistics, the models do suffer from hallucinations, lack of interpretability and causality attribution. This makes them unreliable for practical applications to predictive analysis and decision making. Our research is focused on knowledge guided language processing mechanisms to ensure reliable linguistic outcomes. Our current areas of interest span over computational analysis for food computing, physical and mental health, sustainability and legal content.
Sequential learning
Efficient or optimal sequential learning is critical in many applications including clinical trials, Internet advertising, recommendation systems, etc. In the vanilla form one considers many unknown probability distributions that can be sampled from where each sample from an arm leads to a distribution dependent reward. The aim may be to sample them optimally to maximise the overall expected rewards, or to identify the distribution with the best performance metric quickly while providing performance guarantees. Our work focuses on developing provably efficient algorithms for related problems.
Weather modelling
Accurate rainfall prediction in India during monsoons is crucial for a variety of reasons – for agriculture planning, disaster management, day to day transportation planning and so on. Anecdotally, it is well known that International numerical weather predictors (NWP) do not perform well in rainfall prediction for India. It is also conjectured that during monsoons, rainfall data across India has spatial-temporal memory so that information on rainfall early on in neighboring parts may be useful for future rainfall prediction. Moreover, rainfall has been shown to be also affected by a variety of other atmospheric, soil and ocean variables, such as temperature, wind, soil moisture, etc. In our work, we consider daily gridded precipitation data from India Meteorological Department (IMD) and use it to predict rainfall for all of India, one day as well as three days into the future. We also use daily atmospheric and soil data as additional covariates in an attempt to improve our forecasts. We compare our performance with popular operational NWP forecasts and observe that our approach leads to substantial improvement in predictions.
Computational radiology
In collaboration with the radiology group at AIIMS, New Delhi, we investigate improved detection of breast cancer from mammograms in situations where the presentation is obscure and difficult. We also investigate using other auxiliary information to improve the accuracy of detection.
In a one-off problem, we also addressed detection of Covid from chest X-Ray in a hospital setting.
Computer vision and imaging
We look at a variety of computer vision and imaging problem including object detection and recognition, segmentation, regression from satellite images, localisation, image generation and style transfer, and single view reconstruction. We also look at adversarial attacks, robustness and reliability of machine learning.
We are fundamentally interested in acquiring, curating and integrating large amounts of heterogeneous and multi-modal data for applications in diverse domains such as politics, economics, archaeology, biology, climate science, etc. Our approach is two fold.
Analysis and conceptual design of digital public infrastructures
Governments around the world – and India in particular – are trying to build large data registries for effective delivery of a variety of public services. However, these efforts are often undermined due to serious concerns over trust and privacy risks associated with collection and processing of personally identifiable information, and the possibilities of exclusion due to unsafe use cases. While a rich set of special- purpose privacy-preserving techniques exist in computer science, they are unable to provide end-to-end protection in alignment with legal principles in the absence of an overarching operational architecture to ensure purpose limitation and protection against insider attacks.
We investigate the issues in designing an operational architecture for privacy-by-design and safety analysis of use cases.
Private modes of machine learning and aggregation
We focus on settings in which multiple entities collaboratively train a model while ensuring that their data remains decentralised.
Elections and digitalisation
India’s parliamentary election is the largest in the world, with 543 constituencies and well over 1 million voters per constituency on the average, and voting is conducted electronically since 2004. However, there is considerable doubt about the integrity of both the Electronic Voting Machine (EVM) used by the Election Commission of India (ECI) and the procedure for maintaining and updating voters lists.
We analyse the ECI solutions from the points of view of verifiability and compliance with democratic principles. We also investigate
Modeling rare events that can potentially have catastrophic consequences is increasingly important in our hi-tech driven world. Applications are varied and include accidents from autonomous vehicles, failure of large electrical networks, communication systems, earthquakes and so on. These events are typically measured by developing stochastic simulation models of the underlying phenomenon which is then simulated to understand the performance of the stochastic systems. Unfortunately, when rare events are involved, the computational effort required to generate enough rare samples to reliably estimate performance statistics can be massive and computationally prohibitive. Importance sampling involves changing the sampling probability measure so that the underlying rare event is no longer rare, and the output is corrected with a likelihood ratio. Unfortunately, the likelihood ratio can be noisy and only for special case of rare events, primarily involving interaction of random walks, does one know a change of measure under which the resultant output is well behaved. In such cases, the efficiency gained by this new methodology can be dramatic, making rare event simulation a viable technology. However, in most applications the correct change of measure is difficult to identify. In fact identifying such a measure for a wide class of stochastic processes remains a difficult open problem. Fortunately, in many settings including the popular settings of diffusions, one can identify the ideal measure that has perfect zero variance. This involves solving a HJB partial differential equation, a computationally difficult exercise in large dimensions. In our work we developed a methodology to learn a solution to this pde using a deep learning neural network architecture. We further developed algorithms where rare events increase based on a schedule so that we train our neural networks first on lesser rare events, so that the resultant learning is more effective, and then on more rare events. We demonstrate the effectiveness of designed techniques on popular examples of diffusions. We show orders of magnitude of improvement over existing methods. Specifically, we developed deep learning based function approximation methods to efficiently simulate rare events associated with certain diffusions.
Analysis of large-scale media data using state of the art computational methods is an active area of research. Media data is generated and updated on a daily basis, with respect to varied socio-political and geographic context. Following are some of the research questions we could investigate along lines of media analysis:
Political Economy Analysis:
Analysis of Misinformation: