Ashoka University

Other links:

Other links:

Research – CS

Algorithms
Subareas
Random sampling and randomized techniques

Geometric and graph algorithms

Dynamic data structures

Model centric algorithm design and analysis
Research questions

Design and analysis of efficient algorithms had been recognized as the common fabric of computer science for at least six decades. It has shaped and defined many important subareas of computer science that may not appear algorithmic but have underlying algorithmic layers. The primary motivation is to understand the computational complexity and relative hardness of various problems in specific computational frameworks like RAM, parallel, distributed and streaming. The class of intractable problems (those with super-polynomial running time) has led to many interesting paradigms like approximation, fixed-parameter tractable that go beyond the traditional worst case complexity measures. While intractability is a challenge, it has been cleverly leveraged for developing many secure cryptographic protocols like RSA, where the asymmetry of hardness of ciphering/deciphering is the key.

Given that much of the recent advances across all fields are fuelled by computational methods, that also includes AI, ML and Deep-learning, a basic understanding of algorithms and complexity is an essential know-how for all researchers, and not just theoreticians. The problems arise from diverse fields like graphs, geometry, algebraic, numerical, number-theoretic and it is quite challenging to explore the connection between problems from such diverse domains. Use of random sampling and randomization has been spectacularly successful in this endeavor as also the critical question about efficient derandomization. Pursuing research in this field requires a deep understanding of algorithmic techniques, mathematical maturity and a zest for problem solving.
Representative publications
Approximation Algorithms/Geometry/Randomization

Linear time approximate clustering in any dimension, A. Kumar, Y. Sabharwal and S. Sen, Journal of the ACM. Vol. 57, No.2, Jan 2010, pp. 1 – 32.

Randomized Rounding with Applications, Dhiraj Madan and Sandeep Sen CoRR abs/1507.08501 (2015)

A Simple Linear Time Algorithm for Computing a (2k − 1)-Spanner of O(n1+1/k) Size in Weighted Graphs, S. Baswana and Sandeep Sen, Random Structures and Algorithms, Vol 30, 2007, Pages 532 – 563.

A unified approach to tail estimates for Randomized Incremental Construction, Sandeep Sen, STACS 2019.

Dynamic data structures

Fully dynamic maximal matching in O(log n) update time, S. Baswana and M. Gupta and Sandeep Sen, SIAM Jnl on Computing , 44(1) pp 88 –113.

Fractional Cascading Revisited, Sandeep Sen: J. Algorithms 19(2): 161-172 (1995)

Alternate computation models – Memory hierarchy/Parallel/Streaming/Distribution sensitive

Towards a theory of cache-efficient algorithms Sandeep Sen S. Chatterjee, N. Dumeer, Journal of the ACM, 49(6), Nov 2001, pp 828 –858.

Generalizations of Length Limited Huffman Coding for Hierarchical Memory Settings Shashwat Banchhor, Rishikesh R. Gajjala, Yogish Sabharwal and Sandeep Sen, FSTTCS 2021, 8:1 – 23

Distribution-Sensitive Algorithms, Sandeep Sen, Neelima Gupta, Nord. J. Comput. 6(2): 194- (1999)

On the streaming complexity of fundamental geometric problems, Arijit Bishnu, Arijit Ghosh, Gopinath Mishra, Sandeep Sen, CoRR abs/1803.06875 (2018)
Cryptography
Subareas
Multi Party Computation / Secure Function Evaluation

Verifiable Computation

Security and Privacy of DPIs

Customized Fast Oblivious Execution of common algorithms

Public Key Cryptography, Network Security

Post-quantum Lattice-based Cryptography

Privacy Enhancing Technologies

Electronic Voting
Research questions and representative publications
Multi-party computation

We are especially interested in cases where things go wrong in MPC protocols, and how accountability can be achieved in such scenarios. This interest extends also to things like abuse in secure messaging protocols: again, with a focus on how to design trapdoors that reveal identities or provide some sort of disincentive in cases where illegal or abusive actions (or content) have been found to exist. A parallel question is that of producing a verifiable proof of correct computation in cases where no such “bad” actions occurred.

One theoretical focus is to find specific computational problems that are MPC- or TEE-hard in practice, such as functions requiring many memory lookups.

Frigate: A Validated, Extensible, and Efficient Compiler and Interpreter for Secure Computation. D Gupta, B Mood, H Carter, K Butler, P Traynor, IEEE European Symposium on Security and Privacy, 2016. https://ieeexplore.ieee.org/abstract/document/7467350

Using Intel Software Guard Extensions for Efficient Two-Party Secure Function Evaluation. D Gupta, B Mood, J Feigenbaum, K Butler, P Traynor, Financial Cryptography and Data Security, 2016. https://link.springer.com/chapter/10.1007/978-3-662-53357-4_20

Trenchcoat: Human-Computable Hashing Algorithms for Password Generation. RH Rooparaghunath, TS Harikrishnan, D Gupta. International Conference on Cryptology and Network Security, 2020. https://arxiv.org/abs/2310.12706

S++: A Fast and Deployable Secure-Computation Framework for Privacy-Preserving Neural Network Training. P Ramachandran, S Agarwal, A Mondal, A Shah, D Gupta. The Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021. https://arxiv.org/pdf/2101.12078

FLATEE: Federated Learning Across Trusted Execution Environments. A Mondal, Y More, RH Rooparaghunath, D Gupta. IEEE European Symposium on Security and Privacy, 2021. https://arxiv.org/pdf/2111.06867

Post-quantum Cryptography

Lattice-based cryptography is the use of conjectured hard problems on point lattices in $\mathbb{R}^{n}$ as the foundation for secure cryptographic systems. Attractive features of lattice-based cryptography include apparent resistance to quantum attacks, security under worst-case intractability assumptions, high asymptotic efficiency and parallelism, and solutions to long-standing open problems in cryptography. Recent trends, such as the NIST initiative to standardize post-quantum cryptography, point to large-scale adoption of lattice-based cryptography in the near future. Our current efforts focus on analyzing state-of-the-art approaches for constructing lattice-based public-key primitives such as public-key encryption and signature schemes.

As part of PQC migration, we also focus on exploring various techniques that integrate post-quantum cryptography into well-known network security protocols such as TLS, DNSSec, and IPSec for achieving post-quantum security.

Exploiting Determinism in Lattice-based Signatures: Practical Fault Attacks on pqm4 Implementations of {NIST} Candidates. Prasanna Ravi and Mahabir Prasad Jhanwar and James Howe and Anupam Chattopadhyay and Shivam Bhasin. 2019 ACM Asia Conference on Computer and Communications Security (AsiaCCS). Pages 427–440. https://dl.acm.org/doi/10.1145/3321705.332982

Post-Quantum DNSSEC over UDP via QNAME-Based Fragmentation. Aditya Singh Rawat, Mahabir Prasad Jhanwar. Security, Privacy, and Applied Cryptography Engineering (SPACE) 2023. Lecture Notes in Computer Science, vol 14412. [Recipient of Best Paper Award] https://link.springer.com/chapter/10.1007/978-3-031-51583-5_4

Privacy Enhancing Cryptography/ Electronic voting

This field encompasses advanced cryptographic tools that allow parties to interact effectively toward a specific application goal without disclosing unnecessary private information to each other or to third parties. Our focus is on PEC tools and concepts, such as Mixnets, Anonymous Credentials, Zero-Knowledge Proofs, Blind Signatures, and Private Set Intersection. We delve into the design and analysis of these tools, as well as their applications in areas like electronic voting, anonymous account ownership, anonymous communication, privacy-preserving authentication and transactions, and decentralized identity.

Secret Sharing for mNP: Completeness Results. Mahabir Prasad Jhanwar and Kannan Srinathan. Indocrypt 2016. Lecture Notes in Computer Science, Springer, Volume 10095, pages 380-390.https://link.springer.com/chapter/10.1007/978-3-319-49890-4_21

Certificate Transparency Using Blockchain. S. V. Madala and Mahabir Prasad Jhanwar and Anupam Chattopadhyay. 2018 IEEE International Conference on Data Mining Workshops (ICDM). Pages 71–80. https://ieeexplore.ieee.org/document/8637448

PASsword-Based Threshold Authentication with Password Update. Rachit Rawat and Mahabir Prasad Jhanwar. PAS-TA-U: SPACE 2020. Lecture Notes in Computer Science, Springer, Volume 12586, Pages 24–45. https://link.springer.com/chapter/10.1007/978-3-030-66626-2_2

OpenVoting: Recoverability from Failures in Dual Voting. Prashant Agrawal, Kabir Tomer, Abhinav Nakarmi, Mahabir Prasad Jhanwar, Subodh Sharma, Subhashis Banerjee. Electronic Voting. E-Vote-ID 2023. Lecture Notes in Computer Science, vol 14230. https://link.springer.com/chapter/10.1007/978-3-031-43756-4_2

Traceable mixnets. Prashant Agrawal, Abhinav Nakarmi, Mahabir Prasad Jhanwar, Subodh Vishnu Sharma, and Subhashis Banerjee. Proc. Priv. Enhancing Technol. 2024(2): 235-275 (2024). https://petsymposium.org/popets/2024/popets-2024-0049.pdf

Publicly auditable privacy-preserving electoral rolls. Prashant Agrawal, Mahabir Prasad Jhanwar, Subodh Vishnu Sharma, Subhashis Banerjee. IEEE CSF 2024 https://www.computer.org/csdl/proceedings-article/csf/2024/620300a619/1XTFnDFG3Xa, https://arxiv.org/abs/2402.11582
Artificial Intelligence, Machine Learning and Learning Theory
Subareas
Natural Language Processing, Knowledge Representation and Management, Causal reasoning

Sequential learning including multi-armed bandits, A/B testing

Weather modelling using AI/ML techniques

Computational radiology

Computer vision and imaging
Research questions and representative publications
Natural language processing

Though the area of natural language processing has seen huge advancements in recent years, especially in the areas of generative linguistics, the models do suffer from hallucinations, lack of interpretability and causality attribution. This makes them unreliable for practical applications to predictive analysis and decision making. Our research is focused on knowledge guided language processing mechanisms to ensure reliable linguistic outcomes. Our current areas of interest span over computational analysis for food computing, physical and mental health, sustainability and legal content.

Dey, L. (2024). Knowledge graph‐driven data processing for business intelligence. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(3), e1529.

SK Gupta, L Dey, PP Das, R Jain (2024) Building FKG. in: a Knowledge Graph for Indian Food; presented at IFOW, University of Twente.

Sachin Thukral, Suyash Sangwan, Vipul Chauhan, Arnab Chatterjee, Lipika Dey (2023) Generating insights about financial asks from Reddit posts and user interactions; ASONAM ’23: Proceedings of the International Conference on Advances in Social Networks Analysis and Mining.

Dey, Lipika, Sudeshna Jana, Tirthankar Dasgupta, and Tanay Gupta. Deciphering Clinical Narratives-Augmented Intelligence for Decision Making in Healthcare Sector In 2023 18th Conference on Computer Science and Intelligence Systems (FedCSIS), pp. 11-24. IEEE, 2023.

Sequential learning

Efficient or optimal sequential learning is critical in many applications including clinical trials, Internet advertising, recommendation systems, etc. In the vanilla form one considers many unknown probability distributions that can be sampled from where each sample from an arm leads to a distribution dependent reward. The aim may be to sample them optimally to maximise the overall expected rewards, or to identify the distribution with the best performance metric quickly while providing performance guarantees. Our work focuses on developing provably efficient algorithms for related problems.

Deep, V., Bassamboo, A. and Juneja, S. Asymptotically Optimal and Computationally Efficient Average Treatment Effect Estimation in A/B testing. To appear in ICML 2024.

Bhattacharjee, A. and Juneja, S. 2024. Selecting the Safest Design in Rare Event Settings. To appear in IEEE 2024 Winter Simulation Conference (WSC).

Bandyopadhyay, A., Juneja, S. and Agrawal, S., 2024. Optimal Top-Two Method for Best Arm Identification and Fluid Analysis. arXiv preprint arXiv:2403.09123.

Weather modelling

Accurate rainfall prediction in India during monsoons is crucial for a variety of reasons – for agriculture planning, disaster management, day to day transportation planning and so on. Anecdotally, it is well known that International numerical weather predictors (NWP) do not perform well in rainfall prediction for India. It is also conjectured that during monsoons, rainfall data across India has spatial-temporal memory so that information on rainfall early on in neighboring parts may be useful for future rainfall prediction. Moreover, rainfall has been shown to be also affected by a variety of other atmospheric, soil and ocean variables, such as temperature, wind, soil moisture, etc. In our work, we consider daily gridded precipitation data from India Meteorological Department (IMD) and use it to predict rainfall for all of India, one day as well as three days into the future. We also use daily atmospheric and soil data as additional covariates in an attempt to improve our forecasts. We compare our performance with popular operational NWP forecasts and observe that our approach leads to substantial improvement in predictions.

Narula, A., Jain, A., Batra, J. and Juneja, S., 2024. Comparing skill of historical rainfall data based monsoon rainfall prediction in India with NCEP-NWP forecasts. arXiv preprint arXiv:2402.07851.

Computational radiology

In collaboration with the radiology group at AIIMS, New Delhi, we investigate improved detection of breast cancer from mammograms in situations where the presentation is obscure and difficult. We also investigate using other auxiliary information to improve the accuracy of detection.

In a one-off problem, we also addressed detection of Covid from chest X-Ray in a hospital setting.

Simulation training in mammography with AI-generated images: A Multi-reader Study. Krithika Rangarajan, Veeramakali Vignesh Manivannan, Harpinder Singh, Amit Gupta, Hrithik Maheshwari, Rishparn Gogoi, Debashish Gogoi, Rupam Jyoti Das, Smriti Hari, Surabhi Vyas, Raju Sharma, Shivam Pandey, V Seenu, Subhashis Banerjee, Vinay Namboodiri, Chetan Arora. European Radiology. August 2024. (https://doi.org/10.1007/s00330-024-11005-x)

Reproducibility and Explainability of Deep Learning in Mammography: A Systematic Review of Literature. Deeksha Bhalla, Krithika Rangarajan, Tany Chandra, Subhashis Banerjee, Chetan Arora. The Indian Journal of Radiology and Imaging. October 2023. (10.1055/s-0043-1775737)

Deep learning for detection of iso-dense, obscure masses in mammographically dense breasts. Krithika Rangarajan, P Aggarwal, D K Gupta, Rohan Dhanakshirur, Akhil Baby, Chandan Paul, A K Gupta, Smriti Hari, Subhashis Banerjee, Chetan Arora. European Radiology. June 2023. (https://doi.org/10.1007/s00330-023-09717-7)

Building and evaluating an artificial intelligence algorithm: A practical guide for practicing oncologists. Anupama Ramachandran, Deeksha Bhalla, Krithika Rangarajan, Raja Pramanik, Subhashis Banerjee, Chetan Arora. Artif Intell Cancer. Jul 28, 2022; 3(3): 42-53. (https://doi.org/10.35713/aic.v3.i3.42)

Ultra-high resolution, multi-scale, context-aware approach for small cancers on Mammography. Krithika Rangarajan, Aman Gupta, Saptarshi Dasgupta, Uday Marri, Arun Kumar Gupta, Smriti Hari, Subhashis Banerjee, Chetan Arora. Scientific Reports 12, 11622 (2022). (https://doi.org/10.1038/s41598-022-15259-7)

Basic principles of AI simplified for a Medical Practitioner: Pearls and Pitfalls in Evaluating AI algorithms. Deeksha Bhalla, Anupama Ramachandran, Krithika Rangarajan, Rohan Dhanakshirur, Subhashis Banerjee, Chetan Arora. Current Problems in Diagnostic Radiology, 2022. ISSN 0363-0188. (https://doi.org/10.1067/j.cpradiol.2022.04.003)

Multi-modal cancer detection on mammography using patient history and multiple view. Krithika Rangarajan, Rohan Dhanakshirur, Arun Gupta ( AIIMS), Smita Manshanda, Subhashis Banerjee, Chetan Arora. 2021 (working paper)

Artificial Intelligence-Assisted Chest X-Ray Assessment Scheme for COVID-19. Krithika Rangarajan, Sumanyu Muku, Amit Kumar Garg, Pavan Gabra, Sujay Halkur Shankar, Neeraj Nischal, Kapil Dev Dev Soni, Ashu Seith Bhalla Seith Bhalla, Anant Mohan, Pawan Tiwari, Sushma Bhatnagar, Raghav Bansal, Atin Kumar, Shivanand Gamanagati, Richa Aggarwal, Upendra Baitha, Ashutosh Biswas, Arvind Kumar, Pankaj Jorwal, . Shalimar, A Shariff, Naveet Wig, Rajeshwari Subramanium, Anjan Trikha, Rajesh Malhotra, Randeep Guleria, Vinay Namboodiri, Subhashis Banerjee, Chetan Arora. European Radiology. January 2021. (pubmed.ncbi.nlm.nih.gov/33471219)

Simulation Training for Diagnostic Radiology: Use of GAN Generated Images for Resident Training in Mammography. Sneha Goswami, Ambikapathi P, Suransh Chopra, Vishwas Lathi, Krithika Rangarajan, Surabhi Vyas, Smriti Hari, Arun Gupta, Vinay Namboodiri, Subhashis Banerjee, Chetan Arora. RSNA 2020.

Computer vision and imaging

We look at a variety of computer vision and imaging problem including object detection and recognition, segmentation, regression from satellite images, localisation, image generation and style transfer, and single view reconstruction. We also look at adversarial attacks, robustness and reliability of machine learning.

SimSAM: Simple Siamese Representation-Based Semantic Affinity Matrix for unsupervised image segmentation. Kamra, Chanda Grover, Indra Deep Mastan, Nitin Kumar, Shiv Nadar University, Debayan Gupta. In 2024 IEEE International Conference on Image Processing (ICIP), IEEE, 2024. https://arxiv.org/pdf/2406.07986

SEM-CS: Semantic Clipstyler for Text-Based Image Style Transfer. Kamra, Chanda Grover, Indra Deep Mastan, and Debayan Gupta. In 2023 IEEE International Conference on Image Processing (ICIP), pp. 395-399. IEEE, 2023. https://arxiv.org/pdf/2303.06334

Using satellites and artificial intelligence to measure health and material-living standards in India. Adel Daoud, Felipe Jordan, Makkunda Sharma, Sourabh Bikash Paul, Fredrik Johansson, Devdatt Dubhashi, Subhashis Banerjee. Social Indicators Research. April 29, 2023. (https://doi.org/10.1007/s11205-023-03112-x)

Contextclip: Contextual alignment of image-text pairs on clip visual representations. Grover, Chanda, Indra Deep Mastan, and Debayan Gupta. In Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 1-10. 2022. https://arxiv.org/abs/2211.07122

REGroup: Rank-aggregating Ensemble of Generative Classifiers for Robust Predictions. Lokender Tiwari, Anish Madan, Saket Anand, Subhashis Banerjee. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022.(https://openaccess.thecvf.com/content/WACV2022/papers/Tiwari_REGroup_Rank-Aggregating_Ensemble_of_Generative_Classifiers_for_Robust_Predictions_WACV_2022_paper.pdf)

WALDO Finds You using Machine Learning: Wireless Adaptive Location and Detection of Objects. Singh, Aditya, Pratyush Kumar, Vedansh Priyadarshi, Yash More, Aishwarya Das, and Debayan Gupta. In 2021 IEEE Radar Conference (RadarConf21), pp. 1-6. IEEE, 2021. https://ieeexplore.ieee.org/abstract/document/9455315
Data Science and Management
Subareas
Knowledge-graph construction, curation and applications for diverse domains

Large-scale, heterogeneous, multi-modal data integration for data science applications
Research questions and representative publications
We are fundamentally interested in acquiring, curating and integrating large amounts of heterogeneous and multi-modal data for applications in diverse domains such as politics, economics, archaeology, biology, climate science, etc. Our approach is two fold.

First, we look into the problem of knowledge-graph construction over a given domain. This problem involves automatically identifying and extracting informative, domain-specific entities and the relationships among them from a corpus of data belonging to that domain. The corpus could consist of textual articles, structured, tabular data, images, etc. Constructing such a knowledge-base maps all the interesting content that is specific to that domain in a systematic way and helps the user ask interesting questions about the domain.

Second, we look into the problem of data integration over multiple modalities and (potentially) domains. The motivation for this problem is that there are already very useful and structured datasets available, but it is not possible to query across these datasets in a useful manner since they all reside in their own silos. Therefore, while each of these datasets can be queried individually, it is not possible to connect and query across multiple datasets. Adding to the complexity is that these datasets may have multiple modalities (such as text, images, videos, etc.). Our aim is to find efficient ways to offer integrated views to the user to enable seamless querying of multiple datasets in varying modalities.

Maya Ramanath: Domain-specific Knowledge Graph Construction and Applications. EGC 2022: 11-12 (Keynote talk)
Systems
Digitalisation, privacy and society
Subareas
Analysis and conceptual design of digital public infrastructures

Enabling and modelling private modes of machine learning and aggregation

Elections and digitalisation
Research questions and representative publications
Analysis and conceptual design of digital public infrastructures

Governments around the world – and India in particular – are trying to build large data registries for effective delivery of a variety of public services. However, these efforts are often undermined due to serious concerns over trust and privacy risks associated with collection and processing of personally identifiable information, and the possibilities of exclusion due to unsafe use cases. While a rich set of special- purpose privacy-preserving techniques exist in computer science, they are unable to provide end-to-end protection in alignment with legal principles in the absence of an overarching operational architecture to ensure purpose limitation and protection against insider attacks.

We investigate the issues in designing an operational architecture for privacy-by-design and safety analysis of use cases.

On health data architecture design. Prashant Agrawal, Subodh Sharma, Ambuj Sagar, Subhashis Banerjee. Book chapter in Private and controversial: When public health and privacy meet in India. Edited by Smriti Parsheera. Harper Collins. January, 2023. (https://www.cse.iitd.ac.in/~suban/reports/ndhm2.pdf)

India’s “Aadhaar” Biometric ID: Structure, Security, and Vulnerabilities, Tiwari, Pratyush Ranjan, Dhruv Agarwal, Prakhar Jain, Swagam Dasgupta, Preetha Datta, Vineet Reddy, and Debayan Gupta. In International Conference on Financial Cryptography and Data Security, pp. 672-693. Cham: Springer International Publishing, 2022. https://eprint.iacr.org/2022/481

Elements of an accountability-based framework for privacy protection. Prashant Agrawal, Anubhutie Singh, Malavika Raghavan, Subodh Sharma, Subhashis Banerjee. 2021. (working paper)

An operational architecture for privacy-by-design in large public service applications. Prashant Agrawal, Subodh Sharma, Subhashis Banerjee. 2020 (working paper)

Privacy concerns with Aadhaar. Subhashis Banerjee, Subodh Sharma. Commun. ACM 62(11): 80, 2019.

Privacy and Security of Aadhaar: A Computer Science Perspective. Shweta Agrawal, Subhashis Banerjee and Subodh Sharma. Economic and Political Weekly, September 2017.

Private modes of machine learning and aggregation

We focus on settings in which multiple entities collaboratively train a model while ensuring that their data remains decentralised.

BEAS: Blockchain enabled asynchronous and secure federated machine learning. Mondal, Arup, Harpreet Virk, and Debayan Gupta. In The Third AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-22), at the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22). https://aaai-ppai22.github.io/files/20.pdf

SCOTCH: an efficient secure computation framework for secure aggregation. More, Yash, Prashanthi Ramachandran, Priyam Panda, Arup Mondal, Harpreet Virk, and Debayan Gupta. In The Third AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI-22), at the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22). https://arxiv.org/pdf/2201.07730

Elections and digitalisation

India’s parliamentary election is the largest in the world, with 543 constituencies and well over 1 million voters per constituency on the average, and voting is conducted electronically since 2004. However, there is considerable doubt about the integrity of both the Electronic Voting Machine (EVM) used by the Election Commission of India (ECI) and the procedure for maintaining and updating voters lists.

We analyse the ECI solutions from the points of view of verifiability and compliance with democratic principles. We also investigate

the possibility of end-to-end verifiable electronic voting that can support voter-verified paper audit trails in a closely coupled manner

the issues with security, integrity and privacy of electoral rolls, and secure eligibility verification during voting.

Publicly auditable privacy-preserving electoral rolls. Prashant Agrawal, Mahabir Prasad Jhanwar, Subodh Sharma, Subhashis Banerjee. IEEE Computer Security Foundations. July, 2024. (https://arxiv.org/abs/2402.11582)

OpenVoting: Recoverability from failures in dual voting. Prashant Agrawal, Kabir Tomer, Abhinav Nakarmi, Mahabir Prasad Jhanwar, Subodh Sharma, Subhashis Banerjee. E- Vote-ID – The International Conference for Electronic Voting, Luxembourg City, October 2023. (https://arxiv.org/abs/1908.09557).

Traceable mixnets. Prashant Agrawal, Abhinav Nakarmi, Mahavir Prasad Jhawar, Subodh Sharma, Subhashis Banerjee. Privacy Enhancing Technology Symposium (PoPETs/PETS), July 2024. (https://arxiv.org/abs/2305.08138).

Citizens’ Commission on Elections’ Report on EVMs and VVPAT. Madan Lokur, Wajahat Habibullah, Hariparanthaman, Arun Kumar, Subhashis Banerjee, Pamela Philipose, John Dayal, Sundar Burra and M. G. Devasahayam. Economic and Political Weekly, Vol. 57, Issue No. 3, 15 Jan, 2022. (https://www.epw.in/journal/2022/3/perspectives/citizens%E2%80%99-commission-elections%E2%80%99-report-evms-and.html)

Blockchain vs public bulletin boards for integrity of elections and electoral rolls. Prashant Agrawal, Subodh Sharma, Subhashis Banerjee. The India Forum, August, 2021. (theindiaforum.in/…/blockchain-vs-public-bulletin-board-integrity-elections-and-electoral-rolls)
Applied Probability and Computation Finance
Research questions and representative publications
Modeling rare events that can potentially have catastrophic consequences is increasingly important in our hi-tech driven world. Applications are varied and include accidents from autonomous vehicles, failure of large electrical networks, communication systems, earthquakes and so on. These events are typically measured by developing stochastic simulation models of the underlying phenomenon which is then simulated to understand the performance of the stochastic systems. Unfortunately, when rare events are involved, the computational effort required to generate enough rare samples to reliably estimate performance statistics can be massive and computationally prohibitive. Importance sampling involves changing the sampling probability measure so that the underlying rare event is no longer rare, and the output is corrected with a likelihood ratio. Unfortunately, the likelihood ratio can be noisy and only for special case of rare events, primarily involving interaction of random walks, does one know a change of measure under which the resultant output is well behaved. In such cases, the efficiency gained by this new methodology can be dramatic, making rare event simulation a viable technology. However, in most applications the correct change of measure is difficult to identify. In fact identifying such a measure for a wide class of stochastic processes remains a difficult open problem. Fortunately, in many settings including the popular settings of diffusions, one can identify the ideal measure that has perfect zero variance. This involves solving a HJB partial differential equation, a computationally difficult exercise in large dimensions. In our work we developed a methodology to learn a solution to this pde using a deep learning neural network architecture. We further developed algorithms where rare events increase based on a schedule so that we train our neural networks first on lesser rare events, so that the resultant learning is more effective, and then on more rare events. We demonstrate the effectiveness of designed techniques on popular examples of diffusions. We show orders of magnitude of improvement over existing methods. Specifically, we developed deep learning based function approximation methods to efficiently simulate rare events associated with certain diffusions.

Hult, H., Jain, A., Juneja, S., Nyquist, P. and Vijayan S. 2024. A Deep Learning Approach for Rare Event Simulation in diffusion processes. To appear in 2024 Winter Simulation Conference (WSC). IEEE

See https://www.tcs.tifr.res.in/~sandeepj/files/pdf/publications.pdf for more details
Computational Social Science
Subareas
Political Economy Analysis using Mass Media Data

Social Network Analysis and Mining (focus on Misinformation, Hate Speech, Fear Speech and Fake News Analysis/Detection)

Computational Climate Analysis
Research questions and representative publications
Analysis of large-scale media data using state of the art computational methods is an active area of research. Media data is generated and updated on a daily basis, with respect to varied socio-political and geographic context. Following are some of the research questions we could investigate along lines of media analysis:

Political Economy Analysis:

MediaGraph: Can we develop a knowledge base (of policy-makers, businesspersons, bureaucrats, government departments, locations, topics of discussion) around various social and political issues discussed in media, which could help capture the nuances of political economy around these issues?

Can we observe temporal evolution of media discourse around social and political issues? For example, how does media stance change over time around certain people, topics of discussion, political events?

Can a fair and balanced news recommendation system be developed, taking into account the context around various nodes in MediaGraph? For instance, a recommendation system that looks at relative co-occurrences of entity groups before making a recommendation.

Analysis of Misinformation:

How do state of the art LLMs perform in terms of detection/recommendation of fake news?

How can user and network characteristics on social media be leveraged to ensure early detection of misinformation and its propagation pattern?

Can we perform a comparative analysis of different multimodal fake news detection methods around Indian elections? Additionally, what are some of the constraints around such methods?

Sen, Anirban, Soham De, and Joyojeet Pal. “Networks and Influencers in Online Propaganda Events: A Comparative Study of Three Cases in India.” Proceedings of the ACM on Human-Computer Interaction 8.CSCW1 (2024): 1-27. https://dl.acm.org/doi/abs/10.1145/3640790

Anirban Sen, Soumyasis Gun, Soham De, and Joyojeet Pal. 2024. On the Influence and Political Leaning of Overlap between Propaganda Communities. ACM J. Comput. Sustain. Soc. Just Accepted (January 2024). https://doi.org/10.1145/3640790

Study at Ashoka

Study at Ashoka