Dr. Uzair Javaid

Dr. Uzair Javaid is the CEO and Co-Founder of Betterdata AI, a company focused on Programmable Synthetic Data generation using Generative AI and Privacy Engineering. Betterdata’s technology helps data science and engineering teams easily access and share sensitive customer/business data while complying with global data protection and AI regulations.
Previously, Uzair worked as a Software Engineer and Business Development Executive at Merkle Science (Series A $20M+), where he worked on developing taint analysis techniques for blockchain wallets. 

Uzair has a strong academic background in Computer Science/Engineering with a Ph.D. from National University of Singapore (Top 10 in the world). His research focused on designing and analyzing blockchain-based cybersecurity solutions for cyber-physical systems with specialization in data security and privacy engineering techniques. 

In one of his PhD. projects, he reverse engineered the encryption algorithm of Ethereum blockchain and ethically hacked 670 user wallets. He has been cited 600+ times across 15+ publications in globally reputable conferences and journals, and has also received recognition for his work including Best Paper Award and Scholarships. 

In addition to his work at Betterdata AI, Uzair is also an advisor at German Entrepreneurship Asia, providing guidance and expertise to support entrepreneurship initiatives in the Asian region. He has been actively involved in paying-it-forward as well, volunteering as a peer student support group member at National University of Singapore and serving as a technical program committee member for the International Academy, Research, and Industry Association.

Using Privacy Preserving Synthetic Data to Enhance Cyber Security Ops for DHS

Dr. Uzair Javaid
April 29, 2025
Summary:
  • Betterdata’s Large Tabular Model (LTM) generated synthetic data with high fidelity and zero PII exposure.
  • LTM enables zero-shot and few-shot adaptability without constant retraining.
  • Built-in privacy audits ensure every synthetic dataset meets DHS’s strict compliance standards.
  • LTM accelerated cyber defense training, improved anomaly detection, and enabled secure interagency data sharing.
  • Table of Contents

    In October 2024, Betterdata was the only startup in south-east Asia and among the 4 synthetic data startups globally to be awarded a contract building DHS's synthetic data generation capabilties. Where we solved ML training on high-quality, high utility and non-anonymized data for the US Department of Homeland Security (DHS) without compromising or risking data privacy. 

    What Challenge did DHS face?

    The Department of Homeland Security faced 2 critical problems when training AI and ML models to improve cyber security,

    Restrictive Data Sharing:

    Due to privacy, security, and regulatory constraints, operational data is often sensitive and cannot be shared across departments. 

    Ineffective Data Protection:

    Traditional anonymization techniques fail to protect against re-identification risks fully, limiting the ability to conduct effective cybersecurity and infrastructure protection exercises.

    Thus, the Department of Homeland Security (DHS) required high-quality data to train ML models, test critical systems, and simulate real-world scenarios.

    Our Solution, i.e., Large Tabular Model (LTM):

    Through our foundational model, ‘Large Tabular Model (LTM)’, DHS can generate high-fidelity synthetic data that mirrors the statistical properties of real datasets, while protecting sensitive or personally identifiable information (PII). 

    Zero-shot and few-shot adaptability: 

    Unlike rule-based or deep learning-based synthetic data generators that need retraining for each use case, LTM can adapt to new datasets with minimal input.

    Built-in privacy audits: 

    Every synthetic dataset generated undergoes rigorous privacy assessments, ensuring compliance with DHS’s stringent security and privacy standards

    Benefits of Synthetic Data?

    Synthetic data is increasingly being used as an alternative to real data because of the following advantages:

    Advanced Privacy Protection:

    • Synthetic data does not contain any Personally Identifiable Information (PII), making it completely safe and secure from reidentification attacks or risk of sensitive data exposure. 
    • At Betterdata, we enhance data privacy by incorporating differential privacy into the entire synthetic data pipeline, improving data protection, providing quantifiable data privacy guarantees while balancing data privacy and data utility.

    Statistically Similar: 

    • Synthetic data mimics the statistical properties of real data, such as marginal distributions, correlation structure, temporal and sequential patterns (for time-series), child-parent relationships (for relational data), etc.

    Customizable:

    • Synthetic data can be customized depending on the enterprise’s unique data needs.
    • Synthetic data can be augmented to increase domain coverage for better model generalizability.
    • Synthetic data can be enhanced to improve fairness, reduce imbalance, and bias in datasets.

    Is Synthetic Data High-Quality?

    Yes. Synthetic data is not only high-quality but also,

    • High Utility
    • High Dimensional
    • Highly Private

    Making it ideal for data-intensive tasks such as machine learning, data analysis, data sharing, data monetization, and so on.

    The Impact:

    Through the adoption of Betterdata’s LTM, DHS is now achieving game-changing outcomes:

    Faster Cyber Defense Simulations: 

    DHS can now simulate sophisticated cyber-attack scenarios using realistic yet risk-free datasets, accelerating training and strategic planning.

    Enhanced Anomaly Detection: 

    ML models trained on synthetic data identify anomalies and threats more accurately, without ever accessing real user information.

    Secure Interagency Collaboration: 

    Agencies can share synthetic datasets freely, breaking down data silos without risking policy violations.

    Regulatory Compliance: 

    DHS remains fully aligned with national cybersecurity mandates while advancing its AI-driven threat intelligence programs.

    Synthetic data has the potential to transform industries by enabling government agencies and enterprises to innovate without any restrictions because it allows enterprises and government agencies to work with accessible, fair, and scalable data. Something that was not possible in the not-so-distant past. With data protected and utility maintained (or even improved in some cases), innovation is not a question of how but when.

    Dr. Uzair Javaid
    Dr. Uzair Javaid is the CEO and Co-Founder of Betterdata AI, specializing in programmable synthetic data generation using Generative AI and Privacy Engineering. With a Ph.D. in Computer Science from the National University of Singapore, his research has focused on blockchain-based cybersecurity solutions. He has 15+ publications and 600+ citations, and his work in data security has earned him awards and recognition. Previously, he worked at Merkle Science, developing taint analysis techniques for blockchain wallets. Dr. Javaid also advises at German Entrepreneurship Asia, supporting entrepreneurship in the region.
    Related Articles

    don’t let data
    slow you down

    Our 3 step synthetic data solution increases your business performance by 10x
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.