We are thrilled to announce the launch of our Public Health Dataset Project, an initiative focused on gathering comprehensive and high-quality public health data to drive the development of AI models dedicated to public health applications. As we embark on this exciting journey, we invite public health researchers, healthcare professionals, and organizations to contribute their data and expertise to help us build a robust resource that will revolutionize public health research and practice.

Why Current Language Models Fall Short in Public Health

While Large language models (LLMs) like OpenAI’s ChatGPT and Meta Llama models have shown remarkable capabilities in various applications, they currently lack the specificity and accuracy needed for public health use cases for the following reasons:

  1. Generalization vs. Specialization: LLMs are designed to be general-purpose, meaning they can handle a wide range of topics but may not excel in specialized fields like public health. They often lack the domain-specific knowledge required to accurately interpret and analyze health data.
  2. Data Limitations: LLMs are trained on diverse datasets that may not include sufficient public health data. This results in a lack of context and understanding when dealing with public health terminology, concepts, and nuances.
  3. Regulatory and Ethical Considerations: Public health data is sensitive and requires careful handling to comply with regulations like HIPAA. General LLMs are not designed with these specific privacy and ethical considerations in mind.

The need for domain-specific public health models

To address these challenges, the Public Health Committee needs to develop domain-specific models that are customized to work effectively in public health settings. Such a model will:

  • Improve Accuracy: By focusing on public health data, the model can achieve higher accuracy and reliability in interpreting health-related information.
  • Enhance Decision-Making: Public health professionals can make better-informed decisions with a model trained on relevant and high-quality data.
  • Ensure Compliance: A domain-specific model can be designed to adhere to regulatory standards, ensuring that data privacy and ethical guidelines are met.

How Much Data Do We Need?

Creating a high-quality public health model requires a substantial amount of data. We are looking for contributions that cover a wide range of public health topics, including but not limited to epidemiology, disease surveillance, health behaviour and outcomes, environmental health, and healthcare access and quality.

What Qualifies as High-Quality Data?

High-quality data is critical for the success of our project. Here are some key attributes that define such data:

  1. Accuracy: Data should be correct and free from errors.
  2. Completeness: All relevant data fields should be filled out comprehensively.
  3. Timeliness: Data should be up-to-date and relevant to current public health scenarios.
  4. Consistency: Data should be consistent across different sources and formats.
  5. Relevance: Data should be directly related to public health topics and useful for research and analysis.

How to Contribute

We encourage all stakeholders in the public health community to contribute to this vital initiative. You can submit your data using the link below. Our team will review all submissions to ensure they meet our quality standards.


Join Us in Transforming Public Health

Your contribution can make a significant difference in advancing public health research and improving outcomes. Together, we can build a powerful tool that supports public health professionals in their critical work.

For more information, please contact us at Let’s work together to create a healthier future for all.

This blog post is part of our ongoing effort to engage the public health community in meaningful collaborations. We look forward to your support and contributions.

Stay tuned for updates and follow our progress on social media