About Us

This project explores the variation in cancer rates across different racial groups and geographical regions, with a specific focus on understanding the underlying factors contributing to these disparities. By analyzing data from various reliable sources, we aim to highlight significant differences in cancer incidence and mortality rates among different communities. Our goal is to uncover healthcare inequalities and provide insights for targeted interventions and policy changes. Using digital humanities tools, we aim to present this critical information in an accessible and engaging manner, ultimately striving to promote health equity and social justice.

Our Project

Our project combines data science and digital humanities to shed light on cancer disparities. The primary goal is to highlight the inequalities in cancer rates in a clear, accessible manner, ensuring our audience understands the underlying factors and implications.

Assets

The primary dataset used is from the CORGIS database, specifically the cancer.csv file, which includes detailed information on cancer rates segmented by age, sex, and race. This dataset provides a comprehensive overview of various cancer types (lung, colorectal, breast) and demographic information, enabling a thorough analysis of disparities. Our project uses the resources provided by the University of California, Berkeley’s digital humanities platforms. Berkeley offers valuable support in terms of data access, computational tools, and academic guidance. The dataset was chosen due to its relevance, as it includes important variables that allow for a detailed analysis of cancer disparities across different demographic groups. This aligns with our goal of exploring how cultural and historical contexts influence health outcomes, a key aspect of digital humanities. Initially, the dataset’s broad scope was challenging. We addressed this by focusing on specific cancer types and key demographic factors, allowing for a more detailed analysis. Ensuring data quality required deep cleaning, which was important for maintaining accuracy and reliability. As Geoffrey Rockwell emphasizes, “Evaluators should ask about the decisions made and discuss these with a candidate” to ensure appropriate content and standards are met in digitization efforts.

Services

We utilized Python and its libraries (pandas, numpy, etc) for data cleaning, exploration, and visualization. These tools enabled strong statistical analysis and helped create clear visualizations. The dataset lets users explore and understand cancer disparities based on their specific interests or demographic focus. It is available through public databases, promoting transparency and further research. We chose these tools for their ability to handle large datasets and create advanced visualizations. Python’s extensive libraries provided the necessary functions for data analysis and visualization, making it ideal for our project. Initially, combining digital humanities perspectives with our data analysis was challenging. We addressed this by adding historical and cultural information into our visualizations, enriching the project’s interdisciplinary approach. Rockwell advises, “It is common in electronic text projects to bring in consultants to review encoding schemes and technical infrastructure,” highlighting the importance of expert consultations to ensure the quality and relevance of digital work.

Interface & Display

The visualizations created using Python libraries, pandas, and numpy are interactive, allowing users to engage with the data dynamically. Users can zoom, filter, and explore different aspects of the visualizations to gain deeper insights. Creating these visualizations required significant labor, including data cleaning, coding, and design. The project team ensured that the visualizations were accessible to a broad audience. The project is designed to be shareable, with links provided for easy access. The focus on user experience and accessibility ensures that the project reaches a wide audience and provides valuable insights in an engaging manner. The interactive elements and clear explanations enhance the usability and educational value of the visualizations. Ensuring that the visualizations were detailed yet clear and understandable posed a challenge. We revised the design to balance detail and clarity, making the visualizations informative and accessible. Implementing interactive features required technical expertise and time. As Rockwell notes, “Interface was considered window dressing for serious projects that might be considered successful even if the only users were the content experts themselves,” emphasizing the need for thoughtful interface design that considers user engagement and accessibility.

Why We Chose This Dataset

Relevance

The cancer.csv dataset includes detailed information on cancer rates segmented by age, sex, and race, making it highly relevant for analyzing disparities in cancer incidence across different demographics. From a digital humanities perspective, this allows for a nuanced understanding of how socio-cultural and historical contexts influence health outcomes.

Comprehensive Data

The dataset provides a wide range of variables, including total cancer rates, specific cancer types (lung, colorectal, breast), and demographic information, allowing for a thorough analysis. This comprehensiveness enables us to draw connections between quantitative data and qualitative insights into the lived experiences of different demographic groups.

Public Health Signifcance

Understanding cancer rates and their disparities is crucial for public health planning and policy-making. This dataset helps shed light on areas needing attention and intervention, contributing to the broader goal of health equity. In digital humanities, this aligns with exploring how historical and social determinants of health shape current disparities.