Speakers
Laura Galuchie, BS
Senior Director, Global Clinical Development
Merck & Co, Inc.
Zachary Smith, MA
Assistant Director, Data Sciences & Analytics
Tufts Center for the Study of Drug Development
Tufts University School of Medicine
Keywords
Data; Optimization; Data Collection; Protocol Design
Key Points
- The TransCelerate Initiative – comprising a group of pharmaceutical companies with research and development organizations – seeks to identify key considerations in protocol design to optimize procedures and their frequency, while providing tools and a value-based framework for internal evaluation.
- Optimized data collection can improve patient and site experience, reduce complexity, enhance trial execution through better design decisions, and maintain (or potentially improve) quality.
- A seasoned approach to data collection is timely, as the volume of data is rising (and increasingly exceeds that which is needed). Additionally, recent ICH and ethics updates emphasize fit-for-purpose data and eliminating unnecessary complexity in clinical trials.
- The TransCelerate-Tufts Center for Study of Drug Development (CSDD) partnership was borne of the need for continued tangible, actionable evidence to demonstrate the opportunity to optimize data collection.
- In 2024, they workshopped a data collection instrument and 14 companies collected and provided data. Tufts CSDD conducted data quality checks to ensure accuracy, validity, and completeness and conducted a comprehensive quality control process. Data analysis took place in early 2025. Endpoints were defined as “core” and “non-core” based on procedure type.
- The study sought to quantify the collection and use of non-core and extraneous core protocol data; gather updated benchmarks on the amount, purpose, and impact of data collected in clinical trials; and identify ways to improve protocol design by reducing complexity and easing the burden on sites and participants.
- The research team found that the mean number of datapoints collected has exploded in the last decade, up from 930,000 in 2012 to nearly 6 million in 2025. More than 1/3 of all data collected comes from non-core and non-essential procedures.
- Non-core and other non-essential procedures contribute to 25-30% of total participants and site burden. Note that there may be other benefits to some non-essential procedures; for example, making sure patients are heard through site questionnaires.
- The analysis provides empirical evidence encouraging protocol design discussion and a shift towards more intentional and fit-for-purpose data collection strategies. Planning frameworks and collection assessment tools can reduce unnecessary burden on patients, sites, regulators, and other stakeholders, as well as help sponsors critically assess what data are collected and why.
Discussion Themes
When looking at the factors that contributed to the overcollection of data, the study team found that no one function or department was responsible for the majority of the data points and procedures; the distribution of contributing factors was diffuse. It’s an equal-opportunity problem.
Factors driving non-core data collection included teams’ fear of being asked for data they hadn’t collected by regulators and a lack of on-site experience amongst functional areas. In the latter case, teams are focused on their objectives and lack perspective on how data collection translates into the patient experience, the site experience, or an impact on another function within the group.
In addition to financial costs, there are time costs associated with data collection. The study team found a direct correlation between the amount of data and complexity of a trial – and as complexity increased, the time it took to conduct the trial also increased.