ERuDIte is the educational resource discovery index that powers the BD2K Training Coordinating Center (TCC) Web Portal. ERuDIte not only serves as a resource collector and aggregator but also as system powered by Machine Learning, Information Retrieval, and Natural Language Processing that intelligently organizes resources to provide a dynamic and personalized curriculum for biomedical researchers interested in learning about Data Science.
In the context of this document, biomedical researchers are the intended audience of ERuDIte and the TCC Web Portal, and consequently, they will be addressed as users or learners.
As a research initiative itself, ERuDIte aims to:
- Identify, store and synthesize large volumes of relevant educational resources in a scalable fashion
- Maintain a schema that aligns with other resource collection initiatives to promote data sharing
- Serve high-quality, up-to-date educational content to the biomedical community (and research community at large) that not only teaches Data Science concepts but also supports the practical application of such concepts into specific analysis tasks
- Aid learners in navigating the vast number of resources pertaining to Data Science through semi-automatic tagging and prerequisite identification
- Provide an individualized learning path through recommendations tailored to learners’ interests, experience, and progress over time
To accomplish these objectives, ERuDIte has multiple components responsible for the resource to ERuDIte to TCC Web Portal pipeline. We illustrate the pipeline below:
- Resource Identification Component: collects links to relevant resources and gathers any available data for the resources
- Resource Integration Component: unifies data from heterogeneous resources and conforms them to a standard schema
- Resource Database: stores resource data, making it available for the Resource Organization Engine, Curation Interface, TCC Web Portal, and Resource Personalization Engine
- Resource Organization Engine: automatically assigns tags, identifies prerequisites, and evaluates resource depth and uses curator data from the Curation Interface and user data TCC User Database to improve its algorithms
- Curation Interface: tool for curators to validate organization data and assess resource quality
- TCC Web Portal: presents ERuDIte data and collects learner activity and progress.
- TCC User Database: stores user data, including (but not limited to) learner profile data and usage activity, and informs the Resource Personalization Engine and the Resource Organization Engine
- Resource Personalization Engine: synthesizes user activity, resource tags and prerequisites, and resource similarity measurements to recommend resources to learners