Authors: David Lengweiler, Tobias Weber, Heiko Schuldt, Marco Vogt
Data exploration, integration, organization, and analysis are critical workflows for data scientists. In recent years, tools like Jupyter Notebooks have gained significant traction by incorporating these steps into a unified repository, allowing users to modify, extend, and document complex analytical processes with ease. However, while these tools streamline analysis, they often leave data integration to the user, frequently resulting in the execution of processes on stale data. Furthermore, standard notebooks lack robust support for persisting large datasets, forcing data scientists to rely on file-based storage or ephemeral memory.
Databases, especially multi-model databases, offer a convenient repository for consolidating diverse data formats and providing simplified access. In this paper, we present the combination of these two paradigms. We show how integrating notebooks with multi-model databases leverages established data models to access and persist analytical data, ultimately improving performance for data science use cases.
Link: