Ciência-IUL    Autores    João Caldeira    Projetos de Investigação
Projetos de Investigação
Ciência dos Dados para não programadores
The objective of this project is to explore the use of visual programming paradigms to enable non-programmers to be part of the Data Science workforce. In contrast to existing approaches, which require programming, Scientific Workflow Management Systems (SWMS) can become an alternative to support the visual programming of data science projects. Such systems (e.g. Taverna and Kepler) use a simple graphical, graph-based structure to develop applications. This simplicity has shown to be suitable in several scientific areas such as bioinformatics, geophysics, and climate analysis. Despite the success of SWMS in data intensive research, they did not reach a state where non-programmers data scientists can use them. They still require some programming and scripting skills to code individual processing tasks. That is why research teams using those systems are usually composed of scientists and software developers. We propose to extend current SWMS to support the parameterization of generic prebuild workflow templates. Workflow templates capture the processing tasks of data science projects. A template can be seen as a formalized best practice that data scientists can use to solve common data analysis challenges. Templates are developed by multidisciplinary teams of experts and reused by non-programmer data scientists. Parameterized workflows have been used successfully in the field of enterprise computing since 1970 to increase software reuse (e.g. SAP’s parameterized workflows to automate business process models). We claim that the same type of benefits can be obtained by parameterizing scientific workflow templates.
Informação do Projeto
Parceiros do Projeto