Interview with Dr. Daniela Ledezma Tejeida
ETH Zürich
Professur für Systembiologie
1) What is the role of R in scientific research today?
From my experience in the area of computational biology, R facilitates the handling, processing and analysis of massive data. Especially in statistical analysis. Due to its open-source nature, it allows the indirect collaboration of thousands of researchers from different disciplines, so that a statistical expert can generate a package that helps a biology expert analyze their data in a simple way.
2) What advantages does it have compared to other systems like Matlab? What disadvantages?
The most important advantage against MatLab is that R is freely distributed software, which allows any researcher to use it, regardless of the budget they have for their research. Hand in hand there are many online or free courses (such as the Software Carpentry and DataCarpentry projects) so that anyone can learn to use it. Likewise, any developer can generate and publish packages for very specific topics, from which other users can benefit.
The biggest disadvantage would be that the standards for package documentation are very lax, so that some packages become difficult to use because their manuals are not clear. In this matter Matlab certainly has a lot of advantage.
3) Could you tell us a little about the object of your scientific research? What role does R play in it?
My research is focused on the relationship between transcriptional regulation and bacterial metabolism, both in functional and mechanistic terms. In my research I generate massive amounts of gene expression and metabolomics data. R allows me to process and analyze them.
4) Do you think that university training in the current plans offers a solid base in Bioinformatics knowledge and management of programs such as R?
I do not have enough information to give a strong answer since my university training program was specifically focused on bioinformatics. In my limited experience, I believe that university-level biology programs have gradually included more bioinformatics topics such as programming principles. I don't think it should be mandatory to include bioinformatics courses in any biology degree. I suspect that there are areas of expertise in biology that do not require handling of big data. I think master's degrees would be a great place to start offering options much more focused on bioinformatics.
5) What examples from everyday life or basic scientific research does R use?
Any event in daily life or research that requires the use of Microsoft Excel, or a calculator, can be solved in R. The most common example that occurs to me is that during the pandemic some bioinformatics colleagues generated code in R to be able to play games by zoom that required the use of dice or any other randomisation.
6 ) What do you consider essential to know when handling R? What is the best way to learn?
Ideally, have basic notions of programming. Not in any specific language, simply being able to generate pseudo-code to be clear about the requirements to solve the problem that one faces. Everything else is learning commands that can be easily googled.
The best way to learn is with practice. Looking for basic programming problems and solving them by making flow charts on paper.
7) Do you consider it important to know how to operate in R for those people who are training in Biotechnology?
I believe that R is a tool, as important as many others. I think it is more important for the student to have an idea of what areas of biotechnology are of interest to them and if those areas use big data. If that's the case, then I consider it important to learn basic programming principles. This way, if you suddenly need to use Python instead of R, or the student comes to a lab where only Matlab exists, the transition will be much easier.
8) In what state is Bioinformatics today? What future do you see?
I believe that bioinformatics is advancing rapidly. Every time we are able to generate more data and I think that the frontier is currently in integrating data from different technologies -omics using machine learning.
In more classical bioinformatics there is the idea that statistics is the discipline that allows us to make sense of the data, hence the relevance of R. In my opinion, statistics allow us to find the relevant data, but it requires people who are able to take the statistically significant results and return them to the biological context, that is the real challenge. I believe that in the future tools and applications that allow extracting knowledge from data will be more valued, and not those that only transform it. In a more down-to-earth example, differential gene expression analyzes (RNA-seq, microarrays, etc.) produce lists of genes with significant expression changes that usually end up converted into Gene Ontologies lists. Tools are required to tell us how Gene Ontologies are related to each other and what those relationships mean. Otherwise, we will continue reading articles that start with millions of data and end up validating 1 interesting example.