Since 2019 I am teaching the course “Data science” as part of the LPI Digital Masters, along with Liubov Tupikina.
Syllabus
This course provides an advanced introduction to the field of data science, with a focus on network and spatial data. Topics cover data project management, data collection, cleaning, analysis and visualization, network analysis and representation, and work with spatial data. The course has a strong focus on hands-on sessions and personal projects.
Why focus on network data? Over the past century, network studies have had significant impact in disciplines as varied as mathematics, sociology, physics, biology, computer science or quantitative geography, giving birth to Network Science as a field of itself. With the recent rise of social networks in the last decade, their use has now become widespread in the digital world. Here we will provide an introduction to the field of Network Science, from the theoretical foundations (generating, analyzing, perturbing networks) to the practical hands-on part (analysis and visualization of real-world networks).
At the end of the course the students will have gained intuition to analyze real-world data. They will be able to use Python and R for statistical analyses and working with data. They will know practical tools and packages to work with spatial data, as well as network visualization tools. Finally, they will have obtained good practices for code and data management.
Classes:
- Introduction to network and data science
- Elements of statistics and AI
- Building intuition with a dataset
- Kickstarting projects
- Structuring your project & API techniques
- Spatial data analysis
- Data and Network visualization
- Data fitting, embedding, modeling
- Work and feedback session on the projects
- Projects presentation
Resources in Network Science
Introductory material on networks:
- Introductory interactive textbook by A-L Barabasi: http://networksciencebook.com/
- Chapter 2 for network metrics
- Chapter 9 for community detection
- An introduction to network visualisation:
- BASIC
- INTERMEDIATE
- ADVANCED
Network databases:
- Index of Complex Networks (ICON): https://icon.colorado.edu/ 5,000+ networks
- Network repository: http://networkrepository.com/ offers a lot of visualisation tools already in the website
- On Github:
- Deezer Social Networks, Facebook Page-Page Networks, Wikipedia Article Networks: https://github.com/benedekrozemberczki/datasets
- A Repository of Benchmark Graph Datasets for Graph Classification (31 Graph Datasets In Total https://github.com/shiruipan/graph_datasets
- Repository of Network repositories: https://github.com/ComplexNetTSP/ComplexNetWiki/wiki/Networks-datasets
- Your own!! Any dataset with two columns can be a network after all… Why not try with your favorite data?
For visualisations:
- Gephihttps://gephi.org/ for simple graph visualisation
- Introduction to Gephi: http://www.martingrandjean.ch/gephi-introduction/
- Cytoscapehttps://cytoscape.org/ for more fine grained visualisation.
- Introduction to cytoscape https://github.com/cytoscape/cytoscape-tutorials/wiki
- D3.js for interactive visualisations: https://www.d3-graph-gallery.com/network
- Cytoscape.js for other interactive visualisations: http://js.cytoscape.org/
- R by following the (amazing) guide from https://kateto.net/network-visualization
- A paper and a pen. Sometimes it’s all that it takes: https://benfry.com/exd09/.
For analysis:
- R https://www.rstudio.com/ with the library iGraph (some intro here: https://kateto.net/networks-r-igraph)
- Python https://www.python.org/ with the networkx library (https://networkx.github.io/)
Other resources
- Exploring complex systems (not just networks): http://www.complexity-explorables.org/
- About collective phenomena and emergence: https://youtu.be/16W7c0mb-rE
Papers about network science: