I am teaching the course “Big data and network science” as part of the Digital Masters if the CRI, along with Liubov Tupikina, Loic Saint Roch, Anirudh Krishnakumar, and Felix Schoeller. The course will take place on Wednesdays 9am-12pm beginning 16 October 2019 to 22 January 2020
Syllabus
This course will provide an introduction to the field of big data, with a focus on network data and data for mental health. Topics will cover data project management, infrastructure of big data, data analysis and visualisation, and mental health data. The course will be divided into a big data and a network data parts.
Network part:
Why focus on network data? Over the past century, network studies have had significant impact in disciplines as varied as mathematics, sociology, physics, biology, computer science or quantitative geography, giving birth to Network Science as a field of itself. With the recent rise of social networks in the last decade, their use has now become widespread in the digital world. Here we will provide an introduction to the field of Network Science, from the theoretical foundations (generating, analysing, perturbing networks) to the practical hands-on part (analysis and visualisation of a real-world networks).
Network topics will cover:
a. How to construct networks from real data?
b. How to analyze networks? (centrality measures, community detection, statistical analyses etc.)
c. How to visualise networks?
d. Dynamics and spreading phenomena on networks (epidemics / information spreading, diffusion)
e. How do networks wirings change in time? (network robustness, temporal networks)
f. How to represent more complex network data? Multilayer, multiplex networks.
Students will select, analyse and present a network of their choice as part of a personal project for the course. They will also choose an advanced topic in network science & big data for which they will make a presentation in a reverse classroom setting. They will in particular contribute to a wikipedia page about that topic.
Data Efforts in Big Data for Mental Health part:
In this part, students will be presented with topics related to the infrastructure of ‘big data’. They will be introduced to barriers, current trends, types, protocols and importance of ‘big data’ collection in the sphere of mental health, specifically through the
(i) Healthy Brain Network project for 10000 children collecting and sharing neuroimaging & phenotypic data.
Students will also contribute to the development of :
(ii) A Linked Semantic Mental Health Database and scientific framework mapping signs, symptoms and behaviors to subjective and objective measures, projects and technologies (https://github.com/ChildMindInstitute/mhdb/wiki)
(iii) MindLogger Data Collection Platform & App to dramatically improve the convenience, consistency, efficiency, accuracy & analysis of widely distributed data efforts (https://mindlogger.org/)
Students will then spend the last part of the course working on a research project developing and applying digital tools related to ‘big data’ and mental health, using the skills obtained from the first part of the course.
Course objectives
- Theoretical foundations of big data management and network science
- Analysis and visualisation of real-world network data
- Contributing to the development of digital tools (a semantic database & an app) to diagnose, assess, monitor, analyse and improve mental health
- Personal research projects related to data mining, analysis of a real-world network, and data for mental health
Key concepts
Network science; Data science; Data visualisation; Big data; Mental Health; Digital tools; Data collection platforms; Ontologies
Resources in Network Science
Introductory material on networks:
- Introductory interactive textbook by A-L Barabasi: http://networksciencebook.com/
- Chapter 2 for network metrics
- Chapter 9 for community detection
- An introduction to network visualisation:
- BASIC
- INTERMEDIATE
- ADVANCED
Network databases:
- Index of Complex Networks (ICON): https://icon.colorado.edu/ 5,000+ networks
- Network repository: http://networkrepository.com/ offers a lot of visualisation tools already in the website
- On Github:
- Deezer Social Networks, Facebook Page-Page Networks, Wikipedia Article Networks: https://github.com/benedekrozemberczki/datasets
- A Repository of Benchmark Graph Datasets for Graph Classification (31 Graph Datasets In Total https://github.com/shiruipan/graph_datasets
- Repository of Network repositories: https://github.com/ComplexNetTSP/ComplexNetWiki/wiki/Networks-datasets
- Your own!! Any dataset with two columns can be a network after all… Why not try with your favorite data?
For visualisations:
- Gephi https://gephi.org/ for simple graph visualisation
- Introduction to Gephi: http://www.martingrandjean.ch/gephi-introduction/
- Cytoscape https://cytoscape.org/ for more fine grained visualisation.
- Introduction to cytoscape https://github.com/cytoscape/cytoscape-tutorials/wiki
- D3.js for interactive visualisations: https://www.d3-graph-gallery.com/network
- Cytoscape.js for other interactive visualisations: http://js.cytoscape.org/
- R by following the (amazing) guide from https://kateto.net/network-visualization
- A paper and a pen. Sometimes it’s all that it takes: https://benfry.com/exd09/.
For analysis:
- R https://www.rstudio.com/ with the library iGraph (some intro here: https://kateto.net/networks-r-igraph)
- Python https://www.python.org/ with the networkx library (https://networkx.github.io/)
Other resources
- Exploring complex systems (not just networks): http://www.complexity-explorables.org/
- About collective phenomena and emergence: https://youtu.be/16W7c0mb-rE
Papers about network science:
Big Data & Mental Health:
Linked Semantic Mental health Database:
MindLogger Data Collection Platform & App:
Healthy Brain Network Data Collection Project:
- https://healthybrainnetwork.org/
- http://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network/index.html
News, Tutorials, Explorations and Data Adventures of a mental health lab:
Follow a Scientist – Projects and Papers :