Marios Papachristou Personal Homepage

Datasets

Here you will find a collection of datasets available on the public domain for various tasks. I strive to make the data I use for my research open. In case anything is not working please contact me at papachristoumarios@cs.cornell.edu.

Under construction

Social Networks

Opinion Dynamics

Data in this category consist of unweighted directed networks where each node has a multi-dimenaional label (0-1 valued) regarding whether (or not) each node endorses a certain opinion. The file X.edges contains directed edges between the nodes of the network, and the file X.feat contains a dense feature matrix where the first entry corresponds to the corresponding node id.

  • pokec. Derived from soc-pokec. The data contains users of the pokec social network where users with private information have been filtered out. The attributes of each user are derived by looking at his/her corresponding profile interests (described in the original network).
  • github. Contains data gathered from GHTorrent with queries described in this gist where nodes are github users and attributes are programming languages that the user has programmed at as an owner of a project.

Call Graphs

These datasets contain call graphs derived using the cscout tool. Each file represents a directed call graph where each line corresponds to a directed edge between two entities (files, functions etc.).

Other resources