Here you will find a collection of datasets available on the public domain for various tasks. I strive to make the data I use for my research open. In case anything is not working please contact me at papachristoumarios@cs.cornell.edu.
Data in this category consist of unweighted directed networks where each node has a multi-dimenaional label (0-1 valued) regarding whether (or not) each node endorses a certain opinion. The file X.edges
contains directed edges between the nodes of the network, and the file X.feat
contains a dense feature matrix where the first entry corresponds to the corresponding node id.
These datasets contain call graphs derived using the cscout tool. Each file represents a directed call graph where each line corresponds to a directed edge between two entities (files, functions etc.).
In these datasets the nodes of a hypergraph represent users and hyperedges represent repositories, org members etc. that these users belong to. Each user comes with features (such as number of commits, number of followers etc.) used for experiments for this paper. We provide the SQL queries to create the datasets based on the GHTorrent MySQL schema. The post-processed datasets follow the convention of these hypergraph datasets.