Internet Learning Volume 3, Number 2, Fall 2014 | Page 82

Visualizing Knowledge Networks in Online Courses B. Technical Summary: Graph Database Technology For the analysis presented in this article, we populated a Neo4J graph database with our research data according to the schema described above. This research graph comprised roughly 46,000 vertices and 144,000 edges. We currently use the distributed graph database Titan to maintain our production dataset, consisting of approximately 400 million vertices and 1.2 billion edges. Because TinkerPop is graph vendor agnostic, we are able to use the same tools to manipulate both our Titan production graph and our Neo4J research graph. We built a custom DSL using Gremlin, the graph traversal language built into TinkerPop. The DSL composes custom graph traversals, queries, and calculations that can be executed in various contexts in the graph, such as for a whole course, a whole discussion, a single thread, or a group or individual over time. Queries can generate sub-graphs that can be used for visualization, or to test traversals, statistical methods, machine learning techniques, or other approaches. While we will provide limited examples to illustrate our approach, an in-depth discussion of these technologies is beyond the scope of this paper. You can learn more about them at http://tinkerpop. com, and https://github.com/tinkerpop. Gremlin enables the flexible construction of traversals for exploratory data analysis in the graph. For example, where ‘g’ is the graph, and ‘V’ is the set of all vertices in the graph, the following Gremlin query would generate a list of all concepts mentioned by a person named Renlit over the history of all of Renlit’s responses: g.V.has(‘personName’, ‘Renlit’).out(‘wrote’).out(‘mentions’) In this manner, we can construct complex and unanticipated queries to explore and interrogate the data, and evolve new queries based on emergent understandings of the data. Query results are themselves graphs, which can be used for visualization and other analytical work. If an interesting metric is discovered, it can be codified as an algorithm, expressed as a ‘step’ and used inline with other Gremlin commands. For example, imagine we have created a method for determining whether or not a person is a ‘Thought Leader’ in a course, based on some graph traversal. We could express that algorithm in a Gremlin step called isThoughtLeader, and use that step to discover all concepts discussed by thought leaders as follows: g.V.has(‘type’,’person’).isThoughtLeader.out(‘wrote’).out(‘mentions’) The output of such algorithms can be tested and used to inform learning environment design, or studied in conjunction with other factors in the course of ongoing research. VI - RQ1 Findings: Can we identify, differentiate and visualize individual attributes and behaviors in an online discussion or course? A. RQ1 Conceptual Overview There are many kinds of learner data available, depending on the environment, activity, platform, or product under study. In a general conversational context, much of what we can know about a person is derived from: • What they contribute: Number, size, content, and attributes of individual comments 81