Internet Learning Volume 3, Number 2, Fall 2014 | Page 82
Visualizing Knowledge Networks in Online Courses
B. Technical Summary: Graph Database
Technology
For the analysis presented in this article,
we populated a Neo4J graph database
with our research data according to
the schema described above. This research
graph comprised roughly 46,000 vertices and
144,000 edges. We currently use the distributed
graph database Titan to maintain our
production dataset, consisting of approximately
400 million vertices and 1.2 billion
edges. Because TinkerPop is graph vendor
agnostic, we are able to use the same tools to
manipulate both our Titan production graph
and our Neo4J research graph. We built a custom
DSL using Gremlin, the graph traversal
language built into TinkerPop. The DSL
composes custom graph traversals, queries,
and calculations that can be executed in various
contexts in the graph, such as for a whole
course, a whole discussion, a single thread,
or a group or individual over time. Queries
can generate sub-graphs that can be used
for visualization, or to test traversals, statistical
methods, machine learning techniques,
or other approaches. While we will provide
limited examples to illustrate our approach,
an in-depth discussion of these technologies
is beyond the scope of this paper. You can
learn more about them at http://tinkerpop.
com, and https://github.com/tinkerpop.
Gremlin enables the flexible construction
of traversals for exploratory data
analysis in the graph. For example, where ‘g’
is the graph, and ‘V’ is the set of all vertices
in the graph, the following Gremlin query
would generate a list of all concepts mentioned
by a person named Renlit over the
history of all of Renlit’s responses:
g.V.has(‘personName’, ‘Renlit’).out(‘wrote’).out(‘mentions’)
In this manner, we can construct
complex and unanticipated queries to explore
and interrogate the data, and evolve
new queries based on emergent understandings
of the data. Query results are themselves
graphs, which can be used for visualization
and other analytical work. If an interesting
metric is discovered, it can be codified as
an algorithm, expressed as a ‘step’ and used
inline with other Gremlin commands. For
example, imagine we have created a method
for determining whether or not a person
is a ‘Thought Leader’ in a course, based
on some graph traversal. We could express
that algorithm in a Gremlin step called isThoughtLeader,
and use that step to discover
all concepts discussed by thought leaders as
follows:
g.V.has(‘type’,’person’).isThoughtLeader.out(‘wrote’).out(‘mentions’)
The output of such algorithms can
be tested and used to inform learning environment
design, or studied in conjunction
with other factors in the course of ongoing
research.
VI - RQ1 Findings: Can we identify, differentiate
and visualize individual attributes
and behaviors in an online discussion or
course?
A. RQ1 Conceptual Overview
There are many kinds of learner data
available, depending on the environment,
activity, platform, or product
under study. In a general conversational
context, much of what we can know about a
person is derived from:
• What they contribute: Number, size,
content, and attributes of individual
comments
81