Category: Coding

Multiserver DoOO Data Management

This post can only possibly appeal to about 12 people and only when they’re really in the mood for weedsy code stuff. However, that’s about my normal readership, so here we go…

For the OU Create project, we now have 5 servers that are managed by Reclaim Hosting. We’ve got more than 4000 users, and, collectively, they have more than 5000 websites. Keeping track of all of the usernames, domains, and apps in use is difficult.

One of the ways that we study the digital ecology of these servers is by looking at the data.db files created by each server’s instance of Installatron. These database files keep track of all of the web apps installed using Installatron. Thus we have a record of every installation of WordPress, Drupal, Omeka, MediaWiki, or any of the other 140 or so apps with Installatron one-click installation packages. I can find the user’s name, the date, and the domain, subdomain, or subdirectory used for each installation. However, within the data.db file, there are five tables for all of this data and it’s a SQLite file, so it’s not exactly a quick or easy read. Further complicating everything is that each server has it’s own data.db file and each one is buried deep in the directory structure amongst the tens of thousands of files on the server.

Here at OU, we have a couple of websites that were built as derivatives of studying the data.db files. The first was community.oucreate.com/activity.

This site is built on Feed WordPress. We feed it the urls for each WordPress site in OU Create and it aggregates the blog activity into one feed averaging 300+ posts a week.

The other site is community.oucreate.com/sites.

Screen Shot 2017-12-06 of community.oucreate.com/sites

Sites provides a filterable set of cards displaying screen captures, links, and app metadata for all of the sites in OU Create. You can see what sites are running Omeka or Vanilla Forums or whatever other app you’d like.

To maintain these sites, I would normally go into each server and navigate up one level from the default to the server root, then go into the var directory, then the installatron directory, and then download the data.db file. Once I’ve downloaded it, I use some software to open the file, and then export the i_installs table to a csv. Then I find the installations since my last update of the system, copy the urls, and paste them into the Activity site or run a script against them for the Sites site. I repeat this for each server. This process is terrible and I hate doing it, so it doesn’t get done as often as it should.

This week, I wrote some code to do these updates for me. At the most basic level, the code uses secure shell (SSH) to login to a server and download a desired file. My version of the code loops (repeats) for each of my five servers downloading the data.db file and storing them all in one folder. Here is the code and below I’ll explain how I got here and why:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import os
import paramiko
import sqlite3
import csv
 
#The keypasses list holds the passwords for the keys to each of the servers
keypasses=["xxxxxxxxxx", "xxxxxxxxxx", "xxxxxxxxxx", "xxxxxxxxxx", "xxxxxxxxxx"]
counter = 1
csvWriter = csv.writer(open("output.csv", "w"))
 
#loop through the keypass list getting the data.db files
for keypass in keypasses:
 
db = "data%s.db" % counter
servername = "oklahoma%s.reclaimhosting.com" % counter
keyloc="/Users/johnstewart/Desktop/CreateUserLogs/id_oklahoma%s" % counter
k = paramiko.RSAKey.from_private_key_file(keyloc, password=keypass)
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
print ("oklahoma%s connecting" % counter)
ssh.connect(servername, username="root", pkey = k)
print ("connected")
sftp = ssh.open_sftp()
localpath = '/Users/johnstewart/Desktop/CreateUserLogs/data%s.db' % counter
remotepath = '../var/installatron/data.db'
sftp.get(remotepath,localpath)
sftp.close()
print ("data%s.db downloaded" % counter)
ssh.close()
#export the desired table from the database to the aggregated csv
with sqlite3.connect(db) as connection:
c = connection.cursor()
c.execute("SELECT * FROM i_users")
rows = c.fetchall()
csvWriter.writerows(rows)
print ("%s i_users exported to csv" % db)
counter=counter+1

 

On Monday, I wrote the first draft of this in bash and then rewrote it in a similar language called expect. With expect, I could ssh into a server and then respond to the various login prompts with the relevant usernames and passwords. However, this exposed the passwords to Man in the Middle attacks. If someone where listening to the traffic between my computer and the server, they would be able to intercept the username and password from within the file. This is obviously not the best way to do things.

The solution was to use an ssh key. These keys are saved to your local computer and provide an encrypted code to login to the server. You in turn use a password to activate the key on your own computer, so there’s no ‘man in the middle.’ Unfortunately, I had no idea how to do this. Luckily for me, Tim Owens is a fantastical web host and has a video explaining how to set up keys on Reclaim accounts:

I set up keys for each of the servers and saved them into my project folder. This also denecessitated the ‘expect’ script because I no longer needed to enter a password for each server.

I turned back to a bash shell script, but couldn’t figure out what to do with my .db files once I had downloaded them all. This morning I turned from bash to python which is very good at handling data files. Python also has the paramiko library, which simplifies the process for logging into and downloading files from servers. You can see in the loop part of the code above where I call several paramiko functions to establish a connection to the server and then use sftp to get the file I want.

Our servers are labeled numerically oklahoma1 through oklahoma5, and I had saved my keys for each server as id_oklahoma1 through id_oklahoma5, so it was easy to repeat the basic procedure for each server by writing a loop that repeated 5 times. Each time the loop occurs it just increases the number associated with the servers and keys. The loop also saves the data.db files locally as data1, data2, etc.

The last step was to use Python to compile the desired data from each of these data.db files. SQLite3 provided the needed methods for handling data files. I could connect to each database after I downloaded them. Then I called the table that I wanted from each table and “fetched” all of the rows from that table. From there, I can use the csv library to write those rows to a csv (an excel like, comma separated variable table). This whole process was part of the larger programmatic loop, so each time I pulled a database from a server, I was adding it’s table rows to the collective csv.

For me, this process will make it easy to pull a daily update of all of the domains in OU Create and then upload those to my two community websites. As we follow Tom Woodward and Marie Selvanadin’s work on the Georgetown Community site, these up-to-date lists of sites will make it easier to build sites that pull together information from the API’s of the many OU Create websites. The process could also be generalized to pull or post any files from a set of similar servers allowing for quicker maintenance and analysis projects. Hopefully Lauren, Jim, and Tim will find fun applications as they continue their Reclaim Hosting server tech trainings.

Twine Game Data to Google Sheets via Javascript

Using Twine, a free, open-source, text-based game software, you can build choose your own adventure games that explore the untaken paths in literaturepromote empathy through simulated experiences, and provide role-playing adventures in historical scenarios. Twine games are often used in the classroom, because you can quickly build an educational experience about whatever subject you choose. They are also heavily used in the interactive fiction world as a medium for short stories and novels.

In the XP Twine workshop that Keegan and I are leading, several of the faculty members asked how Twine games could be used to track students’ understanding of concepts. One faculty member is building a game that simulates entrepreneurial business investment. In the game, students can try out different investment strategies and equity stakes as they try to become a successful venture capitalist. The professor wanted to be able to track the choices they made in game in order to spur in class discussion.

Twine games take the form of HTML files with embedded CSS and JS. In my latest round of tinkering, I figured out how to use javascript within a Twine game to send an HTTP post message to pass game-play data to a Google Spreadsheet, thereby creating a database that records each game-play.

Google Sheet/Apps Script Code

In order to track this game data, I suggested that we push the data from Twine to a Google Spreadsheet. Following the lead of Tom Woodward, I’ve found that Google Spreadsheets are a relatively easy place to collect and analyze data. I wanted to use Google Scripts, which are mostly javascript and a few custom functions, to receive data and parse it into the cells of the Google Sheet.

Martin Hawksey wrote a blog post a few years ago called “Google Sheets as a Database – INSERT with Apps Script using POST/GET methods (with ajax example).” Martin had set up an Ajax form that could be embedded in any website that would pass data to his Google Script which would then record it in his Google Sheet. Martin’s code (below) receives an HTTP Get or Post call generated by an Ajax form, parses the parameters of that HTTP call, and stores those parameters in a Google Sheet. Martin also provides comments in his code to help users customize the Google script and initiate it as a Web App.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
//  1. Enter sheet name where data is to be written below
        var SHEET_NAME = "DATA";
 
//  2. Run > setup
//
//  3. Publish > Deploy as web app 
//    - enter Project Version name and click 'Save New Version' 
//    - set security level and enable service (most likely execute as 'me' and access 'anyone, even anonymously) 
//
//  4. Copy the 'Current web app URL' and post this in your form/script action 
//
//  5. Insert column names on your destination sheet matching the parameter names of the data you are passing in (exactly matching case)
 
var SCRIPT_PROP = PropertiesService.getScriptProperties(); // new property service
 
// If you don't want to expose either GET or POST methods you can comment out the appropriate function
function doGet(e){
  return handleResponse(e);
}
 
function doPost(e){
  return handleResponse(e);
}
 
function handleResponse(e) {
  // shortly after my original solution Google announced the LockService[1]
  // this prevents concurrent access overwritting data
  // [1] http://googleappsdeveloper.blogspot.co.uk/2011/10/concurrency-and-google-apps-script.html
  // we want a public lock, one that locks for all invocations
  var lock = LockService.getPublicLock();
  lock.waitLock(30000);  // wait 30 seconds before conceding defeat.
 
  try {
    // next set where we write the data - you could write to multiple/alternate destinations
    var doc = SpreadsheetApp.openById(SCRIPT_PROP.getProperty("key"));
    var sheet = doc.getSheetByName(SHEET_NAME);
 
    // we'll assume header is in row 1 but you can override with header_row in GET/POST data
    //var headRow = e.parameter.header_row || 1; Hawksey's code parsed parameter data
    var postData = e.postData.contents; //my code uses postData instead
    var data = JSON.parse(postData); //parse the postData from JSON
    var headers = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues()[0];
    var nextRow = sheet.getLastRow()+1; // get next row
    var row = []; 
    // loop through the header columns
    for (i in headers){
      if (headers[i] == "Timestamp"){ // special case if you include a 'Timestamp' column
        row.push(new Date());
      } else { // else use header name to get data
        row.push(data[headers[i]]);
      }
    }
    // more efficient to set values as [][] array than individually
    sheet.getRange(nextRow, 1, 1, row.length).setValues([row]);
    // return json success results
    return ContentService
          .createTextOutput(JSON.stringify({"result":"success", "row": nextRow}))
          .setMimeType(ContentService.MimeType.JSON);
  } catch(e){
    // if error return this
    return ContentService
          .createTextOutput(JSON.stringify({"result":"error", "error": e}))
          .setMimeType(ContentService.MimeType.JSON);
  } finally { //release lock
    lock.releaseLock();
  }
}
 
function setup() {
    var doc = SpreadsheetApp.getActiveSpreadsheet();
    SCRIPT_PROP.setProperty("key", doc.getId());
}

I edited Martin’s original code in lines 39-41. In his code, he’s looking for Post data in a slightly different format than what I generate. Rather than using the parameters from the HTTP Post, my code uses the data from the Post.

Twine Game Code

Rather than using an Ajax form, I wanted to pass variables that had been collected during gameplay in a Twine game. Twine is built on javascript, so I decided to replace Martin’s Ajax form with a javascript HTTP Post function embedded in Twine. Based on research on how Twine works, I decided that the best way to do this would be to write the javascript code directly into a Twine game passage. My passage, called PostData, would presumably come at or very near the end of my game after all interesting variables have been set:

Twine Screen ShotI wrapped Twine’s script syntax <script></script> around the standard xhr XMLHttpRequest() function. This sends an HTTP Post to whatever url is provided (like the url for your Google Script Web App) with a json package defined in the var data line. Here’s this code (please note that you need to add the <script></script> modifiers in Twine:

1
2
3
4
5
6
var xhr = new XMLHttpRequest();
var url = "URL for the Google App";
xhr.open("POST", url, true);
xhr.setRequestHeader("Content-type", "application/json; charset=UTF-8");
var data = JSON.stringify({"var1": harlowe.State.variables['var1'], "var2": harlowe.State.variables['var2'], "var3": harlowe.State.variables['var3']});
xhr.send(data);

However, in order to pull variables out of the Harlowe version of Twine that I was using, I also needed to add the following code by editing the Story Javascript:

Twine Javascript Screen Shot

This bit of Javascript passes all variables defined within the Twine game into an array (window.harlowe) that is accessible by Javascript code that is embedded in the game. Here’s the code in case you want to try this out:

1
2
3
if (!window.harlowe){
	window.harlowe = {"State": State};
}

I hope this work will be useful in studying any Twine game to see how players are moving through the game. You could record any variables in the game and also the games ‘history’ to see which passages each player went through. This has obvious uses for educational games in being able to provide feedback to players, but it also has implications for game design more broadly with the increased use metrics.

Implement in your own game

In order to implement this for your own game, I would suggest following these steps:

  1. Copy the Javascript code above (starts with if (!window)) into your Twine game’s Javascript panel
  2. Copy the PostData code above and paste it into a TwinePost passage towards the end of your game
  3. Then replace the variables in the TwinePost passage so that harlowe.State.variables[‘var1’] becomes harlowe.State.variables[‘your variable name here’] for each of the variables you want to track
  4. Click this link to get a copy of my Google Spreadsheet
  5. Make sure the column headers in the spreadsheet match your variable names from the TwinePost passage
  6. In the Google Sheet, click on Tools->Script Editor and follow Martin Hawksey’s instructions for steps 2-5
  7. When you publish your Script as a Web App, it will give you a URL for the Web App. Copy this URL and paste it into the URL variable in your TwinePost passage code.
  8. You’re done. Play your game and see if everything works. If it doesn’t work, tweet at Tom Woodward. He’s good at fixing code and has nothing but free time on his hands.

I am excited about this code because it answers a question for several of our faculty members and makes Twine games more useful as formative assessments. Hawksey did an excellent job in keeping his code very generalized, and I’ve tried to preserve that, so that you can track whatever variables you want.

You could also use the HTTP Post javascript code outside of Twine in any other web site or web app to pass information to your Google Sheet. Tom has blogged a couple of times about using code to send data to Google Forms and autosubmitting into a Google Spreadsheet. I think the process described above denecessitates that Google Form pass through and moves us a step closer to Google sheets as a no-SQL data base alternative.

Using Reclaim Hosting’s Server Status API in WordPress

I am extremely excited about a server status page that I just published for OU Create.

The page simply displays the current status, operational or otherwise, of the two Reclaim Hosting servers that run our OU Create service. The page itself is boring and purely information, but the code that powers it is exciting to me.

After a recent, short outage, Tim Owens at Reclaim Hosting and Tom Woodward at VCU entered a conversation about using RSS to post notifications of outages for each of the schools using Reclaim Hosting servers.

Drawing on this new Server Status API, Tom wrote a Google Script that would post a tweet on the ALTLab Twitter account whenever the servers were down. Tom’s code runs an HTTP GET request against the Reclaim API every five minutes. It then parses the response json file to check whether their server is ‘Operational.’ If it is, nothing happens. If it’s not, then the system generates a tweet.

Inspired by Tom’s work, I wanted to use the API to add server-status page to the OU Create account.  To do this, I added a plug-in called ‘Insert php’ to the WordPress account for OU Create that allows us to write php code directly into posts and pages and display the results. I then used Google and advice from our coding guru, Kerry Severin, to write this short php code.

The code first runs WordPress’s version of the HTTP GET operation to pull in the Reclaim Hosting server status information as a string – an unstructured list of characters.  Then it uses a php function to parse those characters into an array of json attributes.  Next, it sets variables equal to relevant pairs within that array to read the status of the two OU servers.  The final If/then command prints the status of the servers with either a green check mark if they are operational or a red ‘X’ if they’re not (these images were modified from Wikimedia files and would need to be modified when adapting this code).

This code is shared on GitHub and should be readily adaptable for anyone else who wants to produce a similar WordPress page.  Simply change out the php header and closing tags as per the instructions in the ‘Insert php’ plugin.  Then change the number of your server and get rid of code for the second server. Don’t forget to change the urls for the image files (ours were open-sourced from Wikimedia).

Powered by WordPress & Theme by Anders Norén

css.php