Etherpad-lite performances - an ongoing saga

This all started with Julie writing to me to ask about collaborative editing software, and describing the project. She wanted to do a live projection and web broadcast of a collaborative editing session with several women. She had been thinking of piratepad but wanted a substantially different visuals.

This post is a summary of my analysis and technical solutions. Last night, I was still buzzing from the actual performance, and wrote a very different post reflecting my experience of it.

I took a brief inventory of every collaborative editing software I knew of and had actually used:

I quickly wrote down my goals:

  1. It would be best to use a web based client to avoid having to install a client on multiple computers. This excluded gobby.
  2. I need to be able to run this thing on a laptop or on my puny little vserver.
  3. I need to be able to customize the look and feel of the interface for the performance.
  4. I would have to squeeze this project in around my regular working hours, so I needed to keep the number of new things to learn to a minimum.

After a couple of hours of cursory research, I quickly decided that etherpad-lite was the best candidate. I passed on my conclusions to Julie and we set a first meeting and test at the gallery, Skol.

The day of the first test came with me frantically deploying etherpad on my laptop, and coming up with ways to use ssh tunneling and a proxy to allow me to serve the etherpad locally on Skol's puny dsl connection but serve it over the web to outside visitors. I figured that way we could have a good redundancy and the show could go on if the gallery's internet connection went down.

Julie and I sat down and played around with etherpad-lite while she filled me in on some new details and wrinkles. I discovered we had a number of problems:

  • She now had participants who would be writing from overseas, rather than in the gallery, say.
  • Etherpad-lite had no built-in interface for managing groups and users and private pads. This had to be done via an API, and there was no api client written for a system I was familiar with and lightweight enough for my poor vserver.
  • Etherpad's recursive iframes structure and complexity were going to make my life difficult as far as customization went.

The first problem actually simplified my life. My original reasoning for having a locally hosted etherpad instance was that it would allow us some network redundancy in the event of losing the external internet connection. With outside participants involved, I figured the effort was no longer worth the marginal benefits.

In retrospect this is probably worth exploring for future performances - we had a handful of interruptions over the weekend that interrupted the flow and feel of the work in-gallery.

Problem 2 - An API Client

We needed to get the participants to try out and start getting used to the interface fast. I threw up a public pad for the time being but it was clear we needed something better.

After a thorough reading of the API documentation I decided to write a Django app for this purpose. The results of my initial efforts are on github, and I welcome collaborators and user comments.

It took most of my free hours over a week to get everyone up and running on the pad. Finally I was ready to start hacking etherpad itself.

Then Julie called me with a new problem...

Problem 3 - We Need a Website!

So, every web developer I know started off doing at least one static site. And somewhere, buried deep in my head, is lodged the suspicion that a static webpage is much superior to a big CMS, at least when it comes to delivering a single page.

So when Julie sent me a write-up of the project and asked me to put it online at thinkwemust.org which had been, up until then, an empty page, I followed my instincts, and pasted her text into vim, threw a little markup around it, and added the domain name to my vhost conf. Three days later Julie was writing me asking me about web design and additional tabs.

I should have set up a CMS.

Since I didn't set up a CMS I had to come up with alternate solutions. Such as, rather than coding a commenting and image upload system, providing a public pad for visitors to comment on (like a souped-up guestbook on a 90s era Angelfire page) and using flickr to upload photos and embed them.

This did give me one unexpected surprise: people responded to the public pad like I have not seen people take to a commenting system since Facebook. At last count etherpad had 30 different authors in memory who had contributed to the public pad. For an unknown website with a week of social media promotion, that kind of participation is pretty impressive.

Problem 4 - Oh Yeah, Customizing Etherpad

I found etherpad-lite overwhelming, plagued with the sort of javascript developer problems that I have gotten used to seeing - one letter variable names and a paucity of comments - the sort of code written by people still in an "every bit counts" mentality who apparently never heard of maintaining separate development and minified versions of their code.

The community though, was awesome and welcoming, with people in the IRC channel and on github taking the time to help me find my way around.

I decided to break my etherpad problems into several parts, and take them one at a time, from simplest to most complex. The idea being that I would learn as a I went and be in a better position to realize more complex tasks as I went along.

I found it practical to work on a fork of the main etherpad-lite repo, in my own branch - I knew my hacks would not necessarily be the best way of accomplishing what I wanted, so I wanted to keep them separate from the existing branches of the project.

Styling Etherpad

I figured modifying the CSS would be simplest. I started in etherpad's static/custom subdirectory but I soon found that none of the existing css files offered me much of a chance to adjust the actual text editing area. So I dug around in the code and figured out how to add new custom css and js files that would scope to the outerdocbody and innerdocbody body elements in etherpad.

The bin/installDeps.sh file is called to actually create custom files:

--- a/bin/installDeps.sh
+++ b/bin/installDeps.sh
@@ -96,7 +96,7 @@ rm -f var/minified*

 echo "ensure custom css/js files are created..."

-for f in "index" "pad" "timeslider"
+for f in "index" "pad" "timeslider" "inner"
 do
   if [ ! -f "static/custom/$f.js" ]; then
     cp -v "static/custom/js.template" "static/custom/$f.js" || exit 1

but you actually have to load them somewhere next:

--- a/static/js/ace.js
+++ b/static/js/ace.js
@@ -265,6 +265,8 @@ function Ace2Editor()
       pushScriptsTo(iframeHTML);

       iframeHTML.push('');
+      iframeHTML.push('iframeHTML.push('&nb

       // Expose myself to global for my child frame.

After that styling was a simple matter of adding rules to inner.css, outer.css or pad.css depending on what I wanted to do.

Providing a Read-Only Interface

From our perspective the read-only version of pads in the master branch of etherpad-lite was extremely disappointing. Building off of my work from styling, I added a new view to etherpad, nicknamed spectator mode, that would be served at /s/padID. This involved:

Copying static/pad.html to static/spectactor.html, and editing the resulting file to load new custom files, spectator.js and spectator.css.

Then, adjusting node/server.js to add lines to handle the new url:

--- a/node/server.js
+++ b/node/server.js
@@ -259,7 +259,14 @@ async.waterfall([
           res.send(html);
       });
     });
-    
+   
+    //serve spectator.html under /s
+    app.get('/s/:pad', function(req, res, next)
+    {    
+      var filePath = path.normalize(__dirname + "/../static/spectator.html");
+      res.sendfile(filePath, { maxAge: exports.maxAge });
+    });
+
     //serve pad.html under /p
     app.get('/p/:pad', function(req, res, next)
     {    

Making the necessary adjustments to bin/installDeps.sh for the new custom files:

--- a/bin/installDeps.sh
+++ b/bin/installDeps.sh
@@ -96,7 +96,7 @@ rm -f var/minified*
 
 echo "ensure custom css/js files are created..."
 
-for f in "index" "pad" "timeslider" "inner" "outer"
+for f in "index" "pad" "timeslider" "inner" "outer" "spectator"
 do
   if [ ! -f "static/custom/$f.js" ]; then
     cp -v "static/custom/js.template" "static/custom/$f.js" || exit 1

Adding one condition to static/js/ace2_inner.js to keep people from editing the file:

--- a/static/js/ace2_inner.js
+++ b/static/js/ace2_inner.js
@@ -111,6 +111,10 @@ function Ace2Inner(){
 
   var root, doc; // set in setup()
   var isEditable = true;
+  // isEditable is overriden to false if we are on spectator mode
+  if (parent.location.pathname.substring(1,2) == 's') {
+    isEditable = false;
+  }
   var doesWrap = true;
   var hasLineNumbers = true;
   var isStyled = true;

(All that in a single commit)

Of course this set-up is extremely brittle, depending as it does on client-side javascript code to control access to the pad. Long term for something like this a much more complete rewrite is required. Worst of all, since I wasn't able to quickly discern how access control worked, I simply worked around by writing yet another cgi script that called the API, generated an author and session cookie, and allowed me to embed this new read-only version of the pad.

Replacing etherpads line numbers with usernames

At this point in the game I was five days away from the performance date, with two nights left before a first rehearsal. I was also in the middle of a rush at my day-job, so I sat down to deliver this central functionality at around 9pm last tuesday night, armed with a massive overdose of caffeine, and the incentive of a pint of ice-cream in the freezer.

I had already done a fair bit of digging around and my conclusion was that the key function in all of this was the widget that allows etherpad users to change their author colors. I had found several functions in the code that seemed promising but had still not managed to wrap my head around etherpad's event model.

So I took the easy way out. I wrote a little script in static/custom/pad.js, that became increasingly complex as I went along.

Lets take a look. First I declare two empty variables, edBod and lastLine, that are reused.

The customStart function is called in static/pad.js, and is etherpad-lite's answer to $(document).ready():

function customStart() {
  jQuery(document).ready( function() {

    // Try and retrieve the first iframe, containing the outer body
    ifr = jQuery('iframe');
    if (ifr.length < 1) {
      setTimeout('customStart()', 200);
      return;
    }

    // Try and retrieve the inner iframe
    ifr = jQuery(ChildAccessibleAce2Editor.registry[1].editor.getFrame().contentDocument.body).children('iframe');
    if (ifr.length < 1) {
      setTimeout('customStart()', 200);
      return;
    }

    // Set the value of edBod for later use
    edBod = ifr[0].contentDocument.body;

    // Try and retrieve more than one div
    edDivs = jQuery(edBod).children('div');
    if (edDivs.length < 2) {
      setTimeout('customStart()', 100);
      return;
    }

    // for this to work, we can't have people clearing the authorship
    jQuery('#clearAuthorship').toggle();
    setInterval("userNameLines()", 1000);
  });
}

The DOM isn't really fully baked when this function is called on a page - etherpad-lite calls a mountain of scripts to provide most of the actual document being edited, many of which will execute after customStart is fired. So I had to use those multiple setintervals combined with checks to make sure that all the elements I needed were present before calling my other functions.

The ChildAccessibleAce2Editor.registry[1].editor.getFrame().contentDocument.body object saved my night. This magical object, available from the top iframe of etherpad, contains all of the content of the outerdocbody. From there, you can get anything you want from inside of etherpad.

Lets take a look at the function that does the heavy lifting every second:

function userNameLines() {

  // Fetch the divs in the actual etherpad text
  edDivs = jQuery(edBod).children('div');

  // Fetch data about the various pad authors
  edAuths = clientVars.collab_client_vars.historicalAuthorData;

  // Fetch the side divs that contain the line numbers
  sideDivs = jQuery(ChildAccessibleAce2Editor.registry[1].editor.getFrame().contentDocument.body).find('div#sidediv').find('td#sidedivinner').children();

  // Iterate through the documents lines and get the authors
  for (i=lastLine;i<edDivs.length;i++) {
    sp = jQuery(edDivs[i]).find('span:first');
    if (sp.length>0 && !jQuery(sideDivs[i]).hasClass('usernamed')) {   
      cl = jQuery(sp).attr('class');
      if (cl.length > 0) {

        // Assume the first class in the string is the author class
        auth = cl.split(' ');
        auth = convertClass2Author(auth[0]);

        // Get the author object with the parsed id
        auth = edAuths[auth];

        // Inject the author name into the apropriate sideDiv
        jQuery(sideDivs[i]).text(auth.name);
        jQuery(sideDivs[i]).css('color', auth.colorId);
        jQuery(sideDivs[i]).addClass('usernamed')
        lastLine = i;
      }
    }
  }
}

The clientVars.collab_client_vars.historicalAuthorData was the other showstopper for me that night. It contains authorids, usernames and colors for every author who has ever edited the etherpad. The rest of this function is pretty straightforward iteration.

I originally had the document iterate through the entire etherpad before instituting lastLine, but this brought in significant performance issues on most people's machines once the pad got over 2000 lines or so, so I settled on this compromise and occasional refreshes.

You'll notice I call one more function, convertClass2Author - this function is a copy and paste from out of static/ace2_inner.js - I could not figure out a practical way of calling this function from pad.js directly:

function convertClass2Author(className) {
  if (className.substring(0, 7) == "author-")
  {
    return className.substring(7).replace(/[a-y0-9]+|-|z.+?z/g, function(cc)
    {
      if (cc == '-') return '.';
      else if (cc.charAt(0) == 'z')
      {
        return String.fromCharCode(Number(cc.slice(1, -1)));
      }
      else
      {
        return cc;
      }
    });
  }
}

The crux of my approach, then, is to use the authorship colors assigned to span elements within the document to generate author names. This is also why I had to so crudely disable the authorship colors button at the end of my customStart function.

My custom pad.js file got one new function this morning, which I used to add a new UI element to the pad that lets me export the contents of the pad in a table, preserving the username meta information in relation to the lines. This let me take down my abominable custom hack of a script to display the pad and put up some nice clean html instead, with a heartfelt sigh:

function dumpSpecMode() {
  dumper = '<div id="specdump" class="popup" style="display: none; position: absolute; top: 55px; right: 20px;"><h1>HTML export of spectator mode</h1><textarea readonly="readonly" cols="50", rows="10"></textarea></div>';
  button = '<li id="specdumplink" style="text-align: center"><a class="buttonicon" id="specdumplink" title="HTML export of spectator mode" style="background-image: none; color: #666; font-size: 16px;">♥</a></li>';
  jQuery('div#embed').after(dumper);
  jQuery('ul#menu_right').prepend(button);

  // Bind behaviour to the new menu button
  jQuery('ul#menu_right li#specdumplink').click(function() {
    specdump = jQuery('div#specdump')[0];
    if (jQuery(specdump).hasClass('slid-down')) {
      jQuery(specdump).slideUp();
      jQuery(specdump).removeClass('slid-down');
    }
    else {
      jQuery(specdump).slideDown();
      jQuery(specdump).addClass('slid-down');
      
      // Fetch and process the pad contents and insert it into the textarea
      sideDivs = jQuery(ChildAccessibleAce2Editor.registry[1].editor.getFrame().contentDocument.body).find('div#sidediv').find('td#sidedivinner').children();
      edDivs = jQuery(edBod).children('div');
      output = "<table>\n";
      for (i=0;i<edDivs.length;i++) {
        output = output + "  <tr><td class='" + jQuery(sideDivs[i]).attr('class') + "' style='" + jQuery(sideDivs[i]).attr('style') +"'>" + sideDivs[i].innerHTML + "</td><td class='" +jQuery(edDivs[i]).attr('class') + "'>" + edDivs[i].innerHTML + "</td></tr>\n";
      }
      output = output + "</table>";
      jQuery(specdump).find('textarea').val(output);
    }
  });
}

The whole file is available as a gist.

Scrolling via Autopilot

My last item on the list was a sort of autopilot. Unfortunately, I was not able to take the time to really dig into etherpad's event model, as I said, so the initial spec, of detecting changes in the document and automatically scrolling to them, flew out the window. Instead I wrote a simple script that kept the window scrolling down to the bottom of the document every few seconds, and used a javascript confirmation dialogue to let people opt out of that.

The Outcome

  • 52 hours logged of coding and development.
  • 519 hits to the performance pad
  • a high of 25 simultaneous spectators and rarely fewer than 6
  • 180 hits to public pad
  • 30 contributors to the public pad over the weekend

The energy of the collective writing experiment was amazing, with several gallery visitors staying several hours watching the projected text or participating on the public pad. All five of the participants, with relatively little experience with these kinds of tools before, were enticed into a more verbal, more immediate writing style.

At a party on Saturday night I watched some friends read the document to each other as a script - a possibility suggested by the user names as line numbers which I had not previously considered.

I am still re-reading and absorbing the text. I read it a couple times top to bottom, but I find it is more interesting hopping around the text, alighting on one interesting pass or another, or following one author's trajectory.

The public pad is still up, and we are hoping more people will contribute.

Comments

Alex Graveley's picture

You should really try out hackpad.com for this kind of collaboration in the future.

sfyn's picture

I've approved this blatant self-promotion because it is relevant and I wanted to respond. I had a look and hackpad is very interesting. It offers some of the same features I was trying to implement for thinkwemust. That said I couldn't have used the hackpad.com site itself, because we wanted far more control of user experience than it offers.

Give me the ability to embed your pads and control the style and available controls, and I might consider using hackpad for future projects of this type.

Based on this repo (https://github.com/hackpad/pad-mediawiki) hackpad is a fork of the original etherpad, based on mediawiki?

John McLear's picture

Despite it being self promotion Alex is a really nice guy and a great hacker!

John McLear's picture

Great job! :) Hopefully you can issue some pull requests helping us sort out the issues you had :) Cheers!

sfyn's picture

Thanks for the congrats. Yes I will very likely be making some pull requests once I sort things out.

Matt's picture

The public pad link is broken. It is missing ".org".

sfyn's picture

fixed - thanks!

Lucas Cioffi's picture

Great suggestions! Thanks-- this saved me a bunch of time.