Author Archives: Robert Brewer

About Robert Brewer

Senior Software Engineer at Tableau Software, Inc. We help people see and understand their data.

Nice LaTeX package: fixme

In writing my dissertation proposal, I came across this nice little LaTeX package called fixme. It provides new commands to insert notes on things that should be fixed in a document. A common way of putting these type of notes in a LaTeX document is to use comments in the .tex file, but the downside there is that there is no trace of it in the output document, so they are easy to forget about and people reviewing your PDF will never see them.

fixme is part of TeX Live 2009, so for most LaTeX users can just say \usepackage{fixme} in their preamble and start using it. You can add different levels of corrections (from note to fatal), and they are displayed in the document in a variety of formats like margin notes, footnotes, etc. The package also prepares a list of corrections which you can use to keep track of things you need to fix.

The nice thing is that when you switch your document from draft to final mode, fixme removes all the comments and the list of corrections, except for fatal ones, which cause compilation to fail. So all the notes about minor stuff that can wait until the next revision disappears from the final copy, but anything you marked as fatal will have to be fixed before you can generate a PDF.

Unlocking a protected PDF on Mac OS X

Recently I needed to demonstrate proof of purchasing something via my credit card statement. Easy enough, I download my most recent statement as a PDF file from American Express. Then I wanted to use Adobe Acrobat Pro’s nifty redaction features to redact all the irrelevant information from the appropriate page of the bill. Except Amex has decided that the statement should be a protected PDF, which means you can view it but cannot change it. This is of course totally bogus DRM, it’s my statement afterall! I suppose they hope to curb statement forgeries, but as anyone akamai knows: if I can view it, I can edit it. I think Preview.app on Mac OS X used to ignore DRM and let you edit protected PDFs, but doesn’t seem to on Snow Leopard.

I hunted around for a tool to unlock the PDF. There are lots of tools for Windows, which didn’t interest me. One person suggested opening the PDF and “printing” it to a PDF, but Adobe has disabled those features of the Print dialog box on Mac OS X (presumably since it would allow trivial circumvention of the DRM).

PDFKey Pro looks like a reasonable option for Mac OS X, but it is $25 which seems kinda steep for a single use. They have a downloadable demo, but it will just create an unlocked version of the first page of the PDF, which wasn’t the page I wanted. And of course I can’t edit the source PDF because it is protected, so the demo wasn’t useful to me.

Then I came upon MuPDF, which is a “lightweight PDF viewer and toolkit written in portable C”. It has an X11 GUI component, as well as command line tools. One of the command line tools is “pdfclean”, which will remove the DRM from a PDF.

Unfortunately, MuPDF isn’t in MacPorts yet, so I had to compile it by hand. It uses the Perforce jam tool instead of make, and has three library dependencies: zlib, libjpeg, and freetype2. Luckily, all of these are available in MacPorts, so I was able to install them and then edit the Jamrules file to point at the MacPorts location. Here is the updated section of Jamrules:


if $(OS) = MACOSX
{
    Echo Building for MACOSX ;

    BUILD_X11APP = true ;

    CCFLAGS = -Wall -std=gnu99 -I/opt/local/include -I/opt/local/include/freetype2 ;
    LINKFLAGS = -L/usr/X11R6/lib -L/opt/local/lib ;
    LINKLIBS = -lfreetype -ljpeg -lz -lm ;
    APPLINKLIBS = -lX11 -lXext ;

    if $(BUILD) = debug   { OPTIM = -g -O0 -fno-inline ; }
    if $(BUILD) = release { OPTIM = -O3 ; }

    if $(HAVE_JBIG2DEC) { LINKLIBS += -ljbig2dec ; }
    if $(HAVE_OPENJPEG)    { LINKLIBS += -lopenjpeg ; }
}

pdfclean worked like a charm, removing the DRM from the statement. After that I was able to redact the statement without incident.

Perhaps in my copious spare time I will make a MuPDF portfile for MacPorts, but until then perhaps this will help others who want an open source way to remove bogus PDF DRM.

The null ritual

Philip and I were discussing the design of my dissertation experiment, and he pointed me at an interesting book chapter titled “The Null Ritual: What You Always Wanted to Know About Significance Testing but Were Afraid to Ask“. It’s fascinating reading, as it walks through a lot of false beliefs about significance testing as used by psychologists in experiments. I found that my understanding of significance testing was definitely incorrect in the ways described in the chapter.

The “null ritual” from the title is described as:

  1. Set up a statistical null hypothesis of “no mean difference” or “zero correlation.” Don’t specify
    the predictions of your research hypothesis or of any alternative substantive hypotheses.
  2. Use 5% as a convention for rejecting the null. If significant, accept your research hypothesis.
  3. Always perform this procedure.

The problem is that the null hypothesis test is p(D|H0), or the probability of obtaining the observed data given that the null hypothesis is true. When doing an experiment, any real world scientist will have a hypothesis that they are testing and usually hope that they can prove that it is true using the data from the experiment. What we really want is p(H1|D), or the probability of our hypothesis being true given the observed data. However, we need Bayes’ rule to draw a conclusion about the hypothesis and that requires the prior probabilities of the hypotheses, which are often not available to us beforehand.

The chapter also brings out the controversies in statistics between different approaches and goals of particular techniques, which is usually glossed over in teaching of statistics.

I’m planning to follow the authors’ recommendation in my research: “In many (if not most) cases, descriptive statistics and exploratory data analysis are all one needs.”

Rebuild Hawaii Consortium March 2010 meeting

I attended the Rebuild Hawaii Consortium quarterly meeting last week. I had never attended any of their meetings before, and I was somewhat surprised at the sizable number of people in attendance (40? 50?). It was held in a large stadium-style conference room at the Hawaii Convention Center. I had checked the agenda in advance, and thought I could arrive at 10 AM and still see everything I wanted to, but apparently the agenda changed since it was posted on the website.

The talk I missed that I wish I had seen was by Luis Vega on the Hawaii National Marine Renewable Energy Center. His slides look very interesting, lots of hard-nosed cost comparisons of wave and OTEC electricity generation.

Paul Norton have a talk on Zero Energy Buildings, which was interesting. I attended his REIS seminar where he covered some of the same things, but this was focused on ZEB. Some points I found particularly interesting:

  • The introduction of air conditioning leads to a 70% increase in electricity use
  • The key conceptual shift is thinking about the monthly cost of a home being the mortgage + utility bill.
  • The efficiency / photovoltaic balance point is the point at which adding generation via PV is the same cost as additional efficiency measures
  • A cost neutral design (monthly cost is same as a home built to code) that uses efficiency and PV results in an 85% reduction in home electricity usage
  • Once major efficiency measures are in place (solar water heating, efficient lighting & air conditioning, insulation), the major remaining load is appliance plug loads
  • In one military housing complex on Oahu, there is a 4x difference in electricity usage between houses with identical efficiency measures. Presumably the differences are due to appliance purchases and behavior.
  • In a group of homes in Las Vegas, the difference was 5x
  • Further, the differences were fairly continuous: there is no nice average plateau
  • PV inverters on the neighbor islands have been causing problems because the utility frequency can sag during periods of high usage. By default, the inverters are set to disconnect from the grid when the frequency drops below 59.3 Hz, so inverters all over turn off, which puts additional strain on the utility, exacerbating the problem. Reducing that threshold frequency to 57 Hz can help. Thus there is a lot of research still to be done on renewable integration.

Another presentation was on HCEI and smart grid initiatives at PACOM. They are working on a project called SPIDERS that is trying to address the fact that access to electricity is a critical need for the military. One thing I was stunned to learn was that people living in military housing don’t pay for electricity! Thus they have no financial incentive at all to reduce their energy usage. Slide 8 shows an actual graph of HECO’s demand and generation for one particular day. Our work on OSCAR was all based on vague outlines of what the demand curve looks like, so it was great to see it “in the flesh”.

There was a lot of good information at the meeting, so I’m planning to attend in the future. Next meeting is June 2.

Community-Based Social Marketing workshop

Today I attended an all day workshop on Community-Based Social Marketing by Dr. Doug McKenzie-Mohr. While it may have originally been intended as a workshop, I think the number of participants was doubled to over 100 people, so it ended up being mostly a long lecture with a lot of question-taking.

I was already somewhat familiar with the CBSM from reading a condensed version of the method on the web. CBSM is an attempt to modularize and standardize a process for “fostering sustainable behavior” across a range of domains. It’s based on a wide variety of social psychology research on how to get people to actually change their behavior. One major takeaway is that big mass-media campaigns to promote behavior change are not very effective for the money spent on them.

CBSM process has 5 steps

  • Selecting the behavior(s) that you wish people to adopt
  • Assess the barriers and benefits people face adopting the behaviors
  • Develop strategies to foster the behavior changes
  • Run a pilot project that tests your action plan
  • Implement your plan broadly and evaluate its effectiveness

The development section suggests the use of a variety of psychological “tools” that can help people change their behavior such as making public commitments, social diffusion, social norms, and prompts. The design of the Kukui Cup competition and the supporting website is already strongly influenced by these tools.

The target audience for the workshop and CBSM in general are government and NGOs that have a mission to foster some type of behavior change, which is somewhat different from our situation with the Kukui Cup. The Kukui Cup can be thought of as a sort of applied research into using these CBSM techniques but with a very narrow market segment (first-year college students) and making extensive use of a customized website. Rather than focus on encouraging a small number of behaviors in participants to achieve our goals of increased energy conservation and energy literacy, we are giving participants a smorgasbord of options via the website and trying to figure out which ones work the best based on what the participants do.

I was struck by how little the Internet and WWW were mentioned in the workshop. CBSM is clearly labor intensive compared to a traditional mass media campaign, but the claim is that CBSM delivers better results (i.e. more desired behavior) than informational campaigns. From a CBSM perspective, the Kukui Cup seeks to determine how much of that traditional CBSM labor can be embodied in the website, and whether the web-based CBSM retains the effectiveness of traditional CBSM. Does a web community provide the same benefits as real-world community? At least with the Kukui Cup the web community will mirror a series of small real world communities: floors of a dorm.

That said, there are some cautions from the workshop that are worrisome. There was a lot of emphasis on surveying the people the campaign is targeting, and running a pilot study. This makes a lot of sense, and from a certain perspective, the first Kukui Cup could be thought of as a pilot, though it will be a bigger pilot than most due to the infrastructure required to make it a competition. Obviously the risk is that the whole Kukui Cup could flop (nobody uses the website, general apathy towards the competition, no energy conservation, etc), which would pose significant problems for my graduation timeline. 🙂 Doug related a story of his own biggest failed campaign, which turned out to be his own dissertation project!! At great expense he created a media campaign across Canada that cost $100K to encourage people to participate in policy meetings in their communities. The ads included a 1-800 number, and his organization braced for tens of thousands of calls. Number of calls received: 8.

Using XPath to pick data out of XML

This week I wrote a WattDepot sensor for the TED 5000 home energy meter. The TED 5000 gateway (a small Internet-connected embedded computer) provides a URI that generates XML showing the current power data. First, I needed to figure out what the XML meant. Once that was done, I wanted a quick and simple way to pick out the 2 pieces of data from the XML that I care about using Java.

WattDepot uses JAXB extensively for XML processing, but that was kinda heavyweight for my needs here. I had heard about XPath, and it sounded like the right type of tool for just grabbing a little data from XML. Turns out that Java 1.5 and later have XPath built-in, so there’s no additional dependencies.

IBM has a good tutorial on using XPath from Java by Elliotte Rusty Harold. Unfortunately, I was confused initially because all the XPath examples in the tutorial are for finding all XML nodes in a document that meet certain criteria, whereas I knew exactly where in the XML tree my data was lurking. Luckily, it turns out that XPath is really a lot like a path in a filesystem (duh), so traversing the tree is easy.

Say you have the following XML from TED (some parts elided):

<LiveData>
  ...
  <Power>
    <Total>
      <PowerNow>2995</PowerNow>
      ...
      <PowerMTD>515227</PowerMTD>
      ...
    </Total>
  ...
  </Power>
</LiveData>

The XPath that would pull out the value from PowerNow is /LiveData/Power/Total/PowerNow/text(), and for PowerMTD it is /LiveData/Power/Total/PowerMTD/text(). Simple!

Here a code fragment that extracts those two values from an XML file (stealing liberally from the XPath tutorial linked above):

public class XPathTest {

  public static void main(String[] args) throws ParserConfigurationException, SAXException,
      IOException, XPathExpressionException {
    if (args.length != 1) {
      System.out.println("Need XML filename arg.");
      return;
    }
    DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
    domFactory.setNamespaceAware(true);
    DocumentBuilder builder = domFactory.newDocumentBuilder();
    Document doc = builder.parse(args[0]);

    XPathFactory factory = XPathFactory.newInstance();
    XPath powerXpath = factory.newXPath();
    XPath energyXpath = factory.newXPath();
    XPathExpression exprPower = powerXpath.compile("/LiveData/Power/Total/PowerNow/text()");
    XPathExpression exprEnergy = energyXpath.compile("/LiveData/Power/Total/PowerMTD/text()");
    Object powerResult = exprPower.evaluate(doc, XPathConstants.NUMBER);
    Object energyResult = exprEnergy.evaluate(doc, XPathConstants.NUMBER);

    Double power = (Double) powerResult;
    Double energy = (Double) energyResult;
    System.out.println("Power from TED 5K: " + power + "W");
    System.out.println("Energy from TED 5K month to date: " + energy + "Wh");
  }
}

It’s nice to have a quick and easy way to make use of XML from Java in my toolbox.

it’s electric: TED data storage and plotting

I was checking on the website for The Energy Detective the other day looking for API info, and found that their page of 3rd-party applications had been updated, and included an application called it’s electric. it’s electric is a Java web application that queries the TED gateway frequently for the 1 second resolution power data, and stores it in a Berkeley DB. That alone is useful, as the TED has a segmented data storage system, keeping the 1 second resolution data only for an hour (and so on for coarser grained data).

It also provides a graphing system based on Google’s Annotated Timeline visualization, with some enhancements like automatically changing the resolution of the displayed data depending on the time interval displayed. Here’s a screenshot:

Screenshot of graph produced by it's electric

There’s a Google group for support and discussion, and the author Robert Tupelo-Schneck seems quite responsive. A jar file is provided on the group page (which I won’t link to since you should download the latest version), which includes the Java bytecode as well as the source, which is released under the AGPL license. The application is not large, consisting of 5 class files.

Compared to WattDepot, it’s electric seems considerably snappier. Presumably this is due in part to using Berkeley DB for persistence instead of an SQL database. The code also stores data in byte form, rather than higher-level Java objects and XML. Also, it’s electric occupies a clear functionality niche: it provides long-term storage of the finest-grained TED data (which is otherwise lost every hour), and provides graphing of that data from locations outside the home network.

I experienced some problems when scrolling around the data on the live it’s electric website, sometimes the graph would not update, or I was unable to scroll to where I wanted to apparently because new data was being loaded in for the current location.

Overall it’s electric looks like it could be useful for TED owners that want to hold on to that fine grained data, and want more options for displaying that data outside the home.

WattDepot going “real-time”

In the past week I have added a new REST API method to support near-real-time queries in WattDepot. The goal is to support user interface widgets that display the latest sensor data from a source, such as a smart meter in a home or dormitory. I have also written a command line monitoring client that shows how to use the new functionality. Both of these will be released as part of WattDepot 1.2 in the near future, hopefully with the addition of a sensor that collects data from TED 5000 home smart meters.

Speaking of sensors, I created a wiki page that explains how to write a WattDepot sensor. This should be helpful for anyone planning to write a sensor to support a new type of meter.

In other WattDepot news, there are three projects in ICS414 this semester that are related to WattDepot. The WattDepot Apps team are working on demonstration web applications for WattDepot. The first application is a visualizer that makes use of the Google Visualization API to make graphs of WattDepot data. It should be ready for a 1.0 release very soon. Next they will be moving on to create a web application that monitors the latest sensor data from a source using the new API method. In the future, hopefully they will be working on a browsing application that lets users look over the users and sources in a WattDepot repository.

The Stoplight Gadget team is working on a Google Visualization gadget that checks a data source for a value, and based on user-settable thresholds displays a traffic light as either red, yellow, or green. While this is a general-purpose visualization gadget, we expect to use it with WattDepot data as part of the UH dorm energy competition, though precisely how is yet to be determined.

Finally, the Energy Meter team is surveying power meters that can be used for the UH dorm energy competition. While they have been in a data gathering phase so far, they are now switching to implement a Modbus/TCP sensor for WattDepot. This sensor will be used to collect data from the floors of the dorms in the energy competition.

Debugging Restlet connector problem

In the course of developing WattDepot, I ran into an annoying intermittent bug in my JUnit tests. I would sometimes get a failure in one particular test class, but not always in the same method of that class. The failure manifested as a 60 pause on the affected test, followed by the WattDepotClient method returning a 1001 miscellaneous failure status code. Maddeningly, it would only fail sometimes, making it much harder to track down (and making continuous integration comical). Further, running the test from within Eclipse would work fine every time, so I was unable to use the debugger to figure out what was going on.

Philip pointed out that this sounded like a classic deadlock problem between threads, perhaps in Derby which I’m using for persistence. He suggested that I use VisualVM to see if I could track down any deadlocks. Mac OS X comes with VisualVM installed as “jvisualvm”, and it’s pretty easy to use. Luckily, since the failure manifested as a 60 second pause, I could start the test, and then attach to the JUnit process and obtain thread dumps to see what was going on.

After a few thread dumps, I tracked it down to HTTP communication. The failure happens when the client is using PUT to send a new resource to the server, and the server is waiting for the end of the entity body from the client. This happens before any Derby call, so it looks like Derby is ruled out (at least for this bug).

WattDepot uses the Restlet framework to make it easier to implement the REST API, and to perform all the HTTP client and server work. Restlet provides a variety of connectors for both the client and server HTTP connections. In fact, there are enough options that it is somewhat confusing trying to pick one. Restlet has internal HTTP client and server connectors that come in the core Restlet jars. According to this email thread, the choice of connector is done automatically by scanning the classpath, with the first match winning.

When first setting up WattDepot, I based the set of Restlet jars I was using on Hackystat. Hackystat’s SensorBase includes org.restlet.jar (API classes), com.noelios.restlet.jar (reference implementation, including internal HTTP connectors), com.noelios.restlet.ext.net.jar (client connector based on JDK HTTP code), and com.noelios.restlet.ext.simple_3.1.jar (server connector based on Simple framework). So it appears that WattDepot is using the Net connector for client HTTP connections, and the Simple connector for server connections, both overriding the internal HTTP connections in the reference implementation.

Since my problem was taking place in the HTTP code, I decided to try experimenting with removing Net and Simple from the classpath, thereby allowing the appropriate internal HTTP connector to kick in. Since I’m using Ivy and Ivy RoundUp for dependency management, this turns out to be as easy as changing the configuration parameter in the Restlet Ivy config, deleting the project “lib” directory and rerunning the tests.

After trying all combinations (all internal connectors, internal server & Net client, Simple server & internal client, Simple server & Net client), I found that only the combination of the Simple server connector and the Net client connector leads to my unit test failure. I guess I’m just lucky that way. 🙂

The solution is then to stop using either the Net client or the Simple server. Since the WattDepot server is likely to be the more performance-sensitive aspect of WattDepot, I opted to keep the Simple server on the assumption that it is higher performance than the internal Restlet server. It would be nice to figure out which of the variety of client and server connectors is recommended as the best performing, but this will do for now.

In the future I plan to post something to the Restlet mailing list to see if anyone else has run into this problem so it can be tracked down and perhaps fixed.

Future publication venues

Update March 5, 2010: now maintained as a wiki page rather than a blog post

Updated Feb 2, 2010: added IEEE upcoming journals and IEEE Sensors conference.

So I’ve been thinking lately about where I might publish my research on WattDepot, and later on the UH dorm energy challenge. Here’s what I have come up with so far.

Conferences:

  • IEEE Smart Grid Conference. The deadline for papers is May 1, with the conference happening Oct 4-6 in Maryland. Philip has suggested this might be a good place for a paper on WattDepot, and I agree. The maximum page length is 6 pages.
  • Behavior, Energy, Climate Change 2010. Philip attended BECC 2009 and it seems like an ideal conference for the dorm energy competition results. The call for abstracts (presentations & posters) goes out in March, with a mid-May deadline to submit abstracts. The conference is November 14-17 in Sacramento. There is no paper required for the presentation, just slides, so we could potentially present some actual results in the presentation (which wouldn’t be available in May when the abstract is submitted).
  • Hawaii International Conference on System Sciences 44. This is happening on Kauai January 4-7 2011. The deadline for papers is June 15. The Jennifer Mankoff’s group has had a series of papers about StepGreen and related work at HICSS so this seems like a good venue, and the travel will be much easier. 🙂
  • CHI 2011. Apparently this is happening May 7-12 in Vancouver (BC I assume?). There have been a variety of papers on supporting green/sustainable behavior in CHI before, and the CHI community is a large and vibrant one.
  • Ubicomp 2010. There was plenty of sustainability work at Ubicomp 2008 (including a workshop I attended), so this is a possible venue. However, the submission deadlines are rather soon (March for papers) so it’s probably more realistic for 2011.
  • Pervasive 2011. This is similar to Ubicomp, but happens in May in Europe. Submission deadline is mid-October.
  • IEEE Sensors 2010. May 4 is the abstract submission deadline, with the conferencing happening November 1-4 at the Hilton Waikaloa. Maximum paper length is 4 pages. This is perhaps less relevant than the IEEE Smart Grid conference, but still worthy of consideration.

Journals:

  • International Journal of Sustainability in Higher Education. This is where the Oberlin dorm energy contest paper was published, so it seems an obvious choice. Not the broadest appeal though.
  • Environment & Behavior.
  • IEEE Transactions on Smart Grid. Journal to be launched soon.
  • IEEE Transactions on Sustainable Energy. Another journal to be launched soon. The smart grid one looks more relevant, this looks to be focused on energy generation from renewables.

I’m sure more will come up as I read more, but this is a good starter list.