Contents |
Introduction
Piggy bank runs scrapers it grabs from arbitrary third parties. We need formal technical, as well as social rituals and supporting technical mechanisms for controlling the risk this creates.
Formal technical mechanisms certainly means finding a way to run the scrappers in a 'sandbox' of some kind that limits the damange they can do. For example we could mimic this the what GreaseMonkey does, or we roll our own solution (see below for why that's desirable).
Social mechinisms (and supporting tools) certainly means ways that users can be aware of changing scrapers and are given the opportunity to assess if they accept the risk of downloading and updating scrapers; including tools that allow them to clearly see the particulars of the changes.
Sandboxing
Possible Approaches
- Mimic GreaseMonkey
- Pro: This is probably well tested, widely reviewed, and the code is at hand.
- Con: Page authors can frustrate scraping using the techniques available to [frustrate GreaseMonkey]
- Con: Possible that arbitrary pages could mimic a scraper
- Create our own sandbox
- Create empty scope and add back only what we approve as safe, run 3rd party script there.
- Pro: Sounds great
- Con: We don't know how.
- Remove from scope dangerous things
- Pro: We know how, we even do this already to a limite extent.
- Con: Enumerating the long tail of danger is impossible.
- Create empty scope and add back only what we approve as safe, run 3rd party script there.
Inspirations for the Sandbox
- Yay! Firefox 1.5 implements evalInSandbox that seems to be doing *exactly* what we need! A little example is extracted from the IDL definition
var s = new C.u.Sandbox("http://www.mozilla.org");
var res = C.u.evalInSandbox("var five = 5; 2 + five", s);
var outerFive = s.five;
s.seven = res;
var thirtyFive = C.u.evalInSandbox("five * seven", s);
void evalInSandbox(in AString source/*, obj */);
Firefox uses this call in javascript only once but it gives good inspiration:
onStopRequest: function(request, ctxt, status) {
if(!ProxySandBox) {
ProxySandBox = new Sandbox();
}
// add predefined functions to pac
var mypac = pacUtils + pac;
ProxySandBox.myIpAddress = myIpAddress;
ProxySandBox.dnsResolve = dnsResolve;
ProxySandBox.alert = proxyAlert;
// evaluate loaded js file
evalInSandbox(mypac, ProxySandBox, pacURL);
LocalFindProxyForURL=ProxySandBox.FindProxyForURL;
this.done = true;
},
browsing around in other branches yields another instance of usage which shows you how scary things can get in a browser hold together by javascript:
_isTrustedWindow: function(obj) {
var s = new Components.utils.Sandbox("http://localhost.localdomain.:0/");
/* Some notes:
* 1. Doing an instanceof check outside of the sandbox is not safe because
* it would call the QueryInterface method of an untrusted object.
* 2. Inside the sandbox (which does not have chrome privileges), the
* QueryInterface method of an untrusted object will never get called
* since it has a different origin.
* 3. We cannot check whether the object is an instance of nsIDOMWindow
* because XPConnect wraps the window argument as an nsIDOMWindow
* due to the argument type (nsIDOMWindow, suprise suprise).
*/
s.nsIInterfaceRequestor = Ci.nsIInterfaceRequestor;
s.obj = obj;
const IS_TRUSTED_CODE = "obj instanceof nsIInterfaceRequestor;"
return Components.utils.evalInSandbox(IS_TRUSTED_CODE, s);
},
- Thread on dev-security@moz about the problem of passing trusted objects to untrusted code running in a sandbox.
- The Javascript Language Specification is going to tell us ultimately what's possible and what's not.
- Detailed explaination of the inner workings of Javascript in relation to closures suggests that the javascript statement with can be used to change the execution scope of a function and that a function can have only the global object in the scope (but never an empty scope as there is no way to programmatically remove an object from the scope).
- A clever trick to introduce private and priviledge access to Javascript objects suggests ways to use closures to form scope-locking scenarios.
- The Javascript function watch could be used by us to invalidate a script in case they try to change anything in the execution scope's objects we pass to the scraper.
- XPCNativeWrapper Documentation gives a detailed description of what the access protection wrappers do.
Questions
- What does ChickenFoot do about this risk?
Collaborative Development: Possible Approaches
Strict collaborative schemes slow down everything, so the stronger the sandboxing scheme the better.
There are principles/points/whatever; all tentative at this point.
- No script should be installed without prompting the end user.
- No change should be installed without give the user to see what changed, though that might be just transcript of diff and commit messages.
- Some scripts should be changed only by members of trusted groups.
- Versioning and release points are good.
- Sign off schemes are nice.
- Voting schemes are nice.
- Voting, signoff, trust groups - all have issues about what group gets the franchise.
- We need something simple sooner, rather than something complex.
- 'Commit then review' is much much better than 'review then commit'.
- The wiki model confounds commit and release.
- Code should be fetched at the same time as metadata to avoid version skew?
See Also
- Example of GreaseMonkey's PR Debacle
- Technical Description of GreaseMonkey's Troubles.