Thursday, April 29, 2010

A require() for jQuery

I had a fun time at the Bay Area jQuery Conference. Great people, and I learned some neat things.

In the conference wrap-up, John Resig mentioned some requirements he has for a jQuery script loader:

1) script loading must be async

2) script loading should do as much in parallel as possible. This means in particular, that it should be possible to avoid dynamic nested dependency loading.

3) it looks like a script wrapper is needed to allow #1 and #2 to work effectively, particularly for cross-domain loading. It is unfortunate, but a necessity for script loading in browsers.

I believe these requirements mesh very well with RequireJS. I will talk about how they mesh, and some other things that should be considered for any require() that might become part of jQuery.

Async Loading

As explained in the RequireJS Why page, I believe the best-performing, native browser option for async loading is dynamically created script tags. RequireJS only uses this type of script loading, no XHR.

The text plugin uses XHR in dev mode, but the optimization tool inlines the text content to avoid XHR for deployment. Also, the plugin capability in RequireJS is optional, it is possible to build RequireJS without it. That is what I do for the integrated jQuery+RequireJS build.

Parallel Loading

John mentioned that dynamic nested dependency resolution was slower and potentially a hazard for end users. Slow, because it means you need to fetch the module, wait for it to be received, then fetch its dependencies. So the module gets loaded serially relative to its dependencies. Potentially hazardous because a user may not know the loading pattern.

The optimization tool in RequireJS avoids the parallel loading for nested dependencies, by just inlining the modules together. The optimization tool can also build files into "layers" that could be loaded in parallel.

For each build layer, there is an exclude option, in which you can list a module or modules you want to exclude. exclude will also exclude their nested dependencies from the build layer.

There is an excludeShallow option if you just want specific modules to exclude, but still want their nested dependencies included in the build layer. This is a great option for making your development process fast: just excludeShallow the current module you are debugging/developing.

While dynamically loading nested dependencies can be slower than a full parallel load, what is needed is listing dependencies individually for each module. There needs to be a way to know what an individual file needs to function if the file is to be portable in any fashion. So the question is how to specify those dependencies for a given file/module.

There are schemes that list the dependencies as a separate companion file with the module, and schemes that list the dependencies in the module file. Using a separate file means the module is less portable -- more "things" need to follow the module, so it makes copy/pasting, just distributing one module more onerous.

So I prefer listing the dependencies in the file. Should the dependencies be listed in a comment or as some sort of script structure?

Comments can be nice since they can be stripped from the built/optimized layer. However, it means modules essentially need to communicate with each other through the global variable space. This ultimately does not scale -- at some point you will want to load two different versions of a module, or two modules that want to use the same global name, and you will be stuck. For that reason, I favor the way RequireJS does it:

require.def("my/module", ["dependency1"], function (dependency1) {
//dependency1 is the module definition for "dependency1"

//Return a value to define "my/module"
return {
limit: 500,
action: function () {}
};
});
With this model, dependency1 does not need to be global, and it allows a very terse way to reference the module. It also minifies nicely. By using string names to reference the modules and using a return value from the function, it is then possible to load two versions of module in a page. See the Multiversion Support in RequireJS for more info, and the unit tests for a working example.

This model also frees the jQuery object from namespace collisions by allowing a terse way to reference modules without needing them to hang off of the jQuery object. There are many utility functions that do not need to be on the jQuery object to be useful, and today the jQuery object itself is starting to become a global of sorts that can have name collisions.

Script Wrapper

Because async script tags are used to load modules, each script needs to be wrapped in a function wrapper, to prevent its execution before its dependencies are ready. CommonJS recognizes this concern (one of the reasons for their Transport proposals) and so does YUI3. xdomain builds for Dojo also use a script wrapper.

While it is unfortunate -- many people are not used to it -- it ends up being an advantage. Functions are JavaScript's natural module construct, and it encourages well scoped code that does not mess with the global space. For RequireJS, that wrapper is called require.def, as shown above.

Here are some other things that should be considered for a require implementation:

require as a global

I believe it makes more sense to keep require as a global, not something that is a function hanging off of the jQuery object. require can be used to load jQuery itself, and as mentioned above, it would be possible to load more than one version of jQuery if it was constructed like this.

CommonJS awareness

The CommonJS module format was not constructed for the browser, but having an awareness of their design goals and a way to support their modules in the browser will allow more code reuse. RequireJS has an adapter for the CommonJS Transport/D proposal, and it has a conversion script to change CommonJS modules into RequireJS modules.

In addition, RequireJS was constructed with many of the same design goals as CommonJS: allow modules to be enclosed/do not pollute the global space, use the "path/to/module" module identifiers, have the ability to support the module and exports variables used in CommonJS.

Browsers need more than a require API

They also need an optimization/build tool that can combine modules together. RequireJS has such a system today. It is server-independent, a command line tool. It builds up the layers as static files which can be served from anywhere.

I am more than happy to look at a runtime system that uses the optimization tool on the server. RequireJS works in Node and in Rhino. The optimization tool is written in JavaScript and uses require.js itself to build the optimization layers.

I can see using either Node or Rhino to build a run-time server tool to allow combo-loading on the fly. Using Rhino via the Java VM has an advantage because Closure Compiler or YUI Compressor could be used to minify the response, but I am open to some other minification scheme that is implemented in plain JavaScript.

Loader plugins

I have found the text plugin for RequireJS to be very useful -- it allows you to reference HTML templates on disk and edit HTML in an HTML editor vs. dealing with HTML in a string. The optimization tool is smart enough to inline that HTML during a build, so the extra network cost goes away for deployment.

In addition, Sean Vaughan and I have been talking about support for JSONP-based services and scripts that need extra setup besides just being ready on the script onload event. I can see those as easy plugins to add that open up loading Google Ajax API services on the fly.

For these reasons I have found loader plugins to be useful. They are not needed in the basic case, but they can make overall dependency management better.

script.onload

Right now RequireJS has support for knowing when a script is loaded by waiting for the script.onload event. This could be avoided by mandating that anything loaded via require() register via require.def to indicate when it is loaded.

However, by using script.onload it allows some existing scripts to be loaded without modification today, to give people time to migrate to the require.def pattern. I am open to doing a build without the script.onload support, however the amount of minified file savings will not be that great.

Explicit .js suffix

RequireJS allows two different types of strings for dependencies. Here is an example:
require(["some/module", "http://some.site.com/path/to/script.js"]);
"some/module" is transformed to "some/base/path/some/module.js", while the other one is used as-is.

The transform rules for a dependency name are as follows: if the name contains a colon before a front slash (has a protocol), starts with a front slash, or ends in .js, do not transform the name. Otherwise, transform the name to "some/base/path/some/module.js".

I believe that gives a decent compromise to short, remappable module names (by changing the baseUrl or setting a specific path via a require config call) to loading scripts that do not participate in the require.def call pattern. There is also a regexp property on require that can be changed to allow more exceptions to the rules.

However, if this was found insufficient, I am open to other rules or a different way to list dependencies. The "some/module" format was chosen to be compatible with CommonJS module names, but probably some algorithm or approach could be used to satisfy both desires.

File Size/Implementation

Right now the stock RequireJS is around 3.7KB minified and gzipped. However, there are build options that get the size down to 2.6KB minified and gzipped by removing some features:
  • plugin support
  • require.modify
  • multiversion support (the "context" switching in RequireJS)
  • DOM Ready support

I am open to getting that file size smaller based on the feature set that needs to be supported.

3 layer loading

John mentioned a typical loading scenario that might involve three sections:

1) loading core libraries from a CDN (like jQuery and maybe a require implementation)
2) loading a layer of your common app scripts
3) loading a page-specific layer

RequireJS can support this scenario like so today:

<script src="http:/some.cdn.com/jquery/1.5/require-jquery.js"></script>
<script>
require({
baseUrl: "./scripts"
},
["app/common", "app/page1"]
);
</script>

Then the optimization tool instructions would look like so:
{
modules: [
{
//inside app/common.js there is a require call that
//loads all the common modules.
name: "app/common",
exclude: ["jquery"]
},
{
//app/page1 references jquery and app/common as a dependencies,
//as well as page-specific modules
name: "app/page1",

//jquery, app/common and all their dependencies will be excluded
exclude: ["jquery", "app/common"]
},
... other pages go here following same pattern ...
]
}
This would result in app/common and app/page1 being loaded async in parallel. If require.js was a separate file from jquery.js, the following HTML could be used to load jQuery, app/common and app/page1 async and in parallel (the optimization instructions stay the same):

<script src="http:/some.cdn.com/jquery/1.5/require.js"></script>
<script>
require({
baseUrl: "./scripts",
paths: {
"jquery": "http:/some.cdn.com/jquery/1.5/jquery"
}
},
["jquery", "app/common", "app/page1"]
);
</script>
Those configurations work today.

However, it is not quite flexible enough -- typically modules that are part of app/page1 will not want to refer to the complete "app/common" as the only dependency, but specify finer-grained dependencies, like "app/common/helper". So the above could result in a request for "app/commom/helper" from the "app/page1" script, depending on how fast "app/common" is loaded.

So I would build in support for the following:

<script src="http:/some.cdn.com/jquery/1.5/require.js"></script>
<script>
require({
baseUrl: "./scripts",
paths: {
"jquery": "http:/some.cdn.com/jquery/1.5/jquery"
},
layers: ["jquery", "app/common", "app/page1"]
},
["app/page1"]
);
</script>
Notice the new "layers" config option, and now the required modules for the page is just "app/page1". The "layers" config option would tell RequireJS to load all of those layers first, and find out what is in them before trying to fetch any other dependencies.

This would give the most flexibility in coding individual modules, but give a very clear optimization path to getting a configurable number of script layers to load async and in parallel. I will be working on this feature for RequireJS for the next release.

Summary

Hopefully I have demonstrated how RequireJS could be the require implementation for jQuery. I am very open to doing code changes to support jQuery's desires, and even if jQuery or John feel like they want to write their own implementation, hopefully we can at least agree on the same API, and maybe even still use the optimization tool in RequireJS. I am happy to help with an alternative implementation too.

I know John and the jQuery team are busy, focusing mostly on mobile and templating concerns, but hopefully they can take the above into consideration when they get to script loading.

In the meantime, I will work on the layers config option support, improving RequireJS, and keeping my jQuery fork up to date with the changes. You can try out RequireJS+jQuery today if you want to give it a spin yourself.

7 comments:

Anonymous said...

LABjs loads *everything* in parallel, but maintains execution order when necessary. It does so with normal script tags (or XHR if you want) and no requirement for script wrappers or other stuff like that.

The only part that LABjs doesn't itself natively support is the idea of "discovering" nested dependencies and handling them. However, this would be an easy server-side add-on (working on one in fact!) to scan files and find expressed nested dependencies and then surface them all up to the single global $LAB loading chain in the proper order.

Why would that solution not accomplish what you're saying John "requires" for "require"?

Unknown said...

>>> The only part that LABjs doesn't itself natively support is the idea of "discovering" nested dependencies and handling them. However, this would be an easy server-side add-on...

The ability to load nested dependencies is important to me and I'd like to be able to work with static files w/o the server or any server addons.

James Burke said...

getify: having an build tool for LABjs would help it meet the goals outlined above. I believe it is required to get to the goal of parallel downloads for things that have nested dependencies.

I do think it is clever how LABjs uses a type="script/cache" attribute on the script tag to download but not execute a file, but I am concerned that it relies too much on browser cache behavior and correct server expiry headers. To me those things have not been completely predictable to make sure it does not download the script twice.

That said, LABjs has really helped some people, so I may be worried about extreme edge cases.

More importantly though, I believe a function wrapper has a few benefits that warrant its use:

1) It allows well-scoped modules that do not mess with the global space. This is really needed for very large apps or projects that have code loaded from multiple 3rd party sources.

This makes it possible to load two versions of jQuery in a page. While it is unfortunate that those cases might come up (not as efficient), I have personally encountered two higher traffic Dojo projects that needed to load two versions of Dojo. It is not a super common case, but it does come up.

2) With the function wrapper used by RequireJS, it allows the re-use of CommonJS modules. One of the primary goals of CommonJS modules is not messing about in the global space with module variables. Using the function wrapper and the same "some/module" names makes it easy to conform to CommonJS goals and convert CommonJS modules.

Since RequireJS also works in non-browser environments, this important to have. Server side JS is getting hotter, and the ability to reuse the same modules on the client and server is useful.

3) It makes it possible to get easy references to dependencies via function arguments. This is mostly sugar, but I really like it. It also fits well with the practice in jQuery to wrap code via (function($){}(jQuery)).

Those references also work well with minification since those function args are just for that function and can be minified well.

4) It means you do not have to worry about executing modules in the order they are specified, and in comparison with LABjs, do not have to be concerned about caching edge cases that may lead to a double download.

In short, I believe a function wrapper is the most robust solution for the future of modular JS code in the browser.

If the LABjs build tool also wrapped code in a function wrapper, it is likely that would eliminate the concern about proper cache settings, but it means the scope of the code could be changed adversely.

For instance, var foo = "bar"; outside a function wrapper creates a global, but inside a function wrapper hides it from the global space. That may not be what that script wanted.

So I do not think a function wrapper can be bolted on after a build, it should exist in the source file.

Anonymous said...

@James-

So, I'm curious how, even with function wrappers, do you parallel load all scripts if the expressed nested dependencies can't be discovered until after the file loads?

In other words, if A.js requires B.js, and you require() A.js in your page... how will the system konw about B.js to load it in parallel with A.js? Won't it only know about B.js once it has executed A.js and found the nested def/require?

Unless I'm missing something, to load everything in parallel like John says he wants, you have to know about all files at the beginning. With LABjs' normal usage, you *do* know about all files at the beginning.

But with the idea of nested dependency annotation that requireJS has, the only way to surface all those to the beginning of the loading would be a server build tool, right?

James Burke said...

@getify: a build tool is definitely needed to bundle scripts into a small set of files to download in parallel, and that is needed for any script loader solution.

If you knew the nested dependencies, you could use RequireJS as LABjs is used now, specify all the files up front, but RequireJS does not support the download-but-do-not-execute that LABjs does.

However, that model, knowing all the dependencies up front, by itself is not sufficient to provide the best optimized parallel download support: even with LABjs to get the best performance you will want to do an optimization build to cut down the number of HTTP requests. You might want to cut it down to 1-4 requests instead of say 5-10 that probably exist in the raw source/development mode (for a smallish project, a webmail-type of app will be much bigger).

But that model, where the developer must know all the dependencies and specify the correct load order of modules, does not scale. It is probably manageable with about 10 scripts, but getting to 20 or more means automated dependency tracing should be used.

The most portable way to do automated dependency tracing is to place the immediate dependencies for a file in the file. Then use the optimization tool to do the automated dependency tracing.

There still may be a manual tuning step during optimization, to figure out the right number of optimized files, and in RequireJS, the optimization tool lists the files included in any optimized layer to help with that.

So I am all for getting parallel downloads for best optimization, but hopefully I have demonstrated that dependencies need to be nested for code portability/sharing and for ease of optimization when bundling files for parallel download.

Finally, a slight clarification: I actually prefer an optimization/build tool that does not require a server dependency, and that is how the one in RequireJS works -- it works on the command line. It could also work in a server, but command line is a base requirement for me. That is the most portable -- it does not depend on a specific server type.

Anonymous said...

@James-

OK, that makes a little more sense. I thought maybe I was hearing that John wanted a jquery solution for parallel loading with nested dependencies that was purely client-side JS, which seems like it's not possible to do.

FWIW, the server-side LABjs tool I'm working on (hopefully launching in the next month or two) is intended to do what you're talking about: scanning files (html, js) in a post-processing step and figuring out the dependency chain, and then concat'ing/minifying files, and finally adjusting the output so that it uses optimal loading code automatically.

Anonymous said...

I've been working on a pretty complex javascript application using several classes that I had to split into several files. The setup is basically that you would load a script file in your page, and this script would then load all of the other required classes depending on the context.

So I made this very simple and effective jQuery plugin to lazy load the scripts and execute the object creation once the class is available :



/**
* jQuery plugin
*/
if (typeof(jQuery) != "undefined")
{
jQuery.extend({
requires: function(scripts, callback) {
if (!jQuery.isArray(scripts))
scripts = [scripts];
var failed = [];
var loaded = [];
var done = function()
{
if (failed.length + loaded.length == scripts.length && typeof(callback) == "function")
callback(loaded, failed);
};
for (var i = 0; i < scripts.length; i++)
if (typeof(scripts[i]) == "string")
jQuery.getScript(scripts[i])
.done(function() { loaded.push(this.url); done(); })
.fail(function() { failed.push(this.url); done(); });
}
});
}

Here is an example call :
/* Load required libraries */
jQuery.requires([
"a js file.js",
"another js file.js"
], function(loaded, failed) {
/* Execute your code here */
});

The function can take a single string containing an URL or an array containing several. Once all the scripts have been loaded the callback will be execute and an array of loaded and failed script will be returned to the function.