An important feature of any web archiving tool is the ability to specify that one would like to pull down embedded resources ("page requisites" in the parlance of wget) that are hosted on a domain other than the site that is the target of the crawl. Such cases are found not only as examples of people "hot-linking" other people's images, but is encountered heavily on sites with cloud based content delivery networks, such as Tumblr. Including such assets in a crawl is absolutely necessary when one's aim is to achieve a complete mirror of a site.
Heritrix has an option for including such assets in the crawl scope:https://webarchive.jira.com/wiki/display/Heritrix/unexpected+offsite+content
As does Httrack, using the --near flag:\
http://www.httrack.com/html/fcguide.html\ But of course, Httrack does not offer WARC output.
Wget has the -H flag, allowing one to "span hosts" (in other words hit sites with domain names other than the starting url), it lacks the ability to specify that the crawler should span hosts only for page requisites, and so tries to download the entire web if one combines an infinitely recursive crawl with -H. There are some hacky ways of getting around this, but they aren't pretty or reliable. The great thing about Wget though is that it allows the user to output WARC and a directory tree of the crawled site, thus not locking one in to WARC completely.
Are there any tools of the trade that allow one to conduct the comprehensive type of crawl mentioned above, but also have the affordance of outputting WARC, and a directory tree?