We intersect 2013 DNS Census virtual host cleanup with 2013 DNS census MX records and that leaves 460k hits. We did lose a third on the the MX records as of 260 hits since secureserver.net is only used in 1/3 of sites, but we also concentrate 9x, so it may be worth it.
Then we Wayback Machine CDX scanning. it takes about 5 days, but it is manageale.
We did a full Wayback Machine CDX scanning for JAR, SWF and cgi-bin in those, but only found a single new hit: