Use MSHTML to parse local HTML file without using Internet Explorer (Microsoft HTML Object Library)

Anyone who has done some web scraping will be familiar with creating an instance of Internet Explorer (IE) and the navigating to a web address and then once the page is ready start navigating the DOM using the ‘Microsoft HTML Object Library’ (MSHTML) type library. The question asks if IE is unavailable what to do. I am in the same situation for my box running Windows 10.

I had suspected it was possible to spin up an instance of MSHTML.HTMLDocument but its creation is not obvious. Thanks to the questioner for asking this now. The answer lies in the MSHTML.IHTMLDocument4.createDocumentFromUrl method. One needs a local file to work with (EDIT: actually one can put a webby url in as well!) but we have a nice tidy Windows API function called URLDownloadToFile to download a file.

This codes runs on my Windows 10 box where Microsoft Edge is running and not Internet Explorer. This is an important find and thanks to the questioner for raising it.

In this code we visit a car occasion website and scrape some car brands and the price and display the result in the immediate window of the Visual Basic Editor.

© The Excel Development Platform

If you want to Log the complete webpage then uncheck the next line in the above code:
‘LogInformation (oHtml.body.outerHTML)
and insert the below code in your module.

Leave a Reply

Your email address will not be published. Required fields are marked *