Anyone who has done some web scraping will be familiar with creating an instance of Internet Explorer (IE) and the navigating to a web address and then once the page is ready start navigating the DOM using the ‘Microsoft HTML Object Library’ (MSHTML) type library. The question asks if IE is unavailable what to do. I am in the same situation for my box running Windows 10.
I had suspected it was possible to spin up an instance of MSHTML.HTMLDocument but its creation is not obvious. Thanks to the questioner for asking this now. The answer lies in the MSHTML.IHTMLDocument4.createDocumentFromUrl method. One needs a local file to work with (EDIT: actually one can put a webby url in as well!) but we have a nice tidy Windows API function called URLDownloadToFile to download a file.
This codes runs on my Windows 10 box where Microsoft Edge is running and not Internet Explorer. This is an important find and thanks to the questioner for raising it.
In this code we visit a car occasion website and scrape some car brands and the price and display the result in the immediate window of the Visual Basic Editor.
Make sure you set a reference to the Microsoft HTML Object Library.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
Option Explicit 'Tools->Refernces Microsoft HTML Object Library 'MSDN - URLDownloadToFile function - https://msdn.microsoft.com/en-us/library/ms775123(v=vs.85).aspx Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" _ (ByVal pCaller As Long, ByVal szURL As String, ByVal szFileName As String, _ ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long Sub Find_Ford_Granada() Dim fso As Object Set fso = CreateObject("Scripting.FileSystemObject") Dim sLocalFilename As String sLocalFilename = Environ$("TMP") & "\urlmon.html" Dim sURL As String sURL = "https://www.autoscout24.de/lst/ford/granada?sort=standard&desc=0&ustate=N%2CU&atype=C&cy=D&ocs_listing=include&source=homepage_search-mask" Dim bOk As Boolean bOk = (URLDownloadToFile(0, sURL, sLocalFilename, 0, 0) = 0) If bOk Then If fso.FileExists(sLocalFilename) Then 'Tools->References Microsoft HTML Object Library Dim oHtml4 As MSHTML.IHTMLDocument4 Set oHtml4 = New MSHTML.HTMLDocument Dim oHtml As MSHTML.HTMLDocument Set oHtml = Nothing 'IHTMLDocument4.createDocumentFromUrl 'MSDN - IHTMLDocument4 createDocumentFromUrl method - https://msdn.microsoft.com/en-us/library/aa752523(v=vs.85).aspx Set oHtml = oHtml4.createDocumentFromUrl(sLocalFilename, "") 'need to wait a little whilst the document parses 'because it is multithreaded While oHtml.readyState <> "complete" DoEvents 'do not comment this out it is required to break into the code if in infinite loop Wend Debug.Assert oHtml.readyState = "complete" Dim sTest As String sTest = Left$(oHtml.body.outerHTML, 100) Debug.Assert Len(Trim(sTest)) > 50 'just testing we got a substantial block of text, feel free to delete 'You can log the information in a textfile. Uncheck the next line. 'LogInformation (oHtml.body.outerHTML) 'this is where the page specific logic now goes, here I am getting info from Autoscout page Dim htmlBrand As Object 'MSHTML.DispHTMLElementCollection Dim htmlPrice As Object 'MSHTML.DispHTMLElementCollection Set htmlBrand = oHtml.getElementsByTagName("h2") Set htmlPrice = oHtml.getElementsByClassName("Price_price__APlgs PriceAndSeals_current_price__ykUpx") Dim lngCounterLoop As Long For lngCounterLoop = 0 To htmlBrand.Length - 1 Dim vBrand Dim vPrice Set vBrand = htmlBrand.Item(lngCounterLoop) Set vPrice = htmlPrice.Item(lngCounterLoop) 'On Error GoTo err_chk If vBrand Is Nothing Or vPrice Is Nothing Then MsgBox "There are no more car brands or prices to be found or available" Else Debug.Print vBrand.outerText & " - " & vPrice.outerText End If 'On Error GoTo 0 Next End If End If 'err_chk: ' If Err.Number = 91 Then ' MsgBox "There was an ERROR!!! - " & Err.Number & " : " & Err.Description & vbNewLine & "There are no more brand or price for cars available" ' Else ' MsgBox Err.Number & ":" & Err.Description 'End If End Sub |
© The Excel Development Platform
If you want to Log the complete webpage then uncheck the next line in the above code:
‘LogInformation (oHtml.body.outerHTML)
and insert the below code in your module.
1 2 3 4 5 6 7 8 9 10 11 12 |
Sub LogInformation(LogMessage As String) Dim fileNum As Integer Const LogFileName As String = "C:\temp\textfile.html" Open "C:\temp\textfile.html" For Output As #1: Close #1 MsgBox "Clear complete" fileNum = FreeFile ' next file number Open LogFileName For Append As #fileNum ' creates the file if it doesn't exist Print #fileNum, LogMessage ' write information at the end of the text file Close #fileNum ' close the file End Sub |
