Save Web Pages As PDF Files

Share this

April 26, 2013

Save Web Pages As PDF Files

Introduction

Sometimes a task that you consider as “simple” tends to be so demanding that at the end you blame yourself for your initial assumption. The previous sentence describes exactly what happened to me the previous days. Around 10 days ago, a blog reader (Sangeetha) asked me if is possible to convert HTML files (web pages) to PDF files using VBA. My initial reaction was yes, why not? But when I started writing the VBA code I couldn’t imagine that it will take me so much time to finish this project.

In the code below I used a lot of different techniques and methods which I will try to describe as detailed as possible in case someone needs to use a part only of the code. There are multiple lines of VBA code in order to:

  • Select a folder using folder picker dialog.
  • Determine if a folder exists.
  • Set the default computer’s printer to your desired printer (using Windows Script Host Object Model Library).
  • Check if a file name contains special/illegal characters.
  • Open a web page in Internet Explorer and wait until is fully loaded.
  • Print the web page as PDF to a specified file path using Adobe Professional. The latter involves:

– API functions (FindWindow, SetForegroundWindow andFindWindowEx) in order to “find” the print window of Internet Explorer and its “child” windows.

– API functions (SendMessage and keybd_event) for changing the PDF file path.

– A custom WMI (Windows Management Instrumentation) function in order to determine if the printer has finished printing.
– API functions (FindWindow and PostMessage) for finding the opened PDF document and closing it.

The above list describes more or less the sequence of actions that I followed in order to fulfil this task. However, I should mention that without the following tools it would be impossible to finish this project:

  • Spy++ is a utility that gives you a graphical view of the system’s processes, threads, windows, and window messages.
  • API Viewer is a utility that helps you write the API declarations, by providing the correct syntax of each function.

VBA code

1. First of all, the code below contains all the necessary API functions and constants that are used in the various subs of this project. API Viewer helped me found the correct declaration syntax for all API functions.

Option Explicit

'This module contains all the necessary API functions and constants
'that are used in the subs of this workbook.

'By Christos Samaras
'https://myengineeringworld.net/////


'Retrieves a handle to the top-level window whose class name and window name match the specified strings.
'This function does not search child windows. This function does not perform a case-sensitive search.

Public Declare Function FindWindow Lib "user32" Alias "FindWindowA" _
(ByVal lpClassName As String, ByVal lpWindowName As String) As Long

'Retrieves a handle to a window whose class name and window name match the specified strings.
'The function searches child windows, beginning with the one following the specified child window.
'This function does not perform a case-sensitive search.

Public Declare Function FindWindowEx Lib "user32" Alias "FindWindowExA" _
(ByVal hWnd1 As Long, ByVal hWnd2 As Long, ByVal lpsz1 As String, _
ByVal lpsz2 As String) As Long

'Sets the specified window's show state.
Public Declare Function ShowWindow Lib "user32.dll" (ByVal hwnd As Long, ByVal nCmdShow As Long) As Long

'Brings the thread that created the specified window into the foreground and activates the window.
'Keyboard input is directed to the window, and various visual cues are changed for the user.
'The system assigns a slightly higher priority to the thread that created the foreground
'window than it does to other threads.

Public Declare Function SetForegroundWindow Lib "user32" (ByVal hwnd As Long) As Long

'Suspends the execution of the current thread until the time-out interval elapses.
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

'Sends the specified message to a window or windows. The SendMessage function calls the window procedure
'for the specified window and does not return until the window procedure has processed the message.

Public Declare Function SendMessage Lib "user32" Alias "SendMessageA" _
(ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long

'Places (posts) a message in the message queue associated with the thread that created the specified
'window and returns without waiting for the thread to process the message.

Public Declare Function PostMessage Lib "user32.dll" Alias "PostMessageA" _
(ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As Long) As Long

'Synthesizes a keystroke. The system can use such a synthesized keystroke to generate a WM_KEYUP or
'WM_KEYDOWN message. The keyboard driver's interrupt handler calls the keybd_event function.

Public Declare Sub keybd_event Lib "user32.dll" (ByVal bVk As Byte, ByVal bScan As Byte, _
ByVal dwFlags As Long, ByVal dwExtraInfo As Long)

'Constants used in API functions.
Public Const SW_MAXIMIZE = 3
Public Const WM_SETTEXT = &HC
Public Const VK_DELETE = &H2E
Public Const KEYEVENTF_KEYUP = &H2
Public Const BM_CLICK = &HF5&
Public Const WM_CLOSE As Long = &H10

API Viewer Window
Figure 2: The API Viewer window.

 2. The PDFFolderSelection that follows shows the folder picker dialog in order for the user to select the folder in which the PDF files will be saved. The selected folder’s path is written in the cell B4 of the worksheet.

Sub PDFFolderSelection()

'Shows the folder picker dialog in order the user to select
'the folder in which the PDF files will be saved.

'By Christos Samaras
'https://myengineeringworld.net/////


Dim FoldersPath As String

'Show the folder picker dialog.
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Select a folder to save your PDF files..."
.Show
If .SelectedItems.Count = 0 Then
shMain.Range("B4").Value = "-"
MsgBox "You did't select a folder!", vbExclamation, "Canceled"
Exit Sub
Else
FoldersPath = .SelectedItems(1)
End If
End With

'Pass the folder's path to the cell.
shMain.Range("B4").Value = FoldersPath

End Sub

Folder Picker Dialog

Figure 3: The PDFFolderSelection sub results (folder picker dialog).

3. The URLToPDF constitutes the main procedure.  First, it checks the folder’s path that was selected in the previous step. If the folder’s path exists and is not blank it tries to find if there are any illegal characters in the PDF files’ path; if it finds anyone it replaces it with a “-“. Afterwards, the code calls the WebpageToPDF sub using as parameters the URL address and the corresponding PDF file name that is provided in the main sheet (by the user).

Sub URLToPDF()
                   
    'Loops throuhg all the urls at column C and print the web
    'pages as PDF using Adobe Professional.
    'This is the main sub that calls the rest subs.
           
    'By Christos Samaras
    'https://myengineeringworld.net/////

   
    Dim PDFFolder           As String
    Dim LastRow             As Long
    Dim arrSpecialChar()    As String
    Dim dblSpCharFound      As Double
    Dim PDFPath             As String
    Dim i                   As Long
    Dim j                   As Integer
   
    'An array with special characters that cannot be used for naming a file.
    arrSpecialChar() = Split(" / : * ? " & Chr$(34) & " < > |", " ")
   
    'Find the last row.
     With shMain
        .Activate
        LastRow = .Cells(.Rows.Count, "C").End(xlUp).Row
    End With
   
    'Check if the PDF's folder exists.
    PDFFolder = shMain.Range("B4").Value
    If FolderExists(PDFFolder) = False Or PDFFolder = "" Then
        MsgBox "The PDF folder's path is incorect!", vbCritical, "Wrong path"
        shMain.Range("B4").Select
        Exit Sub
    End If
              
    'Check if there is at least one URL.
    If LastRow < 8 Then
        MsgBox "You did't enter a URL!", vbCritical, "No URL"
        Exit Sub
    End If
   
    'Add the backslash if not exists.
    If Right(PDFFolder, 1) <> "" Then
        PDFFolder = PDFFolder & ""
    End If
           
    'Set the default printer to Adobe PDF (for Adobe Professional).
    SetDefaultPrinter "Adobe PDF"
   
    'Convert the URLs to PDFs.
    For i = 8 To LastRow
        On Error Resume Next
        PDFPath = Cells(i, 4).Value
        'Check if the PDF name contains a special/illegal character.
        For j = LBound(arrSpecialChar) To UBound(arrSpecialChar)
            dblSpCharFound = WorksheetFunction.Find(arrSpecialChar(j), PDFPath)
            If dblSpCharFound > 0 Then
                PDFPath = WorksheetFunction.Substitute(PDFPath, arrSpecialChar(j), "-")
            End If
        Next j
        PDFPath = PDFFolder & PDFPath
        On Error GoTo 0
        'Save the PDF files to the selected folder.
        Call WebpageToPDF(Cells(i, 3).Value, PDFPath & ".pdf")
    Next i
   
    'Inform the user that macro finished.
    MsgBox LastRow - 7 & " web pages were successfully saved as PDFs!", vbInformation, "Done"
   
End Sub

4. Here is the small FolderExists function.

Function FolderExists(strFolderPath As String) As Boolean
   
    'Checks if a folder exists.
           
    'By Christos Samaras
    'https://myengineeringworld.net/////


    On Error Resume Next
    If Not Dir(strFolderPath, vbDirectory) = vbNullString Then FolderExists = True
    On Error GoTo 0
   
End Function

5. The WebpageToPDF creates a new web browser object, makes it visible, maximizes the browser window and navigates to the desired URL. If the IE window is visible it popups the print window. The API functions FindWindow and SetForegroundWindow are used in order to find the IE window and bring it to the foreground (above other windows). Then the code calls the PDFPrint procedure.

Sub WebpageToPDF(pageURL As String, PDFPath As String)
   
    'Creates a new web browser object, opens a selected URL and then prints
    'the web page as PDF using Adobe Professional.
   
    'By Christos Samaras
    'https://myengineeringworld.net/////
   
    'The macro needs a reference to Windows Script Host Object Model Library, as well
    'as to the Microsoft Internet Controls Library in order to work.
    'From VBA editor go to Tools -> References -> add the two references.
    'Or you can find them at C:Windowssystem32wshom.ocx and C:Windowssystem32ieframe.dll.

   
    Dim WebBrowser      As InternetExplorer
Dim StartTime As Date
    Dim intRet          As Long
 
    'Create new web browser object, make it visible,
    'maximize the window and navigate to the desired url.

    Set WebBrowser = New InternetExplorer
    WebBrowser.Visible = True
    ShowWindow WebBrowser.hwnd, SW_MAXIMIZE
    WebBrowser.Navigate (pageURL)
   
    'Wait until the web page is fully loaded.
    Do
        DoEvents
    Loop Until WebBrowser.ReadyState = READYSTATE_COMPLETE
   
    'Check if the internet explorer window exists.
StartTime = Now()
Do Until Now() > StartTime + TimeValue("00:00:05")
intRet = 0
        DoEvents
        'IEFrame is the class name for internet explorer.
        intRet = FindWindow("IEFrame", vbNullString)
        If intRet <> 0 Then Exit Do
    Loop
   
    'If the IE window exists, print the web page as PDF.
    If intRet <> 0 Then
        Call SetForegroundWindow(intRet)
        WebBrowser.ExecWB OLECMDID_PRINT, OLECMDEXECOPT_DONTPROMPTUSER
        Call PDFPrint(PDFPath)
        SetForegroundWindow (intRet)
    End If
   
    'Release the web browser object.
    WebBrowser.Quit
    Set WebBrowser = Nothing

End Sub

Webpage To PDF

Figure 4: WebpageToPDF sub results.

6. In order to change the default name (and path) of the save as dialog, I used Spy++ in order to specify and edit the combo box that contains the file name. The sequence goes like this: Save PDF File As (main window) → DUIViewWndClassName (first child) → DirectUIHWND (second child) → FloatNotifySink (third child) → ComboBox → Edit. Having found the Edit property of the combo box the SendMessage API is used to send the PDF file path.

Spyxx Window

Figure 5: Showing the hierarchy of Save As PDF window in Spy++.

Well, here there is a tricky part: for some unknown reason if you pass the PDF path using directly the SendMessage function and then press the Save button (again, using SendMessage) the file is not saved with the desired name and at the desired path! The file is named by the URL (for example vba-macro-to-convert-) and is saved at the last folder you selected within IE window! Quite strange….

I overcome this obstacle by doing a small trick: when I pass the PDF path in the combo box I use a space before the path. So in the combo box, the SendMessage function passes a string like “ C:UsersΧρήστοςDesktopNew folder Daily Schedule Charts.pdf” (notice the blank space before C). Then, I delete this space using the keybd_event API function. This function simulates a key press (here the delete button) and a key release. Why I did this? Well, because when I was experimenting (without using code) I saw that the PDF path changed only if there was a keyboard change. So, I tried to simulate this observation using VBA code.

Having passed the PDF path successfully and pressed the Save button, then the macro checks if the printer has finished printing (i.e. creating the PDF file) by using the CheckPrinterStatus function. If the function returns “Idle” it means that the printing finished.

Finally, since the Adobe Professional opens the file after finishing the printing, a combination of FindWindow and PostMessage API functions are used in order to find the PDF window and close it.

Option Explicit

Sub PDFPrint(strPDFPath As String)

'Prints a web page as PDF file using Adobe Professional.
'API functions are used to specify the necessary windows while
'a WMI function is used to check printer's status.

'By Christos Samaras
'https://myengineeringworld.net/////


Dim Ret As Long
Dim ChildRet As Long
Dim ChildRet2 As Long
Dim ChildRet3 As Long
Dim comboRet As Long
Dim editRet As Long
Dim ChildSaveButton As Long
Dim PDFRet As Long
Dim PDFName As String
Dim StartTime As Date

'Find the main print window.
StartTime = Now()
Do Until Now() > StartTime + TimeValue("00:00:05")
Ret = 0
DoEvents
Ret = FindWindow(vbNullString, "Save PDF File As")
If Ret <> 0 Then Exit Do
Loop

If Ret <> 0 Then
SetForegroundWindow (Ret)
'Find the first child window.
StartTime = Now()
Do Until Now() > StartTime + TimeValue("00:00:05")
ChildRet = 0
DoEvents
ChildRet = FindWindowEx(Ret, ByVal 0&, "DUIViewWndClassName", vbNullString)
If ChildRet <> 0 Then Exit Do
Loop

If ChildRet <> 0 Then
'Find the second child window.
StartTime = Now()
Do Until Now() > StartTime + TimeValue("00:00:05")
ChildRet2 = 0
DoEvents
ChildRet2 = FindWindowEx(ChildRet, ByVal 0&, "DirectUIHWND", vbNullString)
If ChildRet2 <> 0 Then Exit Do
Loop

If ChildRet2 <> 0 Then
'Find the third child window.
StartTime = Now()
Do Until Now() > StartTime + TimeValue("00:00:05")
ChildRet3 = 0
DoEvents
ChildRet3 = FindWindowEx(ChildRet2, ByVal 0&, "FloatNotifySink", vbNullString)
If ChildRet3 <> 0 Then Exit Do
Loop

If ChildRet3 <> 0 Then
'Find the combobox that will be edited.
StartTime = Now()
Do Until Now() > StartTime + TimeValue("00:00:05")
comboRet = 0
DoEvents
comboRet = FindWindowEx(ChildRet3, ByVal 0&, "ComboBox", vbNullString)
If comboRet <> 0 Then Exit Do
Loop

If comboRet <> 0 Then
'Finally, find the "edit property" of the combobox.
StartTime = Now()
Do Until Now() > StartTime + TimeValue("00:00:05")
editRet = 0
DoEvents
editRet = FindWindowEx(comboRet, ByVal 0&, "Edit", vbNullString)
If editRet <> 0 Then Exit Do
Loop

'Add the PDF path to the file name combobox of the print window.
If editRet <> 0 Then
SendMessage editRet, WM_SETTEXT, 0&, ByVal " " & strPDFPath
keybd_event VK_DELETE, 0, 0, 0 'press delete
keybd_event VK_DELETE, 0, KEYEVENTF_KEYUP, 0 ' release delete

'Get the PDF file name from the full path.
On Error Resume Next
PDFName = Mid(strPDFPath, WorksheetFunction.Find("*", WorksheetFunction.Substitute(strPDFPath, "", "*", Len(strPDFPath) _
- Len(WorksheetFunction.Substitute(strPDFPath, "", "")))) + 1, Len(strPDFPath))
On Error GoTo 0

'Save/print the web page by pressing the save button of the print window.
Sleep 1000
ChildSaveButton = FindWindowEx(Ret, ByVal 0&, "Button", "&Save")
SendMessage ChildSaveButton, BM_CLICK, 0, 0

'Sometimes the printing delays, especially in large colorful web pages.
'Here the code checks printer status and if is idle it means that the
'printing has finished.

Do Until CheckPrinterStatus("Adobe PDF") = "Idle"
DoEvents
If CheckPrinterStatus("Adobe PDF") = "Error" Then Exit Do
Loop

'Since the Adobe Professional opens after finishing the printing, find
'the open PDF document and close it (using a post message).

StartTime = Now()
Do Until StartTime > StartTime + TimeValue("00:00:05")
PDFRet = 0
DoEvents
PDFRet = FindWindow(vbNullString, PDFName & " - Adobe Acrobat Pro")
If PDFRet <> 0 Then Exit Do
Loop
If PDFRet <> 0 Then
PostMessage PDFRet, WM_CLOSE, 0&, 0&
End If
End If
End If
End If
End If
End If
End If
End Sub

7. Here is the CheckPrinterStatus function.

Function CheckPrinterStatus(strPrinterName As String) As String
   
    'Provided the printer name the functions returns a string
    'with the printer status.
           
    'By Christos Samaras
    'https://myengineeringworld.net/////


Dim strComputer As String
Dim objWMIService As Object
Dim colInstalledPrinters As Variant
Dim objPrinter As Object

'Set the WMI object and the check the install printers.
On Error Resume Next
strComputer = "."
Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\" & strComputer & "rootcimv2")
Set colInstalledPrinters = objWMIService.ExecQuery("Select * from Win32_Printer")

'If an error occurs in the previous step, the function will return error.
If err.Number <> 0 Then
CheckPrinterStatus = "Error"
End If
On Error GoTo 0

'The function loops through all installed printers and for the selected printer,
'checks it status.

For Each objPrinter In colInstalledPrinters
If objPrinter.Name = strPrinterName Then
Select Case objPrinter.PrinterStatus
Case 1: CheckPrinterStatus = "Other"
Case 2: CheckPrinterStatus = "Unknown"
Case 3: CheckPrinterStatus = "Idle"
Case 4: CheckPrinterStatus = "Printing"
Case 5: CheckPrinterStatus = "Warmup"
Case 6: CheckPrinterStatus = "Stopped printing"
Case 7: CheckPrinterStatus = "Offline"
Case Else: CheckPrinterStatus = "Error"
End Select
End If
Next objPrinter

'If there is a blank status the function returns error.
If CheckPrinterStatus = "" Then CheckPrinterStatus = "Error"

End Function

8. The final step was the creation of the Clear macro, which cleans up the main sheet.

Sub Clear()
   
    'Clears the URLs, the PDF names and the PDF folder's path.
           
    'By Christos Samaras
    'https://myengineeringworld.net/////


    Dim LastRow As Long
      
    'Find the last row.
     With shMain
        .Activate
        LastRow = .Cells(.Rows.Count, "C").End(xlUp).Row
    End With
   
    shMain.Range(Cells(8, 3), Cells(LastRow, 4)).Value = ""
    shMain.Range("B4").Value = ""
    shMain.Range("B4").Select
   
End Sub

Code results

The short video below demonstrates a test case with 3 URLs.

Conclusion

As you can see, it was quite a “simple” task!!! Nevertheless, by elaborating a project like this in many cases you have several positive side effects. In my case, for example, I learned how to use API functions in order to specify a control (here combo box) in a specific window. Of course, Spy++ helped me a lot on this. Furthermore, I saw the power of WMI functions. Initially, I tried to write the CheckPrinterStatus function in VB/VBA using some API functions. However, I realized that by using script (WMI) it was much easier to write the function.

Download it from here

The sample workbook contains all the VBA code that was presented above. It has been successfully tested using Excel 2010, Adobe Acrobat X Professional and Internet Explorer 9.0. However, it might be possible to run in other versions of these programs. Please, remember to enable macros before using it.

Page last modified: January 6, 2019

Christos Samaras

Hi, I am Christos, a Mechanical Engineer by profession (Ph.D.) and a Software Developer by obsession (10+ years of experience)! I founded this site back in 2011 intending to provide solutions to various engineering and programming problems.

  • Hi, Ben,

    Whenever I find some free time, I will re-write this code since there are a lot of things that need changing.
    However, even if I switch the code to 64bit, I am not sure that it will work with Adobe Reader.

    Best Regards,
    Christos

  • Please update the code for 64-bit and test the “Save Web Pages As PDF Files” code with free Adobe Reader.

    Thanks a lot!

  • Hi, Lidor,

    You can check your printer name in the Start -> Devices and Printers window.
    Another way to do this is to use the workbook or the code on this page.

    Although, to be honest, I haven’t tested the “Save Web Pages As PDF Files” code with Adobe Reader.

    Best Regards,
    Christos

  • Hey Christos Samaras, Great artical.
    I would like to ask some help on your code,
    if I use the free version of PDF, how I convert the printer to it?
    right now get an error on :
    Sub SetDefaultPrinter(ByVal PrinterName As String)
    PrinterName

    Thank you very much 🙂

  • Rajesh,

    First, the API declarations should switch to 64-bit.
    Then, all the variables that refer to window handles must change from Long to LongPtr type.
    When I find some free time, I will re-write this code since it can be simplified.

    Best Regards,
    Christos

  • Hi, Nemat,

    Compile error?
    Could you give me more info?
    I have Excel 2013 with all the latest updates and the codes still work.

    Best Regards,
    Christos

  • Hi, these codes works perfectly until Microsoft Excel got some updates in earlier 2017.
    VBA gives compile error:
    Could you please update these codes too? Thanks for advance.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
    Add Content Block
    >