copy

Friday, 26 April 2013



Save Web Pages As PDF Files

Introduction

Sometimes a task that you consider as “simple” tends to be so demanding that at the end you blame yourself for your initial assumption. The previous sentence describes exactly what happened to me the previous days. Around 10 days ago, a blog reader (Sangeetha) asked me if is possible to convert HTML files (web pages) to PDF files using VBA. My initial reaction was yes, why not? But when I started writing the VBA code I couldn’t imagine that it will take me so much time to finish this project.

In the code below I used a lot of different techniques and methods which I will try to describe as detailed as possible in case someone needs to use a part only of the code. There are multiple lines of VBA code in order to:
  • Select a folder using folder picker dialog.
  • Determine if a folder exists.
  • Set the default computer’s printer to your desired printer (using Windows Script Host Object Model Library).
  • Check if a file name contains special/illegal characters.
  • Open a web page in Internet Explorer and wait until is fully loaded.
  • Print the web page as PDF to a specified file path using Adobe Professional. The latter involves:
- API functions (FindWindow, SetForegroundWindow andFindWindowEx) in order to “find” the print window of Internet Explorer and its “child” windows.
- API functions (SendMessage and keybd_event) for changing the PDF file path.
- A custom WMI (Windows Management Instrumentation) function in order to determine if the printer has finished printing.
- API functions (FindWindow and PostMessage) for finding the opened PDF document and closing it.

The above list describes more or less the sequence of actions that I followed in order to fulfil this task. However, I should mention that without the following tools it would be impossible to finish this project:
  • Spy++ is a utility that gives you a graphical view of the system’s processes, threads, windows, and window messages.
  • API Viewer is a utility that helps you write the API declarations, by providing the correct syntax of each function.

VBA code

1. First of all, the code below contains all the necessary API functions and constants that are used in the various subs of this project. API Viewer helped me found the correct declaration syntax for all API functions.

Option Explicit

'This module contains all the necessary API functions and constants
'that are used in the subs of this workbook.

'By Christos Samaras
'http://www.myengineeringworld.net

'Retrieves a handle to the top-level window whose class name and window name match the specified strings.
'This function does not search child windows. This function does not perform a case-sensitive search.
Public Declare Function FindWindow Lib "user32" Alias "FindWindowA" _
(ByVal lpClassName As String, ByVal lpWindowName As String) As Long

'Retrieves a handle to a window whose class name and window name match the specified strings.
'The function searches child windows, beginning with the one following the specified child window.
'This function does not perform a case-sensitive search.
Public Declare Function FindWindowEx Lib "user32" Alias "FindWindowExA" _
(ByVal hWnd1 As Long, ByVal hWnd2 As Long, ByVal lpsz1 As String, _
ByVal lpsz2 As String) As Long

'Sets the specified window's show state.
Public Declare Function ShowWindow Lib "user32.dll" (ByVal hwnd As Long, ByVal nCmdShow As Long) As Long

'Brings the thread that created the specified window into the foreground and activates the window.
'Keyboard input is directed to the window, and various visual cues are changed for the user.
'The system assigns a slightly higher priority to the thread that created the foreground
'window than it does to other threads.
Public Declare Function SetForegroundWindow Lib "user32" (ByVal hwnd As Long) As Long

'Suspends the execution of the current thread until the time-out interval elapses.
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

'Sends the specified message to a window or windows. The SendMessage function calls the window procedure
'for the specified window and does not return until the window procedure has processed the message.
Public Declare Function SendMessage Lib "user32" Alias "SendMessageA" _
(ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long

'Places (posts) a message in the message queue associated with the thread that created the specified
'window and returns without waiting for the thread to process the message.
Public Declare Function PostMessage Lib "user32.dll" Alias "PostMessageA" _
(ByVal hwnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As Long) As Long

'Synthesizes a keystroke. The system can use such a synthesized keystroke to generate a WM_KEYUP or
'WM_KEYDOWN message. The keyboard driver's interrupt handler calls the keybd_event function.
Public Declare Sub keybd_event Lib "user32.dll" (ByVal bVk As Byte, ByVal bScan As Byte, _
ByVal dwFlags As Long, ByVal dwExtraInfo As Long)

'Constants used in API functions.
Public Const SW_MAXIMIZE = 3
Public Const WM_SETTEXT = &HC
Public Const VK_DELETE = &H2E
Public Const KEYEVENTF_KEYUP = &H2
Public Const BM_CLICK = &HF5&
Public Const WM_CLOSE As Long = &H10

API Viewer Window

Figure 2: The API Viewer window.


 2. The PDFFolderSelection that follows shows the folder picker dialog in order for the user to select the folder in which the PDF files will be saved. The selected folder’s path is written in the cell B4 of the worksheet.

Sub PDFFolderSelection()
    
    'Shows the folder picker dialog in order the user to select
    'the folder in which the PDF files will be saved.
    
    'By Christos Samaras
    'http://www.myengineeringworld.net

    Dim FoldersPath As String
    
    'Show the folder picker dialog.
    With Application.FileDialog(msoFileDialogFolderPicker)
        .Title = "Select a folder to save your PDF files..."
        .Show
            If .SelectedItems.Count = 0 Then
                shMain.Range("B4").Value = "-"
                MsgBox "You did't select a folder!", vbExclamation, "Canceled"
                Exit Sub
            Else
                FoldersPath = .SelectedItems(1)
            End If
    End With
    
    'Pass the folder's path to the cell.
    shMain.Range("B4").Value = FoldersPath
    
End Sub

Folder Picker Dialog

Figure 3: The PDFFolderSelection sub results (folder picker dialog).


3. The URLToPDF constitutes the main procedure.  First, it checks the folder’s path that was selected in the previous step. If the folder’s path exists and is not blank it tries to find if there are any illegal characters in the PDF files’ path; if it finds anyone it replaces it with a “-“. Afterwards, the code calls the WebpageToPDF sub using as parameters the URL address and the corresponding PDF file name that is provided in the main sheet (by the user).

Sub URLToPDF()
                    
    'Loops throuhg all the urls at column C and print the web
    'pages as PDF using Adobe Professional.
    'This is the main sub that calls the rest subs.
            
    'By Christos Samaras
    'http://www.myengineeringworld.net
    
    Dim PDFFolder           As String
    Dim LastRow             As Long
    Dim arrSpecialChar()    As String
    Dim dblSpCharFound      As Double
    Dim PDFPath             As String
    Dim i                   As Long
    Dim j                   As Integer
    
    'An array with special characters that cannot be used for naming a file.
    arrSpecialChar() = Split("\ / : * ? " & Chr$(34) & " < > |", " ")
    
    'Find the last row.
     With shMain
        .Activate
        LastRow = .Cells(.Rows.Count, "C").End(xlUp).Row
    End With
    
    'Check if the PDF's folder exists.
    PDFFolder = shMain.Range("B4").Value
    If FolderExists(PDFFolder) = False Or PDFFolder = "" Then
        MsgBox "The PDF folder's path is incorect!", vbCritical, "Wrong path"
        shMain.Range("B4").Select
        Exit Sub
    End If
               
    'Check if there is at least one URL.
    If LastRow < 8 Then
        MsgBox "You did't enter a URL!", vbCritical, "No URL"
        Exit Sub
    End If
    
    'Add the backslash if not exists.
    If Right(PDFFolder, 1) <> "\" Then
        PDFFolder = PDFFolder & "\"
    End If
            
    'Set the default printer to Adobe PDF (for Adobe Professional).
    SetDefaultPrinter "Adobe PDF"
    
    'Convert the URLs to PDFs.
    For i = 8 To LastRow
        On Error Resume Next
        PDFPath = Cells(i, 4).Value
        'Check if the PDF name contains a special/illegal character.
        For j = LBound(arrSpecialChar) To UBound(arrSpecialChar)
            dblSpCharFound = WorksheetFunction.Find(arrSpecialChar(j), PDFPath)
            If dblSpCharFound > 0 Then
                PDFPath = WorksheetFunction.Substitute(PDFPath, arrSpecialChar(j), "-")
            End If
        Next j
        PDFPath = PDFFolder & PDFPath
        On Error GoTo 0
        'Save the PDF files to the selected folder.
        Call WebpageToPDF(Cells(i, 3).Value, PDFPath & ".pdf")
    Next i
    
    'Inform the user that macro finished.
    MsgBox LastRow - 7 & " web pages were successfully saved as PDFs!", vbInformation, "Done"
    
End Sub


4. Here is the small FolderExists function.

Function FolderExists(strFolderPath As String) As Boolean
    
    'Checks if a folder exists.
            
    'By Christos Samaras
    'http://www.myengineeringworld.net

    On Error Resume Next
    If Not Dir(strFolderPath, vbDirectory) = vbNullString Then FolderExists = True
    On Error GoTo 0
    
End Function


5. The WebpageToPDF creates a new web browser object, makes it visible, maximizes the browser window and navigates to the desired URL. If the IE window is visible it popups the print window. The API functions FindWindow and SetForegroundWindow are used in order to find the IE window and bring it to the foreground (above other windows). Then the code calls the PDFPrint procedure.

Sub WebpageToPDF(pageURL As String, PDFPath As String)
    
    'Creates a new web browser object, opens a selected URL and then prints
    'the web page as PDF using Adobe Professional.
    
    'By Christos Samaras
    'http://www.myengineeringworld.net
    
    'The macro needs a reference to Windows Script Host Object Model Library, as well
    'as to the Microsoft Internet Controls Library in order to work.
    'From VBA editor go to Tools -> References -> add the two references.
    'Or you can find them at C:\Windows\system32\wshom.ocx and C:\Windows\system32\ieframe.dll.
    
    Dim WebBrowser      As InternetExplorer
    Dim StartTime       As Date
    Dim intRet          As Long
  
    'Create new web browser object, make it visible,
    'maximize the window and navigate to the desired url.
    Set WebBrowser = New InternetExplorer
    WebBrowser.Visible = True
    ShowWindow WebBrowser.hwnd, SW_MAXIMIZE
    WebBrowser.Navigate (pageURL)
    
    'Wait until the web page is fully loaded.
    Do
        DoEvents
    Loop Until WebBrowser.ReadyState = READYSTATE_COMPLETE
    
    'Check if the internet explorer window exists.
    StartTime = Now()
    Do Until Now() > StartTime + TimeValue("00:00:05")
        intRet = 0
        DoEvents
        'IEFrame is the class name for internet explorer.
        intRet = FindWindow("IEFrame", vbNullString)
        If intRet <> 0 Then Exit Do
    Loop
    
    'If the IE window exists, print the web page as PDF.
    If intRet <> 0 Then
        Call SetForegroundWindow(intRet)
        WebBrowser.ExecWB OLECMDID_PRINT, OLECMDEXECOPT_DONTPROMPTUSER
        Call PDFPrint(PDFPath)
        SetForegroundWindow (intRet)
    End If
    
    'Release the web browser object.
    WebBrowser.Quit
    Set WebBrowser = Nothing

End Sub 

Webpage To PDF

Figure 4: WebpageToPDF sub results.

6. In order to change the default name (and path) of the save as dialog, I used Spy++ in order to specify and edit the combo box that contains the file name. The sequence goes like this: Save PDF File As (main window) → DUIViewWndClassName (first child) → DirectUIHWND (second child) → FloatNotifySink (third child) → ComboBox → Edit. Having found the Edit property of the combo box the SendMessage API is used to send the PDF file path.

Spyxx Window

Figure 5: Showing the hierarchy of Save As PDF window in Spy++.

Well, here there is a tricky part: for some unknown reason if you pass the PDF path using directly the SendMessage function and then press the Save button (again, using SendMessage) the file is not saved with the desired name and at the desired path! The file is named by the URL (for example vba-macro-to-convert-) and is saved at the last folder you selected within IE window! Quite strange….

I overcome this obstacle by doing a small trick: when I pass the PDF path in the combo box I use a space before the path. So in the combo box, the SendMessage function passes a string like “ C:\Users\Χρήστος\Desktop\New folder\ Daily Schedule Charts.pdf" (notice the blank space before C). Then, I delete this space using the keybd_event API function. This function simulates a key press (here the delete button) and a key release. Why I did this? Well, because when I was experimenting (without using code) I saw that the PDF path changed only if there was a keyboard change. So, I tried to simulate this observation using VBA code.

Having passed the PDF path successfully and pressed the Save button, then the macro checks if the printer has finished printing (i.e. creating the PDF file) by using the CheckPrinterStatus function. If the function returns "Idle" it means that the printing finished.

Finally, since the Adobe Professional opens the file after finishing the printing, a combination of FindWindow and PostMessage API functions are used in order to find the PDF window and close it.

Option Explicit

Sub PDFPrint(strPDFPath As String)
    
    'Prints a web page as PDF file using Adobe Professional.
    'API functions are used to specify the necessary windows while
    'a WMI function is used to check printer's status.
    
    'By Christos Samaras
    'http://www.myengineeringworld.net
       
    Dim Ret                 As Long
    Dim ChildRet            As Long
    Dim ChildRet2           As Long
    Dim ChildRet3           As Long
    Dim comboRet            As Long
    Dim editRet             As Long
    Dim ChildSaveButton     As Long
    Dim PDFRet              As Long
    Dim PDFName             As String
    Dim StartTime           As Date
    
    'Find the main print window.
    StartTime = Now()
    Do Until Now() > StartTime + TimeValue("00:00:05")
        Ret = 0
        DoEvents
        Ret = FindWindow(vbNullString, "Save PDF File As")
        If Ret <> 0 Then Exit Do
    Loop
    
    If Ret <> 0 Then
        SetForegroundWindow (Ret)
        'Find the first child window.
        StartTime = Now()
        Do Until Now() > StartTime + TimeValue("00:00:05")
            ChildRet = 0
            DoEvents
            ChildRet = FindWindowEx(Ret, ByVal 0&, "DUIViewWndClassName", vbNullString)
            If ChildRet <> 0 Then Exit Do
        Loop
        
        If ChildRet <> 0 Then
            'Find the second child window.
            StartTime = Now()
            Do Until Now() > StartTime + TimeValue("00:00:05")
                ChildRet2 = 0
                DoEvents
                ChildRet2 = FindWindowEx(ChildRet, ByVal 0&, "DirectUIHWND", vbNullString)
                If ChildRet2 <> 0 Then Exit Do
            Loop
            
            If ChildRet2 <> 0 Then
                'Find the third child window.
                StartTime = Now()
                Do Until Now() > StartTime + TimeValue("00:00:05")
                    ChildRet3 = 0
                    DoEvents
                    ChildRet3 = FindWindowEx(ChildRet2, ByVal 0&, "FloatNotifySink", vbNullString)
                    If ChildRet3 <> 0 Then Exit Do
                Loop
                
                If ChildRet3 <> 0 Then
                    'Find the combobox that will be edited.
                    StartTime = Now()
                    Do Until Now() > StartTime + TimeValue("00:00:05")
                        comboRet = 0
                        DoEvents
                        comboRet = FindWindowEx(ChildRet3, ByVal 0&, "ComboBox", vbNullString)
                        If comboRet <> 0 Then Exit Do
                    Loop
                    
                    If comboRet <> 0 Then
                        'Finally, find the "edit property" of the combobox.
                        StartTime = Now()
                        Do Until Now() > StartTime + TimeValue("00:00:05")
                            editRet = 0
                            DoEvents
                            editRet = FindWindowEx(comboRet, ByVal 0&, "Edit", vbNullString)
                            If editRet <> 0 Then Exit Do
                        Loop
                        
                        'Add the PDF path to the file name combobox of the print window.
                        If editRet <> 0 Then
                            SendMessage editRet, WM_SETTEXT, 0&, ByVal " " & strPDFPath
                            keybd_event VK_DELETE, 0, 0, 0 'press delete
                            keybd_event VK_DELETE, 0, KEYEVENTF_KEYUP, 0 ' release delete
                            
                            'Get the PDF file name from the full path.
                            On Error Resume Next
                            PDFName = Mid(strPDFPath, WorksheetFunction.Find("*", WorksheetFunction.Substitute(strPDFPath, "\", "*", Len(strPDFPath) _
                            - Len(WorksheetFunction.Substitute(strPDFPath, "\", "")))) + 1, Len(strPDFPath))
                            On Error GoTo 0
                            
                            'Save/print the web page by pressing the save button of the print window.
                            Sleep 1000
                            ChildSaveButton = FindWindowEx(Ret, ByVal 0&, "Button", "&Save")
                            SendMessage ChildSaveButton, BM_CLICK, 0, 0
                                                                                   
                            'Sometimes the printing delays, especially in large colorful web pages.
                            'Here the code checks printer status and if is idle it means that the
                            'printing has finished.
                            Do Until CheckPrinterStatus("Adobe PDF") = "Idle"
                                DoEvents
                                If CheckPrinterStatus("Adobe PDF") = "Error" Then Exit Do
                            Loop
                            
                            'Since the Adobe Professional opens after finishing the printing, find
                            'the open PDF document and close it (using a post message).
                            StartTime = Now()
                            Do Until StartTime > StartTime + TimeValue("00:00:05")
                                PDFRet = 0
                                DoEvents
                                PDFRet = FindWindow(vbNullString, PDFName & " - Adobe Acrobat Pro")
                                If PDFRet <> 0 Then Exit Do
                            Loop
                            If PDFRet <> 0 Then
                                PostMessage PDFRet, WM_CLOSE, 0&, 0&
                            End If
                        End If
                    End If
                End If
            End If
        End If
   End If
End Sub


7. Here is the CheckPrinterStatus function.

Function CheckPrinterStatus(strPrinterName As String) As String
    
    'Provided the printer name the functions returns a string
    'with the printer status.
            
    'By Christos Samaras
    'http://www.myengineeringworld.net

    Dim strComputer As String
    Dim objWMIService As Object
    Dim colInstalledPrinters As Variant
    Dim objPrinter As Object
    
    'Set the WMI object and the check the install printers.
    On Error Resume Next
    strComputer = "."
    Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
    Set colInstalledPrinters = objWMIService.ExecQuery("Select * from Win32_Printer")
        
    'If an error occurs in the previous step, the function will return error.
    If err.Number <> 0 Then
        CheckPrinterStatus = "Error"
    End If
    On Error GoTo 0
    
    'The function loops through all installed printers and for the selected printer,
    'checks it status.
    For Each objPrinter In colInstalledPrinters
        If objPrinter.Name = strPrinterName Then
            Select Case objPrinter.PrinterStatus
                Case 1: CheckPrinterStatus = "Other"
                Case 2: CheckPrinterStatus = "Unknown"
                Case 3: CheckPrinterStatus = "Idle"
                Case 4: CheckPrinterStatus = "Printing"
                Case 5: CheckPrinterStatus = "Warmup"
                Case 6: CheckPrinterStatus = "Stopped printing"
                Case 7: CheckPrinterStatus = "Offline"
                Case Else: CheckPrinterStatus = "Error"
            End Select
        End If
    Next objPrinter
    
    'If there is a blank status the function returns error.
    If CheckPrinterStatus = "" Then CheckPrinterStatus = "Error"
    
End Function


8. The final step was the creation of the Clear macro, which cleans up the main sheet.

Sub Clear()
    
    'Clears the URLs, the PDF names and the PDF folder's path.
            
    'By Christos Samaras
    'http://www.myengineeringworld.net

    Dim LastRow As Long
       
    'Find the last row.
     With shMain
        .Activate
        LastRow = .Cells(.Rows.Count, "C").End(xlUp).Row
    End With
    
    shMain.Range(Cells(8, 3), Cells(LastRow, 4)).Value = ""
    shMain.Range("B4").Value = ""
    shMain.Range("B4").Select
    
End Sub


Code results

The short video below demonstrates a test case with 3 URLs.



Conclusion

As you can see, it was quite a “simple” task!!! Nevertheless, by elaborating a project like this in many cases you have several positive side effects. In my case, for example, I learned how to use API functions in order to specify a control (here combo box) in a specific window. Of course, Spy++ helped me a lot on this. Furthermore, I saw the power of WMI functions. Initially, I tried to write the CheckPrinterStatus function in VB/VBA using some API functions. However, I realized that by using script (WMI) it was much easier to write the function.


Download it from here

The sample workbook contains all the VBA code that was presented above. It has been successfully tested using Excel 2010, Adobe Acrobat X Professional and Internet Explorer 9.0. However, it might be possible to run in other versions of these programs. Please, remember to enable macros before using it.

Did you like this post? If yes, then share it with your friends. Thank you!


Categories:


Mechanical Engineer (Ph.D. cand.), M.Sc. Cranfield University, Dipl.-Ing. Aristotle University, Thessaloniki - Greece.
Communication: e-mail, Facebook, Twitter, Google+ and Linkedin. More info