The
Macro Editor (Pro, Expert & Enterprise Editions)
The macro editor allows you to create and edit macros.
In the top left corner of the Macro Editor Panel appears the
name
of the macro being edited, on a popup menu, which allows you to switch to any of the macros stored in your profile.
Start Page Leave this field blank if you wish the macro to be applied to the currently loaded page. If you wish to specify a starting page for the process, enter the full URL of the first page to which your macro will be applied. If OutWit finds a possible next page link and if you have checked Browse, Dig or Slideshow in the Navigation Zone, then the macro will be applied to a whole series of pages. Start Page can also contain a column of the Catch (in the form "
catch/theColumnName" --or "
catchData/theColumnName" if you are using a scraper in the macro and want the other columns of your catch included in the final scraped results), a Query Directory (in the form "
queries/theDirectoryName" --or "
queriesData/theDirectoryName" to keep the Notes column in the scraped results) or a Query Generation Matrix. In these cases, the macro will be applied to all URLs found in the Catch column, in a directory of queries or to all URLs generated by the matrix.
The following series of buttons gives you access to the general management functions:
-
Save
Saves the current macro to your profile
-
New
Opens a new blank macro
-
Reset
Empties the macro editor
-
Revert
Restores the last saved version of the macro
-
Delete
Deletes the current macro from your profile
-
Import
Loads a previously exported macro
-
Export
Saves the macro as an XML file to your hard disk
-
Get from Views
Copies the current configuration of the bottom panels of all views to the extractors panel of the macro editor
-
Send to Views
Copies the current configuration of all extractors in the macro editor to the bottom panel of their respective views
-
Properties
Displays the Macro Properties Dialog where information about the current macro can be found and edited (name, author, comments, etc.)
-
Close
Closes the Macro Editor and displays the Automator Manager.
Below the previous controls, the
Macro Editor
view is divided in several zones:
-
Macro as URL (MAU)
-
Navigation
-
Catch
-
Destination files
-
Extractor settings
Macro as URL (MAU)
The
Macro as URL
is the representation of all the non-default settings as a single string that can be pasted in OutWit Hub's address bar for immediate execution, pasted in an email to be shared with other OutWit users etc. The MAU is updated in real time when the characteristics of the macro are changed in the macro editor. Reversely, editing the MAU will alter the definition of the macro in the user interface. An invalid syntax will cause the MAU to be displayed in red.
Note: You can execute a macro by simply pasting the MAU in the Hub's address bar and hitting return.
'Navigation' Zone
-
Browse
When checked, the macro will look for a next page link after the execution of all the active extractors in the current page, and so on, until no following page is found. The number of pages to explore this way can be limited in the Browse popup menu.
-
Dig When checked, the active extractors will be applied to all links found in the start page (or of all browsed pages, if Browse is checked). The domain and depth of the links to explore can be defined in the Dig popup menu. The advanced settings dialog allows you to set a criterion to visit only certain pages or to exclude certain pages from the automatic exploration. (Only links matching the list of extensions set in the advanced preference panel are explored in a Dig. Some link types are also systematically filtered out from the exploration: log-out pages, feeds which cannot be opened by the browser, etc.)
-
Slideshow
When checked, the program will display a slideshow of the images found in the current page (or of all browsed pages, if Browse is checked).
-
Fast Scraping
In this mode exploration will be made without loading the full content of each pages, but only their textual part. This allows a much faster processing of large series of pages. In the Pro version, when Fast Scraping is checked, the Browse, Dig and Slideshow checkboxes are disabled, as well as all extractors except for Scrapers.
The Expert & Enterprise editions allows you to Fast-Scrape and Dig at several levels of depth. If you check Fast-Scrape and Browse, the program will not automatically look for the next page link, as it does when Fast-Scraping is off, but it will use the scraper queue of URLs to visits that you can populate with the #addToQueue# directive.
Note that browsing/digging and fast scraping are very different: applying a scraper by loading a page and going to the scraped view does the extraction from the source code of the loaded page, whereas using the 'Fast Scraping Mode' sends a query (an XML HTTP Request) for each URL, but doesn't really load the pages (ignoring images etc.). Most of the time, the result is the same, the Apply Scraper function being simply faster (hence the name). In some cases, however, the result can be different, or the Fast Scraping mode can even completely fail: the reason is that in the normal mode, events can happen that dynamically alter a page (mostly due to the execution of javascript scripts). These dynamic changes will not occur in the Fast Scraping mode, as scripts are not executed. This means that dynamically added information, javascript redirections, page reloads... will simply not happen in Fast Scraping Mode. If you notice this kind of behavior, the best way is to accept the slower method and browse through the URLs, doing the scraping page after page.
'Catch Export' Zone
-
Saving and Emptying the Catch before execution
Depending on the checked option(s), the content of the Catch can be emptied (and optionally saved to a backup file in the destination folder) before the execution of the macro. Note that it will not be restored after the execution, which means that you will have to load the backup file if you wish to recover the Catch content.
-
Export
When checked, the content of the Catch will be exported to a file after the execution of the macro. The destination file format can be defined in the Export popup menu. The export file will be saved in the destination file folder (or FTP server in Expert and Enterprise editions) that you have selected in the Destination Files zone or in the current default folder if no destination folder is defined.
'Destination Files' Zone
This zone allows you to set the location (on a local drive or on a FTP server if you have the Expert or Enterprise edition) and the name format for the files generated by your different extraction processes.
Important Note: If you set login credentials to upload the exported data to an FTP server with the 'ftp' protocol (Expert or Enterprise), the login and password will be sent in the clear on the network. To avoid retyping the ftp address for each macro you create, you can set a default value in the advanced preference panel.
-
Save extracted data to: If you set an export destination for one or several extractors in the macro or for the Catch, the generated files will be saved to the destination folder (or FTP server on Expert & Enterprise) that you set here. For the case where a file of the same name already exists, the popup menu will allow you to set the suffix to add to the filename.
-
Split into several files: This checkbox and menu allow you to ask OutWit Hub to close the current export file and create a new one when the chosen condition becomes true. In the current version, the only criterion available is the number of extracted rows.
-
Save downloaded files to: If you have instructed one or several extractors to download selected files, these will be saved in the folder you set here. For the case a file of the same name already exists, the popup menu will allow you to set the suffix to add to the filename.
'Extractor Setting' Panel
This panel contains the list of all extraction processes that can be performed during the execution of your macro.
-
Extractor column: In this column, check the extractors that you wish to activate during the execution of the macro.
-
Empty column: When checked (like in the views bottom panels), the data resulting from the extraction of one page will be cleared each time a new page is loaded. Note that you should not uncheck 'Empty' if you automatically move the data to the catch as the operation will be executed each time a new page is visited, with a number of rows growing exponentially.
-
Options column: You can set, in this column, options that are specific to each extractor. These options correspond to the ones you will find in the bottom panel of each respective view in the Hub.
-
Select If column: Allows you to set the criterion that will trigger the actions defined in the destination column (moving the selected data to the Catch, downloading files, exporting the data to a file). Like in the views bottom panels, you can set a selection criterion on either a specific field of the extractor or on all fields.
-
Sorted by & Limit to columns: In association with the limit to option, the sort option allows you to set the sort column (field) and direction, in order to only extract the first n rows of extracted data.
-
Destination column: Allows you to set the destination of the extracted data and files:
- Catch: When checked, moves the selected data to the Catch (which can be exported to a file at the end of the process). Note that you should not uncheck 'Empty' if you automatically move the data to the catch as the operation will be executed each time a new page is visited, with a number of rows growing exponentially.
- Download: When checked, tries to download all files corresponding to URLs found in the selected data to the set destination folder.
- Export: Exports the selected data to a file in the Extracted data destination folder (or FTP server on Expert & Enterprise) set above. You can set the export format using the Export popup menu.
IMPORTANT NOTE: In general, you should choose ONLY ONE of these settings at the same time:
- Uncheck 'Empty' to keep the data in the view, OR
- Check 'Catch' to move the data to the Catch (and export it at the end of the process), OR
- Check 'Export' to export the data to a file during the process.