The Preference Panels
The Preference Panels (Tools>Preferences) display your current preference settings.
The Preference Dialog is divided into five panels: General, Exploration, Time Settings, Export, Advanced.
The option labels should be self-explanatory. They vary more often than the rest of the user interface, which makes it difficult to discuss them here. We have, however, documented the functions within the interface itself with tooltips: if you place your cursor over the label of a control and wait for two or three seconds, you will see a yellow tooltip explaining the expected content for the control.
Export
Renaming Files
A preference allows you to rename downloaded files. Another one is for the data export files. The pattern entered in the corrsponding text box determines the format of the new names.
These preferences set the default values to automatically generate or rename the filenames for data to be saved. This syntax can also be used to define specific output filename format in some scraping directives (e.g. #exportEvery# or #screenshot#). The currently allowed syntax for this pattern includes the following:
- ### a series of pound signs will be replaced in the final file name by a numerical index to be incremented if several of the downloaded files have the same original name (leading zeros will be added to match the number of pound signs.
- type [url] to include the url of the page from which the file was downloaded: www.example.com › downloads › documents › help.pdf
- type [domain] to include the domain of the page from which the file was downloaded: example.com
- type [domainName] to include the domain name of the page from which the file was downloaded: example
- type [params] to include all the parameters from the url of the page from which the file was downloaded
- type [paramN] to include the Nth parameter of the url from which the file was downloaded: [param1] gives q=which for the url www.mySite.com/tester/test?q=which&page=2
- type [param-N] to include the Nth parameter starting from the end, of the url from which the file was downloaded
- type [path] to include the path from which the file was downloaded
- type [pathN] to include the Nth element of the path from which the file was downloaded: [path1] gives tester for the url www.mySite.com/tester/test?q=which&page=2
- type [path-N] to include the Nth element starting from the end, in the path from which the file was downloaded
- type [ordinal] to include the ordinal number of the page in the series of pages that are being visited in an automatic exploration
- type [originalName] or [originalRoot] to include the root (file name without the extension) of the file being downloaded. (Note that the extension is handled by the download manager and will be chosen by the program to match the type of the file.)
- type [originalExtension] to include the extension of the file being downloaded.
- type [date],[time],[datetime],[hours],[minutes],[seconds],[miliseconds]... to include date or time elements.
When renaming downloaded files, the program will alter certain characters if they are not valid in filenames for the operating system you are using. Forward slashes, in particular, will be replaced by a right-pointing Double Angle Quotation Mark ( » ) or any other character set in the Path Separator preference.
Time Settings
The time settings allow you to set the timeout values for the HTTP and XHR queries sent by the program during automated processes. The first three sliders allow you to set respectively:
- the maximum time the program can wait between the moments when the query is sent to a server and the first byte of data is received from the server
- the maximum time allowed between the first data received from the server and the moment when the page is fully loaded
- the maximum time allowed after the page is fully loaded, for OutWit extraction processes on this page
The following two couples of sliders set the temporization (pace) of the automatic exploration/fast scraping processes. If minimum and maximum values are different, the program will wait for a random amount of time between these limits.
The last sliders allow you to pause the automatic application of scrapers to a series of URLs (fast Scraping) for a given amount of time after a given number of queries.