OutWit Hub Screenshots

The Page View Among many other things, OutWit Hub is a browser: the page view allows you to simply navigate, as you would with any Web browser, to the sites containing the data you wish to extract. Around the Web page, the interface includes a log panel at the top, a side panel on the left, with the various extraction views and automators, and the Catch at the bottom where you can store all data or media before you export it to your hard disk or upload it to an ftp server in one of the dozen file formats available.
The Images View OutWit Hub views are accessible through the left side panel. The images view contains the URLs and a thumbnail preview of images in various formats like JPG, PNG, GIF, SVG, BMP, TIF, etc.,that were found in the current page. You can of course download these images to your hard disk as you can automate the downloading of (all or selected) images, found on whole lists of URLs. You can also export the image URLs and info as a table: On the right side of the datasheet, like in every view, the Export panel allows you to preview the various export formats available. In this screenshot: the HTML output, with thumbnails of each image and a larger popup when you hover over them.
The Contacts View When applying this extractor, the program scans the source code of the current page using a number of different strategies to recognize contact records with as many fields as possible: email address, phone numbers, physical address, contact name, photo portrait, company, meta description, keywords, etc.. The contact extractor can be applied to a single page, to a series of pages (like SERPs, for instance), to a list or URLs in a directory of the queries view, to a whole Website, etc.
The Documents View The documents view contains the URLs of document files of various formats like PDF, CSV, Excel, Word, PowerPoint, etc., that were found in the current page. You can download these from the current page to your hard or you can automate the downloading of (all or selected) documents, from whole lists of URLs. You can also export the document URLs and info as a table in Excel, CSV, SQL or a number of other widely used file formats.
The Tables View Extracts and displays the data presented as HTML tables in the current page or in a list of pages, inserting the most pertinent URL of each row before the first column of data in the datasheet.
The List View Extracts and displays the data presented as HTML lists in the current page or in a list of pages, keeping the hierarchical organization of the lists and inserting the most pertinent URL of each row as the first column in the datasheet.
The Guess View When the data in the Webpage is not presented as an HTML table or list and before you create a custom scraper for this page (see next view), OutWit Hub can try to guess the scructure of the data and do the extraction automatically. Like for any automatic recognition algorithm, this will not work in all cases and the resulting format may not be exactly like what you want, but it may be a simple alternative to creating a scraper.
The Scraped View This is the view that receives the data extracted by custom scrapers. A custom scraper is a simple template you can create and edit within OutWit, that allows the program to locate, extract, transform and save the data of a given Website, exactly as you want it. You can apply a scraper to the page you are viewing with the browser or to a whole list of URLs (dozens or hundreds of thousands) that you can import to the queries view.
The Links View The links view contains all the Web links found in the current page. Dragging the horizontal divider up or down, you can view the extracted data as well as the page.
The Text View Displays the content of the current page as simple text.
The Words View Presents all the different words and recurring groups of two, three or four words, and sorts them by frequency in the current page or in the series of pages that were explored while the words extractor was active.
The News View Explores the current page and pertinent links within it, to find RSS feeds and display them as records in the view's datasheet.
The Source View OutWit Hub source view can display three different types of source code that you can use for scraping the current page, depending on the type of the site (and of the OutWit Hub edition you are using). Below the rendered page you can see the source code, colorized for data miners, not for programers, so that you can easily view data elements, links, images, text, etc. The Original source (on a white background) is what was first received by the browser when loading the Webpage. The dynamic source (pale yellow background) is the code as altered by scripts after the page was loaded in AJAX sites. The Expert and Enterprise editions can also display the dynamic source as a concatenation of all the frames composing the page (light green background).
The String Generation Panel The String Generation Panel is a string editor with which you can generate, edit URLs or any other series of strings using a simple syntax to define ranges of numbers or letters, lists of values, etc.
The Search Query Builder Added in version 8, the Search Query Builder allows you to generate, edit and send multiple-criteria search URLs for the most used search engines. (Available in the Expert and Enterprise editions.)