The Page View

Among many other things, OutWit Hub is a browser: the page view allows you to simply navigate, as you would with any Web browser, to the sites containing the data you wish to extract.

Around the Web page, the interface includes a log panel at the top, a side panel on the left, with the various extraction views and automators, and the Catch at the bottom where you can store all data or media before you export it to your hard disk or upload it to an ftp server in one of the dozen file formats available.

OutWit Hub, Page view

The Images View

OutWit Hub views are accessible through the left side panel.

The images view contains the URLs and a thumbnail preview of images in various formats like JPG, PNG, GIF, SVG, BMP, TIF, etc.,that were found in the current page.

You can of course download these images to your hard disk as you can automate the downloading of (all or selected) images, found on whole lists of URLs.

You can also export the image URLs and info as a table:
On the right side of the datasheet, like in every view, the Export panel allows you to preview the various export formats available. In this screenshot: the HTML output, with thumbnails of each image and a larger popup when you hover over them.

OutWit Hub standard extractors: the images view

The Contacts View

When applying this extractor, the program scans the source code of the current page using a number of different strategies to recognize contact records with as many fields as possible: email address, phone numbers, physical address, contact name, photo portrait, company, meta description, keywords, etc..

The contact extractor can be applied to a single page, to a series of pages (like SERPs, for instance), to a list or URLs in a directory of the queries view, to a whole Website, etc.

OutWit Hub standard extractors: the contacts view

The Documents View

The documents view contains the URLs of document files of various formats like PDF, CSV, Excel, Word, PowerPoint, etc., that were found in the current page.

You can download these from the current page to your hard or you can automate the downloading of (all or selected) documents, from whole lists of URLs.

You can also export the document URLs and info as a table in Excel, CSV, SQL or a number of other widely used file formats.

OutWit Hub standard extractors: the documents view

The Tables View

Extracts and displays the data presented as HTML tables in the current page or in a list of pages, inserting the most pertinent URL of each row before the first column of data in the datasheet.

OutWit Hub standard extractors: the tables view

The List View

Extracts and displays the data presented as HTML lists in the current page or in a list of pages, keeping the hierarchical organization of the lists and inserting the most pertinent URL of each row as the first column in the datasheet.

OutWit Hub standard extractors: the list view

The Guess View

When the data in the Webpage is not presented as an HTML table or list and before you create a custom scraper for this page (see next view), OutWit Hub can try to guess the scructure of the data and do the extraction automatically.

Like for any automatic recognition algorithm, this will not work in all cases and the resulting format may not be exactly like what you want, but it may be a simple alternative to creating a scraper.

OutWit Hub standard extractors: the guess view

The Scraped View

This is the view that receives the data extracted by custom scrapers.

A custom scraper is a simple template you can create and edit within OutWit, that allows the program to locate, extract, transform and save the data of a given Website, exactly as you want it.

You can apply a scraper to the page you are viewing with the browser or to a whole list of URLs (dozens or hundreds of thousands) that you can import to the queries view.

OutWit Hub standard extractors: the scraped view

The Text View

Displays the content of the current page as simple text.

OutWit Hub standard extractors: the text view

The Words View

Presents all the different words and recurring groups of two, three or four words, and sorts them by frequency in the current page or in the series of pages that were explored while the words extractor was active.

OutWit Hub standard extractors: the words view

The News View

Explores the current page and pertinent links within it, to find RSS feeds and display them as records in the view's datasheet.

OutWit Hub standard extractors: the news view

The Source View

OutWit Hub source view can display three different types of source code that you can use for scraping the current page, depending on the type of the site (and of the OutWit Hub edition you are using).

Below the rendered page you can see the source code, colorized for data miners, not for programers, so that you can easily view data elements, links, images, text, etc.

The Original source (on a white background) is what was first received by the browser when loading the Webpage. The dynamic source (pale yellow background) is the code as altered by scripts after the page was loaded in AJAX sites. The Expert and Enterprise editions can also display the dynamic source as a concatenation of all the frames composing the page (light green background).

OutWit Hub standard extractors: the source view

The String Generation Panel

The String Generation Panel is a string editor with which you can generate, edit URLs or any other series of strings using a simple syntax to define ranges of numbers or letters, lists of values, etc.

OutWit Hub, String Generation Panel

The Search Query Builder

Added in version 8, the Search Query Builder allows you to generate, edit and send multiple-criteria search URLs for the most used search engines. (Available in the Expert and Enterprise editions.)

OutWit Hub, Search Query Builder