Any Page Parsing Plugin

Introduction

This plugin allows you to receive page content, collect data in manual mode, or save templates to automatically collect data from many pages of the same type.

Important

How To Setup

Place the Page Preview element on the page.

One Page Parsing

  1. Place an input element on the page. A link to the page that we will process will be entered in this field.

2. Place an PagePreview element on the page. In this element will be placed content of parsed page.

3. Add new workflow. You need to pass the link to the plugin action "Web Page Parser - Get HTML From One URL"

Plugin action "Web Page Parser - Get HTML From One URL"
Example of workflow

4. Set 2 states of our PagePreview element:

First State Settings
Set Value Of First State
Second State Settings
Set Value Of Second State

5. Passing states to our element.

6. For further actions, we need 3 data tables.

  • Data table for templates with one field: Title ( type: Text )

Example of table for templates
  • Data table for template fields: 1. Template ID ( type: Text ) - Here we will store the template id from the Templates table. 2. Class List ( type: Text ) - Here we will store data about position of field on the parsed page. 3. Title ( type: Text ) - Here we will store title of field.

Example of table to templates data
  • Data table for fields selected direct from PagePreview element: 1. Title ( type: Text ) - Here we will store title of selected field. 2. Text ( type: Text ) - Here we will store text content of selected field.

Example of table for data

7. Add new workflow

For Collecting Data From Page

1
2

Note: You can pass the name for the data as you like (for example, do everything through a popup with an input field)

For Creating Template

  1. Need to create an template

2. Add Field To Template

Example of Workflow To Add Fields to an template

We got the unique id directly from the database. You can get them from wherever you like, for example, as on the demo page, from dropdown.

Optional Functions

Finding tags on page

  1. Add an dropdown to the page

  2. Add new step in workflow

  3. Create new state

  4. Set value of this state

  5. Set source of dropdown to this state

  6. Add new workflow

Now on change tag in dropdown all elements of the parsed page with this tag will be hightlighted

Multiple Page Parsing

  1. Place an fileuploader at the page

2. Add new workflow

3. Add action from plugin

5. Set link from fileuploader

6. Add step in workflow

7. Set data of action

The template can be chosen in any way convenient for you (see demo page).

8. Now return data in convenient format

  • As link to .csv file with data

1
2
3. Return generated link to an HTML element
  • As formatted text

1
2
Example of output

Plugin Elements Properties

The plugin proprieties :

HTML Code Head - this field get HTML Code of "head" element from parsed page. Value for this field is returned from action "Get HTML From One URL".

HTML Code Body - this field get HTML Code of "body" element from parsed page. Value for this field is returned from action "Get HTML From One URL".

Page Preview proprietes

Page Preview Actions

  • Color Tags - color on the parsed page elements with the given tag name.

  • Unset selected fields - reload page and discard all selected elements to default.

Color Tags Action

Page Preview Events

  • Click - Trigger event when Page Preview is clicked

Page Preview States

  • Field Text - return inner text of clicked element from Page Preview element

  • Class List - return full path of element in DOM Tree

Plugin Actions

Get HTML From One URL

Input Fields

  • Url - Full URL (including http:// or https://) to the page to be parsed.

    • Type: Text

Returned Values

JSON Object with fields:

  • Head - return all HTML code as text of <head></head> tag from parsed page.

    • Type: Text

  • Body - return all HTML code text of <body></body> tag from parsed page.

    • Type: Text

  • Tags - return an list of strings containing all tags from page.

    • Type: List of Text

Get Number Of Tags

Input Fields

  • HTML Code - the HTML code in which you need to find the tags.

    • Type: Text

  • Tag Name - name of the tag to find.

    • Type: Text

Returned Values

JSON Object with fields:

  • Num of entries - return number of tags in HTML Code.

    • Type: Number

This action is similar to "Generate Download Link From Multiple Parse"

Input Fields

  • Content - List of text containts content to write in file.

    • Type: List of Text

  • Title - Name of rows.

    • Type: List of Text

  • File Title - Download file name.

    • Type: Text

Returned Values

JSON Object with fields:

  • Link - return an html <a href="ContentInBase64String" download="File Title.csv">Download CSV</a> tag as text.

    • Type: Text

Input Fields

  • CSV File - CSV file ( format .csv ) with one column containts links. Links must be complete ( include http:// or https:// )

    • Type: File

Returned Values

JSON Object with fields:

  • URL's - return an list urls from uploaded file

    • Type: List of Text

Get Data From Multiple URL

Input Fields

  • URL - List of links you want to parse. Action "Extract Links FROM CSV" return necessary value.

    • Type: List of Text

  • Classes - Path list of elements in DOM tree.

    • Type: List of Text

Returned Values

JSON Object with fields:

  • Data - return an list of inner text of elements with Path indicates in Classes

    • Type: List of Text

Input Fields

  • Data - Data's to write in CSV file . Action "Get Data From Multiple URL" return necessary values.

    • Type: List of Text

  • Fields - Name of columns.

    • Type: List of Text

  • File Title - Download file name.

    • Type: Text

  • URLs - List of links from which the data was taken. Action "Extract Links From CSV" return necessary value. Is write in first cell in row to assign it with parsed page.

    • Type: List of Text

Returned Values

JSON Object with fields:

  • Link - return an html <a href="DataInBase64String" download="File Title.csv">Download CSV</a> tag as text.

    • Type: Text

Generate Output

Input Fields

  • Fields - List of names of fields( ex. from template )

    • Type: List of Text

  • URLs Data - Data from parsed pages. Action "Get Data From Multiple URL" return necessary value.

    • Type: List of Text

Returned Values

JSON Object with fields:

  • Generated Text - return an list of textsField: URLs Data for this field . Each field of object is text for one field.

    • Type: List of Text

Troubleshooting

  1. Pages created on bubble.io ( and posible on other no-code platforms ) is not supported.

  2. Many marketplaces or just large projects use protection against bots and parsing.

  3. Posible problems with drawing page in Page Preview element.

  4. Error handling is in process.

Demo to preview the settings:

Last updated

Was this helpful?