Any Page Parsing Plugin
Introduction
This plugin allows you to receive page content, collect data in manual mode, or save templates to automatically collect data from many pages of the same type.
Important
We cannot guarantee that all pages will be received correctly. Many marketplaces or just large projects use protection against bots and parsing.
Attention: Pages made in bubble.io (and possibly other no-code projects) cannot be processed due to the peculiarities of their system.
How To Setup
Place the Page Preview element on the page.
One Page Parsing
Place an input element on the page. A link to the page that we will process will be entered in this field.

2. Place an PagePreview element on the page. In this element will be placed content of parsed page.

3. Add new workflow. You need to pass the link to the plugin action "Web Page Parser - Get HTML From One URL"


4. Set 2 states of our PagePreview element:




5. Passing states to our element.

6. For further actions, we need 3 data tables.
Data table for templates with one field: Title ( type: Text )

Data table for template fields: 1. Template ID ( type: Text ) - Here we will store the template id from the Templates table. 2. Class List ( type: Text ) - Here we will store data about position of field on the parsed page. 3. Title ( type: Text ) - Here we will store title of field.

Data table for fields selected direct from PagePreview element: 1. Title ( type: Text ) - Here we will store title of selected field. 2. Text ( type: Text ) - Here we will store text content of selected field.

7. Add new workflow

For Collecting Data From Page


For Creating Template
Need to create an template

2. Add Field To Template

Optional Functions
Finding tags on page
Add an dropdown to the page
Add new step in workflow
Create new state
Set value of this state
Set source of dropdown to this state
Add new workflow

Now on change tag in dropdown all elements of the parsed page with this tag will be hightlighted
Multiple Page Parsing
If the page was opened with single parsing, it is not a fact that with multiple parsing, requests will not be blocked for incomprehensible activity from ip.
For multiple parsing of pages, you need at least 1 template and a csv file with links.
Example of correct .csv file: Download
Place an fileuploader at the page

2. Add new workflow

3. Add action from plugin

5. Set link from fileuploader

6. Add step in workflow

7. Set data of action

8. Now return data in convenient format
As link to .csv file with data



As formatted text



Plugin Elements Properties
The plugin proprieties :
HTML Code Head - this field get HTML Code of "head" element from parsed page. Value for this field is returned from action "Get HTML From One URL".
HTML Code Body - this field get HTML Code of "body" element from parsed page. Value for this field is returned from action "Get HTML From One URL".

Page Preview Actions
Color Tags
- color on the parsed page elements with the given tag name.Unset selected fields
- reload page and discard all selected elements to default.

Page Preview Events
Click
- Trigger event when Page Preview is clicked
Page Preview States
Field Text
- return inner text of clicked element from Page Preview elementClass List
- return full path of element in DOM Tree
Plugin Actions
Get HTML From One URL

Input Fields
Url
- Full URL (includinghttp://
orhttps://
) to the page to be parsed.Type: Text
Returned Values
JSON Object with fields:
Head - return all HTML code as text of
<head></head>
tag from parsed page.Type: Text
Body - return all HTML code text of
<body></body>
tag from parsed page.Type: Text
Tags - return an list of strings containing all tags from page.
Type: List of Text
Get Number Of Tags

Input Fields
HTML Code
- the HTML code in which you need to find the tags.Type: Text
Tag Name
- name of the tag to find.Type: Text
Returned Values
JSON Object with fields:
Num of entries - return number of tags in HTML Code.
Type: Number
Generate Download Link From Data
This action is similar to "Generate Download Link From Multiple Parse"

Input Fields
Content
- List of text containts content to write in file.Type: List of Text
Title
- Name of rows.Type: List of Text
File Title
- Download file name.Type: Text
Returned Values
JSON Object with fields:
Link - return an html
<a href="ContentInBase64String" download="File Title.csv">Download CSV</a>
tag as text.Type: Text
Extract Links From CSV

Input Fields
CSV File
- CSV file ( format.csv
) with one column containts links. Links must be complete ( includehttp://
orhttps://
)Type: File
Returned Values
JSON Object with fields:
URL's - return an list urls from uploaded file
Type: List of Text
Get Data From Multiple URL

Input Fields
URL
- List of links you want to parse. Action "Extract Links FROM CSV" return necessary value.Type: List of Text
Classes
- Path list of elements in DOM tree.Type: List of Text
Returned Values
JSON Object with fields:
Data - return an list of inner text of elements with Path indicates in
Classes
Type: List of Text
Generate Download Link From Multiple Parse

Input Fields
Data
- Data's to write in CSV file . Action "Get Data From Multiple URL" return necessary values.Type: List of Text
Fields
- Name of columns.Type: List of Text
File Title
- Download file name.Type: Text
URLs
- List of links from which the data was taken. Action "Extract Links From CSV" return necessary value. Is write in first cell in row to assign it with parsed page.Type: List of Text
Returned Values
JSON Object with fields:
Link - return an html
<a href="DataInBase64String" download="File Title.csv">Download CSV</a>
tag as text.Type: Text
Generate Output

Input Fields
Fields
- List of names of fields( ex. from template )Type: List of Text
URLs Data
- Data from parsed pages. Action "Get Data From Multiple URL" return necessary value.Type: List of Text
Returned Values
JSON Object with fields:
Generated Text - return an list of texts
Field: URLs Data for this field
. Each field of object is text for one field.Type: List of Text
Troubleshooting
Pages created on bubble.io ( and posible on other no-code platforms ) is not supported.
Many marketplaces or just large projects use protection against bots and parsing.
Posible problems with drawing page in Page Preview element.
Error handling is in process.
Demo to preview the settings:
Last updated
Was this helpful?