I am trying to scape the help pages for different platforms, Can you point me to documentation on how this works please?
Does adding the top page filter down?
I am trying to scape the help pages for different platforms, Can you point me to documentation on how this works please?
Does adding the top page filter down?
Sometimes it scrapes the child pages, sometimes it does not. It differs from website to website. We are working on improving it to scrape more consistently. After you enter the URL, it will show you all the pages it successfully pulled with checkboxes next to them and you can select/deselect.
That is a really shame, I am told that tools like intercom do a really good job of scraping website like gitbook. It’s kind of vital that a tool like this can keep up with the most up to date information.
Hi @mrzang,
I’ve just pushed an update to our scraper. It now prioritizes the website’s sitemap.xml links, which should improve performance for sites using GitBook.
That’s amazing I will test it out
@admin_mike and @stephenasuncion this now is working really well. Just fantastic! but is highlighting an issue with the document limit.
I can understand a 50 docs limit for PDFs of 99mb, but the gitbook site I want to import has 180 urls and each page is quite small.
Is there any way you could change the limit for urls ? To something like 250 pages? I have looking into scrapping them using make.com and converting to a pdf but it’s going to a huge effort. I love this tool but this limit is a blocker for mini experts for rapidly changing tools with rapidly changing documentation’s (like pickaxe : )
The 50 page limit per tool is just for Gold customers. Pro tier has unlimited uploads for tools.