Back

Build a Web Crawler in Minutes with Crawl4AI — Convert Sites to Markdown

avatar
21 Nov 20255 min read
Share with
  • Copy link

Want to turn any website into clean Markdown in minutes?

Ever needed a whole website as simple Markdown files? It can take hours to copy and clean pages by hand. Crawl4AI can do that for you. It is a small tool that crawls pages, extracts text, and saves each page as Markdown. You can set how deep it goes and how many pages to get. If you want fast results, go use Crawl4AI and try a short crawl.

Quick one-line summary of what Crawl4AI does

Crawl4AI is an open tool that works as a web crawler to fetch pages and give you a neat markdown export for each page. It helps with simple web scraping and offline reading.

Who benefits most (researchers, devs, content creators)

This tool helps many people. Here are the main groups who get value:

  • Researchers: Save site content for study and notes.
  • Developers: Make a quick dataset for testing or docs.
  • Content creators: Pull site text to edit or repurpose.
  • Learners: Keep a readable offline copy of tutorials and guides.

What you'll learn in this short guide

In this short guide, you will see the main parts of how the tool works. You will learn about crawl settings, three crawl strategies, and how to get your files as Markdown or a ZIP. You will also see some small quirks to watch out for. All steps are simple and clear.

Note: The tool has a simple UI where you enter a URL and choose options like max depth and total pages. You can include external links and add keywords to guide the crawl.

| Strategy | What it does | When to use | Notes | | --- | --- | --- | --- | | BFS (Breadth-First Search) | Explore all links at the same depth before going deeper | Good for wide site maps and even coverage | Covers many pages at top levels | | Best-first search | Rank pages by a keyword relevance score and visit best pages first | Best when you have topics or keywords to focus on | Uses keyword scoring to guide order | | DFS (Depth-First Search) | Go deep into a path before backtracking | Rare for full site crawls; useful for deep single paths | May miss broad pages if depth is limited |

The three strategies matter. Use BFS when you want many top pages. Use best-first search if you have keywords. Use DFS only for deep, narrow crawling.

The best-first mode uses a small score. The tool looks at your keywords and ranks links by how close the text matches. Then it asks the crawler to fetch the highest scored pages first. This helps when the site is big and you only want the most relevant pages.

The crawler runs in an async loop. It streams results back as pages arrive. A progress callback updates the UI so you see which pages finished. This makes long crawls feel live and clear.

After a crawl, you can download the pages in two ways. One way is a single long Markdown file with all pages joined together. The other way is a ZIP file with one Markdown file per page. The ZIP keeps each page as its own file so you can open them separately.

A few small technical notes. The tool may use a headless browser under the hood to read pages. It creates a config with the chosen strategy, depth, include rules, and a streaming flag. Then it runs an async crawler and collects pages as they arrive.

"I built this app because I kept needing to crawl websites and turn them into Markdown."

Practical tips: Set a small max pages value the first time. Try depth 2 or 3 to avoid long runs. If you use keywords, best-first will favor pages that match them.

Watch for two quirks. First, a small off-by-one can happen with the max pages count. If you ask for five pages, you may get four. The fix is simple: add one to the max when you run a crawl. Second, DFS may not be stable in some builds. If depth-first seems broken, use BFS or best-first search instead.

If you want the code, look for the project called Go Fetch on GitHub. That repo has the UI wrapper and the crawler logic. The code shows how to build the strategy, run the async crawler, and stream page results to the UI.

To get started right now: pick a site, set max depth and pages, turn on external links if you want outside pages, add keywords if you have a topic, then run the crawl. When it finishes, download a ZIP or a combined Markdown file. Go use Crawl4AI and test a small site first.

Keywords to remember: Crawl4AI, web crawler, markdown export, web scraping, best-first search, BFS, DFS, GitHub, Go Fetch.

What is Crawl4AI and why choose it for crawling?

Want to grab a site's pages and turn them into neat text files fast? Crawl4AI is a simple web crawler that does just that. It visits pages, reads them, and saves them as markdown export. The tool is easy to run and good for quick web scraping work.

Open-source crawler overview and purpose

Crawl4AI is open-source. That means anyone can read the code. It uses a crawl core to fetch pages and a parser to make markdown. It streams results so you see pages as they come. You can set depth, max pages, and include or skip external links.

Main advantages: markdown export, streaming results, simplicity

  • Markdown export: Gets clean .md files ready to read or edit.
  • Streaming results: See pages as they finish. Good for large sites.
  • Simplicity: Easy settings. Works without heavy setup.
  • Search strategies: Supports best-first search, BFS, and DFS.

When Crawl4AI is the right tool vs. other crawlers

Use Crawl4AI when you want quick markdown files and simple control. It is great for small scraping jobs, fast site snapshots, and content export. The project is on GitHub under the name Go Fetch if you want the code.

| Method | How it works | Best for | | --- | --- | --- | | BFS (Breadth-first) | Explore all links at one depth before going deeper | Covers site sections evenly; finding many shallow pages | | Best-first | Score pages by keywords and visit high-score first | Targeted crawls when you have keywords | | DFS (Depth-first) | Go deep into a path before backtracking | Rarely needed for site maps; deep content follow-up |

How Crawl4AI works under the hood (easy-to-follow)

Want to turn a whole website into clean markdown fast? This guide shows how Crawl4AI does it. Read simple steps about title picking, crawl styles, keyword ranking, and how results stream back while you watch.

Title extraction and safe filename creation

Crawl4AI finds the page title by looking for the first H1 tag. If it finds none, it falls back to the URL. Then it makes a safe filename. It keeps only letters, numbers and a few symbols. Other characters become underscores. This keeps each file name valid when you save the markdown export.

Three crawl strategies: Breadth-First, Best-First, Depth-First

| Strategy | How it works | Best for | | --- | --- | --- | | BFS (Breadth-First) | Explore all links at one depth before going deeper | Site maps and wide coverage | | Best-First | Score pages by keyword relevance and visit high-score pages first | Targeted web scraping with keywords | | DFS (Depth-First) | Go deep along a path before backtracking | Deep article chains (rare) |

Keyword scoring for Best-First ranking

In the best-first search, pages get a score from the keywords you give. Pages with more matching keywords get higher scores. The crawler uses this score to pick the next page. This makes targeted scraping quick and smart.

Async crawling loop, progress callbacks, and result streaming

The crawler runs as an async loop. Pages come back in chunks. Each chunk updates a progress UI with a callback. This shows live results and a percent done. When finished, you can download a zip of markdown files. The code also supports verbose logs and skipping external links.

Try Crawl4AI now. Get it on GitHub (search for Go Fetch) and start converting sites to markdown. It saves time for any web crawler or web scraping task.

Using the UI: settings, preview and export options

Want to turn a website into easy-to-read files? The UI makes it simple. Enter a URL, pick how many pages to grab, and set how deep the crawler should go. You can choose the crawl strategy: BFS (breadth-first), best-first search, or DFS (depth-first). Check the box to include external links or add keywords to guide the crawl. There is also a verbose option to show more detail while it runs.

Sidebar controls: URL, max pages, max depth, strategy, external links

The sidebar is where you build the job. Type the site address. Use sliders or boxes for max pages and max depth. Pick the strategy from the dropdown. Toggle external links to collect outside pages. Add keywords to favor pages that match them. These settings form the Crawl4AI crawler rules.

Previewing results, download as single Markdown or zipped files

After the run, open the preview area to read pages as Markdown. The app shows progress as pages arrive. You get two download choices: one big combined Markdown file, or a ZIP with each page as its own Markdown. Note: sometimes the app saves one fewer page than requested. If that happens, add one extra to your max pages as a quick fix.

Tips for organizing exported files (flat ZIP vs. nested paths)

By default, exports are flat — every page is one file in the ZIP. You can also create nested folders that match the site path. Nested folders keep related pages together and feel like the original site. Flat ZIPs are quick and easy to open.

| Flat ZIP | Nested Paths | When to use | | --- | --- | --- | | All files in one folder | Files in folders by URL path | Quick review or small sites | | Fast to create | More organized, mirrors site | Large sites or long-term archive | | Easy to search locally | Helps rebuild structure | When you need clear sections |

Ready to try it? Use Crawl4AI now to crawl a site and get clean markdown export files for editing or archiving.

Quick start, common quirks and where to get the code

Want to turn a whole website into neat notes fast? Try the Crawl4AI tool. It is an open-source web crawler made to grab pages and do a clean markdown export. You give it a URL, set how many pages and how deep to go, then press crawl. It can include external links and use keywords to guide the crawl. The app shows progress as it runs, and then lets you download either a single long markdown file or a zip of separate markdown files. This is handy for simple web scraping tasks.

Recommended starter settings (e.g., pages=5, depth=2, Best-First)

Use these simple starter settings: pages = 5, depth = 2, and choose Best-First. Best-First uses a keyword score to pick pages. It is the most useful for grabbing the best content fast. You can also think of it like BFS or best-first search, not the same as DFS.

Known issues: off-by-one page count and depth-first limitations

Two quirks to watch for. First, the tool sometimes returns one less page than the max you set. This is an index off-by-one bug. Second, the depth-first mode may be unreliable. The best-first mode and external link option work well.

Workarounds and simple fixes (adjust page count or inspect index)

Quick fixes: add +1 to your page limit to avoid the off-by-one issue. If you need a true depth-first crawl, check the code for the index loop and fix the off-by-one. For most jobs, use Best-First and keywords — it finds the useful pages fast.

If the crawler gives one less page, add one to the page count as a quick fix.

Download and run Crawl4AI now from the Go Fetch GitHub repo

Want to try it? Get the code named Go Fetch on GitHub. Clone or download the repo, run the setup, and start crawling. This is a simple way to collect site content as markdown for notes, research, or training data. Try it and see how Crawl4AI speeds up your web scraping tasks.

Related articles