Migration PHP

Use Markdown to optimize websites for AI crawlers

The website is read and analyzed by AI crawlers in a fraction of a second. However, this process can be accelerated even further if TYPO3 returns Markdown instead of HTML to the AI ​​bot.

After all, Markdown contains no unnecessary visual or functional resources, focusing instead on the content. Furthermore, the multitude of HTML tags can easily be eliminated during transmission.

Setting this up requires only a small piece of middleware within your site template extension. In the example below, the website immediately returns Markdown instead of HTML if the requesting browser or crawler supports Markdown:

<?php declare(strict_types=1); namespace In2code\In2template\Middleware; use League\HTMLToMarkdown\HtmlConverterInterface; use Psr\Http\Message\ResponseInterface; use Psr\Http\Message\ServerRequestInterface; use Psr\Http\Message\StreamFactoryInterface; use Psr\Http\Server\MiddlewareInterface; use Psr\Http\Server\RequestHandlerInterface; /** * Returns a Markdown representation of the rendered page when the client * announces `Accept: text/markdown`. The HTML pipeline runs unchanged; this * middleware only post-processes the resulting body. */ final class MarkdownContentNegotiation implements MiddlewareInterface { private const string ACCEPT_TOKEN = 'text/markdown'; private const string CONTENT_WRAPPER_ID = 'content'; private const int TOKEN_CHAR_RATIO = 4; public function __construct( private readonly HtmlConverterInterface $htmlConverter, private readonly StreamFactoryInterface $streamFactory, ) { } public function process(ServerRequestInterface $request, RequestHandlerInterface $handler): ResponseInterface { $response = $handler->handle($request); if ($this->isMarkdownNegotiated($request) && $this->isHtmlResponse($response)) { $response = $this->convertResponseToMarkdown($response); } return $response->withAddedHeader('Vary', 'Accept'); } private function isMarkdownNegotiated(ServerRequestInterface $request): bool { return str_contains($request->getHeaderLine('Accept'), self::ACCEPT_TOKEN); } private function isHtmlResponse(ResponseInterface $response): bool { return str_contains($response->getHeaderLine('Content-Type'), 'text/html'); } private function convertResponseToMarkdown(ResponseInterface $response): ResponseInterface { $html = (string)$response->getBody(); $contentFragment = $this->extractContentFragment($html); $markdown = trim($this->htmlConverter->convert($contentFragment)); return $response ->withHeader('Content-Type', 'text/markdown; charset=utf-8') ->withHeader('X-Markdown-Tokens', (string)$this->estimateTokenCount($markdown)) ->withHeader('ETag', '"' . md5($markdown) . '"') ->withoutHeader('Content-Length') ->withBody($this->streamFactory->createStream($markdown)); } private function extractContentFragment(string $html): string { $fragment = $html; $previousState = libxml_use_internal_errors(true); $dom = new \DOMDocument(); if ($dom->loadHTML('<?xml encoding="UTF-8">' . $html, LIBXML_NOERROR | LIBXML_NOWARNING)) { $contentNode = $dom->getElementById(self::CONTENT_WRAPPER_ID); if ($contentNode !== null) { $fragment = (string)$dom->saveHTML($contentNode); } } libxml_clear_errors(); libxml_use_internal_errors($previousState); return $fragment; } private function estimateTokenCount(string $markdown): int { return (int)ceil(mb_strlen($markdown) / self::TOKEN_CHAR_RATIO); } }

For this to work, however, you still need a third-party package that you can easily add via composer.json:

{ "name": "in2code/cms-boilerplate", "description": "in2code GmbH TYPO3 CMS Boilerplate", "license": "GPL-2.0", "require": { "league/html-to-markdown": "^5.1", ...

Tip: You can test for yourself whether it actually works using a simple curl command:

curl -ks -H 'Accept: text/markdown' https:// local.website.de/

TYPO3: Finding unused files in fileadmin

Do you want to delete unused or orphaned files in fileadmin or another storage location? Unfortunately, there's no direct core functionality for this. But a small command in your site package can...

Go to news
Code

TYPO3: Editors with individual user_upload folders

Perhaps you're familiar with this client requirement? Editors should be able to add videos using the "Add media by URL" button. But the files shouldn't be located in fileadmin/user_upload/, but rather...

Go to news

TYPO3: Finding pages in mixed mode

In TYPO3, Mixed Mode refers to translated pages that contain content only partially related to the corresponding content in the main language. This is indicated in the backend by an error message. But...

Go to news
Code of the TYPO3 extension powermail

Extbase Extensions: Think extensibility with data, site and language

Today, I have a small request for the TYPO3 extension authors out there: Make sure your extensions are extensible. This will also promote the distribution of the corresponding plugins.

Go to news

SQL: Show all tables sorted by size in descending order

Lately I've been using the SQL command more often to find out which tables in the TYPO3 database are the largest. I've published the snippet once.

Go to news
Hände tippen auf Laptop

TYPO3 12 with CKEditor 5: Styles in a single selection

If you set a link in the RTE in TYPO3, you may have to choose between different link classes, for example to create buttons in the frontend. What's new in TYPO3 12 is that you can select not just one...

Go to news
Computerscreen with JavaScript code