A transformer that uses the Mozilla Readability library to extract the main content from a web page.

Example

const loader = new CheerioWebBaseLoader("https://example.com/article");
const docs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
maxCharacterCount: 5000,
});
const transformer = new MozillaReadabilityTransformer();

// The sequence processes the loaded documents through the splitter and then the transformer.
const sequence = splitter.pipe(transformer);

// Invoke the sequence to transform the documents into a more readable format.
const newDocuments = await sequence.invoke(docs);

console.log(newDocuments);

Hierarchy

Constructors

Properties

Methods

Constructors

Properties

options: Options = {}

Methods

  • Parameters

    • documents: Document<Record<string, any>>[]

    Returns Promise<Document<Record<string, any>>[]>

Generated using TypeDoc