Language models can reliably rewrite and improve business documents, but only if the user gives the machine very specific instructions. Without precise guidelines, artificial intelligence tools often introduce factual errors and clumsiness, demonstrating that professional human editors are still needed for workplace communications. These findings recently writing research journal.
The rapid adoption of generative artificial intelligence is causing widespread anxiety in the writing and publishing industries. Many copywriters and translators worry that automated tools will eventually make their profession obsolete. More and more organizations are turning to digital tools for business communications, marketing materials, and internal reporting.
Previous experiments have shown that language models like ChatGPT can increase productivity and improve grammar for basic writing tasks. However, writing on behalf of an organization is different from writing an expressive personal essay. Organizational documents serve as a collective output that expresses a company’s identity and facilitates day-to-day operations.
Creating these documents requires understanding workplace dynamics, technical regulations, and the company’s preferred atmosphere. Corporate documents often have multiple authors, which can lead to inconsistent messages. Companies frequently hire professional outside editors to disentangle these conflicting voices and simplify complex legal or technical information for everyday readers.
Daniel Janssen, a researcher at Utrecht University in the Netherlands, wanted to know if it was possible to replicate this special kind of editorial intuition in a machine. Janssen and his colleagues Henri Raven, Lisanne van Weelden, and Johannes den Hertog designed an experiment to compare the software to experienced human experts. They sought to determine whether the software could independently apply the same level of nuance and audience awareness to everyday corporate documents.
The research team divided the experiment into two phases. In the first stage, we observed three professional editors, each with over 20 years of industry experience. The researchers gave participants four Dutch business letters and asked them to make the sentences “good.” The original letters came from a variety of organizations and covered topics such as maternity leave policies, sick pay, and scheduling.
The researchers recorded the editors’ computer screens as they worked. Immediately after the revisions were completed, the study authors interviewed the editors. They used a technique called stimulated recall, in which editors watched screen recordings and described what they were thinking as they typed. Editors consistently focused on improving the overall tone, replacing formal jargon with familiar language, and reorganizing text so that the most urgent information appears at the top of the page.
In the second stage, the researchers asked ChatGPT to rewrite these exact same characters. They utilized three different prompts to see how different teaching strategies affected the machine’s output. The initial instructions were intentionally simple, asking the software to make the text “reader-focused.”
The second prompt asked the software to rewrite the text to the “B1” language level. These instructions refer to the Common European Framework of Reference for Languages. A B1 rating represents intermediate language proficiency. This is the standard reading level that most mass market communications are aimed at. The third prompt was a specialized eight-step instruction designed to simulate the exact workflow described by the human editor during the interview.
To evaluate the results, the researchers used specialized reading analysis software to check the readability of the Dutch texts. This digital tool measured syntax, semantic meaning, and level of personal engagement with the text. The researcher also conducted a qualitative review to check each draft for factual accuracy and appropriate wording.
Human editors greatly improved the readability of the original letter. They utilized shorter sentences, incorporated active verbs, and increased the use of personal pronouns such as “you” and “we.” Additionally, the human revisions were free of factual errors and preserved the legal intent of the organizational documents.
Artificial intelligence’s performance varies greatly based on the instructions it receives. Given a specific instruction to write at B1 read level, ChatGPT performed very well. This version achieved readability scores very close to the work of human editors. The B1 prompt was successful in shortening complex passages and simplifying vocabulary without changing the original meaning.
Conversely, simple instructions to focus the text on the reader produced poor results. The software maintained complex sentence structures and relied heavily on unfamiliar words. Even more problematic is that this basic prompt allows the machine to fabricate false information.
For example, in a letter discussing employee maternity and sick pay, a simple prompt generated a sentence congratulating the employer on its upcoming team expansion. This represented a fundamental misunderstanding of the workplace situation. This congratulation is completely inappropriate for an HR document, since the baby is not joining the corporate team as a new employee.
Complex 8-step process prompts also resulted in poor performance compared to B1 prompts and human editors. Although the visual layout of the text was improved, multiple factual errors regarding the payment of certain medical benefits arose. Feeding the machine too many individual revision steps at once can cause the software to lose track of core messages.
This experiment has some limitations. This study was based on a very small number of business letters. Rewriting requirements vary widely depending on the type of document, such as press news releases or consumer instruction manuals. Experimental results for these short administrative messages may not reflect how the system processes longer, more complex reports.
The software also generated a response in a single trial. In a real workplace setting, users might adjust prompts, regenerate text multiple times, or manually edit the first draft on the machine. Rather than testing how well humans and algorithms work together, this study evaluated human and machine output separately.
Future research could explore these collaborative workflows. The study authors suggest that the role of professional writers is changing. Rather than creating documents completely from scratch, experts increasingly serve as curators and directors of automated drafts.
This advancement in technology requires a special skill known as prompt engineering. In prompt engineering, writers learn how to feed specific context cues to the machine. Evaluating synthetic prose requires exactly the same abilities used to evaluate human writing, such as rhetorical suitability and source verification. Effective writing may come to depend not only on traditional language proficiency but also on the ability to monitor and modify text production models.
The study, “Can ChatGPT do the same thing? ChatGPT compared to professional editors” was authored by Daniël Janssen, Henri Raven, Lisanne van Weelden, and Johannes den Hertog.

