Fix Messy Text Copied from PDFs
Copying out of a PDF gives you a line break after every visual line, words chopped in half by hyphens, and spaces drifting around punctuation. Paste that text below and get back flowing paragraphs — with your lists, headings, and wording untouched.
🔒 Your text is processed locally in your browser and never uploaded. We do not upload or store your text.
- 1 Paste your copied text
- 2 Pick which fixes to apply
- 3 Clean
- 4 Copy the result
What goes in, what comes out
Pasted from a PDF
The results of our experi- ment show that the compu- tational method works well across all test cases. Key points : - Accuracy improved by 12 % - Latency stayed flat
After cleaning
The results of our experiment show that the computational method works well across all test cases. Key points: - Accuracy improved by 12% - Latency stayed flat
Why copied PDF text breaks, and what this page does about it
A PDF stores text as positioned lines on a page, not as paragraphs. When you copy it, every visual line ends with a hard line break, and any word the typesetter split at a line end keeps its hyphen. OCR output from scanned papers and reports has the same shape, plus uneven spacing. Search intents like “remove line breaks from copied PDF text”, “fix broken PDF text”, or “remove hyphenation” all describe this one underlying problem.
This page fixes it with a fixed set of rules, each one inspectable in the options above: hard-wrapped lines inside a paragraph are merged back together; a word fragment ending in a hyphen at a line end is rejoined with the lowercase fragment that follows it; runs of spaces collapse to one; spaces before commas and periods are removed and missing spaces after sentence punctuation are added; three or more blank lines shrink to one blank line. That is the complete list — nothing else is changed.
Merging is deliberately cautious. A line that looks like a bullet point, a numbered item, an ALL-CAPS heading, indented code, or a table row is kept on its own line. If your text is mostly lists or tables, switch on conservative mode and the tool will only normalize line endings, strip trailing spaces, fix hyphen-split words, and trim extra blank lines.
Common questions
Does my text get uploaded anywhere?
No. We do not upload or store your text. The cleaning rules ship with this page as JavaScript and run on your device; there is no server that receives your text, and nothing is kept after you close the tab.
How are line breaks removed without destroying my paragraphs?
A single line break is treated as a hard wrap and merged away; a blank line (two or more breaks) is treated as a real paragraph boundary and kept. Untick “Keep blank-line paragraph breaks” if you want everything merged into one block.
Can it change the meaning of my text?
The rules only touch whitespace, line breaks, and hyphens at line ends. Words are never added, removed, or substituted, so the wording you paste is the wording you get back.
When should I use conservative mode?
Use it for resumes, code snippets, contracts, poetry — anything where line positions carry meaning. It limits the tool to the four safest fixes and never merges lines.
Does it handle OCR text from scans?
Yes — OCR output has the same broken-line, split-word structure as copied PDF text, so the same repairs apply. What it won't do is guess at character-recognition mistakes (like “rn” misread as “m”), because guessing risks corrupting correct text.
What happens to my bullet points and numbered lists?
Lines starting with -, •, *, numbering like 1. or (a), or roman numerals are recognized as list items and stay on their own lines even while the paragraphs around them merge.