Formating text into pragraphs - GPT can't fix its own error

Hello, I asked GPT4 and GPT4 Code Interpreter to write a PHP program to format a plain text (chapter of a book) into maximum 100 paragraphs of maximum 5000 characters, with separators.
After an hours and over 20 attempts the AI still can’t find the reason why separators appear in every sentence.

Any help is much appreciated.


// Read the content of the file
$content = file_get_contents("input.txt");

// Split the content into lines
$lines = explode("\n", trim($content));

$outputs = [];
$current_output = "";
$separator = "============================================";
$warning_separator = "====================TEXTE=TROP=LONG=======================";

foreach ($lines as $line) {
    // Check if the line is empty, signaling a new paragraph
    if (trim($line) == "") {
        $outputs[] = trim($current_output);
        $current_output = "";
        continue;
    }

    // If adding the line (and a potential separator if not the first line in current output) would exceed the limit
    if (strlen($current_output . $line . ($current_output != "" ? $separator : "")) > 4950) {
        $outputs[] = trim($current_output);
        $current_output = $line;
    } else {
        $current_output .= ($current_output != "" ? $separator : "") . $line;
    }

    // Limit the number of paragraphs to 100
    if (count($outputs) == 100) {
        break;
    }
}

// If there are still unprocessed lines after creating 100 output paragraphs, append the warning separator
if (count($outputs) == 100) {
    $outputs[99] = $warning_separator;
} elseif ($current_output != "") {
    $outputs[] = trim($current_output);
}

// Display the outputs
echo implode("\n\n", $outputs);

?>

type or paste code here

type or paste code here

What would it do if you asked it in two parts? First ask for 100 paragraphs, then feed it back its own result and ask it to put a separator between each paragraph.

I’m… going to go with “Thats what you told the script to do”?

You told the script to put a separator in whenever the current paragraph is less than 5000 letters and contains more than one line from the input. So i assume that’s what it’s doing.

Your script defines a paragraph ending as having an empty line between it and the next piece of text, or 5000 letters, whichever comes first.

If your input is

a
b
c
d

e

and your separator is #, you should get

a#b#c#d

e

as an output.

1 Like

It’s what GPT4 told the script to do. I only told GPT what I wanted.
I’ve used it a lot for business research but this experience for coding shows how limited is an AI that cannot think.
Finally by having the AI describe what each line of the script does I was able to point out the unwanted instruction that you pointed out. So now it works fine, it even preserves full sentences.
The only issue remaining is that it can’t code double line breaks in a way that I see them in the output, that’s why I used the long =======


// Read the content of the file
$content = file_get_contents("input.txt");

// Split the content into lines
$lines = explode("\n", trim($content));

$outputs = [];
$current_output = "";
$separator = "============================================";
$warning_separator = "====================TEXTE=TROP=LONG=======================";

foreach ($lines as $line) {
    // Check if the line is empty, signaling a new paragraph
    if (trim($line) == "") {
        $outputs[] = trim($current_output);
        $current_output = "";
        continue;
    }

    // If adding the line to the current output exceeds the character limit
    if (strlen($current_output . " " . $line) > 4000) {
        // Find the last end-of-sentence punctuation before the limit
        $last_period_pos = strrpos($current_output, '.');
        $last_question_pos = strrpos($current_output, '?');
        $last_exclamation_pos = strrpos($current_output, '!');
        
        $last_punctuation_pos = max($last_period_pos, $last_question_pos, $last_exclamation_pos);

        if ($last_punctuation_pos === false) {
            // If there's no suitable punctuation, just use the previous logic (this is a fallback)
            $last_punctuation_pos = strrpos($current_output, ' ');
        }
        
        $part_to_next_output = substr($current_output, $last_punctuation_pos + 1);
        $current_output = substr($current_output, 0, $last_punctuation_pos + 1);

        $outputs[] = trim($current_output);
        $current_output = $part_to_next_output . " " . $line;
    } else {
        $current_output .= ($current_output != "" ? " " : "") . $line;
    }

    // Limit the number of paragraphs to 100
    if (count($outputs) == 100) {
        break;
    }
}

// If there are still unprocessed lines after creating 100 output paragraphs, append the warning separator
if (count($outputs) == 100) {
    $outputs[99] = $warning_separator;
} elseif ($current_output != "") {
    $outputs[] = trim($current_output);
}

// Display the outputs
echo implode($separator, $outputs);

?>

Any idea how to fix this ?

A browser doesn’t recognize CRLF characters as printable; it’s “white space” in the code that gets ignored for rendering purposes. If you’re trying to output to the browser and have it render the break, you’ll need to specify the correct HTML Element for that purpose. (better would be to wrap the blocks in an actual paragraph tag.)

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.