Hello, I asked GPT4 and GPT4 Code Interpreter to write a PHP program to format a plain text (chapter of a book) into maximum 100 paragraphs of maximum 5000 characters, with separators.
After an hours and over 20 attempts the AI still can’t find the reason why separators appear in every sentence.
Any help is much appreciated.
// Read the content of the file
$content = file_get_contents("input.txt");
// Split the content into lines
$lines = explode("\n", trim($content));
$outputs = [];
$current_output = "";
$separator = "============================================";
$warning_separator = "====================TEXTE=TROP=LONG=======================";
foreach ($lines as $line) {
// Check if the line is empty, signaling a new paragraph
if (trim($line) == "") {
$outputs[] = trim($current_output);
$current_output = "";
continue;
}
// If adding the line (and a potential separator if not the first line in current output) would exceed the limit
if (strlen($current_output . $line . ($current_output != "" ? $separator : "")) > 4950) {
$outputs[] = trim($current_output);
$current_output = $line;
} else {
$current_output .= ($current_output != "" ? $separator : "") . $line;
}
// Limit the number of paragraphs to 100
if (count($outputs) == 100) {
break;
}
}
// If there are still unprocessed lines after creating 100 output paragraphs, append the warning separator
if (count($outputs) == 100) {
$outputs[99] = $warning_separator;
} elseif ($current_output != "") {
$outputs[] = trim($current_output);
}
// Display the outputs
echo implode("\n\n", $outputs);
?>
What would it do if you asked it in two parts? First ask for 100 paragraphs, then feed it back its own result and ask it to put a separator between each paragraph.
I’m… going to go with “Thats what you told the script to do”?
You told the script to put a separator in whenever the current paragraph is less than 5000 letters and contains more than one line from the input. So i assume that’s what it’s doing.
Your script defines a paragraph ending as having an empty line between it and the next piece of text, or 5000 letters, whichever comes first.
It’s what GPT4 told the script to do. I only told GPT what I wanted.
I’ve used it a lot for business research but this experience for coding shows how limited is an AI that cannot think.
Finally by having the AI describe what each line of the script does I was able to point out the unwanted instruction that you pointed out. So now it works fine, it even preserves full sentences.
The only issue remaining is that it can’t code double line breaks in a way that I see them in the output, that’s why I used the long =======
// Read the content of the file
$content = file_get_contents("input.txt");
// Split the content into lines
$lines = explode("\n", trim($content));
$outputs = [];
$current_output = "";
$separator = "============================================";
$warning_separator = "====================TEXTE=TROP=LONG=======================";
foreach ($lines as $line) {
// Check if the line is empty, signaling a new paragraph
if (trim($line) == "") {
$outputs[] = trim($current_output);
$current_output = "";
continue;
}
// If adding the line to the current output exceeds the character limit
if (strlen($current_output . " " . $line) > 4000) {
// Find the last end-of-sentence punctuation before the limit
$last_period_pos = strrpos($current_output, '.');
$last_question_pos = strrpos($current_output, '?');
$last_exclamation_pos = strrpos($current_output, '!');
$last_punctuation_pos = max($last_period_pos, $last_question_pos, $last_exclamation_pos);
if ($last_punctuation_pos === false) {
// If there's no suitable punctuation, just use the previous logic (this is a fallback)
$last_punctuation_pos = strrpos($current_output, ' ');
}
$part_to_next_output = substr($current_output, $last_punctuation_pos + 1);
$current_output = substr($current_output, 0, $last_punctuation_pos + 1);
$outputs[] = trim($current_output);
$current_output = $part_to_next_output . " " . $line;
} else {
$current_output .= ($current_output != "" ? " " : "") . $line;
}
// Limit the number of paragraphs to 100
if (count($outputs) == 100) {
break;
}
}
// If there are still unprocessed lines after creating 100 output paragraphs, append the warning separator
if (count($outputs) == 100) {
$outputs[99] = $warning_separator;
} elseif ($current_output != "") {
$outputs[] = trim($current_output);
}
// Display the outputs
echo implode($separator, $outputs);
?>
A browser doesn’t recognize CRLF characters as printable; it’s “white space” in the code that gets ignored for rendering purposes. If you’re trying to output to the browser and have it render the break, you’ll need to specify the correct HTML Element for that purpose. (better would be to wrap the blocks in an actual paragraph tag.)