Split csv file into different files on a specified character

#1

I have the output of a GPS route tracker in the format below. Due to problems with the microprocessor code I can not currently use individual files for each track as I would like.
I can not add a special character at the end of each track as I just turn the device off - it does not have many controls. I can add an extra character/row at the start if required and change the first line date time format.
I want to split the large file into each track with the date as the file name.
My idea was to count the data in the file; find a special character e.g. / and split the array on the line above that one but my tests have not been very successfull with the strpos() I was using. Is there an elegant way to do this?

24/5/2019 10:30
0.000000,0.000000,0.0000,0.00
0.000000,0.000000,0.0000,0.00
0.000000,0.000000,0.0000,0.00
53.474474,-2.252000,1.7032,44.60
53.474515,-2.252063,2.3706,40.20
53.474472,-2.252019,1.7377,31.10
53.474482,-2.251990,0.6559,27.40
53.474500,-2.251984,0.4833,27.80
53.474410,-2.251622,5.9265,46.60
53.474591,-2.251822,1.9908,54.20
53.473677,-2.250906,2.8539,9.60
53.473886,-2.251271,1.2774,26.60
53.471829,-2.248942,2.5893,-23.50
53.471148,-2.248520,4.7067,-55.20
53.470708,-2.248306,15.9613,-65.50
53.473323,-2.251162,2.1750,-7.40
53.474106,-2.251974,3.7055,5.40
24/5/2019 16:30
0.000000,0.000000,0.0000,0.00
0.000000,0.000000,0.0000,0.00
0.000000,0.000000,0.0000,0.00
53.474474,-2.252000,1.7032,44.60
53.474515,-2.252063,2.3706,40.20
53.474472,-2.252019,1.7377,31.10
53.474482,-2.251990,0.6559,27.40
53.474500,-2.251984,0.4833,27.80
53.474410,-2.251622,5.9265,46.60
53.474591,-2.251822,1.9908,54.20
53.473677,-2.250906,2.8539,9.60
53.473886,-2.251271,1.2774,26.60
53.471829,-2.248942,2.5893,-23.50
53.471148,-2.248520,4.7067,-55.20
53.470708,-2.248306,15.9613,-65.50
53.473323,-2.251162,2.1750,-7.40
53.474106,-2.251974,3.7055,5.40

<?php
if (($handle = fopen("GPSdata.txt", 'r')) !== FALSE) { 	
	$lines = array();
	while (($data = fgetcsv($handle, 100, ",")) !== FALSE) { 
	echo $data[0].'<br>';
	if( $data[0] != "0.000000" )// Ignore "empty" lines
	{
	$gps[] = $data;
	}
}	
    fclose($handle);
}

echo '<pre>';
print_r($gps);
echo '</pre>';

$data1 = "";
foreach ($gps as $key => $val)
{
    $row = array(); // reset to empty
    foreach ($val as $key2 => $val2)
    {
        $row[] = $val2;
    }
    $formatted_row_data = join(',', $row);
    $data1 .= $formatted_row_data . "\n";
}
echo '<pre>';
print_r($data1);
echo '</pre>';
?>

Out of interest a couple of photos of the device:


#2

How large is the file you’re processing? I would explode this file - as it is no real CSV - by the line ending, then foreach over every element, have a look if there’s a slash within the element - as this seems to be a major difference - and then use that elements value to define a file name that you can write any further row to.

lines = file(name)
foreach(lines as line)
 if(strpos(line, "/") !== false): // define file name and fopen the file with creation
 else: write line into file
#3

Thank you for the input @chorn; I had not thought of doing it that way.

Out of interest the file could be quite large if I am away for a few days. The day in Manchester had 1220 lines and I was thinking about decreasing the save location time to improve the accuracy so it would probably double. I am also contemplating adding the hours and minutes to each line as well.

Hopefully if I can get the microprocessor ( Arduino code ) to display the files on the SD card I might be able to save the individual files at that point.

#4

Is you PHP code running on an arduino? 1k lines is really nothing for every wokrstation computer of the last 10 years.

#5

No the Microprocessor is using the Arduino code but my major problem is the SD card and library. It is very frustrating as while programming the SD card usually will not start. The other problem is the SD library will not find/display the SD card contents. If I could get a reliable SD card start and SD content display I could create the individual files there and upload them separately.

#6

This looks like it does what I want; the only thing I am missing is a fclose() at the end of the file just created but it seems to work OK. I changed the format of the date/time to 24_5_2019_16_30
I am unsure about : in if else so stuck with { }

if (($handle = fopen("GPSdata.txt", 'r')) !== FALSE) { 	
	$lines = array();
	while (($data = fgetcsv($handle, 100, ",")) !== FALSE) {
	foreach($data as $line)		
		if(strpos($line, "_") !== false){ // define file name and fopen the file with creation
		$file = fopen($data[0].".csv","w");
	}
	else {
	$content = $data[0].','.$data[1].','.$data[2].','.$data[2]."\n";
	if( $data[0] != "0.000000" )// Ignore "empty" lines
	{
		fwrite($file,$content);}
	}
}	
}
    fclose($handle);
#7

Here’s how I would do it, but whatever works for you:
(Note: spitball code. Not tested, probably needs refinement.)

$data = preg_split("~\d+/\d+/\d+ \d+:\d+\r?\n~",file_get_contents("GPSdata.txt"), -1,PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
for($i = 0;  $i < len($data); $i += 2) {
   file_put_contents(preg_replace_all("~[\r\n\s/:]~","",data[$i]).".csv",data[$i+1]);
}

EDIT: No, Marc, that parameter is not optional if you want to use flags.

#8

Not 100% sure what you are doing @m_hutley but I have a couple of errors: Call to undefined function len()
With Notepad++ both len( ) and preg_replace_all( ) are not highlighted as functions

#9

What i’m doing in the first case is confusing my Python and PHP. Because i’m a derp.

len() = count().

In the second case, i’m mixing up preg_match and preg_replace. preg_match_all is a function, preg_replace_all isnt.

preg_replace_all() = preg_replace()

#10

These things happen - another Python slip here: file_put_contents(preg_replace_all("~[\r\n\s/:]~","",data[$i]).".csv",data[$i+1]); no dollar in front of the variables :slight_smile:

I have not used php for a while and kept forgetting simple things.

There is also a " Notice : Undefined offset: 1 in" on the line I posted above so assume there must be a problem with the preg_split. Posssibly due to me changing the / for _ :roll_eyes: I will check it out

#11

I mean… what the code SHOULD do is:

1: Load the contents of the CSV.
2: Split those contents on any date line, while capturing the delimiters.
3: Put the non-date contents into files named with the date in the form DDMMYYYYHHmm.csv

The result of #2 should be an array that looks like…

Array( [0] => "24/5/2019 10:30",
[1] => "0.000000,0.000000,0.0000,0.00
0.000000,0.000000,0.0000,0.00
...
53.474106,-2.251974,3.7055,5.40",
[2] => "24/5/2019 16:30",
[3] => "0.000000,0.000000,0.0000,0.00
0.000000,0.000000,0.0000,0.00
....

So then a step-2 for-loop can walk across it, and pick up the even numbers as the filename, and the odd numbers as the contents for those files.

#12

I was being daft; I did not realise it was the complete code!

You may not have noticed I said above I changed “24/5/2019 10:30” to “24_5_2019_10_30” as it was causing a problem with the file naming so your Regex is now incorrect.

I have changed the Regex but it may be incorrect and I now have the below but it is still returning the Undefined offset error. It looks like all the data is still in one piece.

$data = preg_split("~d+_d+_d+_d+_d+\n~",file_get_contents("GPSdata.txt"), PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
for($i = 0;  $i < count($data); $i += 2) {
	echo count($data).'<br>'; // prints 1
   file_put_contents(preg_replace("~[\n\s/:]~","",$data[$i]).".csv",$data[$i+1]);
}
#13

the \'s in front of the d’s in the regex are required. \d means “any digit”.

"~\d+_\d+_\d+_\d+_\d+\r?\n~"

So “Any number (+) of digits (\d) followed by a literal _ followed by Any number (+) of digits (\d) followed by a literal _ followed by Any number (+) of digits (\d) followed by a literal _ followed by Any number (+) of digits (\d) followed by a literal _ followed by Any number (+) of digits (\d) followed by a literal _ followed by either 1 or 0 (?) Return characters (\r)followed by a Newline character (\n).”

EDIT: Sigh, eventually i’ll wake up and remember my regex structures properly. + is not “any number” but “1 or more”

#14

Update:
This creates an empty file called .csv

file_put_contents(preg_replace("~[\n\s/:]~","",$data[$i]).".csv",$data[$i+1]);

This creates an empty file called test.csv
file_put_contents("test.csv", $data[$i+1]);

So I think the problem is still in the preg_split code as I am still getting a “Undefined offset: 3 in” it seems as though there is only on piece of data still?

If you want to carry on @m_hutley we can but the first code that @chorn suggested is working and I am happy to use that.

#15

I might explode the records by the comma and then if the array does not have four items then I would assume the record is a time record. I would audit the data and time for slashes and the colon but only for the records that don’t explode into four items.

Another possibility is to check the length (string size) of the record. If it is less that a chosen size, such as 20, then it can be assumed to be a time. You know the data and if that will work.

I would use the format yyyymmddhhmm (or only a two-digit year) because that helps with sorting and such. I don’t know how to get leading zeros in PHP but that is a detail.

#16

I did try that at the start ( before posting here ) and it did not work for some reason. I am very rusty with php and was probably doing something wrong.

I had a play with the code from @m_hutley again last night but as far as I can see I only had one result for data which was everything.

The code I posted in #6 suggested by @chorn works and I will stick with that. I was just interested in seeing the other code working.

I would use the format yyyymmddhhmm (or only a two-digit year) because that helps with sorting and such.

A good point.

#17

I forgot that DELIM_CAPTURE relies upon a parenthesized pattern in the regex.
Since you were curious, I put in the time to actually turn the spitball code into working code:


<?php
$data = preg_split("~(\d+_\d+_\d+_\d+_\d+\r?\n)~m",file_get_contents("GPSdata.txt"), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
for($i = 0;  $i < count($data); $i += 2) {
   file_put_contents(preg_replace("~[\r\n]~","",$data[$i]).".csv",$data[$i+1]);
}
?>

You can manipulate $data[$i] inside the loop to generate the filename to your liking; this was just a quick form to get a OS-legal filename.

EDIT: cleanup debug echos/print_r’s.

#18

Thank you @m_hutley

It works like a charm now

#19

Yaknow, if i was a smart coder, i’d have moved that close-parenthesis and obliviated the need for the preg_replace entirely.

Sometime’s i’m not a smart coder.

<?php
$data = preg_split("~(\d+_\d+_\d+_\d+_\d+)\r?\n~",file_get_contents("GPSdata.txt"), -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
for($i = 0;  $i < count($data); $i += 2) {
   file_put_contents($data[$i].".csv",$data[$i+1]);
}
?>
1 Like
#20

It looks a lot neater than the code I have and it is hard to believe it does the same thing. I know it does not remove the lines of data made up of zeros but I was removing them when generating the formatted code for the Google API when testing the API.