Best way to see if string is comma delmited or tabbed delimited

Hi guys,

Part of my application is going to allow users to copy and paste data either as comma delimited or tabbed delimited.

str_getcsv should hopefully handle this nicely, but can anyone advise on the best way to pre-determine what delimiter I should use?

My initial thought was to explode the data using a delimiter char and then check the size of the array, but then if my string is tab delimited and I use explode with a comma, if part of my data contains a comma, I’m going to get false results?

I guess I could do an explode using both characters and then assume the biggest array is what is used as the delimiter?

Have I just answered my own question? (:

Crabby

Most spreadsheet applications assume the delimiter is a comma, and provide an interface for users to specify alternatives.

You could experiment with looking for escaped characters. If a character has been escaped, there’s a pretty good chance it’s a delimiter.
If you find a line that starts with a quote, you can jump to the next un-escaped quote and you should find a delimiter there.

Turn tabs to commas?

has to be comma or tab, they’ll be copying from Excel or any other spreadsheet application and csv files

yup, there’s more likely going to be commas in the data, rather than tabs

That approach will face trouble when commas are a part of the data. For example: $1,234.56

The MailChimp import interface does this very thing, it looks rather swish too.

Good call.

Thanks for your replies guys

AnthonySterling, yeah I had thought about that, I think that case would be rare…hmmmm

logic_earth, seems like a simple solution, if the original data was comma delimited it wouldn’t cause any problems? (unless a tab was used in the data)

Interesting, it probably still wouldn’t work either.

|column f,d,d,d,d|column foo|

ponders :confused:

Using javascript, convert all commas, which are part of the user defined data, into tabs and then submit it. At server you would only tabs as separator.

I’ll check it out

thanks

Could your code attempt a guess, and allow the user to specify an alternative. perhaps with a message such as:

“I think that this is comma delimited. Try it as tab delimited instead?”

This might be a silly question but if there is going to be variety in the separator characters, will any given input be assured to be well-formed? Also, where are the users copy-and-pasting from, and do you have control over that?