Observations to improve Query CSV export functionality in different languages and standard date formats

Hi,

I work a lot with French projects using SMART and have some observations regarding common problems when exporting queries on a French language system (French Windows/SMART/and Excel). I expect these experiences are probably shared across a few other languages and would be really interested to hear if anyone else experiences these issues, whether there is something I am missing, or whether these issues could be looked at in a future update.

  • The first issue is that the default CSV export settings struggle with the encoding of French Special Characters. Whilst the query loads fine in SMART - any export to CSV ends up with a mismatch for a special / accented character with the default settings.

  • I am aware that this an encoding option is available with the “Export” button at the top after a query has been run. However there is also a context menu action on the query pane on the left for “Run and Export Results” which does not offer encoding and I suspect that some users may be confused by this.

  • One workaround to this which is to open the exported CSV up in Notepad and resave it as ANSI (I believe the default is UTF-8). The sheet will then open in excel and all the characters will display correctly.

  • The other workaround is to use the run the query and use the Export button functionality, selecting the Windows-1252 character set under the encoding options.

  • This problem is not hard to overcome, but the correct way to do it is not well signposted. I imagine a few tweaks or a section in the Training Guide regarding encoding might help out quite a few users struggling with this. The gold standard would be having the dropdown default encoding option automatically correspond to the windows settings so that most users do not even need to be aware of this consideration.

The second problem is that of dates not being read correctly - which then makes it burdensome to clean exported CSV data for use in a GIS program or to do any temporal analysis inside Excel.

  • Regardless of language settings, all dates are outputted in the format “D MMM YYYY” (6 Dec 2023), which is nice and easy to read in the query interface but poses some fairly significant compatibility problems as it is not a commonly accepted standard for most applications, and introduces a huge amount of variability when working across different languages.

  • Fortunately, in English, Excel automatically parses this non-standard format into “DD-MMM-YYYY” (06-Dec-2023"), which is then read effectively as a date, and converted to a serial date - e.g. 45266 (number of days after 01/01/1900). However in other languages it does not do this, leaving users with some big challenges to solve and always posing obstacles for a streamlined workflow. This most likely looks like a combination of DATE, LEFT, MID, and SEARCH, alongside 12 nested SUBSTITUTE formulas. Unless you have already fixed the encoding your export, you will have some fairly strange looking text months to replace with numbers. For French users, with the encoding fixed, and changing O2 to their required date target, an effective cleaning formula looks like:

    =DATE(DROITE(O2, 4), SI(ESTERREUR(TROUVE("janv", O2)), SI(ESTERREUR(TROUVE("févr", O2)), SI(ESTERREUR(TROUVE("mars", O2)), SI(ESTERREUR(TROUVE("avr.", O2)), SI(ESTERREUR(TROUVE("mai", O2)), SI(ESTERREUR(TROUVE("juin", O2)), SI(ESTERREUR(TROUVE("juil", O2)), SI(ESTERREUR(TROUVE("août", O2)), SI(ESTERREUR(TROUVE("sept", O2)), SI(ESTERREUR(TROUVE("oct.", O2)), SI(ESTERREUR(TROUVE("nov.", O2)), SI(ESTERREUR(TROUVE("déc.", O2)), 0, 12), 11), 10), 9), 8), 7), 6), 5), 4), 3), 2), 1), GAUCHE(O2, CHERCHE(" ", O2) - 1))
    
  • Offering other date formats as an dropdown in the export options would improve compatibility with a wide range of third party applications and give the user the option to output data in internationally recognised standards. I would particularly recommend offering ISO-8601: “YYYY-MM-DD” / “2023-12-06” - which is widely recognised as the main international standard for representing date in data applications. Having an options for DD/MM/YYYY, DD-MM-YYYY, MM/DD/YYYY, MM-DD-YYYY, or would also be very beneficial for a lot of users. The latter two are quite specific to the USA but are likely what will be expected by a reasonable percentage of SMART’s user base.

  • Some users may also be unaware that excel stores numeric dates and time in a single cell as the number of days after 01/01/1900 as the number before the decimal point, and the fraction of the day after the decimal point. For example 2023-12-06 12:00:00 is actually interpreted by excel as “45266.5”. A formula of [Your Date Cell]+[Your Time Cell] will give you a date and time, provided they are both being interpreted numerically. A custom cell format of “yyyy-mm-dd hh:mm:ss” or “dd/mm/yyyy hh:mm:ss” will read as a date and time and can be interpreted by virtually all analysis applications.

  • Being able to refer to both the date and the time in a single value opens up a lot of cool use-cases, like smoothly time lapsing high fidelity collar data or doing more advanced behaviour analysis without a user needing to write formulas to concatenate the date and time columns into a single Date.Time value. For this, being able to export in both “YYYY-MM-DD HH:MM:SS” (international standard), “DD/MM/YYYY HH:MM:SS” (widely accepted and can also be parsed by excel as a Date.Time) and in UTC “YYYY-MM-DDTHH:MM:SSZ” would be very beneficial.

I would be interested to hear people’s thoughts on this. I’d also love to see a few changes considered by the SMART team so that users across all languages directly load their CSV export data into their analysis software of choice without the needing to establish a bespoke process for data cleaning.

2 Likes

Hi Ruari,

Thank you for bringing these issues to our attention. We’ll circulate this internally for review and hope to provide some updates in the near future.

Cheers,

Alex

From what I can figure out, Excel just doesn’t have a nice, easy way to open a UTF-8 file properly no matter how it’s created or formatted. The best I can find is a step-by-step way of opening it so that it does work. This may be no better/faster than your current work-around of saving it in a different tool first, but in case it’s useful here it is:

  1. Save the exported file as a csv
  2. Open Excel
  3. Import the data using Data–>Import External Data → Import Data
  4. Select the file type of “csv” and browse to your file
  5. In the import wizard change the File_Origin to “65001 UTF” (or choose correct language character identifier)
  6. Change the Delimiter to comma
  7. Select where to import to and Finish

Ideally we will be able to add additional encoding and date format output options in more places in SMART so you can always export files exactly the way you want; I think that would resolve many if not all of the above concerns, so hopefully we can get that task added into the SMART 8 work plan and budget. I will create a request for that and put it into ticket system we use, and the powers-that-be will decide how it fits in with all their other priorites etc.

Thanks for the detailed feedback!

1 Like

Hi Ruari,

I work with SMART in Spanish, and I have also experienced the issue with special characters exporting the .csv with USF-8.

Your workaround of using Windows 1252 worked but I don’t understand why, since from what I know UTF-8 is like a more widely standard that can represent a broad range of characters, including those used in the Spanish.

Additionally, I would like to know if there is some technical information regarding query export and requirements of Excel or Windows system formats because another issue I face is with coordinates.

For example, I have

X: 957968,692 Y: 9922615,617 (WGS 84 UTM 17 SUR)

and when exporting to .csv they appear as

X: 9.579.686.919.899 Y: 9.922.615.617.102.660

Any suggestion about this?

Best
Jackie