Hi,
I work a lot with French projects using SMART and have some observations regarding common problems when exporting queries on a French language system (French Windows/SMART/and Excel). I expect these experiences are probably shared across a few other languages and would be really interested to hear if anyone else experiences these issues, whether there is something I am missing, or whether these issues could be looked at in a future update.
-
The first issue is that the default CSV export settings struggle with the encoding of French Special Characters. Whilst the query loads fine in SMART - any export to CSV ends up with a mismatch for a special / accented character with the default settings.
-
I am aware that this an encoding option is available with the “Export” button at the top after a query has been run. However there is also a context menu action on the query pane on the left for “Run and Export Results” which does not offer encoding and I suspect that some users may be confused by this.
-
One workaround to this which is to open the exported CSV up in Notepad and resave it as ANSI (I believe the default is UTF-8). The sheet will then open in excel and all the characters will display correctly.
-
The other workaround is to use the run the query and use the Export button functionality, selecting the Windows-1252 character set under the encoding options.
-
This problem is not hard to overcome, but the correct way to do it is not well signposted. I imagine a few tweaks or a section in the Training Guide regarding encoding might help out quite a few users struggling with this. The gold standard would be having the dropdown default encoding option automatically correspond to the windows settings so that most users do not even need to be aware of this consideration.
The second problem is that of dates not being read correctly - which then makes it burdensome to clean exported CSV data for use in a GIS program or to do any temporal analysis inside Excel.
-
Regardless of language settings, all dates are outputted in the format “D MMM YYYY” (6 Dec 2023), which is nice and easy to read in the query interface but poses some fairly significant compatibility problems as it is not a commonly accepted standard for most applications, and introduces a huge amount of variability when working across different languages.
-
Fortunately, in English, Excel automatically parses this non-standard format into “DD-MMM-YYYY” (06-Dec-2023"), which is then read effectively as a date, and converted to a serial date - e.g. 45266 (number of days after 01/01/1900). However in other languages it does not do this, leaving users with some big challenges to solve and always posing obstacles for a streamlined workflow. This most likely looks like a combination of DATE, LEFT, MID, and SEARCH, alongside 12 nested SUBSTITUTE formulas. Unless you have already fixed the encoding your export, you will have some fairly strange looking text months to replace with numbers. For French users, with the encoding fixed, and changing O2 to their required date target, an effective cleaning formula looks like:
=DATE(DROITE(O2, 4), SI(ESTERREUR(TROUVE("janv", O2)), SI(ESTERREUR(TROUVE("févr", O2)), SI(ESTERREUR(TROUVE("mars", O2)), SI(ESTERREUR(TROUVE("avr.", O2)), SI(ESTERREUR(TROUVE("mai", O2)), SI(ESTERREUR(TROUVE("juin", O2)), SI(ESTERREUR(TROUVE("juil", O2)), SI(ESTERREUR(TROUVE("août", O2)), SI(ESTERREUR(TROUVE("sept", O2)), SI(ESTERREUR(TROUVE("oct.", O2)), SI(ESTERREUR(TROUVE("nov.", O2)), SI(ESTERREUR(TROUVE("déc.", O2)), 0, 12), 11), 10), 9), 8), 7), 6), 5), 4), 3), 2), 1), GAUCHE(O2, CHERCHE(" ", O2) - 1))
-
Offering other date formats as an dropdown in the export options would improve compatibility with a wide range of third party applications and give the user the option to output data in internationally recognised standards. I would particularly recommend offering ISO-8601: “YYYY-MM-DD” / “2023-12-06” - which is widely recognised as the main international standard for representing date in data applications. Having an options for DD/MM/YYYY, DD-MM-YYYY, MM/DD/YYYY, MM-DD-YYYY, or would also be very beneficial for a lot of users. The latter two are quite specific to the USA but are likely what will be expected by a reasonable percentage of SMART’s user base.
-
Some users may also be unaware that excel stores numeric dates and time in a single cell as the number of days after 01/01/1900 as the number before the decimal point, and the fraction of the day after the decimal point. For example 2023-12-06 12:00:00 is actually interpreted by excel as “45266.5”. A formula of [Your Date Cell]+[Your Time Cell] will give you a date and time, provided they are both being interpreted numerically. A custom cell format of “yyyy-mm-dd hh:mm:ss” or “dd/mm/yyyy hh:mm:ss” will read as a date and time and can be interpreted by virtually all analysis applications.
-
Being able to refer to both the date and the time in a single value opens up a lot of cool use-cases, like smoothly time lapsing high fidelity collar data or doing more advanced behaviour analysis without a user needing to write formulas to concatenate the date and time columns into a single Date.Time value. For this, being able to export in both “YYYY-MM-DD HH:MM:SS” (international standard), “DD/MM/YYYY HH:MM:SS” (widely accepted and can also be parsed by excel as a Date.Time) and in UTC “YYYY-MM-DDTHH:MM:SSZ” would be very beneficial.
I would be interested to hear people’s thoughts on this. I’d also love to see a few changes considered by the SMART team so that users across all languages directly load their CSV export data into their analysis software of choice without the needing to establish a bespoke process for data cleaning.