There are two reasons why this file might not be accessible, and they are not necessarily related to whether or (or not) MediaInfo can read Unicode. MediaInfo can otherwise read the Unicode (meta) content, as it was designed for.
The question mark is possibly an illegal character - not strictly Unicode.
What appears to be a blank space might also be an illegal character (though in some cases, even the presence of a space might be undesirable).
The simplest resolution is to rename the file. I typically avoid question marks, trailing blank spaces, or any characters that might seem out-of-place. It’s not uncommon for an application to potentially choke, and it’s not always obvious whether a character is actually supported.
Are you using zsh? Then, the question mark is a special character which needs to be escaped, or put into quotations marks: “Abc? .mkv”. This also helps with the spaces.
So, I am not talking about a solution for a single file, which can be resolved by simple rename.
I only have the GUI version of MediaInfo.
Whether the media file is opened through right-click in Dolphin, or “File > Open > Open file(s)” in MediaInfo, the result is the same.
This question mark was perfectly readable in previous installation.
And I have tonnes of media files which have different Unicode characters - all cannot be read by MediaInfo - but they were readable in previous installation.
1st, when the file is renamed to A.mkv, it is perfectly readable.
2nd, only when a Unicode character presents, whether in path (mnt/device/西游记/ABC.mkv) or file name (西游记.mkv), then MediaInfo could not read.
As the file was readable in previous installation, I suspect my re-installation might have missed some configuration, hence I’ve seeking advice from Gurus to tackle the issue.
Just for the sake of verification - I took a mkv from my system - copied the file to a new name - using the exact qouted chars.
I tested using the CLI
08:16:16 ○ [fh@tiger] ~
$ mediainfo 西游记.mkv
General
Unique ID : 44828678184375984701446168201819827875 (0x21B9B07A702EACEF5B939779C65FAEA3)
Complete name : 西游记.mkv
Format : Matroska
Format version : Version 4
File size : 3.25 GiB
Duration : 59 min 43 s
Overall bit rate : 7 788 kb/s
Frame rate : 23.976 FPS
Writing application : mkvmerge v80.0 ('Roundabout') 64-bit
Writing library : libebml v1.4.4 + libmatroska v1.7.1
Conformance errors : 1
0x8538067 : Yes
0xF43B675 : Yes
0xFFFFFFFF : Yes
General compliance : Element size 3359550 is more than maximal permitted size 976531 (offset 0x9C9CC2)
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4
Format settings : CABAC / 4 Ref Frames
Format settings, CABAC : Yes
Format settings, Reference frames : 4 frames
Codec ID : V_MPEG4/ISO/AVC
Duration : 59 min 43 s
Bit rate mode : Constant
Bit rate : 6 993 kb/s
Nominal bit rate : 10 000 kb/s
Width : 1 920 pixels
Height : 800 pixels
Display aspect ratio : 2.40:1
Frame rate mode : Constant
Frame rate : 23.976 (24000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.190
Stream size : 2.92 GiB (90%)
Default : Yes
Forced : No
Color range : Limited
Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709
Audio
ID : 2
Format : E-AC-3
Format/Info : Enhanced AC-3
Commercial name : Dolby Digital Plus
Codec ID : A_EAC3
Duration : 59 min 43 s
Bit rate mode : Constant
Bit rate : 640 kb/s
Channel(s) : 6 channels
Channel layout : L R C LFE Ls Rs
Sampling rate : 48.0 kHz
Frame rate : 31.250 FPS (1536 SPF)
Compression mode : Lossy
Stream size : 273 MiB (8%)
Language : English
Service kind : Complete Main
Default : Yes
Forced : No
Dialog Normalization : -31 dB
compr : -0.28 dB
dialnorm_Average : -31 dB
dialnorm_Minimum : -31 dB
dialnorm_Maximum : -31 dB
Text
ID : 3
Format : UTF-8
Codec ID : S_TEXT/UTF8
Codec ID/Info : UTF-8 Plain Text
Title : English (SDH)
Language : English
Default : No
Forced : No
I have never used mediainfo before. My test of your issue is the first time I ever installed this application.
I recognize you have an issue - but I cannot reproduce with the information given.
When I use the terminal to copy a file to the quoted characters they appear to be doublebyte characters - indicating utf-16 - which is windows 1252 or similar codepage - perhaps you failed to note which filesystem the files is stored within?
If the filesystem is ntfs - it may explain why you are having issues.
08:41:44 ○ [fh@tiger] ~
$ ls -l 西游记.mkv
-rw-r--r-- 1 fh fh 3488469436 18 jun 14:17 西游记.mkv
My system is a default Manjaro installation using utf-8 for encoding.
I have files stored in both NTFS and EXT4.
And files in both FS produced same issue.
Would you be kind enough to share your locale settings?
I have checked / compared my current vs previous installation’s locale, and they are identical.
I’m really clueless right now as to what causes the issue.
For MediaInfo to function properly in Linux, the locale $LANG will need to be configured correctly. If your Manjaro system is UTF-8 - as is a typical Manjaro default - $LANG must also be UTF-8; for example: en_US.UTF-8:
Mine is
[nix@nix~]$ cat /etc/locale.conf
LANG=en_AU.UTF-8
Likewise for other languages, they should be formatted as UTF-8.
Check /etc/locale.gen and make sure only utf-8 entries are uncommented; and not ISO-8859 or other variants.
That won’t work. The “?” is an illegal character as well. It is a globbing character that expands to only a single character — as opposed to “*”, which can expand to any number of characters.
You can include it in a file name, but then it too must be escaped, either with a backslash or by using quotes around the filename.
I don’t see any obvious inconsistencies with your system locale. There’s also nothing in recent updates that seems related. My guess would still be a possible issue with MediaInfo; beyond that .
It’s the version from the official repositories, right? Not installed/built from the AUR; or possibly a flatpak or other containerized version?
They are (I presume) the defaults provided by KDE, and not strictly related to the system locale. I doubt it has much impact, if any. Of course, I can’t easily test how this might affect MediaInfo in other locales.
The reason I highlight this question, is that if it’s an AUR version, it might need to be rebuilt (as AUR packages often need to be after an update).