Mediainfo unable to read files containing Unicode character in path / filename

Eg. of file:
ABC?.mkv

This above mkv file cannot be read by Mediainfo, but can be played by VLC.
If it is renamed to ABC.mkv, then the file can be read by Mediainfo.

I’m not sure whether my locale was properly configured after the OS reinstallation - had no issue in previous installation.

~/.config/user-dirs.locale

en_US

~/.config/plasma-localerc

[Formats]
LANG=en_US.UTF-8

[Translations]
LANGUAGE=en_US

/etc/locale.conf

LANG=en_SG.UTF-8
LANGUAGE=en_US.UTF-8

Any suggestion where to troubleshoot?

There are two reasons why this file might not be accessible, and they are not necessarily related to whether or (or not) MediaInfo can read Unicode. MediaInfo can otherwise read the Unicode (meta) content, as it was designed for.

  • The question mark is possibly an illegal character - not strictly Unicode.
  • What appears to be a blank space might also be an illegal character (though in some cases, even the presence of a space might be undesirable).

The simplest resolution is to rename the file. I typically avoid question marks, trailing blank spaces, or any characters that might seem out-of-place. It’s not uncommon for an application to potentially choke, and it’s not always obvious whether a character is actually supported.

1 Like

How did you tried to read this file with Mediainfo? Are there any errors if you use a terminal?

rename the file

mv ABC?\ .mkv ABC?.mkv

escape the the space

play ABC?\ .mkv

or use quotation marks

play 'ABC? .mkv'
2 Likes

Are you using zsh? Then, the question mark is a special character which needs to be escaped, or put into quotations marks: “Abc? .mkv”. This also helps with the spaces.

I think I should be clearer:
I have tonnes of media files with Unicode name, such as
西游记.mkv, etc.

And when I right-clicked these media files, “Open with MediaInfo”, this is what I would get:

<?xml version="1.0" encoding="UTF-8"?>

<MediaInfo
xmlns=“MediaInfo
xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance
xsi:schemaLocation=“MediaInfo https://mediaarea.net/mediainfo/mediainfo_2_0.xsd
version=“2.0”>
MediaInfoLib
<media />
</MediaInfo>

So, I am not talking about a solution for a single file, which can be resolved by simple rename.

I only have the GUI version of MediaInfo.
Whether the media file is opened through right-click in Dolphin, or “File > Open > Open file(s)” in MediaInfo, the result is the same.

This question mark was perfectly readable in previous installation.
And I have tonnes of media files which have different Unicode characters - all cannot be read by MediaInfo - but they were readable in previous installation.

Appreciated - thank you.

Manjaro does not revert upstream changes or fix upstream bugs.

When the application changed in the manner you describe - you are advised to create an issue with the developers.

First you should check if the issue is already reported

As you say:

Now that you have been, I have nothing constructive to add.

I dun think this is a bug.

1st, when the file is renamed to A.mkv, it is perfectly readable.
2nd, only when a Unicode character presents, whether in path (mnt/device/西游记/ABC.mkv) or file name (西游记.mkv), then MediaInfo could not read.

As the file was readable in previous installation, I suspect my re-installation might have missed some configuration, hence I’ve seeking advice from Gurus to tackle the issue.

It’s not unreasonable to consider that it might be, as the same condition has apparently arisen before, as highlighted in another project:

Just for the sake of verification - I took a mkv from my system - copied the file to a new name - using the exact qouted chars.

I tested using the CLI

08:16:16 ○ [fh@tiger] ~
 $ mediainfo 西游记.mkv
General
Unique ID                                : 44828678184375984701446168201819827875 (0x21B9B07A702EACEF5B939779C65FAEA3)
Complete name                            : 西游记.mkv
Format                                   : Matroska
Format version                           : Version 4
File size                                : 3.25 GiB
Duration                                 : 59 min 43 s
Overall bit rate                         : 7 788 kb/s
Frame rate                               : 23.976 FPS
Writing application                      : mkvmerge v80.0 ('Roundabout') 64-bit
Writing library                          : libebml v1.4.4 + libmatroska v1.7.1
Conformance errors                       : 1
 0x8538067                               : Yes
  0xF43B675                              : Yes
   0xFFFFFFFF                            : Yes
    General compliance                   : Element size 3359550 is more than maximal permitted size 976531 (offset 0x9C9CC2)

Video
ID                                       : 1
Format                                   : AVC
Format/Info                              : Advanced Video Codec
Format profile                           : High@L4
Format settings                          : CABAC / 4 Ref Frames
Format settings, CABAC                   : Yes
Format settings, Reference frames        : 4 frames
Codec ID                                 : V_MPEG4/ISO/AVC
Duration                                 : 59 min 43 s
Bit rate mode                            : Constant
Bit rate                                 : 6 993 kb/s
Nominal bit rate                         : 10 000 kb/s
Width                                    : 1 920 pixels
Height                                   : 800 pixels
Display aspect ratio                     : 2.40:1
Frame rate mode                          : Constant
Frame rate                               : 23.976 (24000/1001) FPS
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Bit depth                                : 8 bits
Scan type                                : Progressive
Bits/(Pixel*Frame)                       : 0.190
Stream size                              : 2.92 GiB (90%)
Default                                  : Yes
Forced                                   : No
Color range                              : Limited
Color primaries                          : BT.709
Transfer characteristics                 : BT.709
Matrix coefficients                      : BT.709

Audio
ID                                       : 2
Format                                   : E-AC-3
Format/Info                              : Enhanced AC-3
Commercial name                          : Dolby Digital Plus
Codec ID                                 : A_EAC3
Duration                                 : 59 min 43 s
Bit rate mode                            : Constant
Bit rate                                 : 640 kb/s
Channel(s)                               : 6 channels
Channel layout                           : L R C LFE Ls Rs
Sampling rate                            : 48.0 kHz
Frame rate                               : 31.250 FPS (1536 SPF)
Compression mode                         : Lossy
Stream size                              : 273 MiB (8%)
Language                                 : English
Service kind                             : Complete Main
Default                                  : Yes
Forced                                   : No
Dialog Normalization                     : -31 dB
compr                                    : -0.28 dB
dialnorm_Average                         : -31 dB
dialnorm_Minimum                         : -31 dB
dialnorm_Maximum                         : -31 dB

Text
ID                                       : 3
Format                                   : UTF-8
Codec ID                                 : S_TEXT/UTF8
Codec ID/Info                            : UTF-8 Plain Text
Title                                    : English (SDH)
Language                                 : English
Default                                  : No
Forced                                   : No

The same file can be opened using the GUI

screenshot

I have never used mediainfo before. My test of your issue is the first time I ever installed this application.

I recognize you have an issue - but I cannot reproduce with the information given.

:man_shrugging:

When I use the terminal to copy a file to the quoted characters they appear to be doublebyte characters - indicating utf-16 - which is windows 1252 or similar codepage - perhaps you failed to note which filesystem the files is stored within?

If the filesystem is ntfs - it may explain why you are having issues.

08:41:44 ○ [fh@tiger] ~
 $ ls -l 西游记.mkv
-rw-r--r-- 1 fh fh 3488469436 18 jun 14:17 西游记.mkv

image

My system is a default Manjaro installation using utf-8 for encoding.

Are your files on a Windows filesystem?

I have files stored in both NTFS and EXT4.
And files in both FS produced same issue.

Would you be kind enough to share your locale settings?
I have checked / compared my current vs previous installation’s locale, and they are identical.
I’m really clueless right now as to what causes the issue.

Files in both NTFS and EXT4, produced same issue.

For MediaInfo to function properly in Linux, the locale $LANG will need to be configured correctly. If your Manjaro system is UTF-8 - as is a typical Manjaro default - $LANG must also be UTF-8; for example: en_US.UTF-8:

Mine is

[nix@nix~]$ cat /etc/locale.conf
LANG=en_AU.UTF-8

Likewise for other languages, they should be formatted as UTF-8.

Check /etc/locale.gen and make sure only utf-8 entries are uncommented; and not ISO-8859 or other variants.

cat /etc/locale.gen
sudo nano /etc/locale.gen

Reboot after editing.

Sure - not sure you can deduce anything from it

 $ cat .config/plasma-localerc 
[Formats]
LANG=en_DK.UTF-8
 $ localectl
System Locale: LANG=en_DK.UTF-8
               LC_NUMERIC=da_DK.UTF-8
               LC_TIME=da_DK.UTF-8
               LC_MONETARY=da_DK.UTF-8
               LC_PAPER=da_DK.UTF-8
               LC_NAME=da_DK.UTF-8
               LC_ADDRESS=da_DK.UTF-8
               LC_TELEPHONE=da_DK.UTF-8
               LC_MEASUREMENT=da_DK.UTF-8
               LC_IDENTIFICATION=da_DK.UTF-8
    VC Keymap: dk-latin1
   X11 Layout: dk

That won’t work. The “?” is an illegal character as well. It is a globbing character that expands to only a single character — as opposed to “*”, which can expand to any number of characters.

You can include it in a file name, but then it too must be escaped, either with a backslash or by using quotes around the filename.

1 Like

My current locale:

cat /etc/locale.conf
LANG=en_US.UTF-8
LC_CTYPE=en_US.UTF-8
cat .config/plasma-localerc
[Formats]
LANG=en_US.UTF-8
LC_MEASUREMENT=en_001.UTF-8
LC_PAPER=en_001.UTF-8
LC_TELEPHONE=en_SG.UTF-8
LC_ALL=C
localectl
System Locale: LANG=en_US.UTF-8
    VC Keymap: us
   X11 Layout: us

I’m still having no luck to tackle the issue.

I have checked the locale settings in ~\.config that I have saved before re-installation, and I din discover anything different from current settings.

Could this be a KDE Framework issue?

Also noted was that: after changing the locale, my Mcomix can finally open comics in unicode filename, ie. 「戀人」.zip type of filename.

That’s something you would need to ask KDE.

I don’t see any obvious inconsistencies with your system locale. There’s also nothing in recent updates that seems related. My guess would still be a possible issue with MediaInfo; beyond that :man_shrugging: .

It’s the version from the official repositories, right? Not installed/built from the AUR; or possibly a flatpak or other containerized version?

I don’t think those are correct (actual existing) locales - perhaps this throws something off.

1 Like

They are (I presume) the defaults provided by KDE, and not strictly related to the system locale. I doubt it has much impact, if any. Of course, I can’t easily test how this might affect MediaInfo in other locales.

The reason I highlight this question, is that if it’s an AUR version, it might need to be rebuilt (as AUR packages often need to be after an update).