Best approach to extract info in Google Sheets?

anon39617291 · 20 January 2025 00:06

Hi all.
I’m cracking my head, trying to extract info using Google Sheets.

The situation:
In the cell, the data is
<media ref="/ABC/XYZ123/QWE123/ASD/AAA BBB.123.CCC-DDD.mkv">

ABC, XYZ, QWE, ASD, AAA, BBB, CCC, DDD are random words.
123 are random numbers as well.

I want to retrieve AAA BBB.123.CCC-DDD.mkv in another cell.

I tried to use regextract, but other than the / and .mkv, I cannot find a constant to leverage on.
Any suggestion?

cscs · 20 January 2025 00:22

I’m not sure how you mean … as in are we working with the sheet itself or just the data?

$ printf '<media ref="/ABC/XYZ123/QWE123/ASD/AAA BBB.123.CCC-DDD.mkv">' | sed -e 's:.*/::' -e 's/..$//'
AAA BBB.123.CCC-DDD.mkv

anon39617291 · 20 January 2025 00:29

Hmm…
Say, cell A1 in Google Sheet, the data is:
<media ref="/ABC/XYZ123/QWE123/ASD/AAA BBB.123.CCC-DDD.mkv">

Now, I want cell A2 in Google Sheet, to “extract” the content from A1.
The needed content is:
AAA BBB.123.CCC-DDD.mkv

The problem is:
I tried to use regextract formula, but doesn’t seem like a fit here.
So, I’m hoping someone would have an idea how to do it.

linux-aarhus · 20 January 2025 11:11

search for the split() function in the documentation

Takakage · 20 January 2025 14:54

=regexextract(A1,"[^/]+ \S+\.mkv")

anon39617291 · 20 January 2025 15:31

After testing it,

This works for:
<media ref="/ABC/XYZ123/QWE123/ASD/AAA BBB.123.CCC-DDD.mkv">

But for:
<media ref="/ABC/XYZ123/QWE123/ASD/AAA.BBB.123.CCC-DDD.mkv">
It doesn’t work due to no white space present.

So the whole line was extracted in A2:
<media ref="/ABC/XYZ123/QWE123/ASD/AAA.BBB.123.CCC-DDD.mkv">

Additional thought:
instead of using .mkv as end-point, how should I use "> as end-point instead?

This way, I can include other type of file extensions.

Takakage · 20 January 2025 17:11

I have no expertise with either regular expressions or Google Sheets, so I am sure that someone else can come up with a more elegant solution, but this seems to work:

=regexextract(A1,"/([^/]+)"">")

Note that your question is probably better suited for a forum specific to Google Sheets or spreadsheets in general.

anon39617291 · 20 January 2025 17:44

Beautiful!
Elegant approach!

Nachlese · 20 January 2025 17:48

regex should be way easier than learning a natural foreign language,
but learning and grasping regular expressions is still very, very hard (for me)

… too much rigorous logic for me

I’ll keep trying to understand that one.

anon39617291 · 20 January 2025 18:17

His approach is to include everything between / and ">.

But the use of ^/ means the content must not contain /, which effectively pinpoint the string from the last / to ">, instead of any / in the string.

Simple and elegant.

Nachlese · 20 January 2025 18:42

Thank you for elaborating on what apparently works!
I’ll keep trying to understand.

It’s much easier for me when having some actual examples rather than some abstractions.

No worries - I was just trying to follow.
I’ll get it eventually.

system · 23 January 2025 18:42

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.