Best approach to extract info in Google Sheets?

Hi all.
I’m cracking my head, trying to extract info using Google Sheets.

The situation:
In the cell, the data is
<media ref="/ABC/XYZ123/QWE123/ASD/AAA BBB.123.CCC-DDD.mkv">

ABC, XYZ, QWE, ASD, AAA, BBB, CCC, DDD are random words.
123 are random numbers as well.

I want to retrieve AAA BBB.123.CCC-DDD.mkv in another cell.

I tried to use regextract, but other than the / and .mkv, I cannot find a constant to leverage on.
Any suggestion?

I’m not sure how you mean … as in are we working with the sheet itself or just the data?

$ printf '<media ref="/ABC/XYZ123/QWE123/ASD/AAA BBB.123.CCC-DDD.mkv">' | sed -e 's:.*/::' -e 's/..$//'
AAA BBB.123.CCC-DDD.mkv

Hmm…
Say, cell A1 in Google Sheet, the data is:
<media ref="/ABC/XYZ123/QWE123/ASD/AAA BBB.123.CCC-DDD.mkv">

Now, I want cell A2 in Google Sheet, to “extract” the content from A1.
The needed content is:
AAA BBB.123.CCC-DDD.mkv

The problem is:
I tried to use regextract formula, but doesn’t seem like a fit here.
So, I’m hoping someone would have an idea how to do it.

search for the split() function in the documentation

=regexextract(A1,"[^/]+ \S+\.mkv")

After testing it,

This works for:
<media ref="/ABC/XYZ123/QWE123/ASD/AAA BBB.123.CCC-DDD.mkv">

But for:
<media ref="/ABC/XYZ123/QWE123/ASD/AAA.BBB.123.CCC-DDD.mkv">
It doesn’t work due to no white space present.

So the whole line was extracted in A2:
<media ref="/ABC/XYZ123/QWE123/ASD/AAA.BBB.123.CCC-DDD.mkv">

Additional thought:
instead of using .mkv as end-point, how should I use "> as end-point instead?

This way, I can include other type of file extensions.

I have no expertise with either regular expressions or Google Sheets, so I am sure that someone else can come up with a more elegant solution, but this seems to work:

=regexextract(A1,"/([^/]+)"">")

Note that your question is probably better suited for a forum specific to Google Sheets or spreadsheets in general.

3 Likes

Beautiful!
Elegant approach!

1 Like

regex should be way easier than learning a natural foreign language,
but learning and grasping regular expressions is still very, very hard (for me)

… too much rigorous logic for me :slight_smile:



I’ll keep trying to understand that one.

His approach is to include everything between / and ">.

But the use of ^/ means the content must not contain /, which effectively pinpoint the string from the last / to ">, instead of any / in the string.

Simple and elegant.

Thank you for elaborating on what apparently works!
I’ll keep trying to understand.

It’s much easier for me when having some actual examples rather than some abstractions.

No worries - I was just trying to follow.
I’ll get it eventually.

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.