Request hand holding for baloo indexing and search

I recently moved from Xubuntu to Manjaro KDE plasma. I need hand holding for being able to search through my folders using ‘baloo’.

I used System settings — Search option to enable file search. In folder specific configuration I choose NOT to index /home/abad. I chose to index /home/abad/Documents folder, which is about 100gb, including about 30gb of Thunderbird mail folders.

I started indexing late evening, and left indexing as the only task. I used “balooctl monitor” for displaying progress. After about 10+ hours of indexing, next morning, I noticed that indexing got stuck on a small file. System settings — Search window showed 72% progress. Command “balooctl status” returned following text,
Baloo File Indexer is running
Indexer state: Suspended
Total files indexed: 111,169
Files waiting for content indexing: 30,795
Files failed to index: 0
Current size of index is 4.14 GiB

I tried various options suspend/resume/check/etc. But could not get indexing done. I do not recollect, how I came to loose even the 4.14 gb of index which was already done.

Request if someone can help me :

  1. recover lost index of 4.14gb
  2. complete indexing of about 100gb of my Documents folder
  3. enjoy full text search through my Documents folder.

My system information is as follows (inxi --admin --verbosity=7 --filter --width)

System:
  Kernel: 5.15.41-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-5.15-x86_64
    root=UUID=2e646780-2ffb-409b-b567-fb0780bd7300 rw rootflags=subvol=@ quiet
    apparmor=1 security=apparmor
    resume=UUID=2b349ccd-c725-40e4-927d-e49c60e5a66a udev.log_priority=3
  Desktop: KDE Plasma v: 5.24.5 tk: Qt v: 5.15.4 wm: kwin_x11 vt: 1 dm: SDDM
    Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Laptop System: LENOVO product: 81D2 v: Lenovo ideapad 330-15ARR
    serial: <superuser required> Chassis: type: 10 v: Lenovo ideapad 330-15ARR
    serial: <superuser required>
  Mobo: LENOVO model: LNVNB161216 v: SDK0Q55724 WIN
    serial: <superuser required> UEFI: LENOVO v: 7VCN46WW date: 12/10/2018
Battery:
  ID-1: BAT0 charge: 24.4 Wh (96.8%) condition: 25.2/35.0 Wh (72.1%)
    volts: 8.3 min: 7.6 model: LGC L17L2PF0 type: Li-poly serial: <filter>
    status: N/A
Memory:
  RAM: total: 7.38 GiB used: 5.01 GiB (67.8%)
  RAM Report:
    permissions: Unable to run dmidecode. Root privileges required.
CPU:
  Info: model: AMD Ryzen 3 2200U with Radeon Vega Mobile Gfx bits: 64
    type: MT MCP arch: Zen family: 0x17 (23) model-id: 0x11 (17) stepping: 0
    microcode: 0x810100B
  Topology: cpus: 1x cores: 2 tpc: 2 threads: 4 smt: enabled cache:
    L1: 192 KiB desc: d-2x32 KiB; i-2x64 KiB L2: 1024 KiB desc: 2x512 KiB
    L3: 4 MiB desc: 1x4 MiB
  Speed (MHz): avg: 1406 high: 1467 min/max: 1600/2500 boost: enabled
    scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 1395 2: 1467
    3: 1377 4: 1385 bogomips: 19969
  Flags: 3dnowprefetch abm adx aes aperfmperf apic arat avic avx avx2 bmi1
    bmi2 bpext clflush clflushopt clzero cmov cmp_legacy constant_tsc cpb cpuid
    cr8_legacy cx16 cx8 de decodeassists extapic extd_apicid f16c flushbyasid
    fma fpu fsgsbase fxsr fxsr_opt ht hw_pstate ibpb irperf lahf_lm lbrv lm
    mca mce misalignsse mmx mmxext monitor movbe msr mtrr mwaitx nonstop_tsc
    nopl npt nrip_save nx osvw overflow_recov pae pat pausefilter pclmulqdq
    pdpe1gb perfctr_core perfctr_llc perfctr_nb pfthreshold pge pni popcnt pse
    pse36 rapl rdrand rdseed rdtscp rep_good sep sev sev_es sha_ni skinit smap
    smca sme smep ssbd sse sse2 sse4_1 sse4_2 sse4a ssse3 succor svm svm_lock
    syscall tce topoext tsc tsc_scale v_vmsave_vmload vgif vmcb_clean vme
    vmmcall wdt xgetbv1 xsave xsavec xsaveerptr xsaveopt xsaves
  Vulnerabilities:
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl and seccomp
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2
    mitigation: Retpolines, IBPB: conditional, STIBP: disabled, RSB filling
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: AMD Raven Ridge [Radeon Vega Series / Radeon Mobile Series]
    vendor: Lenovo driver: amdgpu v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
    ports: active: eDP-1 empty: HDMI-A-1 bus-ID: 03:00.0 chip-ID: 1002:15dd
    class-ID: 0300
  Device-2: IMC Networks EasyCamera type: USB driver: uvcvideo bus-ID: 3-1:2
    chip-ID: 13d3:5a02 class-ID: 0e02 serial: <filter>
  Display: x11 server: X.Org v: 21.1.3 compositor: kwin_x11 driver: X:
    loaded: amdgpu unloaded: modesetting alternate: fbdev,vesa gpu: amdgpu
    display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
    s-diag: 582mm (22.93")
  Monitor-1: eDP-1 mapped: eDP model: AU Optronics 0x38ed built: 2014
    res: 1920x1080 hz: 60 dpi: 142 gamma: 1.2 size: 344x193mm (13.54x7.6")
    diag: 394mm (15.5") ratio: 16:9 modes: max: 1920x1080 min: 640x480
  OpenGL: renderer: AMD RAVEN (LLVM 13.0.1 DRM 3.42 5.15.41-1-MANJARO)
    v: 4.6 Mesa 22.0.4 direct render: Yes
Audio:
  Device-1: AMD Raven/Raven2/Fenghuang HDMI/DP Audio vendor: Lenovo
    driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
    bus-ID: 03:00.1 chip-ID: 1002:15de class-ID: 0403
  Device-2: AMD Family 17h/19h HD Audio vendor: Lenovo driver: snd_hda_intel
    v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 03:00.6
    chip-ID: 1022:15e3 class-ID: 0403
  Sound Server-1: ALSA v: k5.15.41-1-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.21 running: no
  Sound Server-3: PulseAudio v: 15.0 running: yes
  Sound Server-4: PipeWire v: 0.3.51 running: yes
Network:
  Device-1: Realtek RTL8821CE 802.11ac PCIe Wireless Network Adapter
    vendor: Lenovo driver: rtw_8821ce v: N/A modules: rtw88_8821ce pcie: gen: 1
    speed: 2.5 GT/s lanes: 1 port: 3000 bus-ID: 01:00.0 chip-ID: 10ec:c821
    class-ID: 0280
  IF: wlp1s0 state: up mac: <filter>
  IP v4: <filter> type: dynamic noprefixroute scope: global
    broadcast: <filter>
  IP v6: <filter> type: dynamic noprefixroute scope: global
  IP v6: <filter> type: noprefixroute scope: link
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: Lenovo driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1
    port: 2000 bus-ID: 02:00.0 chip-ID: 10ec:8168 class-ID: 0200
  IF: enp2s0 state: down mac: <filter>
  WAN IP: <filter>
Bluetooth:
  Device-1: Realtek Bluetooth Radio type: USB driver: btusb v: 0.8
    bus-ID: 3-2:3 chip-ID: 0bda:c024 class-ID: e001 serial: <filter>
  Report: rfkill ID: hci0 rfk-id: 2 state: up address: see --recommends
Logical:
  Message: No logical block device data found.
RAID:
  Message: No RAID data found.
Drives:
  Local Storage: total: 931.51 GiB used: 206.3 GiB (22.1%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/sda maj-min: 8:0 vendor: Toshiba model: MQ04ABF100
    size: 931.51 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
    type: HDD rpm: 5400 serial: <filter> rev: 3E scheme: GPT
  Message: No optical or floppy data found.
Partition:
  ID-1: / raw-size: 922.42 GiB size: 922.42 GiB (100.00%)
    used: 206.3 GiB (22.4%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: N/A
    uuid: 2e646780-2ffb-409b-b567-fb0780bd7300
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 568 KiB (0.2%) fs: vfat dev: /dev/sda1 maj-min: 8:1 label: NO_LABEL
    uuid: B8BC-E27B
  ID-3: /home raw-size: 922.42 GiB size: 922.42 GiB (100.00%)
    used: 206.3 GiB (22.4%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: N/A
    uuid: 2e646780-2ffb-409b-b567-fb0780bd7300
  ID-4: /run/timeshift/backup raw-size: 922.42 GiB
    size: 922.42 GiB (100.00%) used: 206.3 GiB (22.4%) fs: btrfs dev: /dev/sda2
    maj-min: 8:2 label: N/A uuid: 2e646780-2ffb-409b-b567-fb0780bd7300
  ID-5: /var/cache raw-size: 922.42 GiB size: 922.42 GiB (100.00%)
    used: 206.3 GiB (22.4%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: N/A
    uuid: 2e646780-2ffb-409b-b567-fb0780bd7300
  ID-6: /var/log raw-size: 922.42 GiB size: 922.42 GiB (100.00%)
    used: 206.3 GiB (22.4%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: N/A
    uuid: 2e646780-2ffb-409b-b567-fb0780bd7300
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default)
  ID-1: swap-1 type: partition size: 8.8 GiB used: 2 MiB (0.0%) priority: -2
    dev: /dev/sda3 maj-min: 8:3 label: swap
    uuid: 2b349ccd-c725-40e4-927d-e49c60e5a66a
Unmounted:
  Message: No unmounted partitions found.
USB:
  Hub-1: 1-0:1 info: Hi-speed hub with single TT ports: 4 rev: 2.0
    speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
  Hub-2: 2-0:1 info: Super-speed hub ports: 4 rev: 3.1 speed: 10 Gb/s
    chip-ID: 1d6b:0003 class-ID: 0900
  Hub-3: 3-0:1 info: Hi-speed hub with single TT ports: 2 rev: 2.0
    speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
  Device-1: 3-1:2 info: IMC Networks EasyCamera type: Video driver: uvcvideo
    interfaces: 2 rev: 2.0 speed: 480 Mb/s power: 500mA chip-ID: 13d3:5a02
    class-ID: 0e02 serial: <filter>
  Device-2: 3-2:3 info: Realtek Bluetooth Radio type: Bluetooth
    driver: btusb interfaces: 2 rev: 1.1 speed: 12 Mb/s power: 500mA
    chip-ID: 0bda:c024 class-ID: e001 serial: <filter>
  Hub-4: 4-0:1 info: Super-speed hub ports: 1 rev: 3.1 speed: 10 Gb/s
    chip-ID: 1d6b:0003 class-ID: 0900
Sensors:
  System Temperatures: cpu: N/A mobo: N/A gpu: amdgpu temp: 55.0 C
  Fan Speeds (RPM): N/A
Info:
  Processes: 252 Uptime: 56m wakeups: 6278 Init: systemd v: 250
  tool: systemctl Compilers: gcc: 12.1.0 clang: 13.0.1 Packages: 1334
  pacman: 1325 lib: 368 flatpak: 0 snap: 9 Shell: Zsh v: 5.9 default: Bash
  v: 5.1.16 running-in: konsole inxi: 3.3.16

Thank you.

I see your home folder is BTRFS.
Baloo has a bug already filed. With every snapshot taken it will reindex the new volume from scratch.
So, if you have 1 unique file with 1 unique keyword, after 2 snapshots (+1 original file):

  1. searching for this unique file will get you 3 results, all pointing to the same file.
  2. If your first index size is x MB, the first snapshot will make it 2x MB, the second snapshot will make it 3 MB.
  3. Your system will be all the time busy indexing same files and too slow.

I have been there, tried BTRFS with Manjaro, then EndeavourOS, a lot of distro hopping for months… till I settled on Manjaro, with no BTRFS and enjoying a wonderful system, so responsive, so stable, so… so… so…

You may check my first posts on the forum here or at Endeavour… see how many problems I had, which I didn’t know it was mainly because of Baloo + BTRFS :skull_and_crossbones:

I hope Baloo would fix this bug or someone suggest an alternative search similar to Baloo that works with Dolphin as Baloo.

Just my experience.
So, I would suggest - as you are like me - Baloo is very essential to you, to reinstall but not with BTRFS, EXT4 I am using.

Maybe someone more experience can tell how to convert BTRFS to EXT4 without the need to reinstall or lose data.

Would like to hear from you.

P.S. I am absolutely not against BTRFS (to me I see it is how a file system should be), and I love Baloo, but they don’t love each other, so I can’t have them both though having both would be great. I hope there is a solution or an alternative.

Thank you @limotux

I need btrfs for timeshift snapshots of Manjaro system. I also need a good full text search tool for about 100gb of documents & mail folders. After two failed attempts at baloo indexing, I choose about 28gb of pdf/odt/txt folders for indexing. However, I had very discouraging experience.

As on today, I am still looking for a good full text search tool (either baloo or ??). I need help.

Here is my largely accurate indexing log (just in case it helps someone to guide my efforts)

**Baloo indexing log**.

1. Total size of selected folders 28gb (largely pdf, odt, txt)
2. Indexing time shown is "dedicated indexing, without any other applications running".
3. Hours of indexing may be largely accurate, with margin of error being 10%.
4. % of indexing is as taken from System settings --- Search --- File Search

After about 15 hours indexing
220610 20:35hrs 81%
balooctl status
Baloo File Indexer is running
Indexer state: Suspended
Total files indexed: 106,226
Files waiting for content indexing: 19,468
Files failed to index: 0
Current size of index is 4.86 GiB

**Note : Indexing speed was very good during this first 15 hours of indexing. And thereafter indexing speed drastically deteriorated**.

After about 24 (15+9) hours indexing
balooctl status
Baloo File Indexer is running
Indexer state: Suspended
Total files indexed: 106,234
Files waiting for content indexing: 15,356
Files failed to index: 0
Current size of index is 5.18 GiB

After about 37 (24+13) hours indexing
220612 22:30 88%
balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 103,733
Files waiting for content indexing: 11,917
Files failed to index: 0
Current size of index is 5.31 GiB

After about 46.5 (37+9.5) hours indexing
220613 08:00 90%
balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 102,782
Files waiting for content indexing: 9,317
Files failed to index: 0
Current size of index is 5.35 GiB

After about 52.25 (46.5+5.75) hours indexing
220613 13:45am 92%
balooctl status
Baloo File Indexer is running
Indexer state: Suspended
Total files indexed: 102,776
Files waiting for content indexing: 7,798
Files failed to index: 0
Current size of index is 5.38 GiB

After about 72.25 (50.25+20) hours indexing
23:00 97%
balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 102,780
Files waiting for content indexing: 2,165
Files failed to index: 0
Current size of index is 5.50 GiB

After about 79.25 (70.25+7) hours indexing
05am 71%
 balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 150,174
Files waiting for content indexing: 43,161
Files failed to index: 0
Current size of index is 5.63 GiB

**Note : (i) Drastic increase in number of files to be indexed (there were no major file/folder movements, no new folders were added to indexing), (ii) % of indexing reduced from 97% to 71%**.

After about 83 (77.25+3.75) hours indexing
08:45  72%
balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 150,174
Files waiting for content indexing: 41,881
Files failed to index: 0
Current size of index is 5.65 GiB

After about 85.5 (81+2.5) hours indexing
11:15am 72%
balooctl status
Baloo File Indexer is running
Indexer state: Indexing file content
Total files indexed: 150,174
Files waiting for content indexing: 41,121
Files failed to index: 0
Current size of index is 5.73 GiB

**Note : On 18-June-2022 I gave-up on baloo indexing --- System settings --- Search --- File Search --- Disable file search**.

Well, let’s hope Baloo fixes that bug.