Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

how to find file info in not-first rar file? #56

Closed
sanderjo opened this issue Sep 23, 2019 · 13 comments
Closed

how to find file info in not-first rar file? #56

sanderjo opened this issue Sep 23, 2019 · 13 comments

Comments

@sanderjo
Copy link

sanderjo commented Sep 23, 2019

With rarfile, how can I find the files in a rar file other than the first?

With the unrar tool it's easy:

$ unrar l mytestrar.part5.rar

UNRAR 5.50 freeware      Copyright (c) 1993-2017 Alexander Roshal

Archive: mytestrar.part5.rar
Details: RAR 5, volume 5

 Attributes      Size     Date    Time   Name
----------- ---------  ---------- -----  ----
 -rw-rw-r--  20971520  2019-09-23 12:08  blabla.bin
 -rw-rw-r--  10485760  2019-09-23 12:08  rrrjfjfj.bin
----------- ---------  ---------- -----  ----
             10485760                    1

But with rarfile I get an error:

$ python3 -c "import rarfile; rf = rarfile.RarFile('mytestrar.part5.rar');"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/rarfile.py", line 685, in __init__
    self._parse()
  File "/usr/local/lib/python3.6/dist-packages/rarfile.py", line 879, in _parse
    self._file_parser.parse()
  File "/usr/local/lib/python3.6/dist-packages/rarfile.py", line 998, in parse
    self._parse_real()
  File "/usr/local/lib/python3.6/dist-packages/rarfile.py", line 1056, in _parse_real
    % (h.main_volume_number,)
rarfile.NeedFirstVolume: Need to start from first volume (current: 4)

FWIW: the filenames are in plaintext in the part5.rar file:

$ strings mytestrar.part5.rar
Rar!
blabla.bin
rrrjfjfj.bin
blabla.bin
        `:/(=
rrrjfjfj.bin
 

and

$ cat mytestrar.part5.rar | hd
00000000  52 61 72 21 1a 07 01 00  26 0f a8 76 0e 01 05 09  |Rar!....&..v....|
00000010  03 04 08 01 01 cf fe ff  82 80 00 bc 72 41 f7 30  |............rA.0|
00000020  02 0b 0b c3 86 c0 80 80  00 04 80 80 80 8a 80 00  |................|
00000030  b4 83 02 17 34 77 38 80  00 01 0a 62 6c 61 62 6c  |....4w8....blabl|
00000040  61 2e 62 69 6e 0a 03 13  a3 99 88 5d 25 c8 1f 09  |a.bin......]%...|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00100390  00 00 00 92 f9 38 21 32  02 13 0b 8d f7 bf 82 80  |.....8!2........|
001003a0  00 04 80 80 80 85 80 00  b4 83 02 de 14 c2 d7 80  |................|
001003b0  00 01 0c 72 72 72 6a 66  6a 66 6a 2e 62 69 6e 0a  |...rrrjfjfj.bin.|
001003c0  03 13 ad 99 88 5d 93 4b  2e 0d 00 00 00 00 00 00  |.....].K........|
001003d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
005fff50  00 00 00 00 00 00 00 bb  70 14 cf 0e 03 06 82 01  |........p.......|
005fff60  00 82 01 00 80 00 01 02  51 4f b3 e1 5f 52 3b 00  |........QO.._R;.|
005fff70  bc fe ff 02 35 bc 72 41  f7 30 02 0b 0b c3 86 c0  |....5.rA.0......|
005fff80  80 80 00 04 80 80 80 8a  80 00 b4 83 02 17 34 77  |..............4w|
005fff90  38 80 00 01 0a 62 6c 61  62 6c 61 2e 62 69 6e 0a  |8....blabla.bin.|
005fffa0  03 13 a3 99 88 5d 25 c8  1f 09 60 3a 2f 28 3d 00  |.....]%...`:/(=.|
005fffb0  c4 f7 bf 02 37 92 f9 38  21 32 02 13 0b 8d f7 bf  |....7..8!2......|
005fffc0  82 80 00 04 80 80 80 85  80 00 b4 83 02 de 14 c2  |................|
005fffd0  d7 80 00 01 0c 72 72 72  6a 66 6a 66 6a 2e 62 69  |.....rrrjfjfj.bi|
005fffe0  6e 0a 03 13 ad 99 88 5d  93 4b 2e 0d 8b 47 51 26  |n......].K...GQ&|
005ffff0  03 05 04 01 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00600000
@markokr
Copy link
Owner

markokr commented Sep 23, 2019

If you open first rar with RarFile, you get all files from all volumes.

@sanderjo
Copy link
Author

sanderjo commented Sep 23, 2019

If you open first rar with RarFile, you get all files from all volumes.

Thanks for replying.

I think that is only true if all rar files are nicely and correctly named .partNN.rar, but in my case the files are obfuscated and contains mixed sets/.

-rw-r--r--  1 sander sander 4194304 sep 23 20:11 06120ece-777c-41ce-8b3d-36cd4ca2379e
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 11b8c93a-b48b-4df7-90bc-479f311817e4
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 168fd2c0-9d1d-42b2-9298-33aff421b772
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 1df328fb-c0d3-4a21-a776-ee36144827b5
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 2249e985-2435-44a9-b729-e70255faebdc
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 2b96d7ac-4af6-4111-ac48-f0e20e468624
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 3149df5c-4062-47a2-9e64-d8b01ea3c234
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 3e130dc0-5c88-4587-b21d-728e5f8b4ba4
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 3eeb2bd3-d46c-44c3-a008-1cf09046934d
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 5526057a-5c32-4c48-afa4-1d44ead87b74
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 695541b5-71da-458f-8ebb-2669834b2e31
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 6c9b4918-687a-4223-935d-e09e466ddccf
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 7bdc9111-9dae-41f1-a015-5f7ef00bff68
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 958a40ad-091e-4102-b090-9a955c1aacd6
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 99b27813-9544-4436-b40e-8b3b274b329b
-rw-r--r--  1 sander sander 3149411 sep 23 20:11 9bdfacdb-40c9-46a2-bd54-ca6010067607
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 c06ad989-6c96-4fd3-849e-5da3182a4f7b
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 c7dfb39d-c14c-4adc-9b6b-9b60b5bd026f
-rw-r--r--  1 sander sander 4194304 sep 23 20:11 eb651a6e-2aea-4f75-85d4-50abe0977464

I wrote a unrar / bash script to find the contents:

$ ~/inspect-all-files.sh .
Directory .
File 06120ece-777c-41ce-8b3d-36cd4ca2379e:	RAR-file. Volume 16 blabla.bin   ccc.bin   
File 11b8c93a-b48b-4df7-90bc-479f311817e4:	RAR-file. Volume 11 blabla.bin   
File 168fd2c0-9d1d-42b2-9298-33aff421b772:	RAR-file. Volume 8 bbbbbbbbb.bin   blabla.bin   
File 1df328fb-c0d3-4a21-a776-ee36144827b5:	RAR-file. Volume 4 bbbbbbbbb.bin   
File 2249e985-2435-44a9-b729-e70255faebdc:	RAR-file. Volume 2 aaaaaa.bin   
File 2b96d7ac-4af6-4111-ac48-f0e20e468624:	RAR-file. Volume 13 blabla.bin   
File 3149df5c-4062-47a2-9e64-d8b01ea3c234:	RAR-file. Volume 12 blabla.bin   
File 3e130dc0-5c88-4587-b21d-728e5f8b4ba4:	RAR-file. Volume 15 blabla.bin   
File 3eeb2bd3-d46c-44c3-a008-1cf09046934d:	RAR-file. Volume 1 aaaaaa.bin   
File 5526057a-5c32-4c48-afa4-1d44ead87b74:	RAR-file. Volume 14 blabla.bin   
File 695541b5-71da-458f-8ebb-2669834b2e31:	RAR-file. Volume 5 bbbbbbbbb.bin   
File 6c9b4918-687a-4223-935d-e09e466ddccf:	RAR-file. Volume 9 blabla.bin   
File 7bdc9111-9dae-41f1-a015-5f7ef00bff68:	RAR-file. Volume 10 blabla.bin   
File 958a40ad-091e-4102-b090-9a955c1aacd6:	RAR-file. Volume 3 aaaaaa.bin   bbbbbbbbb.bin   
File 99b27813-9544-4436-b40e-8b3b274b329b:	RAR-file. Volume 18 ccc.bin   
File 9bdfacdb-40c9-46a2-bd54-ca6010067607:	RAR-file. Volume 19 ccc.bin   
File c06ad989-6c96-4fd3-849e-5da3182a4f7b:	RAR-file. Volume 6 bbbbbbbbb.bin   
File c7dfb39d-c14c-4adc-9b6b-9b60b5bd026f:	RAR-file. Volume 17 ccc.bin   
File eb651a6e-2aea-4f75-85d4-50abe0977464:	RAR-file. Volume 7 bbbbbbbbb.bin   

So:

  1. all contents are visible with unrar
  2. 3eeb2bd3-d46c-44c3-a008-1cf09046934d is volume number 1, so let's check that:
$ python myrarstuff.py 3eeb2bd3-d46c-44c3-a008-1cf09046934d
aaaaaa.bin 10485760

So only the contents of the first rar file ... :-(

If not possible with rarfile module, I'll use unrar, but that will cost a system call per file.

For reference:

myrarstuff.py

import rarfile
import sys

rf = rarfile.RarFile(sys.argv[1])
for f in rf.infolist():
    print f.filename, f.file_size

inspect-all-files.sh


#!/bin/bash
echo "Directory" $1
cd $1
for f in *
do 
	if [ -f "$f" ]; then
		echo -n -e "File $f:\t"
		# check if rar file:
		unrar l $f | grep "not RAR" 2>&1 > /dev/null
		ISRARFILE=$?
		if [ "$ISRARFILE" -eq "0" ]; then
			#echo "Not a rar"
			file $f
		else
			echo -n "RAR-file. Volume "
			# volume number:
			#echo "OLD"
			unrar l $f | grep -A20 "RAR 4"  | awk '/  volume/ { print $3 }'  | tr -d "\n"  # Old rar format

			#unrar l $f | grep volume | tail -1 | awk '{ print $3 }' | tr -d "\n"
			#echo "NEW"
			#unrar l $f | grep -A20 "RAR 5" | grep -e "^Details" | awk '{ print $NF }' | tr -d "\n" # Rar5 format
			unrar l $f |  grep -e "^Details" | awk '/RAR 5/ { print $NF }' | tr -d "\n" # RAR5 format
			#echo "contents"
			#unrar l $f | grep -A2 Attributes | tail -1 | tr -d "\n"
			echo -n " " 
			unrar l $f | grep -e "^ " | sed '1d;$d' |  cut -c42- | tr "\n" "*" |  sed -e 's/\*/   /g' 
			echo ""	
		fi
	fi

done

@markokr
Copy link
Owner

markokr commented Sep 23, 2019

Heh, this is definitely not a common usecase...

I think you simply need to comment out those errors in RarFile, so they don't get thrown. Secondly, you need to install info_callback so you can see all headers that pass by. If you collect headers of all partial files from the start and end of archives, you can then try to match them and put volumes into sequence.

@sanderjo
Copy link
Author

OK, like

rarfile/test/test_api.py

Lines 195 to 205 in cf92552

def test_infocb():
infos = []
def info_cb(info):
infos.append( (info.type, info.needs_password(), info.isdir(), info._must_disable_hack()) )
rf = rarfile.RarFile('test/files/seektest.rar', info_callback=info_cb)
assert infos == [
(rarfile.RAR_BLOCK_MAIN, False, False, False),
(rarfile.RAR_BLOCK_FILE, False, False, False),
(rarfile.RAR_BLOCK_FILE, False, False, False),
(rarfile.RAR_BLOCK_ENDARC, False, False, False)]
?

Thank you for you help!

@Safihre
Copy link
Contributor

Safihre commented Sep 24, 2019

Indeed I modified the RarFile we use in our application to have an extra parameter to ignore first-volume errors and be more robust against mid-volume file-listings:
sabnzbd/sabnzbd@de6d642#diff-5201af00afcd70a08ebe955593b071c1R1021

@sanderjo
Copy link
Author

sanderjo commented Sep 25, 2019

Indeed I modified the RarFile we use in our application to have an extra parameter to ignore first-volume errors and be more robust against mid-volume file-listings:

Nice, that works: sabnzbd/sabnzbd#1331 (comment)

@Safihre
Copy link
Contributor

Safihre commented Oct 8, 2020

@markokr I tried to apply a similar patch to the new RarFile but it's been quite hard.
Would you be willing to reconsider to add a flag to be able to limit RarFile to 1 file, even if it's part of a multi part series?
So we would be able to get file info and contents from a single file that's part of a series, on it's own.

We use RarFile a lot in our application SABnzbd and I would really like to include the original version, instead of the old modified RarFile version we still use now :)

@markokr
Copy link
Owner

markokr commented Oct 9, 2020

Sounds reasonable request, I'll look into it.

@markokr markokr reopened this Oct 9, 2020
@markokr markokr closed this as completed in b2d2507 Oct 9, 2020
@markokr
Copy link
Owner

markokr commented Oct 9, 2020

Please test with 'master', add part_only=True flag to RarFile

@Safihre
Copy link
Contributor

Safihre commented Oct 12, 2020

Thanks for looking into this! But doesn't seem to work:

>>> import rarfile
>>> rarfile.__version__
'4.1a1'
>>> aa=rarfile.RarFile(r"C:\Users\saf\Downloads\84868a4ced3d9ff30597d5b54d066a53.part11.rar",part_only=True)
>>> aa.infolist()
[]
>>> aa.namelist()
[]

Our own modified old RarFile:

>>> import sabnzbd.utils.rarfile as rf_sab
>>> bb=rf_sab.RarFile(r"C:\Users\saf\Downloads\84868a4ced3d9ff30597d5b54d066a53.part11.rar",single_file_check=True)
>>> bb.namelist()
['84868a4ced3d9ff30597d5b54d066a53.mkv']
>>> bb.infolist()
[<sabnzbd.utils.rarfile.Rar5FileInfo object at 0x0000027B1ABFC040>]

@markokr
Copy link
Owner

markokr commented Oct 13, 2020

Please use info_callback to collect archive records.

.namelist()/.infolist() operate on sanitized file list, eg. no versioned files, it will not be robust for volume mapping.
And allowing part_only to affect all that filtering logic seems wrong.

@Safihre
Copy link
Contributor

Safihre commented Oct 18, 2020

Thank you for the info_callback hint! I created my own wrapper class to do what we need:

import rarfile as rf


class SabRarFile(rf.RarFile):
    def __init__(self, *args, **kwargs):
        """Patch RarFile-call when using `part_only`
        to store filenames inside the RAR-files"""
        if kwargs.get("part_only"):
            kwargs["info_callback"] = self.info_callback
        
        # Let RarFile handle the rest!
        super().__init__(*args, **kwargs)

    def info_callback(self, rar_obj: rf.RarInfo):
        """Called for every RarInfo-object found"""
        # We only care about files inside the Rar
        # For Rar5 there is a separate object, for Rar3 we need to check if a filename was parsed
        if isinstance(rar_obj, (rf.Rar5FileInfo, rf.Rar3Info)) and rar_obj.filename:
            # Avoid duplicates
            if rar_obj not in self._file_parser._info_list:
                self._file_parser._info_list.append(rar_obj)
                self._file_parser._info_map[rar_obj.filename.rstrip("/")] = rar_obj

Which works as we need it:

# Rar3
bb = SabRarFile(r"C:\Users\saf\Downloads\SkypeMeetingsApp.part2.rar", part_only=True)
print(bb.namelist())
print(bb.infolist())

['SkypeMeetingsApp.msi']
[<rarfile.Rar3Info object at 0x00000276877D3BB0>]


# Rar5
bb=SabRarFile(r"C:\Users\saf\Downloads\84868a4ced3d9ff30597d5b54d066a53.part11.rar", part_only=True)
print(bb.namelist())
print(bb.infolist())

['84868a4ced3d9ff30597d5b54d066a53.mkv']
[<rarfile.Rar5FileInfo object at 0x0000021C25C387C0>]

@Safihre
Copy link
Contributor

Safihre commented Oct 18, 2020

When you create a new rarfile release, we can start using it 🥳

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants