Old 08-19-2019, 06:57 PM   #1
dsyrock
Human being with feelings
 
dsyrock's Avatar
 
Join Date: Sep 2018
Location: China
Posts: 565
Default How to tell whether two files are the same one or not

For example, I have two files, D:\1.wav, E:\1.wav. Is it possible to compare them with internal or sws or JS API and tell if they are the same one or they just have a same name?
dsyrock is offline   Reply With Quote
Old 08-19-2019, 09:30 PM   #2
mpl
Human being with feelings
 
mpl's Avatar
 
Join Date: Oct 2013
Location: Moscow, Russia
Posts: 3,960
Default

io.open in binary mode, check content string length and some random bytes, no extension needed
mpl is offline   Reply With Quote
Old 08-19-2019, 09:39 PM   #3
dsyrock
Human being with feelings
 
dsyrock's Avatar
 
Join Date: Sep 2018
Location: China
Posts: 565
Default

Quote:
Originally Posted by mpl View Post
io.open in binary mode, check content string length and some random bytes, no extension needed
Thanks mpl!
dsyrock is offline   Reply With Quote
Old 08-20-2019, 05:49 AM   #4
X-Raym
Human being with feelings
 
X-Raym's Avatar
 
Join Date: Apr 2013
Location: France
Posts: 9,875
Default

Rather than compare bit per bit, just compare Checksum :P
X-Raym is offline   Reply With Quote
Old 08-20-2019, 08:53 PM   #5
dsyrock
Human being with feelings
 
dsyrock's Avatar
 
Join Date: Sep 2018
Location: China
Posts: 565
Default

Quote:
Originally Posted by X-Raym View Post
Rather than compare bit per bit, just compare Checksum :P
Can you give me a little more hints about how to compare Checksum please?
dsyrock is offline   Reply With Quote
Old 08-21-2019, 01:51 AM   #6
X-Raym
Human being with feelings
 
X-Raym's Avatar
 
Join Date: Apr 2013
Location: France
Posts: 9,875
Default

Use some checksum / m5 lua libraries, there is some on github


https://github.com/kikito/md5.lua


https://github.com/philanc/plc/tree/master/plc


I didn't tets it but the main idea is that it will output a unique string from a file,
so if two files are two different, it will output diffeent strings.


You only need to make a string comparison based on that.
X-Raym is offline   Reply With Quote
Old 08-21-2019, 03:06 AM   #7
Xenakios
Human being with feelings
 
Xenakios's Avatar
 
Join Date: Feb 2007
Location: Oulu, Finland
Posts: 8,062
Default

Quote:
Originally Posted by X-Raym View Post
You only need to make a string comparison based on that.
But to get the checksum, the file of course has to be read and processed fully first anyway, so using checksums doesn't necessarily make anything faster. For example if you need to do the comparison between 2 files just once, it would be just the same to just directly compare the files at the byte level. That could even be faster than calculating the checksums because you can do an early exit from the comparison algorithm at the first difference between the files, while checksums require to go through the entire file.
__________________
I am no longer part of the REAPER community. Please don't contact me with any REAPER-related issues.

Last edited by Xenakios; 08-21-2019 at 03:12 AM.
Xenakios is offline   Reply With Quote
Old 08-21-2019, 09:36 AM   #8
X-Raym
Human being with feelings
 
X-Raym's Avatar
 
Join Date: Apr 2013
Location: France
Posts: 9,875
Default

@Xenakios
You are right,
so checksum will be relevant if file need to be compared to several files, or if one of the file is inacessible.(if you already have its checksum beforehand)
X-Raym is offline   Reply With Quote
Old 08-22-2019, 01:35 AM   #9
dsyrock
Human being with feelings
 
dsyrock's Avatar
 
Join Date: Sep 2018
Location: China
Posts: 565
Default

Quote:
Originally Posted by Xenakios View Post
For example if you need to do the comparison between 2 files just once, it would be just the same to just directly compare the files at the byte level. That could even be faster than calculating the checksums because you can do an early exit from the comparison algorithm at the first difference between the files, while checksums require to go through the entire file.
So I tried to write a script like this

Code:
local file=io.open("e:\\1.wav", "rb")

local content=file:read("*all")

local len=content:len()

file:close()

local f1={}

for i=1, len do    --store each byte of file1

    local text=content:sub(i, i)

    table.insert(f1, text:byte(1))

end

local file=io.open("e:\\2.wav", "rb")

local content=file:read("*all")

local len=content:len()

file:close()

local f2={}

for i=1, len do    --store each byte of file2

    local text=content:sub(i, i)

    table.insert(f2, text:byte(1))

end

local diff=false

for k, v in pairs(f2) do

    if v~=f1[k] then

        diff=true

        break

    end

end

if diff then msg("They are different") else msg("They are the same") end
Is it the right way to compare two files?

Last edited by dsyrock; 08-22-2019 at 02:18 AM.
dsyrock is offline   Reply With Quote
Old 08-22-2019, 07:32 AM   #10
mpl
Human being with feelings
 
mpl's Avatar
 
Join Date: Oct 2013
Location: Moscow, Russia
Posts: 3,960
Default

Slightly easier check I suggested before:
Code:
local file1=io.open("e:\\1.wav", "rb")
if file1 then 
  content1=file:read("*all")
  len1=content1:len()
  file:close()
end

local file2=io.open("e:\\2.wav", "rb")
if file2 then 
  local content2=file:read("*all")
  local len2=content2:len()
  file2:close()
end

test_byte1 = math.floor(math.random()*math.min(len1,len2))
test_byte2 = math.floor(math.random()*math.min(len1,len2))
test_byte3 = math.floor(math.random()*math.min(len1,len2))
local diff =  content1:byte(test_byte1)==content2:byte(test_byte1) 
              and content1:byte(test_byte2)==content2:byte(test_byte2) 
              and content1:byte(test_byte3)==content2:byte(test_byte3) 
              and len1==len2

if diff then msg("They are same") else msg("They are the different") end
mpl is offline   Reply With Quote
Old 08-23-2019, 10:59 AM   #11
dsyrock
Human being with feelings
 
dsyrock's Avatar
 
Join Date: Sep 2018
Location: China
Posts: 565
Default

Quote:
Originally Posted by mpl View Post
Slightly easier check I suggested before:
Thanks again. It's much faster than comparing byte by byte. Is it safe enough? I mean is it possible that there are two files which are very simular to each other, and they have same sizes. In this situation, is comparing ramdom bytes able to tell they are different?
dsyrock is offline   Reply With Quote
Old 08-23-2019, 11:00 AM   #12
X-Raym
Human being with feelings
 
X-Raym's Avatar
 
Join Date: Apr 2013
Location: France
Posts: 9,875
Default

@dsyrock
Of course


A simple letter in ASCII is only 8 bits so a difference of one letter in a txt document would surely not be found if random bits are taken (and even more if you consider than differents characters can have similar bits at some positions).


Bit per bit is only way to be sure if file have same number of bits.
X-Raym is offline   Reply With Quote
Old 08-24-2019, 01:22 PM   #13
Meo-Ada Mespotine
Human being with feelings
 
Meo-Ada Mespotine's Avatar
 
Join Date: May 2017
Location: Leipzig
Posts: 6,621
Default

As X-Raym pointed out: the only way to do it and to be sure it is exact is doing checking every single byte.
You could try using a comparison-commandline-tool instead, which might be faster than doing it in Lua.
__________________
Use you/she/her.Ultraschall-Api Lua Api4Reaper - Donate, if you wish

On vacation for the time being...
Meo-Ada Mespotine is offline   Reply With Quote
Old 08-24-2019, 01:41 PM   #14
Xenakios
Human being with feelings
 
Xenakios's Avatar
 
Join Date: Feb 2007
Location: Oulu, Finland
Posts: 8,062
Default

Quote:
Originally Posted by X-Raym View Post
@dsyrock
Bit per bit is only way to be sure if file have same number of bits.
Or rather byte by byte, no sane code would these days do actual bit by bit comparisons unless there was some really good reason for that.
__________________
I am no longer part of the REAPER community. Please don't contact me with any REAPER-related issues.
Xenakios is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 07:39 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.