#how can I remove duplicates of a certain object in vector?

66 messages · Page 1 of 1 (latest)

potent marten
#

I wish to remove duplicates of satellite objects in a vector. So each satellite object has an "ID" and alot of other stuff, but I want to ensure that all satellite objects in my vector have a unique ID number.
How can I accomplish this?

carmine marlinBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For more information use !howto ask.

oak ferry
#

You could use an unordered_map maybe?
I guess at some point you may want to access the object by ID.

PS: you could make that an ordered map too if you want.
But you pay a performance penalty and in the majority of implementations the actual order of the map matters not.

potent marten
#

what would would be the "key" of the map ?

oak ferry
#

The ID

#

It's not a big deal if this ID was repeated in the object itself.
In case that has you puzzled.

potent marten
#

yeah I found that a bit weird, the ID is already part of the object

#

what about removing duplicates manually? would that have a deterimental effect on performance ?

#

or is map better?

oak ferry
#

There are other ways.
For example, storing in a set rather than a vector.
But that requires some minor boilerplate code.

potent marten
#

but how would a set actually determine if an object is unique?

oak ferry
#

Through the boilerplate code.

potent marten
#

because in some case, satellite might have the same ID but different other parameters. In that case I want to remove that duplicate.

potent marten
oak ferry
#

Give me a a few minutes to come up with an example.

potent marten
#

How is this manual method? 1- Check if the count(overload "==" operator) of an object is greater than 1. 2- Then remove that object count many times until there is only 1 left.

oak ferry
#

The manual route would be via std::find_if(), I guess.

solid cove
#

you could also std::sort by ID, then std::unique to remove consecutive duplicates, then vec.erase to get rid of them

#

but if you do this a lot, just keeping it sorted / in a std::set might be better? idk

potent marten
#

ok thanks for help

oak ferry
#

Here it is. I am sure it can be bettered by some cleverer than I

#include <set>
#include <string>
#include <iostream>
#include <cassert>

class Satellite
{
    friend bool operator!=(const Satellite& lh, const Satellite& rh);

public:
    Satellite(int id, const std::string& name)
        : id(id)
        , name(name)
    {}

    void print() const
    {
        std::cout << id << " : " << name << std::endl;
    }

private:
    int id;
    std::string name;
};

bool operator!=(const Satellite& lh, const Satellite& rh)
{
    if (lh.id == rh.id)
        return false;

    if (lh.name == rh.name)
        return false;

    return true;
}


int main()
{
    std::set<Satellite, std::not_equal_to<Satellite>> sats;

    {
        // success: first insertion always works
        auto [i, inserted] = sats.emplace(1, "ABC");
        assert(inserted);
    }

    {
        // error: Satellite 1 already exists
        auto [i, inserted] = sats.emplace(1, "DEF");
        assert(!inserted);
    }

    {
        // error: Satellite "ABC" already exists
        auto [i, inserted] = sats.emplace(2, "ABC");
        assert(!inserted);
    }

    {
        // success: Satellite 2,"DEF" is unique
        auto [i, inserted] = sats.emplace(2, "DEF");
        assert(inserted);
    }

    for (const auto& s : sats)
    {
        s.print();
    }
}
potent marten
oak ferry
#

Useful?
Not useful?

potent marten
#

Useful !

#

previously I was doing this:

#
void CustomClass::removeDuplicates(QVector<QGeoSatelliteInfo>& satellites){
    for (int i{};i<satellites.size();++i){
        for (int k{};k<satellites.size();++k){
            if (i==k){
                continue;
            }
            else{
                if (satellites[i].satelliteIdentifier() == satellites[k].satelliteIdentifier()){
                    satellites.erase(satellites.begin()+std::min(i,k));
                }
            }
        }
    }
}``` but this is like 0(N^2) I think
solid cove
#

hmm, I'm not sure if not_equal is strictly forming a weak order

#

I'd just make an operator< and let it sort by ID

oak ferry
#

because in some case, satellite might have the same ID but different other parameters. In that case I want to remove that duplicate.

Perhaps I misunderstood @potent marten but I took this to mean that the multiple attributes must be unique.

solid cove
#

with != as "<" , x < y also means y < x

#

it might break the sorting

oak ferry
#

I didn't think sorting was essential.
Just uniqueness.

solid cove
#

well, set sorts

potent marten
solid cove
#

godbolt doesn't show that, but this is what fails with your version @oak ferry

    if (_Result) {
        _STL_VERIFY(!_Pred(_Right, _Left), "invalid comparator");
    }

inside std::set

oak ferry
#

Perhaps unordered_set would be better, or indeed use set with the default less predicate.

oak ferry
# potent marten what I meant to say was that any satellite object with the duplicate ID must be ...

So only the ID guarantees uniqueness.
I still think unordered_map would provide a better solution, because I can easily imagine you needing to get the Satellite by this ID at various points in your code.

Still, if you are certain that this is not the case, here is the amended code (hopefully correct this time @solid cove ?)

#include <set>
#include <string>
#include <iostream>
#include <cassert>

class Satellite
{
    friend bool operator<(const Satellite& lh, const Satellite& rh);

public:
    Satellite(int id, const std::string& name)
        : id(id)
        , name(name)
    {}

    void print() const
    {
        std::cout << id << " : " << name << std::endl;
    }

private:
    int id;
    std::string name;
};

bool operator<(const Satellite& lh, const Satellite& rh)
{
    return lh.id < rh.id;
}


int main()
{
    std::set<Satellite> sats;

    {
        // success: first insertion always works
        auto [i, inserted] = sats.emplace(1, "ABC");
        assert(inserted);
    }

    {
        // error: Satellite 1 already exists
        auto [i, inserted] = sats.emplace(1, "DEF");
        assert(!inserted);
    }

    {
        // success: Satellite 2,"ABC" is unique
        auto [i, inserted] = sats.emplace(2, "ABC");
        assert(inserted);
    }

    {
        // error: Satellite 2 already exists
        auto [i, inserted] = sats.emplace(2, "GHI");
        assert(!inserted);
    }


    for (const auto& s : sats)
    {
        s.print();
    }
}

Note that you can now have two Satellites with the same name, but no two Satellites can exist with the same ID.
Or: ID guarantees uniqueness.

sleek breach
#

there is std::unique

solid cove
#

yeah, I mentioned it above

#

but it also requires keeping things sorted

#

so might as well use an ordered container if you're gonna do it on each insert

potent marten
# potent marten ```cpp void CustomClass::removeDuplicates(QVector<QGeoSatelliteInfo>& satellites...

I tested this and this is bad, when satellites.erase() is called it can crash the program due to i being out of range. Then I tried this(which works well and based on @solid cove 's advice): cpp void CustomClass::removeDuplicates(QVector<QGeoSatelliteInfo>& satellites) { std::sort(satellites.begin(), satellites.end(), compare_satelliteID); auto last = std::unique(satellites.begin(), satellites.end(), duplicate_IDcheck); satellites.erase(last, satellites.end()); } but which method is the fastest: 1- Use dimwits initial idea of QMap(similar to std::map) with the key being the satellite ID. 2- Stick to this code snippet. 3- Use the std::set shown by dimwit ?

#

performance is really important here because as soon as data is parsed from the serial port it needs to remove duplicates and emit another signal

solid cove
#

sort + unique + erase lets you choose when you remove duplicates,
std::set will prevent duplicates on each insertion

#

a hash map (using the ID as the key) wouldn't need to sort

#

so I think a hash map or hash set should be faster

#

but you'd have to try it

oak ferry
#

I noticed you mentioned QMap.
If you are on Qt, consider QHash instead.

QMap == std::map
QHash == std::unordered_map

solid cove
#

for example I could imagine decoding a few things from the serial port and putting them into the vector,
and only deduplicating once you want to use the data / think there won't be more data / ???

#

so you don't pay for the lookup every time you decode a thing

#

but if you have a hash map, the lookup should be cheap

#

so it might not matter 🤷

potent marten
#

yeah makes sense

#

thanks for the advice

potent marten
#

okay the QHash method works quite well. Now do I close this post right?

#

how do I mark it as solved?

solid cove
#

¯_(ツ)_/¯

oak ferry
#

!solved

carmine marlinBOT
oak ferry
#

You can (should?) also right-click the thread in the tree-view, select tags, and add "solved".
@potent marten

potent marten
oak ferry
potent marten
#

!solved

carmine marlinBOT
#

[SOLVED] how can I remove duplicates of a certain object in vector?