#How to add more tag autonomously for NLP program in C++

14 messages · Page 1 of 1 (latest)

sage scaffold
#

I am trying to create a NLP program in C++ for my school assignment, how do I add more tags for my program autonomously without me adding extra tags for it?

Below is the code:

#include <iostream>
#include <string>
#include <vector>
#include <sstream>

// Function to tokenize a string into words
std::vector<std::string> tokenize(const std::string& text) {
    std::vector<std::string> tokens;
    std::istringstream iss(text);
    std::string token;

    while (iss >> token) {
        tokens.push_back(token);
    }

    return tokens;
}

// Function to perform basic part-of-speech tagging
std::vector<std::pair<std::string, std::string>> posTag(const std::vector<std::string>& tokens) {
    std::vector<std::pair<std::string, std::string>> posTags;

    for (const std::string& token : tokens) {
        std::string posTag = "UNKNOWN";
        if (token == "is" || token == "are" || token == "was" || token == "were") {
            posTag = "VERB";
        }
        else if (token == "a" || token == "an" || token == "the") {
            posTag = "ARTICLE";
        }
        else if (token == "of" || token == "in" || token == "on" || token == "by") {
            posTag = "PREPOSITION";
        }
        else if (token == "NLP" || token == "AI") {
            posTag = "NOUN";
        }

        posTags.push_back(std::make_pair(token, posTag));
    }

    return posTags;
}

int main() {
    std::string inputText = "Natural language processing is a subfield of artificial intelligence.";

    // Tokenize the input text
    std::vector<std::string> tokens = tokenize(inputText);

    // Perform basic part-of-speech tagging
    std::vector<std::pair<std::string, std::string>> posTags = posTag(tokens);

    // Print the results
    for (const auto& pair : posTags) {
        std::cout << "Token: " << pair.first << " | POS Tag: " << pair.second << std::endl;
    }

    return 0;
}
snow wedgeBOT
#

When your question is answered use !solved to mark the question as resolved.

Remember to ask specific questions, provide necessary details, and reduce your question to its simplest form. For tips on how to ask a good question run !howto ask.

sage scaffold
#

Highlight (How to extend this part without doing it manually):

// Function to perform basic part-of-speech tagging
std::vector<std::pair<std::string, std::string>> posTag(const std::vector<std::string>& tokens) {
    std::vector<std::pair<std::string, std::string>> posTags;

    for (const std::string& token : tokens) {
        std::string posTag = "UNKNOWN";
        if (token == "is" || token == "are" || token == "was" || token == "were") {
            posTag = "VERB";
        }
        else if (token == "a" || token == "an" || token == "the") {
            posTag = "ARTICLE";
        }
        else if (token == "of" || token == "in" || token == "on" || token == "by") {
            posTag = "PREPOSITION";
        }
        else if (token == "NLP" || token == "AI") {
            posTag = "NOUN";
        }

        posTags.push_back(std::make_pair(token, posTag));
    }

    return posTags;
}
hollow pasture
#

have you heard of associative containers?

sage scaffold
#

no

hollow pasture
#

right

snow wedgeBOT
#
template<
    class Key,
    class T,
    class Hash = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator<
        std::pair<const Key, T> > >
class unordered_map;
// ... and 1 more
Defined in
hollow pasture
#

click the link and go to the example section

#

it's like a vector, except it stores "key-value" pairs, and instead of using indices to access values, you use a key to access the mapped value

#

in your case, "is" would map to "verb", "are" to "verb" also, "a" to "article", "of" to "preposition", and so on

#

then you can query the map and directly get the tag associated to a word

#

you'll still have to manually add mapped entries (key-value pairs) to the map as you go, but at least you'll never be modifying your function again because of that

#

also I strongly recommend you use an enum instead of a string to represent your different tags

#
enum class Tag {
    Unknown,
    Verb,
    Preposition,
    Article,
    Noun
};