#Create content based on data from external web page

1 messages · Page 1 of 1 (latest)

tender marsh
#

add_filter( "mwai_context_search", 'my_page_search', 5, 3 );

function my_page_search( $context, $query, $options = [] ) {

$site_url_regex = "/https?:\/\/[\S]+/i";
  
if (preg_match($site_url_regex, $query->get_message(), $matches)) {
  $site_url = $matches[0]; // The first URL in the message.
  $command = "pandoc -s -r html $site_url -t plain --no-highlight";
  $output = shell_exec($command);
  } else {
  return null;
}

// Check if command executed successfully.
if (isset($output) && $output === null) {
    return null;
}

// AI Engine expects a type (for logging purposes) and the content (which will be used by AI).

$context["type"] = "sitesearch";
$context["content"] = isset($output) ? $output : '';

// Debug info saved to PHP error.log (comment out if not required)
error_log(print_r("DEBUG: final context", true));
error_log(print_r($context, true));

return $context;
}

glacial pine
#

Please can you share a use-case example that helps understanding what does this code? Thanks in advance.

#

Forget previous message, I have already seen the example you shared on the other channel. It seems promising, will test it. Thank you very much 👍

tranquil valley
#

Awesome!

slender cradle
tender marsh
plush swift
#

Any chance you have on GitHub?

plush swift
#

which file do you include this on?

tender marsh
#

You can install the sample code using a plugin like Code Snippets

slender cradle
tender marsh
slender cradle
tender marsh
#

Indeed, most shared hosting services do not permit the installation of programs such as Pandoc. The best alternative in this situation would be to use a dedicated instance, such as a managed VPS or cloud server. While they are somewhat more costly, they offer greater flexibility and stability. Pandoc can be used to extract data from websites and supply this information to a chatbot. However, if your sole requirement is to integrate the chat functionality with Google, there's no need for this tool. Simply employ the Google Search API instead.

frosty skiff
#

How do I use this code?

tender marsh
#

You can install this code using a plugin like Code Snippets

slender cradle
tender marsh
slender cradle
tender marsh
#

HI @slender cradle Yes, this is possible but it's not ready out of the box and would require some coding. One alternative would be to create a post/page with the contents of your document (using a plugin such as Mammoth .docx converter) and then use Embeddings so the bot can answer questions about it.

slender cradle
tender marsh
slender cradle
tender marsh
#

Actually, it works with Dalle-e (Open.ai) not MidJourney. And uses the same API key as the chatbot (the one you save on Settings). Yes, it's charged per image:
https://openai.com/pricing

Simple and flexible. Only pay for what you use.

slender cradle
# tender marsh Actually, it works with Dalle-e (Open.ai) not MidJourney. And uses the same API ...

Clear, thank you very much! Now I start to use TTS https://github.com/Stevethebeef/ElevenlabsTTSforAIEngine#readme. Do you think it's possible to exclude some sentences generated from the context for the text transcript to audio? Let me provide more detail. I like the idea of always having the voice instead of the text generated by the system. However, in some cases (using the prompt-context), I would like to exclude some parts of the response and generate only text instead of audio during the (chat) conversation. So, I would like to know if it's possible to add some instructions in the prompt in order to excluded (parts/words/sentence) from the text-to-audio option.

GitHub

The plugin is an addon to the AI engine of MEOW. It converts the text response from the GPT model into a audio voice file that is automatically played in the chat. - GitHub - Stevethebeef/Elevenlab...

versed hill
#

So my website data is hidden behind short codes. Could I use this to search my own website for data that appears on the front end? Example I have a listing website could it pull data. User asks it "I'm looking for a camera in Ohio" bot scans my site for a camera in Ohio? Spits out the link in chat? Thanks

slender cradle
tender marsh
#

As far as I know, there is currently no API access to DALL-E v3

silver summit
#

Yes dalle-3 not yet available

tender marsh
#

Hello, everyone! We've noticed that a number of people are struggling with integrating AI Engine with data from external sites and Google searches. To address this, we've developed a web service that caters exactly to this need 🙂

#

This service is still in its early stages of development (beta, perhaps even alpha). If you're interested in being one of our testers, please visit https://bot.centralserver.com/en/ and request early access. Do ensure to provide the IP address of your website as we need this information to grant access to the service. If we gather sufficient interest, we plan to build a fully-fledged API access in the future.

#

Upon receiving access, please review the documentation, available here:

desert fjord
tender marsh
#

Hi @desert fjord Other people have managed to register just fine. Anyway, I've disabled Captcha for a while so that you can register. If it still doesn't work for you, DM and will sort it out.

desert fjord
tender marsh
gray violet
#

All registered - am I able to add this to my own site?

tender marsh
gray violet
#

Yes, registered, accepted and logged in...

tender marsh
#

Good, so you're all set. Did you have a look at the documentation available at the help tab?

gray violet
#

I did read through - it was this bit that confused me

#

Do I need an api key or will this work because you have my site ip?

tender marsh
#

At the moment, access is granted based on your IP address. So if you install the filter, you should be able test it right away

gray violet
#

Great, thank you. I'll give it a try

tepid zinc
tender marsh
#

Hi @tepid zinc Access granted. Let me know it goes!

tepid zinc
tender marsh
#

DM me here if you need anything

tepid zinc
round socket
#

Hey @tender marsh - just signup! Super interested!

tender marsh
#

Got it @round socket Thanks!

#

@round socket sent you an email with a question about the IP address of your web server

earnest forge
#

Hi, @tender marsh I'm trying to read data from external site (api in json format) but it does not work. the address i specify contains my api key and works on normal search result.
any suggestions how to do it?
thanks

round socket
# tender marsh <@787426545186111509> sent you an email with a question about the IP address of ...

hey thx! Your tool is super promising. I like to fact that as a user I don’t really care of the prompt but you do it for me! Super clever. Btw - I’m super interested on your advice on how to improve reply performance as you did cause for me the chatbot is super slow (10/15sec - using embedding). Did you make any code improvement to have this amazing result from the initial code? Thx for help 🙏

tender marsh
#

Hi @round socket thanks for the feedback. Did you have a chance to test the Web Search (Google) and the Web Scraping (Analyze Site) features too?

I'm glad you liked the performance 😀 Our test environment uses GPT-4 and the response time is typically 1-2 seconds. With turbo, it's even faster. We do not make any changes to the AI Engine code nor use cache plugins.

The main advice I would give is to use a dedicated instance for your WP site. Stay away from shared hosting or cheap VPS as they come with a lot of performance bottlenecks that impact the plugin.

A dedicated cloud server nowadays starts at $40-$50/month and provides enough resources to run a few WP sites + AI Engine with embeddings. It's a reasonable investment if you're building a professional solution.

DISCLAIMER: We are a cloud service provider.

Other than this, it's important to use a fast web server, protect your site against brute-force attacks, keep the WP installation light (i.e. only the necessary plugins) and up-to-date.

Let me know if you need any help!

jaunty bronze
#

Im late to the party how can I get this to work?

west fulcrum
#

Hi guys, as my hoster does not allow pandoc in my shared hosting I needed an other simpler solution for crawling html which I've now coded for myself. It's working already and is a gamechanger for me. But I want some error logging and know why it sometimes does not work. It is btw running without any additional external providers or libraries. Is there anyone with coding skills and who would like to help with optimizing the code? If we can make this run smoothly, I am happy to share it with @warm spoke so he can maybe integrate it in the core of the pro version.

silver summit
plush swift
#

just used the perplexity.ai API through openrouter integration to create an excellent web accessing chat.

west fulcrum
west fulcrum
#

Hey. I had a working crawling feature last year (without pandoc). Now its not working anymore. Maybe because of the changes jordy made with the structure. I've tried around a lot, but can't get it running. Please share your insights, so i can change it and you all can use it of course. https://gist.github.com/VABELHAVT/d9a81ba54b90861585d20e7a9dc2ef78

Gist

aiengine crawl. GitHub Gist: instantly share code, notes, and snippets.

alpine meteor
# tender marsh add_filter( "mwai_context_search", 'my_page_search', 5, 3 ); function my_page_s...

hi thanks for the code. I intalled pandoc it is functional but the code is returning a fatal error. so I changed it a bit to catch with the update but still not working " add_filter("mwai_context_search", 'my_page_search', 10, 3);

function my_page_search($context, $query, $options = []) {
// Adjusting to the new method of getting the last message
if(method_exists($query, 'get_message')) {
$lastMessage = $query->get_message();
} else {
// Fallback or error handling if the new method doesn't exist
// This is a placeholder. You might want to handle this differently.
$lastMessage = '';
}

// Check if the request contains a URL. Use a regex to extract it.
$url_regex = '/https?:\/\/[\S]+/';
if (preg_match($url_regex, $lastMessage, $matches)) {
    $url = $matches[0]; // The first URL in the message
    $command = "/pandoc -s -r html $url -t plain --no-highlight";
    $output = shell_exec($command);
}

// Check if command executed successfully.
if (isset($output) && $output === null) {
    return null;
}

// AI Engine expects a type (for logging purposes) and the content (which will be used by AI).
$context["type"] = "sitesearch";
$context["content"] = isset($output) ? $output : '';

return $context;

}"

tender marsh
#

Hi @alpine meteor! I've updated the script that I posted initially to reflect the changes on the AI Engine plugin. Tested it on my WP site and it worked just fine. Take a look to see if it works for you as well.

plush swift
near talon
plush swift
#

you might want to switch to Tavily and jordy's search plugin.